[00:00:54] <icinga-wm>	 RECOVERY - Check systemd state on phab1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:18:58] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:18] <icinga-wm>	 RECOVERY - Check systemd state on logstash1023 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:31:38] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:04:01] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[01:08:02] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[01:15:26] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:07:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:08:43] <wikibugs>	 (03PS1) 10Dzahn: devtools: set mariadb datadir path for phorge-1001 instance [puppet] - 10https://gerrit.wikimedia.org/r/890131 (https://phabricator.wikimedia.org/T328595)
[02:11:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] devtools: set mariadb datadir path for phorge-1001 instance [puppet] - 10https://gerrit.wikimedia.org/r/890131 (https://phabricator.wikimedia.org/T328595) (owner: 10Dzahn)
[02:17:17] <wikibugs>	 (03PS1) 10Dzahn: phorge: install php-zip and php-gd packages [puppet] - 10https://gerrit.wikimedia.org/r/890132 (https://phabricator.wikimedia.org/T328595)
[02:20:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phorge: install php-zip and php-gd packages [puppet] - 10https://gerrit.wikimedia.org/r/890132 (https://phabricator.wikimedia.org/T328595) (owner: 10Dzahn)
[02:22:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:29:52] <wikibugs>	 (03PS1) 10Dzahn: phorge: install php-apcu and python3-pygments [puppet] - 10https://gerrit.wikimedia.org/r/890133 (https://phabricator.wikimedia.org/T328595)
[02:29:58] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "emotional support and evidence of working mouse. Would clear some warning noise :)" [puppet] - 10https://gerrit.wikimedia.org/r/889892 (https://phabricator.wikimedia.org/T312823) (owner: 10BCornwall)
[02:32:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phorge: install php-apcu and python3-pygments [puppet] - 10https://gerrit.wikimedia.org/r/890133 (https://phabricator.wikimedia.org/T328595) (owner: 10Dzahn)
[02:44:20] <wikibugs>	 (03PS1) 10Dzahn: phorge: add parameter and value for the repo path [puppet] - 10https://gerrit.wikimedia.org/r/890134 (https://phabricator.wikimedia.org/T328595)
[02:46:54] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phorge: add parameter and value for the repo path [puppet] - 10https://gerrit.wikimedia.org/r/890134 (https://phabricator.wikimedia.org/T328595) (owner: 10Dzahn)
[02:48:57] <wikibugs>	 (03PS1) 10Dzahn: devtools: fix typo in hiera key name for phorge [puppet] - 10https://gerrit.wikimedia.org/r/890135
[02:49:29] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] devtools: fix typo in hiera key name for phorge [puppet] - 10https://gerrit.wikimedia.org/r/890135 (owner: 10Dzahn)
[04:15:54] <wikibugs>	 (03PS1) 10Sushrith Bogi: Reduce height of the article toolbar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890140 (https://phabricator.wikimedia.org/T316950)
[05:08:46] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[05:12:47] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[05:51:18] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 1.017e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[05:57:16] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[06:00:52] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1016 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.058 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[06:22:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:22:50] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[08:00:04] <icinga-wm>	 PROBLEM - Disk space on kubestagetcd1006 is CRITICAL: DISK CRITICAL - free space: / 709 MB (3% inode=95%): /tmp 709 MB (3% inode=95%): /var/tmp 709 MB (3% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=kubestagetcd1006&var-datasource=eqiad+prometheus/ops
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230218T0800)
[08:21:11] <elukey>	 !log delete /var/log/syslog.1 on kubestageetcd1006 to free space
[08:21:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:31] <elukey>	 !log delete /var/log/{messages,user.log).1 on kubestageetcd1006 to free space
[08:22:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:22] <icinga-wm>	 PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:29:32] <elukey>	 !log kill leftover processes of user `mepps` (offboarded) from stat100[4,5] to unblock puppet
[08:29:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:40] <icinga-wm>	 RECOVERY - Disk space on kubestagetcd1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=kubestagetcd1006&var-datasource=eqiad+prometheus/ops
[09:09:00] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[09:13:04] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[10:22:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:25:35] <wikibugs>	 (03PS8) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830)
[12:27:21] <wikibugs>	 (03PS4) 10Fomafix: Add 'rup' as alias for 'roa-rup' [puppet] - 10https://gerrit.wikimedia.org/r/527917 (https://phabricator.wikimedia.org/T17988)
[12:28:44] <wikibugs>	 (03PS4) 10Fomafix: Add 'vro' as alias for 'fiu-vro' [puppet] - 10https://gerrit.wikimedia.org/r/527915 (https://phabricator.wikimedia.org/T31186)
[12:29:31] <wikibugs>	 (03PS5) 10Fomafix: Add 'egl' as alias for 'eml' [puppet] - 10https://gerrit.wikimedia.org/r/527933 (https://phabricator.wikimedia.org/T36217)
[12:34:53] <wikibugs>	 (03PS5) 10Fomafix: Add 'nrf' as alias for 'nrm' [puppet] - 10https://gerrit.wikimedia.org/r/527909 (https://phabricator.wikimedia.org/T25216)
[12:36:36] <wikibugs>	 (03PS9) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830)
[12:57:28] <wikibugs>	 10SRE, 10Traffic: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10TheDJ) 05Open→03Resolved a:03TheDJ
[12:58:20] <wikibugs>	 (03CR) 10Aklapper: "Sushrith: Did you test this locally in your MediaWiki setup and can you confirm that it does fix the problem?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890140 (https://phabricator.wikimedia.org/T316950) (owner: 10Sushrith Bogi)
[13:13:47] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[13:17:48] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[13:58:02] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}/{provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[14:01:28] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[14:20:30] <icinga-wm>	 RECOVERY - Check systemd state on mirror1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:22:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:01:57] <jinxer-wm>	 (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:06:57] <jinxer-wm>	 (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:10:32] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10Patch-For-Review, 10Software-Licensing: Add LICENSE to operations/dns scripts - https://phabricator.wikimedia.org/T291323 (10Legoktm) >>! In T291323#8626159, @BCornwall wrote: >..., @Legoktm, ... can each of you approve of relicensing the content of your work in the operations/...
[15:11:15] <wikibugs>	 (03CR) 10Legoktm: [C: 03+1] utils: Add SPDX Apache-2.0 license to utils [dns] - 10https://gerrit.wikimedia.org/r/890016 (https://phabricator.wikimedia.org/T291323) (owner: 10BCornwall)
[17:14:01] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[17:15:58] <icinga-wm>	 PROBLEM - Host thumbor1005 is DOWN: PING CRITICAL - Packet loss = 100%
[17:18:03] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[18:22:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:16:50] <icinga-wm>	 PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:13:36] <icinga-wm>	 RECOVERY - Check systemd state on doc1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:18:47] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[21:22:48] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder)
[22:22:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable