[00:00:13] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 3.447 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:01:11] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51007 bytes in 0.109 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:13:05] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:13:31] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:14:23] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51007 bytes in 0.066 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:14:51] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.463 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:38:25] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/985953 [00:38:31] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/985953 (owner: 10TrainBranchBot) [01:01:31] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/985953 (owner: 10TrainBranchBot) [02:09:41] PROBLEM - mailman3_queue_size on lists1001 is CRITICAL: CRITICAL: 1 mailman3 queues above limits: bounces is 218 (limit: 25) https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [02:20:37] RECOVERY - mailman3_queue_size on lists1001 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [02:37:03] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:08:47] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:49:35] PROBLEM - clamd running on vrts1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (clamav), command name clamd https://wikitech.wikimedia.org/wiki/VRT_System%23ClamAV [03:49:51] PROBLEM - Check systemd state on vrts1001 is CRITICAL: CRITICAL - degraded: The following units failed: clamav-daemon.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:50:06] (ProbeDown) firing: Service vrts1001:1443 has failed probes (http_ticket_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#vrts1001:1443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:55:05] (ProbeDown) resolved: Service vrts1001:1443 has failed probes (http_ticket_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#vrts1001:1443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:56:55] RECOVERY - clamd running on vrts1001 is OK: PROCS OK: 1 process with UID = 114 (clamav), command name clamd https://wikitech.wikimedia.org/wiki/VRT_System%23ClamAV [03:57:13] RECOVERY - Check systemd state on vrts1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:56:09] PROBLEM - Check systemd state on vrts1001 is CRITICAL: CRITICAL - degraded: The following units failed: clamav-daemon.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:57:39] RECOVERY - Check systemd state on vrts1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:25:02] (03PS1) 10Ladsgroup: snapshot: Improve border of dumps cards [puppet] - 10https://gerrit.wikimedia.org/r/986181 [06:27:15] (03PS2) 10Ladsgroup: snapshot: Improve border of dumps cards [puppet] - 10https://gerrit.wikimedia.org/r/986181 [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231227T0800) [11:24:00] 10SRE, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T354049 (10ArthurTaylor) [12:28:11] PROBLEM - MariaDB disk space on dbstore1003 is CRITICAL: DISK CRITICAL - free space: /srv 265066 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [12:30:14] 10SRE, 10Phabricator: SRE Access requests: Disable Phabricator form 104 in favor of Phabricator form 8 - https://phabricator.wikimedia.org/T354051 (10Aklapper) p:05Triage→03Low [12:30:58] 10SRE, 10Phabricator: SRE Access requests: Disable Phabricator form 104 in favor of Phabricator form 8 - https://phabricator.wikimedia.org/T354051 (10Aklapper) 05Open→03Resolved a:03Aklapper [12:31:08] 10SRE, 10SRE-Access-Requests: Requesting access to for Arthur Taylor - https://phabricator.wikimedia.org/T354049 (10Aklapper) [12:47:43] PROBLEM - Disk space on dbstore1003 is CRITICAL: DISK CRITICAL - free space: /srv 169638 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=dbstore1003&var-datasource=eqiad+prometheus/ops [13:56:51] PROBLEM - MariaDB Replica SQL: s1 on dbstore1003 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:56:51] PROBLEM - MariaDB Replica Lag: s1 on dbstore1003 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:57:17] PROBLEM - MariaDB Replica IO: s1 on dbstore1003 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:03] PROBLEM - MariaDB Replica IO: s5 on dbstore1003 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:11] PROBLEM - MariaDB Replica SQL: s5 on dbstore1003 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:11] PROBLEM - MariaDB Replica SQL: s7 on dbstore1003 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:19] PROBLEM - MariaDB Replica Lag: s5 on dbstore1003 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:31] PROBLEM - MariaDB Replica Lag: s7 on dbstore1003 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:58:51] PROBLEM - MariaDB Replica IO: s7 on dbstore1003 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [14:37:03] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:57:03] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:52:01] RECOVERY - Disk space on dbstore1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=dbstore1003&var-datasource=eqiad+prometheus/ops [15:52:21] RECOVERY - MariaDB Replica IO: s7 on dbstore1003 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:52:55] RECOVERY - MariaDB Replica IO: s5 on dbstore1003 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:53:07] RECOVERY - MariaDB Replica SQL: s7 on dbstore1003 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:53:07] RECOVERY - MariaDB Replica SQL: s5 on dbstore1003 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:54:49] RECOVERY - MariaDB Replica SQL: s1 on dbstore1003 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:54:49] RECOVERY - MariaDB Replica Lag: s5 on dbstore1003 is OK: OK slave_sql_lag Replication lag: 0.47 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:55:11] RECOVERY - MariaDB Replica IO: s1 on dbstore1003 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:02:21] RECOVERY - MariaDB Replica Lag: s7 on dbstore1003 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:28:41] RECOVERY - MariaDB Replica Lag: s1 on dbstore1003 is OK: OK slave_sql_lag Replication lag: 0.20 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [22:36:25] (03PS1) 10Novem Linguae: Add "patroller" user group to testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/986200 (https://phabricator.wikimedia.org/T354063) [22:40:55] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [22:41:19] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:46:02] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [22:46:25] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:53:25] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [22:53:41] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply