[00:06:25] FIRING: [4x] SystemdUnitFailed: mediawiki_job_cirrus_build_completion_indices_codfw.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:14:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3559 MB (3% inode=98%): /tmp 3559 MB (3% inode=98%): /var/tmp 3559 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [02:03:31] !log removing 9 files for legal compliance [02:03:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:14:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3555 MB (3% inode=98%): /tmp 3555 MB (3% inode=98%): /var/tmp 3555 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [02:31:25] FIRING: [4x] SystemdUnitFailed: mediawiki_job_cirrus_build_completion_indices_codfw.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:37:16] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:54:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3554 MB (3% inode=98%): /tmp 3554 MB (3% inode=98%): /var/tmp 3554 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [03:02:16] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:54:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3580 MB (3% inode=98%): /tmp 3580 MB (3% inode=98%): /var/tmp 3580 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [04:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:52:49] PROBLEM - MD RAID on wikikube-worker2068 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [04:52:50] ACKNOWLEDGEMENT - MD RAID on wikikube-worker2068 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T378255 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [05:14:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3170 MB (3% inode=98%): /tmp 3170 MB (3% inode=98%): /var/tmp 3170 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [05:46:19] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:47:09] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.175 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:54:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3536 MB (3% inode=98%): /tmp 3536 MB (3% inode=98%): /var/tmp 3536 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [06:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:31:25] FIRING: [2x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:54:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3627 MB (3% inode=98%): /tmp 3627 MB (3% inode=98%): /var/tmp 3627 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [07:21:25] FIRING: [2x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:33:54] 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2068 - https://phabricator.wikimedia.org/T378255 (10ops-monitoring-bot) 03NEW [07:34:08] (03PS1) 10KartikMistry: Disable MT in Content Translation on Lithuanian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083292 (https://phabricator.wikimedia.org/T364073) [07:34:52] (03Abandoned) 10Majavah: P:wmcs::nfs: Fix black formatting [puppet] - 10https://gerrit.wikimedia.org/r/1076426 (owner: 10Majavah) [07:35:29] (03PS1) 10Majavah: Remove CI ignored modules mechanism [puppet] - 10https://gerrit.wikimedia.org/r/1083293 [07:46:25] RESOLVED: SystemdUnitFailed: mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:34:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3330 MB (3% inode=98%): /tmp 3330 MB (3% inode=98%): /var/tmp 3330 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [09:24:31] (03PS1) 10Hamish: enwiktionary: Activate wgMinervaTalkAtTop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 [09:27:48] (03PS2) 10Hamish: enwiktionary: Activate wgMinervaTalkAtTop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) [09:45:40] (03PS1) 10Majavah: base: notify_maintainers: Don't email disabled accounts [puppet] - 10https://gerrit.wikimedia.org/r/1083295 [10:14:27] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3648 MB (3% inode=98%): /tmp 3648 MB (3% inode=98%): /var/tmp 3648 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [10:34:54] 07Puppet, 10MobileFrontend (Tracking): Mobile site does not automatically redirect to desktop version (and not possible to use browser "use desktop view") - https://phabricator.wikimedia.org/T60425#10264627 (10D4n2016) As a workaround, is it possible to use a greasemonkey script (browser addon) on Android to m... [10:52:20] 06SRE, 10wikitech.wikimedia.org: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10264635 (10Peachey88) [10:55:41] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:56:03] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:56:05] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:10:41] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 44, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:11:03] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:11:05] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:27:41] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:28:03] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:28:05] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:43:48] (03PS1) 10Majavah: Drop labtestwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083304 (https://phabricator.wikimedia.org/T378260) [11:43:49] (03PS1) 10Majavah: Stop building LdapAuthentication i10n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083305 (https://phabricator.wikimedia.org/T371592) [11:44:38] (03PS1) 10Majavah: Drop labtestwikitech name [dns] - 10https://gerrit.wikimedia.org/r/1083306 (https://phabricator.wikimedia.org/T378260) [11:55:41] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 44, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:56:03] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:56:05] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:22:41] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:23:03] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:23:05] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:39:03] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:39:05] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:39:41] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 44, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:56:41] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:57:03] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:57:05] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:22:26] (03CR) 10RhinosF1: "the nonglobal dblist is now empty. Should that be removed or is it expected to be used in future?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083304 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [14:37:16] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:02:16] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:15:25] (03PS3) 10Hamish: enwiktionary: Enable mobile page tabs for non logged in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) [16:16:04] (03CR) 10CI reject: [V:04-1] enwiktionary: Enable mobile page tabs for non logged in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) (owner: 10Hamish) [16:18:13] (03PS4) 10Hamish: enwiktionary: Enable mobile page tabs for non logged in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) [16:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:23:52] PROBLEM - Host db1234 #page is DOWN: PING CRITICAL - Packet loss = 100% [16:24:22] ugh, really? [16:26:02] it's not a master node, I think a depool & silence [16:26:08] RECOVERY - Host db1234 #page is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [16:27:17] PROBLEM - MariaDB Replica IO: s1 on db1234 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:28:15] PROBLEM - mysqld processes on db1234 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [16:28:39] PROBLEM - MariaDB Replica SQL: s1 on db1234 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:28:51] !log mvernon@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday [16:29:05] !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday [16:29:16] a memory problem it seems [16:29:46] !log mvernon@cumin1002 dbctl commit (dc=all): 'Depool db1234', diff saved to https://phabricator.wikimedia.org/P70589 and previous config saved to /var/cache/conftool/dbconfig/20241026-162946-mvernon.json [16:31:21] kernel log certainly has a bunch of mce errors in [16:32:01] and getsel is saying Description: A critical diagnostic event occurred in the memory device at A6. Contact your service provider for assistance in replacing the device. [16:32:20] 06SRE, 06DBA: db1234 crashed - https://phabricator.wikimedia.org/T378267 (10MatthewVernon) 03NEW [16:32:43] ^-- ticket created, host depooled, I've resolved the incident on splunk, and I think the hw investigation can wait 'til working hours :) [16:58:25] A6 was already broken half a year ago, it seems there are more systemetic issues with that host: https://phabricator.wikimedia.org/T363102 [20:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:30:13] PROBLEM - Disk space on mw2278 is CRITICAL: DISK CRITICAL - free space: / 1066 MB (0% inode=98%): /tmp 1066 MB (0% inode=98%): /var/tmp 1066 MB (0% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mw2278&var-datasource=codfw+prometheus/ops [20:46:53] PROBLEM - Disk space on mw2259 is CRITICAL: DISK CRITICAL - free space: / 347 MB (0% inode=98%): /tmp 347 MB (0% inode=98%): /var/tmp 347 MB (0% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mw2259&var-datasource=codfw+prometheus/ops [20:50:13] RECOVERY - Disk space on mw2278 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mw2278&var-datasource=codfw+prometheus/ops [21:26:53] PROBLEM - Disk space on mw2259 is CRITICAL: DISK CRITICAL - free space: / 4665 MB (3% inode=98%): /tmp 4665 MB (3% inode=98%): /var/tmp 4665 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mw2259&var-datasource=codfw+prometheus/ops [21:40:03] PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [21:40:05] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [21:47:17] PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [22:46:53] RECOVERY - Disk space on mw2259 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mw2259&var-datasource=codfw+prometheus/ops [23:38:19] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1083375 [23:38:19] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1083375 (owner: 10TrainBranchBot)