[00:02:11] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:02:23] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:03:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:11:15] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 657.09 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[00:11:43] <wikibugs>	 (03Abandoned) 10Andrew Bogott: puppetserver: move duplicate directory definition [puppet] - 10https://gerrit.wikimedia.org/r/1121702 (owner: 10Andrew Bogott)
[00:25:56] <wikibugs>	 (03PS10) 10BCornwall: varnish: Upgrade VCL for Varnish 7.0+/modules 0.20 [puppet] - 10https://gerrit.wikimedia.org/r/1113592 (https://phabricator.wikimedia.org/T378737)
[00:32:17] <wikibugs>	 06SRE, 07LDAP, 13Patch-For-Review: ldap-admins POSIX group does not actually give any permissions to its members - https://phabricator.wikimedia.org/T386472#10572508 (10Reedy) {ce50baf54b2a3eb40517381c6d199267a68a4d20} originally added the group... With no linked task.  Unfortunately my memory of these thing...
[00:39:04] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1121717
[00:39:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1121717 (owner: 10TrainBranchBot)
[00:43:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:50:54] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1121717 (owner: 10TrainBranchBot)
[01:09:07] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1121718
[01:09:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1121718 (owner: 10TrainBranchBot)
[01:30:50] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1121718 (owner: 10TrainBranchBot)
[01:46:29] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/c131b7f9bc3319d653e893375d2082ab86c8d746bb0f664bb54ef1c1fd9ea209/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:06:29] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:10:17] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:17:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:24:37] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs / scholia: add routing for scholia host [puppet] - 10https://gerrit.wikimedia.org/r/1121726
[04:34:15] <wikibugs>	 (03PS5) 10SD hehua: Set Transwiki namespace on zhwikivoyage and zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1121622 (https://phabricator.wikimedia.org/T387055)
[04:44:09] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:44:21] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:44:59] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53514 bytes in 0.118 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:45:11] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.191 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:11:45] <wikibugs>	 (03CR) 10Stang: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1121622 (https://phabricator.wikimedia.org/T387055) (owner: 10SD hehua)
[05:12:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Set Transwiki namespace on zhwikivoyage and zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1121622 (https://phabricator.wikimedia.org/T387055) (owner: 10SD hehua)
[05:14:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10572671 (10phaultfinder)
[05:27:21] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:32:21] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:34:34] <wikibugs>	 (03PS6) 10SD hehua: Set Transwiki namespace on zhwikivoyage and zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1121622 (https://phabricator.wikimedia.org/T387055)
[06:42:22] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:43:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:47:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:49:33] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:04:33] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:13:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[07:20:11] <icinga-wm>	 PROBLEM - snapshot of s7 in codfw on backupmon1001 is CRITICAL: Last snapshot for s7 at codfw (db2198) taken on 2025-02-22 06:20:54 is 832 GiB, but the previous one was 1159 GiB, a change of -28.2 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[07:25:28] <wikibugs>	 (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1121622 (https://phabricator.wikimedia.org/T387055) (owner: 10SD hehua)
[08:07:22] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:22:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:23:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:45:17] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:45:29] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:03:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:10:53] <icinga-wm>	 PROBLEM - SSH on ml-lab1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:13:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:21:49] <icinga-wm>	 RECOVERY - SSH on ml-lab1002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:27:35] <icinga-wm>	 PROBLEM - Host ml-lab1002 is DOWN: PING CRITICAL - Packet loss = 100%
[10:40:05] <icinga-wm>	 RECOVERY - Host ml-lab1002 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[10:43:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:13:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:43:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:22:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:23:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:53:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:54:51] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3276 MB (3% inode=98%): /tmp 3276 MB (3% inode=98%): /var/tmp 3276 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[13:44:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10572967 (10cmooney) DC-Ops folks Nokia reccomend trying to interrupt the grub bootloader for lsw5-a8-codfw to see if we can boo...
[14:01:17] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:23:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[15:29:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10573017 (10phaultfinder)
[15:53:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:02:40] <wikibugs>	 06SRE, 07LDAP, 13Patch-For-Review: ldap-admins POSIX group does not actually give any permissions to its members - https://phabricator.wikimedia.org/T386472#10573065 (10Urbanecm) Noting @jrbs was added to the group in T220860, in order to be able to run changePassword.php on Wikitech (which back then was lin...
[16:22:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:34:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10573095 (10phaultfinder)
[17:03:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[17:33:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[18:18:27] <wikibugs>	 (03PS1) 10Aklapper: Disable weekend rule [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121743
[18:19:06] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Disable weekend rule [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121743 (owner: 10Aklapper)
[18:24:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10573120 (10phaultfinder)
[18:33:01] <wikibugs>	 (03PS1) 10Aklapper: Add Logstash / phlog debug output when users get close to threshold [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121744
[18:33:57] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Add Logstash / phlog debug output when users get close to threshold [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121744 (owner: 10Aklapper)
[18:35:09] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:35:21] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:36:17] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:43:59] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53513 bytes in 0.070 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:44:07] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 09 Apr 2025 10:34:17 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:44:11] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.192 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[19:22:15] <wikibugs>	 (03PS1) 10Aklapper: Make antivandalism.edit-period-hours docs mention result limitations [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121745
[19:22:54] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Make antivandalism.edit-period-hours docs mention result limitations [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121745 (owner: 10Aklapper)
[19:45:00] <wikibugs>	 (03PS1) 10Aklapper: Replace DB transaction query with FeedQuery to estimate recent edits [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121746 (https://phabricator.wikimedia.org/T386704)
[19:48:23] <wikibugs>	 (03CR) 10Aklapper: "I'd not want to merge this for the next deployment but one after, as there have been many changes in this codebase, and this would be the " [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1121746 (https://phabricator.wikimedia.org/T386704) (owner: 10Aklapper)
[19:53:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:22:22] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:23:19] <wikibugs>	 06SRE, 06Traffic, 10Wikidata, 06Wikidata Dev Team, 07Performance Issue: Frequent 500 Errors and Timeouts When Adding Statements to New Properties - https://phabricator.wikimedia.org/T374230#10573215 (10Bugreporter)
[20:23:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:34:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10573274 (10phaultfinder)
[22:06:51] <wikibugs>	 06SRE, 10Prod-Kubernetes, 06Traffic, 10Wikidata, and 3 others: Frequent 500 Errors and Timeouts When Adding Statements to New Properties - https://phabricator.wikimedia.org/T374230#10573276 (10Addshore) Interesting, coming here from the merged in ticket, there are some logs that might be relevant, relating...
[22:37:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 802.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:42:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 806.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:43:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[23:54:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10573302 (10phaultfinder)