[01:15:16] <icinga-wm>	 PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:34:50] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 39.87 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[01:47:16] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[02:16:12] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:06:52] <icinga-wm>	 PROBLEM - mailman3_queue_size on lists1001 is CRITICAL: CRITICAL: 1 mailman3 queues above limits: bounces is 31 (limit: 25) https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3
[05:08:13] <legoktm>	 o.O
[05:08:56] <icinga-wm>	 RECOVERY - mailman3_queue_size on lists1001 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211031T0700)
[07:57:26] <icinga-wm>	 PROBLEM - puppet last run on kafka-test1009 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[08:03:30] <icinga-wm>	 RECOVERY - puppet last run on kafka-test1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[11:02:29] <wikibugs>	 10SRE, 10MediaWiki-Uploading: Unexpected upload speed to commons - https://phabricator.wikimedia.org/T288481 (10Xover) @aborrero Did you specify a `-chunked` to pwb.py, and if so what (5MB perhaps?)? And did you give it `-async`?  The bursty upload could be consistent with pwb uploading a ~5MB chunk to the api...
[11:49:31] <wikibugs>	 10SRE, 10Wikimedia-Incident: Uncached wiki requests partially unavailable due to excessive request rates from a bot - https://phabricator.wikimedia.org/T280232 (10Aklapper) SRE folks: Six months later, is there more to do here in this task?
[13:14:42] <urbanecm>	 !log Re-create global account User:Calvinius and attach existing local accounts to it (T291745)
[13:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:50] <stashbot>	 T291745: User:Calvinius has local accounts with edits, but no global account - https://phabricator.wikimedia.org/T291745
[13:26:44] <icinga-wm>	 PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[13:28:48] <icinga-wm>	 RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:17:53] <wikibugs>	 (03Abandoned) 10Majavah: P::toolforge: force remove /srv/composer on buster [puppet] - 10https://gerrit.wikimedia.org/r/730143 (owner: 10Majavah)
[16:50:26] <wikibugs>	 (03PS1) 10Zabe: Test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735745
[16:55:41] <wikibugs>	 (03PS2) 10Zabe: Test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735745
[20:19:38] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[20:25:46] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.56 ms
[21:49:45] <urbanecm>	 !log urbanecm@mwmaint1002:~$ mwscript userOptions.php --wiki=dewiki --nowarn --touserid 3802752 --old 'linkrecommendation' --new 'control' 'growthexperiments-homepage-variant' # T294712
[21:49:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:49:53] <stashbot>	 T294712: Completely disable the linkrecommendation task type in the Growth module in the German Wikipedia - https://phabricator.wikimedia.org/T294712
[21:56:48] <icinga-wm>	 PROBLEM - SSH on puppetmaster1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:48:44] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /robots.txt (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[22:50:50] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[23:16:38] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1210.37 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:25:32] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s6 on db2141 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2129.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:39:46] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s6 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1249.89 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:43:52] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s6 on db2141 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:45:56] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s6 on db2141 is OK: OK slave_sql_lag Replication lag: 47.54 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:58:46] <icinga-wm>	 RECOVERY - SSH on puppetmaster1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook