[00:01:39] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:15:58] <wikibugs>	 (03PS1) 10Krinkle: ext.wikimediamessages.contactpage: Combine two minor modules [extensions/WikimediaMessages] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802118
[00:16:12] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] ext.wikimediamessages.contactpage: Combine two minor modules [extensions/WikimediaMessages] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802118 (owner: 10Krinkle)
[00:17:02] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] MetaContactPages: Update reference to `ext.wikimediamessages.contactpage` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801423 (owner: 10Krinkle)
[00:17:16] * Krinkle testing on mwdebug1002
[00:18:03] <wikibugs>	 (03Merged) 10jenkins-bot: MetaContactPages: Update reference to `ext.wikimediamessages.contactpage` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801423 (owner: 10Krinkle)
[00:18:23] <wikibugs>	 (03PS3) 10Krinkle: profiler: Turn from functions into class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796300 (https://phabricator.wikimedia.org/T308932)
[00:18:28] <wikibugs>	 (03PS2) 10Krinkle: Profiler: Update wmfSetupProfiler() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801831 (https://phabricator.wikimedia.org/T308932)
[00:18:32] <wikibugs>	 (03PS2) 10Krinkle: Profiler: Remove temporary back-compat for wmfSetupProfiler() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801832 (https://phabricator.wikimedia.org/T308932)
[00:23:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[00:23:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[00:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[00:24:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:28:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[00:28:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:34:08] <wikibugs>	 (03Merged) 10jenkins-bot: ext.wikimediamessages.contactpage: Combine two minor modules [extensions/WikimediaMessages] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802118 (owner: 10Krinkle)
[00:34:15] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:38:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[00:38:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:39:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[00:39:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[00:39:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:39:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:40:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[00:40:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:29] <logmsgbot>	 !log krinkle@deploy1002 Synchronized php-1.39.0-wmf.14/extensions/WikimediaMessages/: I5a700cd3648 (duration: 03m 01s)
[00:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:04] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "wmf.14 is scheduled to reach group2 by afternoon UTC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809) (owner: 10Tim Starling)
[00:50:59] <logmsgbot>	 !log krinkle@deploy1002 Synchronized wmf-config/MetaContactPages.php: Ief1368fd959f428 (duration: 02m 56s)
[00:51:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:57:09] <wikibugs>	 (03PS4) 10Krinkle: profiler: Turn from functions into class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796300 (https://phabricator.wikimedia.org/T308932)
[00:57:52] <wikibugs>	 (03PS3) 10Krinkle: Profiler: Update wmfSetupProfiler() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801831 (https://phabricator.wikimedia.org/T308932)
[00:58:40] <wikibugs>	 (03PS3) 10Krinkle: Profiler: Remove temporary back-compat for wmfSetupProfiler() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801832 (https://phabricator.wikimedia.org/T308932)
[01:00:03] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] profiler: Turn from functions into class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796300 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:00:46] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] profiler: Turn from functions into class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796300 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:01:01] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Profiler: Update wmfSetupProfiler() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801831 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:01:28] <wikibugs>	 (03Merged) 10jenkins-bot: profiler: Turn from functions into class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796300 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:01:46] <wikibugs>	 (03Merged) 10jenkins-bot: Profiler: Update wmfSetupProfiler() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801831 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:01:59] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Profiler: Remove temporary back-compat for wmfSetupProfiler() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801832 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:02:40] <wikibugs>	 (03Merged) 10jenkins-bot: Profiler: Remove temporary back-compat for wmfSetupProfiler() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801832 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:02:51] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:05:08] <logmsgbot>	 !log krinkle@deploy1002 Synchronized src/Profiler.php: I93b3e43d32 (duration: 03m 16s)
[01:05:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:05:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[01:05:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:00] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] Add "db-mainstash" entry to $wgObjectCaches (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[01:06:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[01:06:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[01:07:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:07:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:07:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[01:08:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:09:34] <logmsgbot>	 !log krinkle@deploy1002 Synchronized wmf-config/PhpAutoPrepend.php: Iebd29aaa (duration: 02m 57s)
[01:09:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:10:11] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:13:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[01:13:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:14:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[01:14:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[01:14:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:14:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:15:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[01:15:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:15:41] <logmsgbot>	 !log krinkle@deploy1002 Synchronized src/Profiler.php: I257b41a45 (duration: 03m 15s)
[01:15:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:23:21] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:23:45] <icinga-wm>	 RECOVERY - SSH on wtp1039.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:23:52] <wikibugs>	 (03CR) 10RLazarus: "The structure looks good! But a couple of the tests are failing:" [puppet] - 10https://gerrit.wikimedia.org/r/802079 (owner: 10Lucas Werkmeister (WMDE))
[01:26:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[01:27:41] <wikibugs>	 (03PS7) 10Tim Starling: Implement MediaWiki multi-DC traffic component [puppet] - 10https://gerrit.wikimedia.org/r/801621 (https://phabricator.wikimedia.org/T91820)
[01:28:04] <wikibugs>	 (03PS2) 10Krinkle: tests: Assert that wikiversions.json is complete as per all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796049 (https://phabricator.wikimedia.org/T308932)
[01:28:08] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] tests: Assert that wikiversions.json is complete as per all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796049 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:28:51] <wikibugs>	 (03Merged) 10jenkins-bot: tests: Assert that wikiversions.json is complete as per all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796049 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:30:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Implement MediaWiki multi-DC traffic component [puppet] - 10https://gerrit.wikimedia.org/r/801621 (https://phabricator.wikimedia.org/T91820) (owner: 10Tim Starling)
[01:35:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[01:35:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:35:28] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] CommonSettings: Remove redundant array_search and missing.php ref [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796050 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:36:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[01:36:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[01:36:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:36:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:37:06] <wikibugs>	 (03CR) 10Tim Starling: "I set up ATS locally and tested the module. There were some surprises, which I've documented in the file." [puppet] - 10https://gerrit.wikimedia.org/r/801621 (https://phabricator.wikimedia.org/T91820) (owner: 10Tim Starling)
[01:37:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[01:37:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:38:33] <logmsgbot>	 !log krinkle@deploy1002 Synchronized multiversion/: Id9b34b755230 no-op (duration: 03m 12s)
[01:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:40:36] <wikibugs>	 (03PS3) 10Krinkle: CommonSettings: Remove redundant array_search and missing.php ref [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796050 (https://phabricator.wikimedia.org/T308932)
[01:40:58] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] CommonSettings: Remove redundant array_search and missing.php ref [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796050 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:41:48] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings: Remove redundant array_search and missing.php ref [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796050 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[01:42:02] <wikibugs>	 (03PS8) 10Tim Starling: Implement MediaWiki multi-DC traffic component [puppet] - 10https://gerrit.wikimedia.org/r/801621 (https://phabricator.wikimedia.org/T91820)
[01:46:17] <wikibugs>	 10Puppet, 10SRE, 10Cloud-Services, 10Infrastructure-Foundations, 10Technical-Debt: Uniform cluster nomenclature across puppet - https://phabricator.wikimedia.org/T159411 (10Krinkle) For the record, about "cluster", "dc" and "servergroup" - I took a stab at unifying this as outlined at <https://wikitech.w...
[01:47:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[01:47:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:48:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[01:48:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[01:48:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:48:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:49:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[01:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:50:17] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Krinkle) 05Open→03Resolved Appears resolved. Unless there is a specific common cause recurring, I assume ther...
[01:50:24] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10Beta-Cluster-reproducible: Beta cluster MediaWiki code not updating - https://phabricator.wikimedia.org/T300591 (10Krinkle) 05Open→03Resolved
[01:52:49] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Technical-Debt, 10Tracking-Neverending: Minimize infrastructure differences between Beta Cluster and production - https://phabricator.wikimedia.org/T87220 (10Krinkle)
[02:00:23] <wikibugs>	 (03CR) 10Krinkle: "Does this/should this affect Beta Cluster appservers?  ref https://phabricator.wikimedia.org/T237033#7975492" [puppet] - 10https://gerrit.wikimedia.org/r/792984 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[02:00:31] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Scap, 10serviceops, 10Release-Engineering-Team (Seen): Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 (10Krinkle) @thcipriani @dancy I believe the equivalent of the `beta-scap-eqiad` job from back then (which is n...
[02:01:47] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] "Test case: https://nds.wikiversity.org/ - LGTM before and after." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796050 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[02:04:06] <logmsgbot>	 !log krinkle@deploy1002 Synchronized wmf-config/CommonSettings.php: Ic0e134c61d6 (duration: 03m 02s)
[02:04:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:04:51] <icinga-wm>	 RECOVERY - Check systemd state on cloudelastic1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:10:02] <logmsgbot>	 !log krinkle@deploy1002 Synchronized docroot/noc/: Ic0e134c61d6 (duration: 03m 20s)
[02:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:11:27] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:17:07] <wikibugs>	 (03PS1) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:31:53] <wikibugs>	 (03PS2) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:42:29] <wikibugs>	 (03PS3) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:44:59] <wikibugs>	 (03PS4) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:45:52] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220 (owner: 10Andrew Bogott)
[02:45:56] <wikibugs>	 (03PS5) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:46:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220 (owner: 10Andrew Bogott)
[02:50:46] <wikibugs>	 (03PS6) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:52:58] <wikibugs>	 (03PS11) 10Tim Starling: Add "db-mainstash" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[02:53:00] <wikibugs>	 (03PS3) 10Tim Starling: Switch wgMainStash to db-mainstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799433 (https://phabricator.wikimedia.org/T212129)
[02:53:02] <wikibugs>	 (03PS7) 10Andrew Bogott: Cloud VMs: manage resolv.conf with cloud-init [puppet] - 10https://gerrit.wikimedia.org/r/802220
[02:54:04] <wikibugs>	 (03CR) 10Tim Starling: Add "db-mainstash" entry to $wgObjectCaches (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[02:56:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[03:12:01] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-2] "This doesn't work because the resolv conf module isn't supported by cloudinit on Debian." [puppet] - 10https://gerrit.wikimedia.org/r/802220 (owner: 10Andrew Bogott)
[03:12:06] <wikibugs>	 (03PS3) 10Tim Starling: Enable SSL for master DB connections in the secondary datacenter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809)
[03:12:08] <wikibugs>	 (03PS4) 10Tim Starling: Add the master from the primary DC to the secondary DC load arrays [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799685 (https://phabricator.wikimedia.org/T134809)
[03:12:10] <wikibugs>	 (03PS3) 10Tim Starling: Clean up scap sequencing workaround [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801836
[03:12:21] <wikibugs>	 (03CR) 10Tim Starling: Add the master from the primary DC to the secondary DC load arrays (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799685 (https://phabricator.wikimedia.org/T134809) (owner: 10Tim Starling)
[03:26:15] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[03:27:21] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.151 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[03:37:31] <icinga-wm>	 PROBLEM - SSH on wtp1025.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:22:47] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1001 is CRITICAL: instance=10.64.16.202 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[04:22:47] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:23:03] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:24:21] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 1 (logstash2023), Fresh: 115 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[04:24:53] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48249 bytes in 0.119 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:25:09] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.292 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:31:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[04:32:53] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[04:32:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[04:32:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:39] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:38:39] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:41:03] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[04:45:39] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:45:57] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:52:25] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48249 bytes in 0.215 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:52:41] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.286 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:05:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s7 T309617
[05:05:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:25] <stashbot>	 T309617: Switchover s7 master (db1181 -> db1136) - https://phabricator.wikimedia.org/T309617
[05:05:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s7 T309617
[05:05:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db1136 with weight 0 T309617', diff saved to https://phabricator.wikimedia.org/P29325 and previous config saved to /var/cache/conftool/dbconfig/20220602-050559-ladsgroup.json
[05:06:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:08:29] <wikibugs>	 (03PS3) 10Ladsgroup: mariadb: Promote db1136 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/801684 (https://phabricator.wikimedia.org/T309617)
[05:08:37] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] mariadb: Promote db1136 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/801684 (https://phabricator.wikimedia.org/T309617) (owner: 10Ladsgroup)
[05:09:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298560)', diff saved to https://phabricator.wikimedia.org/P29326 and previous config saved to /var/cache/conftool/dbconfig/20220602-050937-ladsgroup.json
[05:09:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:09:40] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[05:13:36] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2088: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/802119
[05:14:47] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:14:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2088 (s1 and s2) T309485', diff saved to https://phabricator.wikimedia.org/P29327 and previous config saved to /var/cache/conftool/dbconfig/20220602-051451-marostegui.json
[05:14:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:14:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db2088: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/802119 (owner: 10Marostegui)
[05:14:56] <stashbot>	 T309485: db2088 crashed - https://phabricator.wikimedia.org/T309485
[05:15:20] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2088 crashed - https://phabricator.wikimedia.org/T309485 (10Marostegui) 05Open→03Resolved a:05Marostegui→03Papaul db2088 is back in sync with both s1 and s2 master. I have repooled it. Closing this for now. If it happens again we should probably just decommission it.  Tha...
[05:15:46] <ryankemper>	 !log T309720 Finished manual rolling restart of `cloudelastic` cluster to get new S3 plugin operational
[05:15:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:50] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[05:24:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29328 and previous config saved to /var/cache/conftool/dbconfig/20220602-052442-ladsgroup.json
[05:24:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:31:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance cloudelastic1006-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[05:33:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1137 in x1 with minimal weight to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29329 and previous config saved to /var/cache/conftool/dbconfig/20220602-053340-marostegui.json
[05:33:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:33:45] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[05:39:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P29330 and previous config saved to /var/cache/conftool/dbconfig/20220602-053947-ladsgroup.json
[05:39:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:39:55] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:41:11] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Add clarification to comment, to help avoid mistakes using httpd::site. [puppet] - 10https://gerrit.wikimedia.org/r/797110 (owner: 10Slyngshede)
[05:48:24] <wikibugs>	 (03PS1) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[05:52:51] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35664/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[05:54:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298560)', diff saved to https://phabricator.wikimedia.org/P29331 and previous config saved to /var/cache/conftool/dbconfig/20220602-055452-ladsgroup.json
[05:54:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[05:54:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[05:54:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:57] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[05:54:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T298560)', diff saved to https://phabricator.wikimedia.org/P29332 and previous config saved to /var/cache/conftool/dbconfig/20220602-055500-ladsgroup.json
[05:55:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:05] <jouncebot>	 kormat, marostegui, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Primary database switchover . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T0600).
[06:00:08] <Amir1>	 !log Starting s7 eqiad failover from db1181 to db1136 - T309617
[06:00:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:12] <stashbot>	 T309617: Switchover s7 master (db1181 -> db1136) - https://phabricator.wikimedia.org/T309617
[06:00:15] <marostegui>	 o/
[06:00:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T309617', diff saved to https://phabricator.wikimedia.org/P29333 and previous config saved to /var/cache/conftool/dbconfig/20220602-060016-ladsgroup.json
[06:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:44] <marostegui>	 RO confirmed
[06:00:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write T309617', diff saved to https://phabricator.wikimedia.org/P29334 and previous config saved to /var/cache/conftool/dbconfig/20220602-060053-ladsgroup.json
[06:00:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:12] <Amir1>	 and done
[06:01:34] <marostegui>	 Recent changes seems to be moving
[06:03:30] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is CRITICAL: 190 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:03:49] <Amir1>	 that should be us
[06:04:19] <wikibugs>	 (03PS3) 10Ladsgroup: wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/801685 (https://phabricator.wikimedia.org/T309617)
[06:04:33] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/801685 (https://phabricator.wikimedia.org/T309617) (owner: 10Ladsgroup)
[06:04:35] <marostegui>	 Amir1: orchestrator still showing lag
[06:04:49] <Amir1>	 hmm
[06:04:56] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:05:02] <marostegui>	 Are you following the steps order?
[06:05:14] <Amir1>	 yup
[06:05:23] <marostegui>	 So was heartbeat cleaned?
[06:05:53] <wikibugs>	 (03PS2) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[06:06:16] <Amir1>	 I think it's not done yet
[06:06:26] <marostegui>	 ok
[06:06:47] <marostegui>	 that's why I was asking if the steps are being followed in case it was done but didn't work or if it wasn't
[06:06:53] <marostegui>	 I can do it if you want
[06:07:11] <Amir1>	 sure
[06:07:15] <Amir1>	 that'd be amazing
[06:07:20] <marostegui>	 ok
[06:08:42] <marostegui>	 fixed
[06:08:55] <Amir1>	 \o/
[06:10:10] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35665/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[06:10:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db1181 T309617', diff saved to https://phabricator.wikimedia.org/P29335 and previous config saved to /var/cache/conftool/dbconfig/20220602-061039-ladsgroup.json
[06:10:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:10:43] <stashbot>	 T309617: Switchover s7 master (db1181 -> db1136) - https://phabricator.wikimedia.org/T309617
[06:11:15] <marostegui>	 Amir1: Remember to edit db1181 to give the weight db1136 had before (otherwise it will be pooled with 0 weight)
[06:11:42] <Amir1>	 marostegui: we need to depool it for maint
[06:11:42] <marostegui>	 You can do that when it is depooled too
[06:11:54] <Amir1>	 oh, how can I edit it?
[06:12:04] <marostegui>	 Amir1: yes, what I mean is db1181 current weight is 0 and if you pool it back, it will still have weight
[06:12:07] <marostegui>	 Amir1: dbctl instance db1181 edit
[06:12:13] <Amir1>	 awesome
[06:12:19] <marostegui>	 just edit the weight and give the same weight db1136 had before
[06:12:47] <Amir1>	 https://phabricator.wikimedia.org/P29325 
[06:12:51] <Amir1>	 400 it should be
[06:13:20] <marostegui>	 yep!
[06:13:26] <Amir1>	 marostegui: done 
[06:13:43] <marostegui>	 Amir1: but it is pooled, isn't it?
[06:13:44] <wikibugs>	 (03PS3) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[06:13:52] <Amir1>	 I depooled it
[06:14:02] <marostegui>	 ah good
[06:14:16] <Amir1>	 okay, it seems it's all done
[06:14:23] <marostegui>	 zarcillo checked?
[06:14:44] <marostegui>	 (that item isn't marked)
[06:16:30] <Amir1>	 it's correct
[06:17:25] <marostegui>	 Amir1: mark that step as done, so we know it wasn't missed
[06:17:38] <Amir1>	 done 
[06:17:48] <marostegui>	 \o/
[06:17:55] <Amir1>	 I'm closing the ticket and have some ideas on how to improve it later :D
[06:18:01] <marostegui>	 cool!
[06:18:25] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35666/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[06:18:36] <Amir1>	 I need to run some errands, will be back to run the schema changes, feel free to do whatever you like with it in the mean time
[06:37:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29336 and previous config saved to /var/cache/conftool/dbconfig/20220602-063710-marostegui.json
[06:37:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:14] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[06:49:18] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Ferran Tufan to contributors [puppet] - 10https://gerrit.wikimedia.org/r/802423
[06:52:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29337 and previous config saved to /var/cache/conftool/dbconfig/20220602-065203-marostegui.json
[06:52:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:09] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[06:53:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Ferran Tufan to contributors [puppet] - 10https://gerrit.wikimedia.org/r/802423 (owner: 10Muehlenhoff)
[06:59:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Add new bullseye IDPs to acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/802424
[07:00:05] <jouncebot>	 Amir1 and apergos: How many deployers does it take to do UTC morning backport and config training deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T0700).
[07:00:50] <apergos>	 hello
[07:01:01] <apergos>	 no trainees have signed up for the window today
[07:01:15] <apergos>	 there are also no patches scheduled for deployment
[07:01:35] <apergos>	 anyone who would like to self-deploy last minute can add themselves to the calendar
[07:01:42] <apergos>	 otherwise in about 15 minutes I will wander off.
[07:05:46] <moritzm>	 !log installing systemd bugfix updates from last bullseye point release, also includes a minor security fix in systemd-tmpfiles
[07:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29338 and previous config saved to /var/cache/conftool/dbconfig/20220602-071547-marostegui.json
[07:15:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:52] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[07:22:06] <wikibugs>	 (03PS1) 10David Caro: ceph: remove buster repos and move to croit mirrors for the rest [puppet] - 10https://gerrit.wikimedia.org/r/802425
[07:22:57] <wikibugs>	 (03PS2) 10David Caro: ceph: remove nautilus-buster repos and move to croit [puppet] - 10https://gerrit.wikimedia.org/r/802425
[07:24:06] <wikibugs>	 (03PS4) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[07:24:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[07:25:48] <wikibugs>	 (03PS5) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[07:26:35] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@ef68481]: Additional analytics weekly train [analytics/refinery@ef68481]
[07:26:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:28:35] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:30:02] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35667/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[07:36:19] <wikibugs>	 (03PS4) 10David Caro: Fix spelling errors [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801730
[07:38:28] <wikibugs>	 (03PS5) 10David Caro: wmcs: added missing __init__.py and relted lint fixes [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801732
[07:39:59] <wikibugs>	 (03PS4) 10David Caro: Add readme, configure script and missing modules [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379
[07:41:09] <wikibugs>	 (03PS6) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[07:41:16] <wikibugs>	 (03CR) 10David Caro: wmcs: added missing __init__.py and relted lint fixes (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801732 (owner: 10David Caro)
[07:42:02] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[07:43:09] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Parsoid: Retire the old Parsoid deb repository? - https://phabricator.wikimedia.org/T309765 (10MoritzMuehlenhoff)
[07:43:31] <icinga-wm>	 ACKNOWLEDGEMENT - Backup freshness on backup1001 is CRITICAL: All failures: 1 (logstash2023), Fresh: 115 jobs Jcrespo known T237224 - The acknowledgement expires at: 2022-06-02 12:42:59. https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[07:45:43] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: added missing __init__.py and relted lint fixes [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801732 (owner: 10David Caro)
[07:45:45] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] Fix spelling errors [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801730 (owner: 10David Caro)
[07:47:02] <wikibugs>	 (03PS7) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[07:49:06] <wikibugs>	 (03Merged) 10jenkins-bot: Fix spelling errors [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801730 (owner: 10David Caro)
[07:49:41] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: added missing __init__.py and relted lint fixes [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801732 (owner: 10David Caro)
[07:51:09] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@ef68481]: Additional analytics weekly train [analytics/refinery@ef68481] (duration: 24m 33s)
[07:51:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:57] <wikibugs>	 (03PS8) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[07:54:11] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Add new bullseye IDPs to acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/802424 (owner: 10Muehlenhoff)
[07:54:48] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@ef68481] (thin): Additional analytics weekly train THIN [analytics/refinery@ef68481]
[07:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:56] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@ef68481] (thin): Additional analytics weekly train THIN [analytics/refinery@ef68481] (duration: 00m 08s)
[07:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:35] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@ef68481] (hadoop-test): Additional analytics weekly train TEST [analytics/refinery@ef68481]
[07:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:58] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35669/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[07:59:35] <wikibugs>	 (03CR) 10Slyngshede: WIP: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[07:59:53] <wikibugs>	 (03PS9) 10Slyngshede: P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355
[08:01:01] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35670/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[08:02:25] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35671/console" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[08:03:09] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@ef68481] (hadoop-test): Additional analytics weekly train TEST [analytics/refinery@ef68481] (duration: 07m 33s)
[08:03:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:08:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[08:09:45] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10MoritzMuehlenhoff) This also needs sign off by either one of @conny-kawohl_WMDE @WMDE-leszek @darthmon_wmde @Tobi_WMDE_SW @Lea_WMDE @karapayneWMDE
[08:10:07] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10MoritzMuehlenhoff) p:05Triage→03Medium
[08:10:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Parsoid: Retire the old Parsoid deb repository? - https://phabricator.wikimedia.org/T309765 (10MoritzMuehlenhoff) p:05Triage→03Medium
[08:13:34] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.8.1 - https://phabricator.wikimedia.org/T309116 (10JMeybohm) a:03JMeybohm
[08:13:44] <wikibugs>	 10SRE, 10Infrastructure-Foundations: SSH host key verification failures in Ganeti intra node SSH calls after Bullseye update - https://phabricator.wikimedia.org/T309724 (10MoritzMuehlenhoff) p:05Triage→03Medium
[08:14:06] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10MoritzMuehlenhoff) p:05Triage→03Medium
[08:14:25] <wikibugs>	 10SRE, 10Sustainability (Incident Followup): get a legend for haproxy "anomalous session termination states" - https://phabricator.wikimedia.org/T308952 (10MoritzMuehlenhoff) p:05Triage→03Medium
[08:16:00] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1 C: 03+2] P:aptrepo::wikimedia switch public apt repo to use new define. [puppet] - 10https://gerrit.wikimedia.org/r/802355 (owner: 10Slyngshede)
[08:22:13] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::prometheus: set openstack scrape_interval to 5m [puppet] - 10https://gerrit.wikimedia.org/r/802434
[08:26:02] <wikibugs>	 (03PS2) 10Majavah: P:wmcs::prometheus: set openstack scrape_interval to 4m [puppet] - 10https://gerrit.wikimedia.org/r/802434
[08:27:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29339 and previous config saved to /var/cache/conftool/dbconfig/20220602-082700-marostegui.json
[08:27:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:05] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[08:28:15] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 116 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[08:32:01] <jayme>	 !log imported scap 4.8.1 to stretch-/buster-/bullseye-wikimedia - T309116
[08:32:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:06] <stashbot>	 T309116: Deploy Scap version 4.8.1 - https://phabricator.wikimedia.org/T309116
[08:33:19] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802424 (owner: 10Muehlenhoff)
[08:34:32] <wikibugs>	 (03PS2) 10Daniel Kinzler: EXPERIMENT: allow DB config reload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801721 (https://phabricator.wikimedia.org/T298485)
[08:35:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add new bullseye IDPs to acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/802424 (owner: 10Muehlenhoff)
[08:36:12] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.8.1 - https://phabricator.wikimedia.org/T309116 (10JMeybohm) ` mwdebug1002:~$ scap pull Traceback (most recent call last):   File "/usr/bin/scap", line 32, in <module>     from scap import cli   File "/usr/lib/python3/dist-packa...
[08:39:49] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.8.1 - https://phabricator.wikimedia.org/T309116 (10JMeybohm) 05Open→03Stalled
[08:43:24] <wikibugs>	 (03CR) 10Jbond: "Just an FYI that i have done a bit of work on systemd-resolvd which i hope to start using in prod as soon as i have some time to prioritis" [puppet] - 10https://gerrit.wikimedia.org/r/802220 (owner: 10Andrew Bogott)
[08:45:23] <wikibugs>	 (03PS1) 10Majavah: wmcs: Add alert for Neutron agents being down [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377)
[08:46:33] <wikibugs>	 (03CR) 10Jbond: "LGTM from a puppet PoV" [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[08:48:59] <icinga-wm>	 RECOVERY - dump of es4 in eqiad on backupmon1001 is OK: Last dump for es4 at eqiad (es1022) taken on 2022-06-01 09:11:21 (3102 GiB, +0.9 %) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[08:49:06] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM adding simon as they have been working on this module" [puppet] - 10https://gerrit.wikimedia.org/r/802425 (owner: 10David Caro)
[08:49:32] <wikibugs>	 (03PS1) 10Physikerwelt: Explicitly set math rendering modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802443 (https://phabricator.wikimedia.org/T309686)
[08:51:58] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802425 (owner: 10David Caro)
[08:53:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29340 and previous config saved to /var/cache/conftool/dbconfig/20220602-085357-marostegui.json
[08:54:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:03] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[08:54:26] <logmsgbot>	 !log joal@deploy1002 Started deploy [airflow-dags/analytics@19cd054]: (no justification provided)
[08:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:36] <logmsgbot>	 !log joal@deploy1002 Finished deploy [airflow-dags/analytics@19cd054]: (no justification provided) (duration: 00m 09s)
[08:54:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. Since you are removing a component, please make sure to also take care of https://wikitech.wikimedia.org/wiki/Reprepro#Removin" [puppet] - 10https://gerrit.wikimedia.org/r/802425 (owner: 10David Caro)
[08:58:57] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379 (owner: 10David Caro)
[08:59:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] check_netbox_report: add url to output [puppet] - 10https://gerrit.wikimedia.org/r/802075 (owner: 10Jbond)
[09:00:33] <wikibugs>	 (03Abandoned) 10Jbond: P:netbox: Add http proxy support to reports [puppet] - 10https://gerrit.wikimedia.org/r/802095 (https://phabricator.wikimedia.org/T296452) (owner: 10Jbond)
[09:03:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:backup::director: use new sudo_user parameter for nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/802165 (owner: 10Jbond)
[09:06:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove superflous comment [puppet] - 10https://gerrit.wikimedia.org/r/802174 (owner: 10Muehlenhoff)
[09:11:29] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:14:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Apply idp role to idp1002/idp2002 [puppet] - 10https://gerrit.wikimedia.org/r/802444 (https://phabricator.wikimedia.org/T308214)
[09:16:31] <icinga-wm>	 PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:16:41] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:16:57] <wikibugs>	 (03PS1) 10Slyngshede: P::aptrepo::wikimedia install Apache for private repo. [puppet] - 10https://gerrit.wikimedia.org/r/802445
[09:19:29] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:23:39] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35672/console" [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[09:28:01] <wikibugs>	 (03CR) 10Muehlenhoff: P::aptrepo::wikimedia install Apache for private repo. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[09:28:42] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] wikimedia.org: reduce TTL for gitlab A and AAAA to 5m [dns] - 10https://gerrit.wikimedia.org/r/802090 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[09:28:46] <wikibugs>	 (03PS2) 10Jelto: wikimedia.org: reduce TTL for gitlab A and AAAA to 5m [dns] - 10https://gerrit.wikimedia.org/r/802090 (https://phabricator.wikimedia.org/T307142)
[09:29:26] <wikibugs>	 (03PS1) 10Jcrespo: backup: Cleanup bacula_check, make dependency explicit [puppet] - 10https://gerrit.wikimedia.org/r/802467
[09:33:57] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "How is this compatible with the other patches merged at https://phabricator.wikimedia.org/T274463 ?" [puppet] - 10https://gerrit.wikimedia.org/r/677970 (https://phabricator.wikimedia.org/T274463) (owner: 10Jbond)
[09:39:03] <wikibugs>	 (03CR) 10Jcrespo: "tried to fix the issue using your advice. I tried to test-unit it to prevent it in the future." [software/transferpy] - 10https://gerrit.wikimedia.org/r/770089 (https://phabricator.wikimedia.org/T256749) (owner: 10Jcrespo)
[09:39:07] <wikibugs>	 (03PS4) 10Jcrespo: Use the shlex.quote method to escape hosts and paths [software/transferpy] - 10https://gerrit.wikimedia.org/r/770089 (https://phabricator.wikimedia.org/T256749)
[09:39:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:39:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:39:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Use the shlex.quote method to escape hosts and paths [software/transferpy] - 10https://gerrit.wikimedia.org/r/770089 (https://phabricator.wikimedia.org/T256749) (owner: 10Jcrespo)
[09:44:11] <wikibugs>	 (03CR) 10Jcrespo: "Hey, Moritz," [puppet] - 10https://gerrit.wikimedia.org/r/793475 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[09:52:12] <wikibugs>	 (03Abandoned) 10Jcrespo: mariadb: Increase core memory usage to 80% of physical memory [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo)
[09:52:37] <wikibugs>	 (03PS1) 10Majavah: gridengine: default to buster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802470 (https://phabricator.wikimedia.org/T277653)
[09:53:41] <wikibugs>	 (03Abandoned) 10Jcrespo: mariadb: Remove m1 references to old database bacula, leave only bacula9 [puppet] - 10https://gerrit.wikimedia.org/r/658970 (https://phabricator.wikimedia.org/T260717) (owner: 10Jcrespo)
[09:53:53] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:53:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:00] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] "https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/CUWV6ML7NBLST2XE57BWYM6MV2FVQYOR/" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802470 (https://phabricator.wikimedia.org/T277653) (owner: 10Majavah)
[09:55:02] <wikibugs>	 (03Abandoned) 10Jcrespo: [WIP] Starting to cleanup mariadb templating structure [puppet] - 10https://gerrit.wikimedia.org/r/324915 (https://phabricator.wikimedia.org/T93645) (owner: 10Jcrespo)
[09:56:07] <wikibugs>	 (03Abandoned) 10Jcrespo: [WIP] Create scripts for batch sql execution [puppet] - 10https://gerrit.wikimedia.org/r/338809 (owner: 10Jcrespo)
[09:56:12] <wikibugs>	 (03Merged) 10jenkins-bot: gridengine: default to buster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802470 (https://phabricator.wikimedia.org/T277653) (owner: 10Majavah)
[09:57:32] <wikibugs>	 (03PS1) 10Majavah: d/changelog: Prepare for 0.84 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802471
[09:58:07] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] d/changelog: Prepare for 0.84 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802471 (owner: 10Majavah)
[09:58:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] - https://phabricator.wikimedia.org/T294972 (10cmooney) @Cmjohnson hey are you able to take care of the BIOS / RAID setup for these hosts?  All should be ready for normal deploy anyway...
[10:00:01] <wikibugs>	 (03Merged) 10jenkins-bot: d/changelog: Prepare for 0.84 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/802471 (owner: 10Majavah)
[10:00:04] <jouncebot>	 mvolz: My dear minions, it's time we take the moon! Just kidding. Time for Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T1000).
[10:02:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:02:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:02:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:20] <wikibugs>	 (03PS1) 10Jelto: wikimedia.org: make gitlab1004 the new gitlab production host [dns] - 10https://gerrit.wikimedia.org/r/802473 (https://phabricator.wikimedia.org/T307142)
[10:14:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:14:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:14:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:25] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "Let's play with this yes, though given the current instability we might have to tweak it a few times before finding a good timing." [puppet] - 10https://gerrit.wikimedia.org/r/802434 (owner: 10Majavah)
[10:17:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Assign SPDX headers to puppet.git - https://phabricator.wikimedia.org/T308013 (10zhuyifei1999)
[10:17:41] <icinga-wm>	 RECOVERY - SSH on wtp1038.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:18:18] <wikibugs>	 (03CR) 10David Caro: wmcs: Add alert for Neutron agents being down (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[10:19:33] <wikibugs>	 (03PS2) 10Majavah: wmcs: Add alert for Neutron agents being down [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377)
[10:20:01] <wikibugs>	 (03CR) 10Majavah: wmcs: Add alert for Neutron agents being down (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[10:22:57] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: Add alert for Neutron agents being down (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[10:23:13] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: Add alert for Neutron agents being down (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[10:25:03] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: Add alert for Neutron agents being down [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[10:25:48] <wikibugs>	 (03Abandoned) 10Jbond: O:gitlab: add config for backup sets [puppet] - 10https://gerrit.wikimedia.org/r/677970 (https://phabricator.wikimedia.org/T274463) (owner: 10Jbond)
[10:28:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:28:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:28:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:08] <wikibugs>	 (03Abandoned) 10Jcrespo: acct: Add 2 line cron patch to mitigate cronspam [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) (owner: 10Jcrespo)
[10:32:23] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:32:41] <wikibugs>	 (03CR) 10Jbond: "See note i don't think we need this" [puppet] - 10https://gerrit.wikimedia.org/r/802467 (owner: 10Jcrespo)
[10:36:41] <wikibugs>	 (03PS1) 10Jbond: CONTRIBUTORS: Add YiFei Zhu [puppet] - 10https://gerrit.wikimedia.org/r/802474
[10:37:15] <wikibugs>	 (03PS2) 10Jbond: CONTRIBUTORS: Add YiFei Zhu [puppet] - 10https://gerrit.wikimedia.org/r/802474 (https://phabricator.wikimedia.org/T308013)
[10:37:23] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] CONTRIBUTORS: Add YiFei Zhu [puppet] - 10https://gerrit.wikimedia.org/r/802474 (https://phabricator.wikimedia.org/T308013) (owner: 10Jbond)
[10:39:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:40:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:14] <wikibugs>	 (03PS5) 10David Caro: Add readme, configure script and missing modules [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379
[10:40:16] <wikibugs>	 (03CR) 10David Caro: Add readme, configure script and missing modules (034 comments) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379 (owner: 10David Caro)
[10:40:19] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:40:22] <wikibugs>	 (03CR) 10Jcrespo: "Thank you for the input. I am doing a bit of cleanup, hopefully getting rid of unneeded old patches :-)" [puppet] - 10https://gerrit.wikimedia.org/r/677970 (https://phabricator.wikimedia.org/T274463) (owner: 10Jbond)
[10:40:52] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "Thanks for the review!" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379 (owner: 10David Caro)
[10:45:30] <wikibugs>	 (03Merged) 10jenkins-bot: Add readme, configure script and missing modules [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/799379 (owner: 10David Caro)
[10:50:28] <wikibugs>	 (03PS2) 10Jcrespo: backup: Cleanup bacula_check [puppet] - 10https://gerrit.wikimedia.org/r/802467
[10:51:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:51:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[10:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:51:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:13] <wikibugs>	 (03CR) 10Jcrespo: "Done" [puppet] - 10https://gerrit.wikimedia.org/r/802467 (owner: 10Jcrespo)
[11:03:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[11:03:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[11:03:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:03:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:17] <wikibugs>	 (03CR) 10Muehlenhoff: "I love the idea, but haven't found the time to have a closer look so far, will do so next week." [puppet] - 10https://gerrit.wikimedia.org/r/793475 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[11:12:59] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10MoritzMuehlenhoff) This was discussed in the Infrastructure Foundations team meeting and was found to be a okay (to grant the permi...
[11:13:44] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10MoritzMuehlenhoff)
[11:15:40] <wikibugs>	 (03CR) 10Jcrespo: [WIP]django: Create custom django module and apply it to backupmon1001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/793475 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[11:16:40] <wikibugs>	 10SRE, 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10observability: Icinga paged for a host that should have been downtimed - https://phabricator.wikimedia.org/T309447 (10MoritzMuehlenhoff) p:05Triage→03High Severity is unclear to me from just reading the task, but since we dislike unnecessa...
[11:20:47] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] backup: Cleanup bacula_check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802467 (owner: 10Jcrespo)
[11:21:38] <wikibugs>	 (03Abandoned) 10Jcrespo: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[11:21:41] <wikibugs>	 (03PS8) 10David Caro: wmcs: Added taskircmail, ircmail and pageircmail routings [puppet] - 10https://gerrit.wikimedia.org/r/802040
[11:21:43] <wikibugs>	 (03CR) 10David Caro: wmcs: Added taskircmail, ircmail and pageircmail routings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[11:21:45] <wikibugs>	 (03PS1) 10David Caro: alertmanager.yml.erb: use facts directly instead of lookupvar [puppet] - 10https://gerrit.wikimedia.org/r/802489
[11:23:05] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM couple of nit/Qs" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178 (owner: 10Volans)
[11:23:29] <moritzm>	 !log installing sysvinit-utils bugfix updates from last bullseye point release
[11:23:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:24] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] "Don't feel strongly :-)" [puppet] - 10https://gerrit.wikimedia.org/r/802467 (owner: 10Jcrespo)
[11:31:19] <hashar>	 !log Restarted Gerrit on replica gerrit2001
[11:31:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "nit tested but lgtm" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446) (owner: 10Volans)
[11:38:34] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.3 point update - https://phabricator.wikimedia.org/T304599 (10MoritzMuehlenhoff)
[11:40:48] <moritzm>	 !log installing tasksel updates from bullseye point release
[11:40:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:15] <moritzm>	 !log installing python-pip bugfix updates from bullseye point release
[11:44:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:04] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.3 point update - https://phabricator.wikimedia.org/T304599 (10MoritzMuehlenhoff)
[11:57:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Apply idp role to idp1002/idp2002 [puppet] - 10https://gerrit.wikimedia.org/r/802444 (https://phabricator.wikimedia.org/T308214) (owner: 10Muehlenhoff)
[12:02:56] <wikibugs>	 (03PS1) 10David Caro: tools: refresh prometheus certificate [puppet] - 10https://gerrit.wikimedia.org/r/802494 (https://phabricator.wikimedia.org/T308402)
[12:03:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29343 and previous config saved to /var/cache/conftool/dbconfig/20220602-120320-marostegui.json
[12:03:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:25] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[12:14:19] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] "the cert looks good (although I don't think these changes really need review anyways)" [puppet] - 10https://gerrit.wikimedia.org/r/802494 (https://phabricator.wikimedia.org/T308402) (owner: 10David Caro)
[12:14:57] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:15:07] <logmsgbot>	 !log joal@deploy1002 Started deploy [airflow-dags/analytics@19b943d]: (no justification provided)
[12:15:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:16] <logmsgbot>	 !log joal@deploy1002 Finished deploy [airflow-dags/analytics@19b943d]: (no justification provided) (duration: 00m 09s)
[12:15:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:24] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10Tobi_WMDE_SW) >>! In T309700#7975820, @MoritzMuehlenhoff wrote: > This also needs sign off by either one of @conny-kawohl_WMDE @WMDE-leszek @darthmon_wmde @Tobi_WMDE_SW @Lea_WMDE @...
[12:16:44] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] tools: refresh prometheus certificate [puppet] - 10https://gerrit.wikimedia.org/r/802494 (https://phabricator.wikimedia.org/T308402) (owner: 10David Caro)
[12:17:41] <wikibugs>	 (03PS2) 10Slyngshede: P::aptrepo::wikimedia install Apache for private repo. [puppet] - 10https://gerrit.wikimedia.org/r/802445
[12:19:31] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802146 (https://phabricator.wikimedia.org/T286856) (owner: 10Majavah)
[12:21:55] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35673/console" [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[12:22:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802146 (https://phabricator.wikimedia.org/T286856) (owner: 10Majavah)
[12:23:05] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] aptrepo: add thirdparty/kubeadm-k8s-1-22 [puppet] - 10https://gerrit.wikimedia.org/r/802146 (https://phabricator.wikimedia.org/T286856) (owner: 10Majavah)
[12:26:22] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/802489 (owner: 10David Caro)
[12:28:47] <wikibugs>	 (03CR) 10Muehlenhoff: P::aptrepo::wikimedia install Apache for private repo. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[12:32:24] <wikibugs>	 (03PS1) 10Stang: itwikiversity: Correct typo of "markbotedits" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802498 (https://phabricator.wikimedia.org/T309750)
[12:33:55] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:37:52] <wikibugs>	 (03PS3) 10Slyngshede: P::aptrepo::wikimedia install Apache for private repo. [puppet] - 10https://gerrit.wikimedia.org/r/802445
[12:38:45] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P::aptrepo::wikimedia install Apache for private repo. [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[12:39:05] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "This should be a no-op as the lvs stanza is completely missing. So obviously no need to to the LVS restart dance." [puppet] - 10https://gerrit.wikimedia.org/r/799357 (https://phabricator.wikimedia.org/T304891) (owner: 10Hnowlan)
[12:39:34] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] service: image-suggestion state to monitoring_setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799358 (https://phabricator.wikimedia.org/T304891) (owner: 10Hnowlan)
[12:39:37] <wikibugs>	 (03PS1) 10Cathal Mooney: Add cloudsw1-e4 and cloudsw1-f4 to mgmt and adjust existing cloudsw [puppet] - 10https://gerrit.wikimedia.org/r/802499 (https://phabricator.wikimedia.org/T304989)
[12:39:39] <wikibugs>	 (03PS4) 10Slyngshede: P::aptrepo::wikimedia install Apache for private repo. [puppet] - 10https://gerrit.wikimedia.org/r/802445
[12:39:55] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "🎉" [puppet] - 10https://gerrit.wikimedia.org/r/799998 (https://phabricator.wikimedia.org/T304891) (owner: 10Hnowlan)
[12:42:07] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:46:30] <wikibugs>	 (03PS9) 10David Caro: wmcs: Added taskircmail, ircmail and pagetaskircmail routings [puppet] - 10https://gerrit.wikimedia.org/r/802040
[12:47:55] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: add svwiki & trwiki articlequality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/802500 (https://phabricator.wikimedia.org/T307418)
[12:48:00] <wikibugs>	 (03PS8) 10David Caro: wmcs: relabel alerts from wmcs cluster with wmcs team [puppet] - 10https://gerrit.wikimedia.org/r/802074
[12:48:14] <wikibugs>	 (03PS2) 10David Caro: alertmanager.yml.erb: use facts directly instead of lookupvar [puppet] - 10https://gerrit.wikimedia.org/r/802489
[12:48:43] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35674/console" [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[12:49:33] <wikibugs>	 (03PS1) 10Jcrespo: mediabackups: Add test units for the Util helper unit [software/mediabackups] - 10https://gerrit.wikimedia.org/r/802501 (https://phabricator.wikimedia.org/T262668)
[12:49:41] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[12:52:22] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] ceph: remove nautilus-buster repos and move to croit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802425 (owner: 10David Caro)
[12:53:33] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[12:53:55] <icinga-wm>	 RECOVERY - BGP status on cr2-eqsin is OK: BGP OK - up: 97, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:55:50] <wikibugs>	 (03CR) 10Cathal Mooney: "Overall LGTM... really nice work!  Only nit I would have is that we should probably make a similar addition to templates/asw/bgp_overlay.c" [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[12:57:37] <icinga-wm>	 PROBLEM - Memcached on idp2002 is CRITICAL: connect to address 208.80.153.108 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[12:57:41] <wikibugs>	 (03CR) 10Cathal Mooney: Add BGP configuration for the new ML staging codfw cluster (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[12:59:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Recycling Pickup for EQIAD - https://phabricator.wikimedia.org/T307140 (10Cmjohnson) {F35200013}. Attached is the final list for recycling.  @wiki_willy   Disks  318 2.5" ssds/disks 249 3.5" disks
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, and awight: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T1300).
[13:00:04] <jouncebot>	 MatmaRex and koi: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:01:14] <urbanecm>	 i can deploy today!
[13:01:20] <Lucas_WMDE>	 great!
[13:01:27] <Lucas_WMDE>	 (was about to write I can’t until :45 ^^)
[13:01:28] <urbanecm>	 hi MatmaRex / koi, are you around?
[13:01:39] <koi>	 I'm here
[13:01:44] <urbanecm>	 hello :)
[13:02:19] <MatmaRex>	 hi
[13:03:00] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] itwikiversity: Correct typo of "markbotedits" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802498 (https://phabricator.wikimedia.org/T309750) (owner: 10Stang)
[13:03:47] <wikibugs>	 (03Merged) 10jenkins-bot: itwikiversity: Correct typo of "markbotedits" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802498 (https://phabricator.wikimedia.org/T309750) (owner: 10Stang)
[13:04:28] <urbanecm>	 koi: pulled to mwdebug1001, please check
[13:04:36] <wikibugs>	 (03PS3) 10Urbanecm: Enable DiscussionTools automatic topic subscriptions as beta feature on remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802214 (https://phabricator.wikimedia.org/T295425) (owner: 10Bartosz Dziewoński)
[13:04:44] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable DiscussionTools automatic topic subscriptions as beta feature on remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802214 (https://phabricator.wikimedia.org/T295425) (owner: 10Bartosz Dziewoński)
[13:04:48] <koi>	 LGTM, thanks!
[13:05:13] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:05:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:17] <urbanecm>	 syncing
[13:05:27] <icinga-wm>	 RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 120, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:05:31] <wikibugs>	 (03Merged) 10jenkins-bot: Enable DiscussionTools automatic topic subscriptions as beta feature on remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802214 (https://phabricator.wikimedia.org/T295425) (owner: 10Bartosz Dziewoński)
[13:05:45] <urbanecm>	 well, not syncing
[13:06:09] <urbanecm>	 13:05:37 sync-file failed: <LockFailedError> Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "jnuche"; reason is "Scap is being updated"
[13:06:33] <jnuche>	 sorry, bad timing, please try again
[13:06:39] <urbanecm>	 thanks!
[13:06:45] <urbanecm>	 now it works
[13:06:57] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install cloudvirt10[48-50].eqiad.wmnet - https://phabricator.wikimedia.org/T299574 (10cmooney) One thing to note as it's not been mentioned in the task description is that the '--enable-virtualization' flag should...
[13:07:07] <wikibugs>	 (03PS15) 10Jbond: prometheus::blackbox::check: add new blackbox exporter check [puppet] - 10https://gerrit.wikimedia.org/r/787067
[13:07:09] <wikibugs>	 (03PS1) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:07:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:07:11] <wikibugs>	 (03PS1) 10Jbond: wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505
[13:07:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:08:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10cmooney) The work here is largely complete, merging that last patch to add the new switches to monitoring should be t...
[13:08:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:54] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 192c5356e1fb21ba820615085abcb2185fd1864c: itwikiversity: Correct typo of "markbotedits" (T309750) (duration: 03m 13s)
[13:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:57] <stashbot>	 T309750: Replace markbotedit right with markbotedits in the patrollers group on it.wikiversity - https://phabricator.wikimedia.org/T309750
[13:10:00] <urbanecm>	 koi: should be live now :)
[13:10:24] <urbanecm>	 MatmaRex: your first patch is at mwdebug1001, can you check?
[13:10:27] <icinga-wm>	 RECOVERY - BGP status on cr2-eqdfw is OK: BGP OK - up: 146, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:10:42] <wikibugs>	 (03PS3) 10Urbanecm: Launch DiscussionTools topic subscriptions a/b test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801818 (https://phabricator.wikimedia.org/T304029) (owner: 10Bartosz Dziewoński)
[13:10:58] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Launch DiscussionTools topic subscriptions a/b test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801818 (https://phabricator.wikimedia.org/T304029) (owner: 10Bartosz Dziewoński)
[13:11:16] <MatmaRex>	 looking
[13:11:33] <wikibugs>	 (03PS2) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:11:46] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus::blackbox::check: add new blackbox exporter check [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[13:11:59] <wikibugs>	 (03Merged) 10jenkins-bot: Launch DiscussionTools topic subscriptions a/b test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801818 (https://phabricator.wikimedia.org/T304029) (owner: 10Bartosz Dziewoński)
[13:13:21] <MatmaRex>	 urbanecm: yep, looks good
[13:13:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:13:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505 (owner: 10Jbond)
[13:14:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:14:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:15:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:15:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:16:07] <icinga-wm>	 RECOVERY - BGP status on cr3-eqsin is OK: BGP OK - up: 346, down: 2, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:16:07] <urbanecm>	 syncing
[13:16:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:37] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:16:43] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr3-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: ayounsi https://phabricator.wikimedia.org/T307121 - The acknowledgement expires at: 2022-06-07 13:16:18. https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:19:24] <hashar>	 !log Restarting Gerrit
[13:19:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:27] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 806b8367e3c91a2b6b0dd76cdc66e041199ae834: Enable DiscussionTools automatic topic subscriptions as beta feature on remaining wikis (T295425) (duration: 03m 21s)
[13:19:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:31] <stashbot>	 T295425: [Config Change] Deploy Automatic Topic Subscriptions as Beta Feature at Remaining Wikis - https://phabricator.wikimedia.org/T295425
[13:19:52] <hashar>	 oops sorry I forgot about the deployment windows :-\
[13:20:05] <urbanecm>	 happens :)
[13:20:22] <hashar>	 looks like it is already back 
[13:20:55] <urbanecm>	 great!
[13:21:05] <urbanecm>	 MatmaRex: your second patch is at mwdebug1001
[13:21:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:21:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:56] <wikibugs>	 (03PS3) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:22:01] <MatmaRex>	 urbanecm: also looks good
[13:22:05] <urbanecm>	 thanks, syncing
[13:22:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:22:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:22:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:34] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Parsoid: Retire the old Parsoid deb repository? - https://phabricator.wikimedia.org/T309765 (10ssastry) I think so. Parsoid/JS is no longer supported and won't get security releases either. If anyone on the team has any concerns, they will leave their comments here.
[13:23:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:23:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:39] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 3c12e779707e3982f973641e2b9c2522a429830f: Launch DiscussionTools topic subscriptions a/b test (T304029) (duration: 03m 16s)
[13:25:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:44] <stashbot>	 T304029: Make config change to start Topic Subscriptions A/B Test - https://phabricator.wikimedia.org/T304029
[13:26:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:26:32] <urbanecm>	 MatmaRex: synced!
[13:26:37] <urbanecm>	 anything else, anyone?
[13:26:53] <MatmaRex>	 thanks urbanecm
[13:27:16] <wikibugs>	 (03PS4) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:27:25] <wikibugs>	 (03CR) 10David Caro: Create REST api service to manage toolforge replica.my.cnf (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[13:29:16] <urbanecm>	 i guess not
[13:29:28] <urbanecm>	 !log UTC afternoon B&C window done
[13:29:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:52] <urbanecm>	 jnuche: hashar: i'm done with deployments now :)
[13:29:57] <wikibugs>	 (03PS3) 10JMeybohm: Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137
[13:30:29] <jnuche>	 urbanecm: 👍
[13:30:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137 (owner: 10JMeybohm)
[13:31:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:32:40] <wikibugs>	 (03PS5) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:34:18] <logmsgbot>	 !log hashar@deploy1002 Started deploy [integration/docroot@b55f30e]: build: Updating eslint-config-wikimedia to 0.22.1
[13:34:19] <wikibugs>	 10SRE, 10serviceops: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Jdforrester-WMF)
[13:34:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:22] <wikibugs>	 (03CR) 10JMeybohm: "https://integration.wikimedia.org/ci/job/helm-lint/7458/console" [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137 (owner: 10JMeybohm)
[13:34:27] <logmsgbot>	 !log hashar@deploy1002 Finished deploy [integration/docroot@b55f30e]: build: Updating eslint-config-wikimedia to 0.22.1 (duration: 00m 08s)
[13:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:37:22] <wikibugs>	 (03PS4) 10JMeybohm: Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137
[13:37:25] <wikibugs>	 (03PS1) 10JMeybohm: Update outdated developer-portal fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/802526
[13:39:09] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10Jclark-ctr) a:03Cmjohnson
[13:39:13] <wikibugs>	 (03PS6) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:41:38] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-image-create: fix unzipping of .xz files [puppet] - 10https://gerrit.wikimedia.org/r/802527
[13:43:05] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 503 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[13:43:29] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 15): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35681/console" [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:44:08] <urandom>	 !log ALTER-ing system_auth replication strategy, AQS Cassandra cluster -- T307641
[13:44:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[13:44:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:13] <stashbot>	 T307641: AQS multi-datacenter cluster expansion - https://phabricator.wikimedia.org/T307641
[13:45:19] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[13:46:00] <wikibugs>	 (03PS9) 10Slyngshede: Rewrite logster::job to use systemd timers. [puppet] - 10https://gerrit.wikimedia.org/r/790325 (https://phabricator.wikimedia.org/T273673)
[13:46:57] <wikibugs>	 (03CR) 10Slyngshede: Rewrite logster::job to use systemd timers. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/790325 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[13:47:15] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Update outdated developer-portal fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/802526 (owner: 10JMeybohm)
[13:50:17] <wikibugs>	 (03CR) 10Hnowlan: service: image-suggestion state to monitoring_setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799358 (https://phabricator.wikimedia.org/T304891) (owner: 10Hnowlan)
[13:50:36] <wikibugs>	 (03Merged) 10jenkins-bot: Update outdated developer-portal fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/802526 (owner: 10JMeybohm)
[13:52:54] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] wmcs-image-create: fix unzipping of .xz files [puppet] - 10https://gerrit.wikimedia.org/r/802527 (owner: 10Andrew Bogott)
[13:53:32] <wikibugs>	 (03CR) 10Herron: [C: 03+1] mx: enable tainted data checking [puppet] - 10https://gerrit.wikimedia.org/r/801799 (https://phabricator.wikimedia.org/T286911) (owner: 10JHathaway)
[13:54:06] <wikibugs>	 (03PS7) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[13:56:19] <wikibugs>	 (03PS1) 10David Caro: ceph: filter out also dbgsym packages [puppet] - 10https://gerrit.wikimedia.org/r/802531
[13:57:20] <logmsgbot>	 !log joal@deploy1002 Started deploy [airflow-dags/analytics@2ad442e]: (no justification provided)
[13:57:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:28] <logmsgbot>	 !log joal@deploy1002 Finished deploy [airflow-dags/analytics@2ad442e]: (no justification provided) (duration: 00m 08s)
[13:57:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-image-create: fix unzipping of .xz files [puppet] - 10https://gerrit.wikimedia.org/r/802527 (owner: 10Andrew Bogott)
[14:00:08] <wikibugs>	 (03PS2) 10Jbond: wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505
[14:01:27] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) >>! In T303049#7898695, @JMeybohm wrote: > I finally managed to verify and document the steps needed to put a service under Ingress. I did also update...
[14:04:17] <wikibugs>	 (03CR) 10Cathal Mooney: Add BGP configuration for the new ML staging codfw cluster (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[14:04:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505 (owner: 10Jbond)
[14:07:22] <wikibugs>	 (03CR) 10David Caro: "Example of dbgsym package:" [puppet] - 10https://gerrit.wikimedia.org/r/802531 (owner: 10David Caro)
[14:11:02] <wikibugs>	 (03PS2) 10David Caro: ceph: filter out also dbgsym packages [puppet] - 10https://gerrit.wikimedia.org/r/802531 (https://phabricator.wikimedia.org/T309786)
[14:14:28] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
[14:14:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:19] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan)
[14:22:37] <icinga-wm>	 PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:23:33] <wikibugs>	 (03PS2) 10Jcrespo: mediabackups: Add test units for the Util helper unit [software/mediabackups] - 10https://gerrit.wikimedia.org/r/802501 (https://phabricator.wikimedia.org/T262668)
[14:24:22] <wikibugs>	 (03CR) 10AOkoth: [C: 03+2] vrts: rename module files and classes [puppet] - 10https://gerrit.wikimedia.org/r/776237 (https://phabricator.wikimedia.org/T293942) (owner: 10AOkoth)
[14:24:40] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackups: Add test units for the Util helper unit [software/mediabackups] - 10https://gerrit.wikimedia.org/r/802501 (https://phabricator.wikimedia.org/T262668) (owner: 10Jcrespo)
[14:26:26] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan)
[14:29:51] <wikibugs>	 (03PS3) 10Hnowlan: restbase-dev: change role of new hosts [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375)
[14:36:12] <wikibugs>	 (03PS2) 10David Caro: network.tests:Use correct object for site [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/802158
[14:38:23] <wikibugs>	 (03PS3) 10Jbond: wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505
[14:40:13] <wikibugs>	 (03PS8) 10Jbond: wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504
[14:40:33] <wikibugs>	 (03PS4) 10Jbond: wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505
[14:41:42] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] network.tests:Use correct object for site [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/802158 (owner: 10David Caro)
[14:42:01] <wikibugs>	 (03PS1) 10AOkoth: vrts: fix apache error when running puppet [puppet] - 10https://gerrit.wikimedia.org/r/802538 (https://phabricator.wikimedia.org/T309788)
[14:43:15] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137 (owner: 10JMeybohm)
[14:45:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] vrts: fix apache error when running puppet [puppet] - 10https://gerrit.wikimedia.org/r/802538 (https://phabricator.wikimedia.org/T309788) (owner: 10AOkoth)
[14:45:57] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10MoritzMuehlenhoff) As for "Thumbor currently runs in firejail, do we lose anything by dropping it in k8s", that's fine. firejail was our workaround for the original service abst...
[14:46:41] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137 (owner: 10JMeybohm)
[14:47:03] <wikibugs>	 (03CR) 10AOkoth: [C: 03+2] vrts: fix apache error when running puppet [puppet] - 10https://gerrit.wikimedia.org/r/802538 (https://phabricator.wikimedia.org/T309788) (owner: 10AOkoth)
[14:47:29] <wikibugs>	 (03Merged) 10jenkins-bot: network.tests:Use correct object for site [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/802158 (owner: 10David Caro)
[14:48:03] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM, you can deploy it in a backport window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802443 (https://phabricator.wikimedia.org/T309686) (owner: 10Physikerwelt)
[14:48:49] <wikibugs>	 (03PS5) 10Jbond: wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505
[14:49:15] <icinga-wm>	 RECOVERY - Memcached on idp2002 is OK: TCP OK - 0.033 second response time on 208.80.153.108 port 11000 https://wikitech.wikimedia.org/wiki/Memcached
[14:49:48] <wikibugs>	 (03Merged) 10jenkins-bot: Fix CI not failing on "helm template" errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/802137 (owner: 10JMeybohm)
[14:51:16] <wikibugs>	 (03CR) 10JMeybohm: "recheck (sorry for using you as guinea pig 😊)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/801663 (https://phabricator.wikimedia.org/T306963) (owner: 10KartikMistry)
[14:52:54] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 18): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35683/console" [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[14:53:01] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] wmflib: add new resource::capitalise function [puppet] - 10https://gerrit.wikimedia.org/r/802504 (owner: 10Jbond)
[14:53:18] <wikibugs>	 (03CR) 10Muehlenhoff: P::aptrepo::wikimedia install Apache for private repo. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802445 (owner: 10Slyngshede)
[14:53:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib: add resource reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802505 (owner: 10Jbond)
[14:56:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[14:56:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[14:56:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:37] <wikibugs>	 (03CR) 10KartikMistry: Update cxserver to  2022-05-31-123738-production (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/801663 (https://phabricator.wikimedia.org/T306963) (owner: 10KartikMistry)
[14:59:23] <wikibugs>	 (03PS1) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[14:59:51] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720
[14:59:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:55] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[15:02:12] <wikibugs>	 (03PS2) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:05:51] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover idp.w.o to idp1002 (new Bullseye node) [dns] - 10https://gerrit.wikimedia.org/r/802541 (https://phabricator.wikimedia.org/T308214)
[15:06:03] <wikibugs>	 (03PS7) 10Vgutierrez: [WIP] esitest service [puppet] - 10https://gerrit.wikimedia.org/r/793561 (https://phabricator.wikimedia.org/T308799) (owner: 10BBlack)
[15:06:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover active IDP nodes to idp1002/idp2002 [puppet] - 10https://gerrit.wikimedia.org/r/802542 (https://phabricator.wikimedia.org/T308214)
[15:06:14] <jelto>	 !log start migration to gitlab1004 - T307142
[15:06:14] <wikibugs>	 (03PS3) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:06:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:18] <stashbot>	 T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142
[15:07:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Update spec file to use new bullseye nodes [puppet] - 10https://gerrit.wikimedia.org/r/802543
[15:10:15] <wikibugs>	 (03PS4) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:11:31] <icinga-wm>	 PROBLEM - Gitlab SSH healthcheck git daemon on gitlab.wikimedia.org is CRITICAL: connect to address gitlab.wikimedia.org and port 22: Connection refused https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[15:11:37] <icinga-wm>	 PROBLEM - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: connect to address gitlab.wikimedia.org and port 443: Connection refused https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[15:11:44] <jelto>	 ^ expected due to T307142
[15:11:45] <stashbot>	 T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142
[15:12:11] <wikibugs>	 (03PS5) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:12:59] <mutante>	 !log gitlab migration to new hardware in progress
[15:13:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:55] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
[15:14:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:29] <wikibugs>	 (03PS6) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:15:57] <moritzm>	 !log installing openssl security updates on stretch
[15:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:30] <wikibugs>	 (03PS7) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:16:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Adding marostegui for their awareness regarding m5 starting to be used." [deployment-charts] - 10https://gerrit.wikimedia.org/r/801663 (https://phabricator.wikimedia.org/T306963) (owner: 10KartikMistry)
[15:16:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Update cxserver to  2022-05-31-123738-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/801663 (https://phabricator.wikimedia.org/T306963) (owner: 10KartikMistry)
[15:17:03] <icinga-wm>	 PROBLEM - Gitlab HTTPS SSL Expiry on gitlab.wikimedia.org is CRITICAL: connect to address gitlab.wikimedia.org and port 443: Connection refused https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[15:17:12] <wikibugs>	 (03PS8) 10Jbond: P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540
[15:18:18] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35690/console" [puppet] - 10https://gerrit.wikimedia.org/r/802540 (owner: 10Jbond)
[15:21:40] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:sretest: Test out new wmflib::resource::reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802540 (owner: 10Jbond)
[15:22:43] <icinga-wm>	 RECOVERY - SSH on wtp1038.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:23:31] <moritzm>	 !log installing cups security updates (client-side libs only)
[15:23:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:07] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:33:16] <wikibugs>	 (03PS1) 10Jbond: wmflib: Add debugging [puppet] - 10https://gerrit.wikimedia.org/r/802548
[15:34:02] <wikibugs>	 (03PS2) 10Jbond: wmflib: Add debugging [puppet] - 10https://gerrit.wikimedia.org/r/802548
[15:34:47] <wikibugs>	 (03CR) 10Jelto: [V: 03+1 C: 03+2] gitlab: make gitlab1004 new production instance [puppet] - 10https://gerrit.wikimedia.org/r/802150 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[15:34:54] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35692/console" [puppet] - 10https://gerrit.wikimedia.org/r/802548 (owner: 10Jbond)
[15:39:04] <wikibugs>	 (03PS3) 10Jbond: wmflib: Add debugging [puppet] - 10https://gerrit.wikimedia.org/r/802548
[15:40:26] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35693/console" [puppet] - 10https://gerrit.wikimedia.org/r/802548 (owner: 10Jbond)
[15:41:52] <wikibugs>	 (03PS4) 10Jbond: wmflib: Add debugging [puppet] - 10https://gerrit.wikimedia.org/r/802548
[15:42:03] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Scap, 10serviceops, 10Patch-For-Review: Deploy Scap version 4.8.2 - https://phabricator.wikimedia.org/T309116 (10dancy) 05Stalled→03Open
[15:42:30] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Scap, 10serviceops, 10Patch-For-Review: Deploy Scap version 4.8.2 - https://phabricator.wikimedia.org/T309116 (10dancy) Fixed at tag 4.8.2.  >>! In T309116#7975904, @JMeybohm wrote: > Probably missing dependencies: > ` > mwdebug1002:~$ scap pull > Traceback (most rece...
[15:45:05] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] wikimedia.org: make gitlab1004 the new gitlab production host [dns] - 10https://gerrit.wikimedia.org/r/802473 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[15:45:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] wikimedia.org: make gitlab1004 the new gitlab production host [dns] - 10https://gerrit.wikimedia.org/r/802473 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[15:46:40] <wikibugs>	 (03CR) 10Herron: "LGTM overall, please see a few comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[15:47:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib: Add debugging [puppet] - 10https://gerrit.wikimedia.org/r/802548 (owner: 10Jbond)
[15:49:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:49:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:49:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:49:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:49:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:50:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[15:50:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:33] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:50:41] <wikibugs>	 10SRE, 10Wikimedia-Site-requests, 10Performance-Team (Radar), 10Russian-Sites: Increase $wgMaxArticleSize to 4MB for ruwikisource - https://phabricator.wikimedia.org/T308893 (10MoritzMuehlenhoff) p:05Triage→03Medium
[15:50:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P29349 and previous config saved to /var/cache/conftool/dbconfig/20220602-155046-ladsgroup.json
[15:50:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:24] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[15:56:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298560)', diff saved to https://phabricator.wikimedia.org/P29350 and previous config saved to /var/cache/conftool/dbconfig/20220602-155640-ladsgroup.json
[15:56:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:45] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[15:57:35] <icinga-wm>	 RECOVERY - Gitlab HTTPS SSL Expiry on gitlab.wikimedia.org is OK: OK - Certificate gitlab.wikimedia.org will expire on Sun 14 Aug 2022 09:25:34 AM GMT +0000. https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[15:57:50] <mutante>	 ^ yay. 
[15:57:59] <icinga-wm>	 RECOVERY - Gitlab SSH healthcheck git daemon on gitlab.wikimedia.org is OK: SSH OK - OpenSSH_8.4p1 Debian-5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[15:58:01] <icinga-wm>	 RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 121722 bytes in 0.902 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[16:00:05] <jouncebot>	 jbond and rzl: How many deployers does it take to do Puppet request window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T1600).
[16:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:00:33] <mutante>	 cloudsw* related DNS changes are currently in unmerged state
[16:01:21] <wikibugs>	 (03PS1) 10Jbond: wmflib: Test hack to deduplicate resources [puppet] - 10https://gerrit.wikimedia.org/r/802552
[16:01:56] <mutante>	 gitlab just switched to dedicated hardware and is back up
[16:02:08] <wikibugs>	 (03PS2) 10Jbond: wmflib: Test hack to deduplicate resources [puppet] - 10https://gerrit.wikimedia.org/r/802552
[16:02:46] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35695/console" [puppet] - 10https://gerrit.wikimedia.org/r/802552 (owner: 10Jbond)
[16:04:01] <jinxer-wm>	 (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[16:04:18] <Amir1>	 hi
[16:04:24] <akosiaris>	 here
[16:04:57] <jhathaway>	 here as well
[16:05:22] <sukhe>	 here
[16:05:25] <topranks>	 yep
[16:05:33] <sukhe>	 has someone ACKed it?
[16:05:40] <jhathaway>	 I acked it
[16:05:41] <jbond>	 looks like a blip that has allready cleared
[16:05:46] <sukhe>	 thanks jbond 
[16:05:49] <sukhe>	 and jhathaway 
[16:05:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P29351 and previous config saved to /var/cache/conftool/dbconfig/20220602-160550-ladsgroup.json
[16:05:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:56] <jbond>	 tcp.timed_out
[16:06:33] <topranks>	 doubled but seems to have dropped again?
[16:07:04] <topranks>	 more than doubled
[16:07:32] <jbond>	 topranks: went from about ~15 to 125 but has normalised
[16:07:55] <topranks>	 yeah
[16:08:09] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] wmflib: Test hack to deduplicate resources [puppet] - 10https://gerrit.wikimedia.org/r/802552 (owner: 10Jbond)
[16:08:14] <mutante>	 we are still in gitlab migration but maintenance window ends now
[16:08:16] <vgutierrez>	 traffic dropped in esams https://grafana.wikimedia.org/d/000000479/frontend-traffic?orgId=1&viewPanel=2&var-site=All&var-cache_type=text&var-status_type=1&var-status_type=2&var-status_type=3&var-status_type=4
[16:09:01] <jinxer-wm>	 (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[16:10:41] <akosiaris>	 I 've noticed some increases in RTTs in my smokeping to bastion hosts graphs too. Now normalizing again too. 
[16:10:50] <topranks>	 a whole bunch of tls.cert_name_invalid's around the time it peaked
[16:10:53] <topranks>	 hmm of
[16:11:06] <vgutierrez>	 topranks: wut? /o\
[16:11:08] <jbond>	 there was a small peake from russia as well
[16:11:29] <jhathaway>	 https://intake-analytics.wikimedia.org/v1/events?hasty=true 3,332 in the last 30mins
[16:11:31] <akosiaris>	 interestingly, even 1.1.1.1  RTTs quadrupled
[16:11:32] <jbond>	 censorship test??? sukhe 
[16:11:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29352 and previous config saved to /var/cache/conftool/dbconfig/20220602-161145-ladsgroup.json
[16:11:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:30] <sukhe>	 jbond: yeah, possibly. at least the RU spike and esams traffic drop...
[16:12:52] <sukhe>	 hard to say though without more substantive data, and also the recovery
[16:13:19] <jbond>	 avck
[16:14:07] <wikibugs>	 (03PS2) 10Dzahn: backup: switch fileset for gitlab from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/800357 (https://phabricator.wikimedia.org/T274463)
[16:14:58] <wikibugs>	 (03PS1) 10Jbond: sretest: stop realising resources whil we fix up names [puppet] - 10https://gerrit.wikimedia.org/r/802557
[16:15:28] <logmsgbot>	 !log bking@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720
[16:15:32] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] sretest: stop realising resources whil we fix up names [puppet] - 10https://gerrit.wikimedia.org/r/802557 (owner: 10Jbond)
[16:15:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:33] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[16:17:10] <wikibugs>	 (03PS1) 10Jbond: sretest: test reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802558
[16:19:00] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] backup: switch fileset for gitlab from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/800357 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[16:19:14] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
[16:19:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:53] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:20:23] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sretest: test reduce function [puppet] - 10https://gerrit.wikimedia.org/r/802558 (owner: 10Jbond)
[16:20:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P29353 and previous config saved to /var/cache/conftool/dbconfig/20220602-162053-ladsgroup.json
[16:20:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:24:05] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:24:06] <wikibugs>	 (03Abandoned) 10Dzahn: gitlab::dump: backup files on gitlab1004 in Bacula [puppet] - 10https://gerrit.wikimedia.org/r/800358 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[16:26:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P29354 and previous config saved to /var/cache/conftool/dbconfig/20220602-162653-ladsgroup.json
[16:26:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[16:33:33] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720
[16:33:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:38] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[16:38:51] <icinga-wm>	 PROBLEM - SSH on wtp1039.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:39:36] <wikibugs>	 (03PS1) 10Zabe: raid: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802564 (https://phabricator.wikimedia.org/T308013)
[16:39:38] <wikibugs>	 (03PS1) 10Zabe: rabbitmq: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802565 (https://phabricator.wikimedia.org/T308013)
[16:39:40] <wikibugs>	 (03PS1) 10Zabe: query_service: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802566 (https://phabricator.wikimedia.org/T308013)
[16:39:44] <wikibugs>	 (03PS1) 10Zabe: puppet_stastd: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802567 (https://phabricator.wikimedia.org/T308013)
[16:39:46] <wikibugs>	 (03PS1) 10Zabe: presto: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802568 (https://phabricator.wikimedia.org/T308013)
[16:39:48] <wikibugs>	 (03PS1) 10Zabe: poolcounter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802569 (https://phabricator.wikimedia.org/T308013)
[16:39:50] <wikibugs>	 (03PS1) 10Zabe: pontoon: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802570 (https://phabricator.wikimedia.org/T308013)
[16:39:52] <wikibugs>	 (03PS1) 10Zabe: nftables: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802571 (https://phabricator.wikimedia.org/T308013)
[16:39:54] <wikibugs>	 (03PS1) 10Zabe: netops: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802572 (https://phabricator.wikimedia.org/T308013)
[16:41:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298560)', diff saved to https://phabricator.wikimedia.org/P29355 and previous config saved to /var/cache/conftool/dbconfig/20220602-164158-ladsgroup.json
[16:42:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:03] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[16:42:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
[16:42:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
[16:42:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
[16:42:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
[16:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thanks! https://puppet-compiler.wmflabs.org/pcc-worker1003/35696/" [puppet] - 10https://gerrit.wikimedia.org/r/791673 (owner: 10Dzahn)
[16:43:29] <mutante>	 deleting expired globalsign certs
[16:44:56] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "btw, there are also keys in [puppetmaster1001:/srv] $ find . | grep globalsign" [puppet] - 10https://gerrit.wikimedia.org/r/791673 (owner: 10Dzahn)
[16:47:14] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Recycling Pickup for EQIAD - https://phabricator.wikimedia.org/T307140 (10wiki_willy) Thanks @Cmjohnson, I've emailed the list to Sipi for a quote.  Once we receive that, I'll create a Coupa request, then we can schedule the pickup.  After the vendor picks up all the equipment a...
[16:47:35] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thanks! https://puppet-compiler.wmflabs.org/pcc-worker1003/35697/" [puppet] - 10https://gerrit.wikimedia.org/r/791678 (owner: 10Dzahn)
[16:47:41] <wikibugs>	 (03PS2) 10Dzahn: delete expired digicert certs [puppet] - 10https://gerrit.wikimedia.org/r/791678
[16:47:55] <mutante>	 !log deleting expired globalsign and digicert TLS certificates
[16:47:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:09] <jinxer-wm>	 (MXQueueHigh) firing: MX host mx1001:9100 has many queued messages: 4048 #page - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueHigh
[16:48:30] <Amir1>	 sigh
[16:48:44] <TheresNoTime>	 oof
[16:49:00] <herron>	 well, the new mx queue alert works
[16:49:12] <sukhe>	 hello
[16:49:13] <mutante>	 all the same 2 addresses it seems
[16:49:34] <mutante>	 no-reply@phabricator   mass action?
[16:49:42] <jhathaway>	 here as well
[16:49:55] <sukhe>	 I have ACKed it 
[16:50:02] <jhathaway>	 sukhe: thanks
[16:51:25] <sukhe>	 taking to private
[16:51:48] <wikibugs>	 (03PS10) 10David Caro: wmcs: Added taskircmail, ircmail and pagetaskircmail routings [puppet] - 10https://gerrit.wikimedia.org/r/802040
[16:51:50] <wikibugs>	 (03CR) 10David Caro: wmcs: Added taskircmail, ircmail and pagetaskircmail routings (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[16:55:13] <wikibugs>	 (03PS7) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501)
[16:56:11] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[16:58:09] <jinxer-wm>	 (MXQueueHigh) resolved: MX host mx1001:9100 has many queued messages: 4052 #page - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueHigh
[16:58:31] <wikibugs>	 (03PS8) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501)
[16:59:38] <mutante>	 !log mx1001 - deleted certain mails from the mail queue - reacting to mx alert
[16:59:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:21] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] GrowthExperiments: Remove GEHomepageSuggestedEditsTopicsRequiresOptIn (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791303 (https://phabricator.wikimedia.org/T308209) (owner: 10Kosta Harlan)
[17:01:26] <wikibugs>	 (03PS2) 10Krinkle: GrowthExperiments: Remove unused GEHomepageSuggestedEditsRequiresOptIn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791302 (https://phabricator.wikimedia.org/T308208) (owner: 10Kosta Harlan)
[17:01:30] <wikibugs>	 (03PS2) 10Krinkle: GrowthExperiments: Remove GEHomepageSuggestedEditsTopicsRequiresOptIn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791303 (https://phabricator.wikimedia.org/T308209) (owner: 10Kosta Harlan)
[17:01:32] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] GrowthExperiments: Remove GEHomepageSuggestedEditsTopicsRequiresOptIn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791303 (https://phabricator.wikimedia.org/T308209) (owner: 10Kosta Harlan)
[17:04:33] <wikibugs>	 (03CR) 10Herron: wmcs: Added taskircmail, ircmail and pagetaskircmail routings (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[17:09:27] <cwhite>	 !log restart logstash on apifeatureusage hosts
[17:09:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:24] <cwhite>	 !log rolling restart of codfw logstash cluster
[17:11:26] <wikibugs>	 (03PS1) 10Mabualruz: Remove 6 deprecated ResourceLoader skin modules in core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802578 (https://phabricator.wikimedia.org/T304322)
[17:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:54] <wikibugs>	 10SRE, 10ops-eqsin: Decommission cr1-eqsin - https://phabricator.wikimedia.org/T256947 (10RobH) 05Stalled→03Resolved a:03RobH we have decom equipment in the rack there, but we can remove this from open tasks.  It'll stay in netbox until it goes away, but this task can be closed imo.
[17:12:07] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:12:32] <wikibugs>	 (03CR) 10Dzahn: "worth switching it if ldap-corp goes away anyways? Does it still go away?" [puppet] - 10https://gerrit.wikimedia.org/r/791677 (owner: 10Dzahn)
[17:14:43] <icinga-wm>	 PROBLEM - SSH on wtp1048.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:14:53] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:15:07] <mutante>	 ^ the uncommited changes are "cloudsw"
[17:15:23] <mutante>	 not sure if it means netops or wmcs but one of those
[17:18:12] <wikibugs>	 (03PS1) 10Dzahn: sre: update renamed otrs role to vrts [puppet] - 10https://gerrit.wikimedia.org/r/802579 (https://phabricator.wikimedia.org/T293942)
[17:19:38] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
[17:19:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:24] <wikibugs>	 (03CR) 10Ahmon Dancy: docker_registry_ha: Authorize GitLab trusted runners using JWT (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[17:21:30] <wikibugs>	 (03PS1) 10Dzahn: vrts: adjust tests files to renamed role class [puppet] - 10https://gerrit.wikimedia.org/r/802580 (https://phabricator.wikimedia.org/T293942)
[17:23:26] <wikibugs>	 (03CR) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[17:25:36] <wikibugs>	 (03CR) 10Ahmon Dancy: docker_registry_ha: Authorize GitLab trusted runners using JWT (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[17:26:33] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[17:26:36] <wikibugs>	 (03CR) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[17:31:48] <wikibugs>	 10SRE, 10ops-eqiad: db1128 faulty memory - https://phabricator.wikimedia.org/T309291 (10Cmjohnson) Chatted with @Marostegui and we are planning downtime for tomorrow 3 June
[17:33:18] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad: Power drain and restart of ms-be1059 - https://phabricator.wikimedia.org/T307667 (10Cmjohnson) This is turning into a pain in the ass, HPE is using some 3rd party company that I've never heard of to do these installs, they never contacted me and then closed the ticket...
[17:39:15] <cwhite>	 !log rolling restart of eqiad logstash cluster
[17:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:46] <logmsgbot>	 !log bking@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720
[17:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:50] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[17:40:03] <icinga-wm>	 RECOVERY - SSH on wtp1039.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:48:15] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:00:05] <jouncebot>	 jeena and dancy: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T1800).
[18:04:07] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
[18:04:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:27] <wikibugs>	 (03PS1) 10Jeena Huneidi: all wikis to 1.39.0-wmf.14  refs T308067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802583
[18:04:29] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] all wikis to 1.39.0-wmf.14  refs T308067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802583 (owner: 10Jeena Huneidi)
[18:05:18] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.39.0-wmf.14  refs T308067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802583 (owner: 10Jeena Huneidi)
[18:08:50] <logmsgbot>	 !log jhuneidi@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.14  refs T308067
[18:08:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:54] <stashbot>	 T308067: 1.39.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T308067
[18:10:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:46] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Scap, 10serviceops, 10Release-Engineering-Team (Seen): Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 (10dancy) >>! In T237033#7975492, @Krinkle wrote: > @thcipriani @dancy I believe the equivalent of the `beta-sc...
[18:14:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[18:14:30] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[18:14:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T60674)', diff saved to https://phabricator.wikimedia.org/P29356 and previous config saved to /var/cache/conftool/dbconfig/20220602-181434-ladsgroup.json
[18:14:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:37] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[18:14:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:15:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:15:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:49] <icinga-wm>	 RECOVERY - SSH on wtp1048.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:15:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:16:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T60674)', diff saved to https://phabricator.wikimedia.org/P29357 and previous config saved to /var/cache/conftool/dbconfig/20220602-182145-ladsgroup.json
[18:21:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:50] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[18:22:15] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:24:23] <wikibugs>	 (03PS1) 10Jbond: wmflib: add import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802585
[18:28:45] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802585 (owner: 10Jbond)
[18:30:57] <wikibugs>	 (03PS2) 10Jbond: wmflib: add import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802585
[18:35:57] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib: add import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802585 (owner: 10Jbond)
[18:36:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29358 and previous config saved to /var/cache/conftool/dbconfig/20220602-183650-ladsgroup.json
[18:36:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:04] <wikibugs>	 (03PS1) 10Jbond: P:sretest: test new wmflib::import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802587
[18:43:12] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.force-shard-allocation
[18:43:12] <logmsgbot>	 !log bking@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
[18:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:29] <icinga-wm>	 PROBLEM - Disk space on mx1001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mx1001&var-datasource=eqiad+prometheus/ops
[18:43:39] <sukhe>	 oh oh
[18:45:33] <wikibugs>	 (03PS2) 10Jbond: P:sretest: test new wmflib::import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802587
[18:47:40] <wikibugs>	 (03PS3) 10Jbond: P:sretest: test new wmflib::import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802587
[18:48:30] <sukhe>	 jhathaway: any ongoing work on mx1001? 
[18:48:31] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35700/console" [puppet] - 10https://gerrit.wikimedia.org/r/802587 (owner: 10Jbond)
[18:48:39] <sukhe>	 asking because puppet is disabled
[18:48:48] <wikibugs>	 (03PS1) 10Bking: elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588
[18:49:00] <jhathaway>	 sukhe: yes, is something wrong?
[18:49:06] <jhathaway>	 testing an exim patch
[18:49:10] <sukhe>	 oh I see
[18:49:12] <sukhe>	 yeah, got an alert
[18:49:13] <sukhe>	 14:43:30 <+icinga-wm> PROBLEM - Disk space on mx1001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mx1001&var-datasource=eqiad+prometheus/ops
[18:49:25] <jhathaway>	 oooh, thanks, let me take a look
[18:49:29] <sukhe>	 thanks <3
[18:51:06] <wikibugs>	 (03PS2) 10Ryan Kemper: elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588 (owner: 10Bking)
[18:51:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29359 and previous config saved to /var/cache/conftool/dbconfig/20220602-185155-ladsgroup.json
[18:51:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:16] <wikibugs>	 (03PS3) 10Bking: elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588
[18:54:21] <wikibugs>	 (03CR) 10Jdlrobson: [C: 04-1] "I think they are used by the performance team. If they are still relevant they will need to regenerate these." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802578 (https://phabricator.wikimedia.org/T304322) (owner: 10Mabualruz)
[18:54:23] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:sretest: test new wmflib::import/export functions [puppet] - 10https://gerrit.wikimedia.org/r/802587 (owner: 10Jbond)
[18:56:59] <wikibugs>	 (03PS1) 10Jbond: wmflib::resource:export: export the resource not the title [puppet] - 10https://gerrit.wikimedia.org/r/802591
[18:57:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588 (owner: 10Bking)
[18:59:16] <wikibugs>	 (03PS2) 10D3r1ck01: Use a service locator to get a job runner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/793837
[19:01:52] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib::resource:export: export the resource not the title [puppet] - 10https://gerrit.wikimedia.org/r/802591 (owner: 10Jbond)
[19:02:46] <wikibugs>	 (03PS4) 10Bking: elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588
[19:05:57] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elastic: add write_queue_datacenters option [cookbooks] - 10https://gerrit.wikimedia.org/r/802588 (owner: 10Bking)
[19:07:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T60674)', diff saved to https://phabricator.wikimedia.org/P29360 and previous config saved to /var/cache/conftool/dbconfig/20220602-190701-ladsgroup.json
[19:07:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:04] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[19:07:08] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.force-shard-allocation
[19:07:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:12] <logmsgbot>	 !log bking@cumin1001 END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
[19:07:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:26] <ryankemper>	 !log T305646 T308647 Unbanned `elastic2033` and `elastic2054` from clusters; also pooled `elastic2033`
[19:08:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:31] <stashbot>	 T305646: elastic2033 without bootable devices available (repeat of T281621) - https://phabricator.wikimedia.org/T305646
[19:08:32] <stashbot>	 T308647: elastic2054 is having H/W issues - https://phabricator.wikimedia.org/T308647
[19:10:02] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720
[19:10:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:07] <stashbot>	 T309720: Deploy S3 plugin on all Search team-managed Elastic hosts - https://phabricator.wikimedia.org/T309720
[19:20:11] <wikibugs>	 (03PS1) 10Andrea Denisse: Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593
[19:22:57] <icinga-wm>	 PROBLEM - Check systemd state on elastic2050 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:30:10] <wikibugs>	 (03CR) 10Herron: Add role::netmon for netmon1003 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/802593 (owner: 10Andrea Denisse)
[19:32:50] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T309741 (10phaultfinder)
[19:32:55] <wikibugs>	 (03PS2) 10Andrea Denisse: Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593
[19:36:16] <wikibugs>	 (03PS1) 10Huji: Add tfj as a shortcut for toolforge-jobs command [puppet] - 10https://gerrit.wikimedia.org/r/802596 (https://phabricator.wikimedia.org/T309308)
[19:36:28] <wikibugs>	 (03PS3) 10Andrea Denisse: Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074)
[19:36:31] <icinga-wm>	 RECOVERY - Check systemd state on elastic2050 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:36:47] <wikibugs>	 (03CR) 10AOkoth: [C: 03+1] "Thanks. I missed these." [puppet] - 10https://gerrit.wikimedia.org/r/802580 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[19:37:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add tfj as a shortcut for toolforge-jobs command [puppet] - 10https://gerrit.wikimedia.org/r/802596 (https://phabricator.wikimedia.org/T309308) (owner: 10Huji)
[19:37:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[19:38:50] <wikibugs>	 (03PS2) 10Huji: Add tfj as a shortcut for toolforge-jobs command [puppet] - 10https://gerrit.wikimedia.org/r/802596 (https://phabricator.wikimedia.org/T309308)
[19:44:09] <wikibugs>	 (03PS4) 10Andrea Denisse: Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074)
[19:45:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[19:45:04] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
[19:45:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add role::netmon for netmon1003 [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[19:46:48] <wikibugs>	 (03PS5) 10Andrea Denisse: Add role::netmon to the netmon1003 instance. [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074)
[19:49:25] <wikibugs>	 (03PS1) 10Milimetric: Split up the tables we sqoop [puppet] - 10https://gerrit.wikimedia.org/r/802598 (https://phabricator.wikimedia.org/T309806)
[19:50:39] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:50:53] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: experiment with vg_name setting [puppet] - 10https://gerrit.wikimedia.org/r/802599
[19:50:55] <wikibugs>	 (03PS1) 10Andrew Bogott: put clouddumps100[12] into service [puppet] - 10https://gerrit.wikimedia.org/r/802600 (https://phabricator.wikimedia.org/T309346)
[19:52:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: experiment with vg_name setting [puppet] - 10https://gerrit.wikimedia.org/r/802599 (owner: 10Andrew Bogott)
[19:52:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Split up the tables we sqoop [puppet] - 10https://gerrit.wikimedia.org/r/802598 (https://phabricator.wikimedia.org/T309806) (owner: 10Milimetric)
[19:53:04] <ryankemper>	 !log T294805 Marked `elastic10[68-83]` as Active in netbox (all except `elastic10[77,80]` were erroneously marked as `Staged`)
[19:53:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:10] <stashbot>	 T294805: Service implementation for elastic10[68-83].eqiad.wmnet - https://phabricator.wikimedia.org/T294805
[19:53:45] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[19:53:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[19:54:07] <wikibugs>	 (03CR) 10Andrea Denisse: Add role::netmon to the netmon1003 instance. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[19:56:31] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Make new topic tool available as opt-out almost everywhere (phase 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801820 (https://phabricator.wikimedia.org/T309368)
[19:56:31] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:00:05] <jouncebot>	 brennen: That opportune time is upon us again. Time for a UTC late backport and config training deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220602T2000).
[20:03:05] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10Cmjohnson)
[20:04:35] <wikibugs>	 (03PS1) 10Eevans: WIP: Configure AQS Cassandra hosts [puppet] - 10https://gerrit.wikimedia.org/r/802604 (https://phabricator.wikimedia.org/T307801)
[20:04:45] <wikibugs>	 10SRE, 10Traffic: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10BCornwall) a:03BCornwall
[20:04:47] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[20:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:24] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-image-create: Use openstack cli for creating new glance image [puppet] - 10https://gerrit.wikimedia.org/r/802605
[20:05:57] <wikibugs>	 (03CR) 10Eevans: [C: 04-1] "This is not ready to be merged." [puppet] - 10https://gerrit.wikimedia.org/r/802604 (https://phabricator.wikimedia.org/T307801) (owner: 10Eevans)
[20:07:14] <brennen>	 !log no patches and no new trainees; closing utc late backport & config window
[20:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:05] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:08:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:45] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host backup1009.mgmt.eqiad.wmnet with reboot policy FORCED
[20:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:12:28] <wikibugs>	 (03PS2) 10Andrew Bogott: put clouddumps100[12] into service [puppet] - 10https://gerrit.wikimedia.org/r/802600 (https://phabricator.wikimedia.org/T309346)
[20:12:30] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: one more attempt with vg_name [puppet] - 10https://gerrit.wikimedia.org/r/802627
[20:13:23] <wikibugs>	 (03CR) 10Herron: "Question for netops -- Do we risk any side effects deploying this in parallel to netmon1002?  Is there anything that should be silenced/di" [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[20:14:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: one more attempt with vg_name [puppet] - 10https://gerrit.wikimedia.org/r/802627 (owner: 10Andrew Bogott)
[20:14:23] <logmsgbot>	 !log andrew@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host clouddumps1001.wikimedia.org with OS bullseye
[20:14:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:32] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[20:14:47] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[20:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:55] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
[20:14:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[20:15:03] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[20:15:53] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:16:23] <ryankemper>	 !log T306449 Marked `elastic1097` as `Staged` in Netbox (was previously failed, but fixed in https://phabricator.wikimedia.org/T306449#7888260)
[20:16:25] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[20:16:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:27] <stashbot>	 T306449: hw troubleshooting: memory error for elastic1097 - https://phabricator.wikimedia.org/T306449
[20:16:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[20:26:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[20:26:20] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1009.mgmt.eqiad.wmnet with reboot policy FORCED
[20:26:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:49] <wikibugs>	 (03PS1) 10Jbond: wmflib: drop rrsource::reduce and add specs for resource::import [puppet] - 10https://gerrit.wikimedia.org/r/802629
[20:28:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q3:(Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson stat1009 B1 U17 cableid  1181  port 5
[20:29:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q3:(Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10Jclark-ctr)
[20:31:09] <wikibugs>	 (03PS1) 10Samtar: gitignore: add vscode [puppet] - 10https://gerrit.wikimedia.org/r/802630
[20:35:26] <wikibugs>	 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudnet1004 - https://phabricator.wikimedia.org/T309576 (10nskaggs) Yes, I agree. Let's focus on bringing the new machines online.
[20:36:32] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] "I don't think this should be added here - different people use different editors, so instead of every single project having the editors of" [puppet] - 10https://gerrit.wikimedia.org/r/802630 (owner: 10Samtar)
[20:37:19] <TheresNoTime>	 I didn't know that was a thing!
[20:37:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Jclark-ctr)
[20:37:57] <wikibugs>	 (03CR) 10Samtar: gitignore: add vscode (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802630 (owner: 10Samtar)
[20:38:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: re-label cloudstore101[01] to clouddumps100[12] - https://phabricator.wikimedia.org/T309338 (10Jclark-ctr) 05Open→03Resolved Relabeled Servers
[20:38:20] <wikibugs>	 (03Abandoned) 10Samtar: gitignore: add vscode [puppet] - 10https://gerrit.wikimedia.org/r/802630 (owner: 10Samtar)
[20:42:33] <wikibugs>	 (03PS2) 10Jbond: wmflib: drop rrsource::reduce and add specs for resource::import [puppet] - 10https://gerrit.wikimedia.org/r/802629
[20:43:23] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts - https://phabricator.wikimedia.org/T304888 (10nskaggs) @cmooney Let's arrange to move some machines so we can have more optimal routing. @dcaro, do you think it would be easier to move a ceph...
[20:43:27] <icinga-wm>	 PROBLEM - SSH on wtp1039.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:44:13] <wikibugs>	 10SRE, 10Cloud-Services, 10Datasets-General-or-Unknown, 10affects-Kiwix-and-openZIM, 10cloud-services-team (Kanban): Mirror more Kiwix downloads directories - https://phabricator.wikimedia.org/T57503 (10nskaggs)
[20:44:25] <wikibugs>	 (03PS1) 10Eevans: Dummy keys and certificates for cassandra (aqs) [labs/private] - 10https://gerrit.wikimedia.org/r/802631 (https://phabricator.wikimedia.org/T307801)
[20:45:34] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
[20:45:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:43] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[20:47:00] <wikibugs>	 (03PS3) 10Jbond: wmflib: drop rsource::reduce and add specs for resource::import [puppet] - 10https://gerrit.wikimedia.org/r/802629
[20:47:45] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35708/console" [puppet] - 10https://gerrit.wikimedia.org/r/802629 (owner: 10Jbond)
[20:50:49] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] wmflib: drop rsource::reduce and add specs for resource::import [puppet] - 10https://gerrit.wikimedia.org/r/802629 (owner: 10Jbond)
[20:51:00] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] wmflib: drop rsource::reduce and add specs for resource::import [puppet] - 10https://gerrit.wikimedia.org/r/802629 (owner: 10Jbond)
[20:54:26] <wikibugs>	 (03PS1) 10Jbond: P:sretest: Add merge parameter [puppet] - 10https://gerrit.wikimedia.org/r/802633
[20:55:26] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:sretest: Add merge parameter [puppet] - 10https://gerrit.wikimedia.org/r/802633 (owner: 10Jbond)
[20:55:30] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] P:sretest: Add merge parameter [puppet] - 10https://gerrit.wikimedia.org/r/802633 (owner: 10Jbond)
[20:56:40] <wikibugs>	 10SRE, 10Traffic, 10SRE Observability (FY2021/2022-Q4), 10User-fgiunchedi: Migrate Traffic Prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T300723 (10BCornwall) a:03BCornwall
[20:57:47] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:58:57] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] mx: enable tainted data checking [puppet] - 10https://gerrit.wikimedia.org/r/801799 (https://phabricator.wikimedia.org/T286911) (owner: 10JHathaway)
[20:59:19] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:59:51] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[20:59:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10Cmjohnson)
[21:00:04] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[21:00:16] <wikibugs>	 (03CR) 10Nskaggs: [C: 03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/801336 (https://phabricator.wikimedia.org/T309342) (owner: 10David Caro)
[21:02:43] <wikibugs>	 (03PS1) 10Cmjohnson: Adding backup1009 to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/802636 (https://phabricator.wikimedia.org/T307048)
[21:03:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Adding backup1009 to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/802636 (https://phabricator.wikimedia.org/T307048) (owner: 10Cmjohnson)
[21:06:59] <wikibugs>	 (03Abandoned) 10Cmjohnson: Adding backup1009 to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/802636 (https://phabricator.wikimedia.org/T307048) (owner: 10Cmjohnson)
[21:09:34] <wikibugs>	 (03PS1) 10Zabe: Stop writing to cuc_actor on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802637 (https://phabricator.wikimedia.org/T233004)
[21:10:15] <wikibugs>	 (03PS1) 10Cmjohnson: Adding backup1009 to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/802638 (https://phabricator.wikimedia.org/T307048)
[21:11:27] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] Adding backup1009 to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/802638 (https://phabricator.wikimedia.org/T307048) (owner: 10Cmjohnson)
[21:11:40] <zabe>	 Does anyone have time to quickly deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/802637/ before the week ends?
[21:11:47] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[21:11:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:27] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10Cmjohnson)
[21:15:05] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[21:15:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:17:17] <wikibugs>	 (03CR) 10Nskaggs: "Thank you for including a link / runbook as well!" [alerts] - 10https://gerrit.wikimedia.org/r/802442 (https://phabricator.wikimedia.org/T302377) (owner: 10Majavah)
[21:19:15] <James_F>	 zabe: Fine, let's do it.
[21:19:24] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Stop writing to cuc_actor on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802637 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[21:20:00] <icinga-wm>	 PROBLEM - Disk space on centrallog2002 is CRITICAL: DISK CRITICAL - free space: /srv 55426 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog2002&var-datasource=codfw+prometheus/ops
[21:21:04] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cuc_actor on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802637 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[21:24:40] <James_F>	 zabe: Done.
[21:25:08] <zabe>	 James_F, thanks :)
[21:25:31] <James_F>	 Though scap seems to have got stuck on the PHP restart step?
[21:25:40] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
[21:25:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:25:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host backup1009.eqiad.wmn...
[21:25:57] <logmsgbot>	 !log jforrester@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Emergency deploy: [[gerrit:802637|Stop writing to cuc_actor on all wikis (T233004 T309737)]] (duration: 03m 15s)
[21:26:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:26:01] <stashbot>	 T309737: CannotCreateActorException: Cannot create an actor for a usable name that is not an existing user: user_name="Qwqqwqq" - https://phabricator.wikimedia.org/T309737
[21:26:01] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[21:26:03] <James_F>	 Finally.
[21:26:23] <James_F>	 "Finished php-fpm-restarts (duration: 02m 36s)" eesh.
[21:27:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:56] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
[21:27:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:28:07] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[21:28:23] <RhinosF1>	 Bank holidays are confusing. My head wants to think deploys are happening on a Saturday.
[21:28:42] <James_F>	 RhinosF1: I mean, they are; it's the first day of a weekend somewhere.
[21:28:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:28:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:28:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:28:55] <James_F>	 But also the clue's in the term "emergency deploy:" ;-)
[21:28:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:30] <wikibugs>	 10SRE, 10Cloud-Services, 10Datasets-General-or-Unknown, 10affects-Kiwix-and-openZIM, 10cloud-services-team (Kanban): Mirror more Kiwix downloads directories - https://phabricator.wikimedia.org/T57503 (10nskaggs) Looping in @Andrew. @Kelson note that yes, we are installing new, more capable machines that...
[21:29:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:29:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:59] <wikibugs>	 (03CR) 10Dduvall: "This change is ready for review." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/791655 (https://phabricator.wikimedia.org/T308271) (owner: 10Dduvall)
[21:31:32] <icinga-wm>	 RECOVERY - Disk space on mx1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=mx1001&var-datasource=eqiad+prometheus/ops
[21:32:31] <wikibugs>	 (03PS4) 10Dduvall: Provide buildkitd to GitLab runners [puppet] - 10https://gerrit.wikimedia.org/r/791655 (https://phabricator.wikimedia.org/T308271)
[21:33:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Provide buildkitd to GitLab runners [puppet] - 10https://gerrit.wikimedia.org/r/791655 (https://phabricator.wikimedia.org/T308271) (owner: 10Dduvall)
[21:34:59] <wikibugs>	 (03PS5) 10Dduvall: Provide buildkitd to GitLab runners [puppet] - 10https://gerrit.wikimedia.org/r/791655 (https://phabricator.wikimedia.org/T308271)
[21:39:57] <wikibugs>	 (03PS1) 10RLazarus: slo: Correct queries for error budget remaining [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842)
[21:40:30] <wikibugs>	 10SRE, 10Wikimedia-Site-requests, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Vladis13) >>! In T275319#6884320, @cscott wrote: > database storage size, database column limits, etc, all scale with bytes not characters.  We s...
[21:44:00] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:44:30] <icinga-wm>	 RECOVERY - SSH on wtp1039.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:45:11] <wikibugs>	 (03PS2) 10RLazarus: slo: Correct queries for error budget remaining [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842)
[21:52:45] <wikibugs>	 (03PS3) 10Andrew Bogott: put clouddumps100[12] into service [puppet] - 10https://gerrit.wikimedia.org/r/802600 (https://phabricator.wikimedia.org/T309346)
[21:52:47] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: further attempts [puppet] - 10https://gerrit.wikimedia.org/r/802649
[21:54:06] <dancy>	 jouncebot nowandnext
[21:54:07] <jouncebot>	 No deployments scheduled for the next 9 hour(s) and 5 minute(s)
[21:54:07] <jouncebot>	 In 9 hour(s) and 5 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220603T0700)
[21:54:10] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: further attempts [puppet] - 10https://gerrit.wikimedia.org/r/802649 (owner: 10Andrew Bogott)
[21:54:52] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[21:54:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:02] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[21:55:35] <wikibugs>	 (03PS3) 10RLazarus: slo: Correct queries for error budget remaining [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842)
[22:03:38] <wikibugs>	 (03PS4) 10Andrew Bogott: put clouddumps100[12] into service [puppet] - 10https://gerrit.wikimedia.org/r/802600 (https://phabricator.wikimedia.org/T309346)
[22:03:40] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: etc [puppet] - 10https://gerrit.wikimedia.org/r/802652
[22:05:12] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: etc [puppet] - 10https://gerrit.wikimedia.org/r/802652 (owner: 10Andrew Bogott)
[22:08:10] <logmsgbot>	 !log andrew@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host clouddumps1001.wikimedia.org with OS bullseye
[22:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:20] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[22:08:21] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[22:08:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:28] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
[22:08:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:34] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[22:08:40] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[22:08:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[22:08:51] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[22:22:23] <wikibugs>	 (03Abandoned) 10Thcipriani: Beta: Clean puppet cherry-picks [puppet] - 10https://gerrit.wikimedia.org/r/310719 (https://phabricator.wikimedia.org/T135427) (owner: 10Thcipriani)
[22:23:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[22:23:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[22:23:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:23:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:23:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1118 (T298560)', diff saved to https://phabricator.wikimedia.org/P29363 and previous config saved to /var/cache/conftool/dbconfig/20220602-222306-ladsgroup.json
[22:23:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:23:10] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[22:28:35] <icinga-wm>	 PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:28:45] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:29:20] <wikibugs>	 10SRE, 10ops-ulsfo, 10Traffic, 10decommission-hardware: decommission cp4031 - https://phabricator.wikimedia.org/T301269 (10RobH)
[22:29:48] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: further tiny tweaks [puppet] - 10https://gerrit.wikimedia.org/r/802656
[22:30:10] <wikibugs>	 10SRE, 10ops-ulsfo, 10Traffic: SMART errors on cp4031 - https://phabricator.wikimedia.org/T300493 (10RobH)
[22:30:25] <wikibugs>	 10SRE, 10ops-ulsfo, 10Traffic, 10decommission-hardware: decommission cp4031 - https://phabricator.wikimedia.org/T301269 (10RobH) 05Open→03Resolved As this host is in a caching site, we have no out of rack storage.  It will simply sit powered down in the rack until ulsfo is refreshed and it is replaced...
[22:31:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: further tiny tweaks [puppet] - 10https://gerrit.wikimedia.org/r/802656 (owner: 10Andrew Bogott)
[22:31:24] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
[22:31:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:31:34] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[22:31:40] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[22:31:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:31:50] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[22:33:08] <logmsgbot>	 !log andrew@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
[22:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:20] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[22:43:03] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[22:43:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[22:50:58] <logmsgbot>	 !log andrew@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host clouddumps1001.wikimedia.org with OS bullseye
[22:51:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:09] <wikibugs>	 (03PS1) 10Andrew Bogott: hwraid-2dev.cfg: one last stab before I give up for the day [puppet] - 10https://gerrit.wikimedia.org/r/802661
[22:51:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[22:52:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hwraid-2dev.cfg: one last stab before I give up for the day [puppet] - 10https://gerrit.wikimedia.org/r/802661 (owner: 10Andrew Bogott)
[22:53:14] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[22:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:24] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[23:07:25] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[23:14:23] <wikibugs>	 (03PS12) 10Tim Starling: Add "db-mainstash" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[23:14:26] <wikibugs>	 (03PS1) 10Brion VIBBER: Disable older WebM VP8 transcodes except 360p [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802665 (https://phabricator.wikimedia.org/T309823)
[23:14:38] <wikibugs>	 (03PS4) 10Tim Starling: Switch wgMainStash to db-mainstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799433 (https://phabricator.wikimedia.org/T212129)
[23:22:12] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] Add "db-mainstash" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[23:22:22] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddumps1001.wikimedia.org with OS bullseye
[23:22:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:37] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[23:22:56] <wikibugs>	 (03Merged) 10jenkins-bot: Add "db-mainstash" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[23:25:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[23:25:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:26:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[23:26:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[23:26:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:26:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:27:16] <logmsgbot>	 !log tstarling@deploy1002 Synchronized wmf-config/CommonSettings.php: Add db-mainstash g 752807 (duration: 03m 24s)
[23:27:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:27:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[23:27:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:28:48] <wikibugs>	 (03PS1) 10Andrew Bogott: clouddumps: try a different partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/802666
[23:29:49] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:29:52] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] clouddumps: try a different partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/802666 (owner: 10Andrew Bogott)
[23:30:24] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[23:30:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:30:34] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...
[23:37:52] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T309741 (10phaultfinder)
[23:42:24] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[23:42:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:45:32] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[23:45:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:52:29] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10Jclark-ctr)
[23:52:50] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:56:15] <wikibugs>	 (03PS1) 10Andrew Bogott: clouddumps100x: yet another partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/802667
[23:56:45] <logmsgbot>	 !log andrew@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host clouddumps1001.wikimedia.org with OS bullseye
[23:56:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:56:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host clouddumps1001.wikimedia.org with...
[23:58:14] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[23:58:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:24] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host clouddumps1001.wikimedia.org w...