[00:00:45] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:01:41] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install db2175.codfw.wmnet - db2182.codfw.wmnet - https://phabricator.wikimedia.org/T306849 (10RobH)
[00:01:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install db1196.eqiad.wmnet - db1203.eqiad.wmnet - https://phabricator.wikimedia.org/T306848 (10RobH)
[00:07:12] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2412 is CRITICAL: Host mw2412 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[00:59:22] <icinga-wm>	 PROBLEM - SSH on analytics1061.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:00:04] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T0100)
[01:25:44] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:40:45] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[01:50:45] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:00:32] <icinga-wm>	 RECOVERY - SSH on analytics1061.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:06:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[02:06:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[02:06:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:06:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[02:06:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:06:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[02:06:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:06:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:28] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.39.0-wmf.9 [core] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785966
[02:07:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.39.0-wmf.9 [core] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785966 (owner: 10TrainBranchBot)
[02:09:16] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:19:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Branch commit for wmf/1.39.0-wmf.9 [core] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785966 (owner: 10TrainBranchBot)
[02:30:24] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:36:21] <wikibugs>	 10SRE, 10WMF-JobQueue, 10serviceops, 10Sustainability (Incident Followup): Videoscalers fail health checks while CPU is maxed - https://phabricator.wikimedia.org/T306860 (10RLazarus) p:05Triage→03High
[02:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[02:44:10] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:52:40] <jinxer-wm>	 (NodeTextfileStale) resolved: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[03:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[03:14:33] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Neutron networking not working for cloudnet200[5,6]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306861 (10Andrew)
[03:18:09] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Neutron networking not working for cloudnet200[5,6]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306861 (10Andrew) @ayounsi I have barely investigated this but I'm guessing that there's some kind of switch binding that needs to be done f...
[03:20:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/installcloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew)
[03:21:17] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/installcloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew) You're right, these will need public IPs (but with luck we'll free up the old ones shortly after these go online)
[03:24:02] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:24:40] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1013 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[03:26:06] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.077 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:26:54] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.020 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[03:34:34] <icinga-wm>	 PROBLEM - puppet last run on ml-staging-ctrl2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[03:50:39] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[04:15:14] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Neutron networking not working for cloudnet200[5,6]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306861 (10Papaul) @Andrew the racking task for the cloudnet nodes said to setup "cloud-gw-transport and cloud-instance-transport" on the sec...
[04:15:57] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] "retrying" [core] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785966 (owner: 10TrainBranchBot)
[04:27:40] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:32:29] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.39.0-wmf.9 [core] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785966 (owner: 10TrainBranchBot)
[04:38:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[04:38:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[04:38:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[04:38:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[04:38:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:42:59] <wikibugs>	 (03CR) 10Ntubotu: "> Change has been successfully merged into the git repository." [core] (wmf/1.23wmf20) - 10https://gerrit.wikimedia.org/r/123454 (owner: 10MaxSem)
[04:53:17] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Promote db1162 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/785602 (https://phabricator.wikimedia.org/T306417)
[04:53:22] <wikibugs>	 (03PS2) 10Marostegui: wmnet: Update s2-master alias [dns] - 10https://gerrit.wikimedia.org/r/785603 (https://phabricator.wikimedia.org/T306417)
[04:53:35] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s2 T306417
[04:53:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:53:41] <stashbot>	 T306417: Switchover s2 master (db1122 -> db1162) - https://phabricator.wikimedia.org/T306417
[04:53:51] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s2 T306417
[04:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:54:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Set db1162 with weight 0 T306417', diff saved to https://phabricator.wikimedia.org/P26498 and previous config saved to /var/cache/conftool/dbconfig/20220426-045406-root.json
[04:54:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:54:22] <Amir1>	 marostegui: do you want me to do parts of it?
[04:54:26] <marostegui>	  Amir1 nah, it is fine
[04:54:29] <marostegui>	 thanks though
[04:54:48] <Amir1>	 ^^
[04:56:14] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:12:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1162 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/785602 (https://phabricator.wikimedia.org/T306417) (owner: 10Marostegui)
[05:14:47] <icinga-wm>	 PROBLEM - Host cr2-eqord is DOWN: PING CRITICAL - Packet loss = 100%
[05:14:56] <wikibugs>	 (03PS1) 10Marostegui: db1122: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/786162 (https://phabricator.wikimedia.org/T306417)
[05:15:22] <rzl>	 ^ looking
[05:15:32] <icinga-wm>	 PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:16:06] <icinga-wm>	 RECOVERY - Host wikitech-static.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 21.91 ms
[05:17:40] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:17:48] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 216 probes of 670 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:18:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1122: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/786162 (https://phabricator.wikimedia.org/T306417) (owner: 10Marostegui)
[05:18:18] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 250 probes of 763 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:18:20] <icinga-wm>	 PROBLEM - Host cr2-eqord IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[05:18:36] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:19:54] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 103 probes of 754 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:19:56] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 498 probes of 679 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:24:02] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 79 probes of 670 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:24:32] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 9 probes of 763 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:26:08] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 8 probes of 754 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:26:10] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 62 probes of 679 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:28:46] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:41:27] <wikibugs>	 (03PS1) 10Marostegui: switchover-tmpl.py: Adjust timeout [software] - 10https://gerrit.wikimedia.org/r/786173
[05:41:46] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] switchover-tmpl.py: Adjust timeout [software] - 10https://gerrit.wikimedia.org/r/786173 (owner: 10Marostegui)
[05:42:14] <wikibugs>	 (03Merged) 10jenkins-bot: switchover-tmpl.py: Adjust timeout [software] - 10https://gerrit.wikimedia.org/r/786173 (owner: 10Marostegui)
[05:48:19] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10RhinosF1)
[05:51:00] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:00:04] <jouncebot>	 kormat, marostegui, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T0600).
[06:00:17] <marostegui>	 !log Starting s2 eqiad failover from db1122 to db1162 - T306417
[06:00:20] <Amir1>	 o/
[06:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:23] <stashbot>	 T306417: Switchover s2 master (db1122 -> db1162) - https://phabricator.wikimedia.org/T306417
[06:00:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T306417', diff saved to https://phabricator.wikimedia.org/P26500 and previous config saved to /var/cache/conftool/dbconfig/20220426-060033-marostegui.json
[06:00:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db1162 to s2 primary and set section read-write T306417', diff saved to https://phabricator.wikimedia.org/P26501 and previous config saved to /var/cache/conftool/dbconfig/20220426-060058-marostegui.json
[06:01:00] <marostegui>	 all done
[06:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:02] <marostegui>	 testing now
[06:01:18] <Amir1>	 rhttps://it.wikipedia.org/w/index.php?title=Utente:Ladsgroup/Sandbox&action=history
[06:01:20] <Amir1>	 https://it.wikipedia.org/w/index.php?title=Utente:Ladsgroup/Sandbox&action=history
[06:01:24] <Amir1>	 can edit
[06:01:32] <marostegui>	 same yeah
[06:02:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Update s2-master alias [dns] - 10https://gerrit.wikimedia.org/r/785603 (https://phabricator.wikimedia.org/T306417) (owner: 10Marostegui)
[06:03:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1122 T306417', diff saved to https://phabricator.wikimedia.org/P26502 and previous config saved to /var/cache/conftool/dbconfig/20220426-060344-root.json
[06:03:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:06:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 is current s2 master, should not be in API T306417', diff saved to https://phabricator.wikimedia.org/P26503 and previous config saved to /var/cache/conftool/dbconfig/20220426-060602-marostegui.json
[06:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:06:07] <stashbot>	 T306417: Switchover s2 master (db1122 -> db1162) - https://phabricator.wikimedia.org/T306417
[06:07:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db1100, s5 master from API', diff saved to https://phabricator.wikimedia.org/P26504 and previous config saved to /var/cache/conftool/dbconfig/20220426-060734-marostegui.json
[06:07:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:11:13] <wikibugs>	 (03PS1) 10Marostegui: switchover-tmpl.py: Add depooling comment [software] - 10https://gerrit.wikimedia.org/r/786175
[06:11:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] switchover-tmpl.py: Add depooling comment [software] - 10https://gerrit.wikimedia.org/r/786175 (owner: 10Marostegui)
[06:12:18] <wikibugs>	 (03Merged) 10jenkins-bot: switchover-tmpl.py: Add depooling comment [software] - 10https://gerrit.wikimedia.org/r/786175 (owner: 10Marostegui)
[06:12:40] <wikibugs>	 (03PS1) 10Ladsgroup: db1109: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/786176 (https://phabricator.wikimedia.org/T302185)
[06:13:09] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1109: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/786176 (https://phabricator.wikimedia.org/T302185) (owner: 10Ladsgroup)
[06:14:46] <marostegui>	 !log dbmaint s2@eqiad T298557
[06:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:50] <stashbot>	 T298557: Fix mismatching field type of page.page_touched on wmf wikis - https://phabricator.wikimedia.org/T298557
[06:15:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
[06:15:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
[06:15:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:15:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:15:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26505 and previous config saved to /var/cache/conftool/dbconfig/20220426-061519-ladsgroup.json
[06:15:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:15:23] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[06:16:15] <marostegui>	 !log dbmaint s2@eqiad T300381
[06:16:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:16:20] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[06:21:09] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1109.eqiad.wmnet with OS bullseye
[06:21:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:32] <icinga-wm>	 PROBLEM - SSH on wtp1035.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:29:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1109.eqiad.wmnet with reason: host reimage
[06:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:25] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] [beta] Reopen beta eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785925 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[06:30:36] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM, should do the trick." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785925 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[06:32:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1109.eqiad.wmnet with reason: host reimage
[06:32:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:33:07] <icinga-wm>	 RECOVERY - puppet last run on ml-staging-ctrl2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[06:45:30] <jayme>	 !log imported scap 4.7.0 to stretch-/buster-/bullseye-wikimedia - T306827
[06:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:34] <stashbot>	 T306827: Deploy Scap version 4.7.0 - https://phabricator.wikimedia.org/T306827
[06:46:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1109.eqiad.wmnet with OS bullseye
[06:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26506 and previous config saved to /var/cache/conftool/dbconfig/20220426-065112-ladsgroup.json
[06:51:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:17] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[06:51:43] <logmsgbot>	 !log jayme@deploy1002 Started deploy [restbase/deploy@0205f1d] (dev-cluster): (no justification provided)
[06:51:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:47] <logmsgbot>	 !log jayme@deploy1002 Finished deploy [restbase/deploy@0205f1d] (dev-cluster): (no justification provided) (duration: 03m 05s)
[06:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:04] <jouncebot>	 Amir1, awight, Urbanecm, and taavi: #bothumor I � Unicode. All rise for UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T0700).
[07:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:01:30] <marostegui>	 !log dbmaint s2@eqiad T298554
[07:01:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:35] <stashbot>	 T298554: Fix mismatching field type of archive.ar_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298554
[07:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[07:06:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P26507 and previous config saved to /var/cache/conftool/dbconfig/20220426-070617-ladsgroup.json
[07:06:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:44] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:15:45] <icinga-wm>	 RECOVERY - Host cr2-eqord is UP: PING OK - Packet loss = 0%, RTA = 175.97 ms
[07:15:52] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqord is CRITICAL: OSPFv2: 2/2 UP : OSPFv3: 1/1 UP : 2 v2 P2P interfaces vs. 1 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:16:28] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:17:20] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqord is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:20:13] <icinga-wm>	 PROBLEM - Host cr2-eqord is DOWN: PING CRITICAL - Packet loss = 100%
[07:20:34] <wikibugs>	 (03PS1) 10Elukey: Add calico BGP peering settings for ml-serve100[5-8] [deployment-charts] - 10https://gerrit.wikimedia.org/r/786264 (https://phabricator.wikimedia.org/T306649)
[07:21:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P26508 and previous config saved to /var/cache/conftool/dbconfig/20220426-072122-ladsgroup.json
[07:21:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:22:00] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:22:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10netops, 10Patch-For-Review: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10JMeybohm)
[07:22:44] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:24:08] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:24:09] <icinga-wm>	 RECOVERY - Host cr2-eqord is UP: PING OK - Packet loss = 0%, RTA = 56.76 ms
[07:24:54] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:25:52] <icinga-wm>	 RECOVERY - Host cr2-eqord IPv6 is UP: PING OK - Packet loss = 0%, RTA = 57.29 ms
[07:26:10] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 99 probes of 754 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:29:43] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
[07:29:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
[07:29:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:56] <wikibugs>	 (03PS1) 10Majavah: hieradata: swap remaining ldap-labs names to ldap-rw [puppet] - 10https://gerrit.wikimedia.org/r/786265 (https://phabricator.wikimedia.org/T295150)
[07:32:24] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 7 probes of 754 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:36:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26509 and previous config saved to /var/cache/conftool/dbconfig/20220426-073627-ladsgroup.json
[07:36:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:33] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[07:36:33] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
[07:36:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
[07:36:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:38:45] <wikibugs>	 (03PS1) 10JMeybohm: Fix permissions/ownership of helm directories [puppet] - 10https://gerrit.wikimedia.org/r/786269 (https://phabricator.wikimedia.org/T305729)
[07:38:48] <wikibugs>	 (03PS1) 10JMeybohm: Clean up helm2 specific code and environment variable [puppet] - 10https://gerrit.wikimedia.org/r/786270
[07:44:47] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34957/console" [puppet] - 10https://gerrit.wikimedia.org/r/786270 (owner: 10JMeybohm)
[07:47:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2004.codfw.wmnet
[07:48:00] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[07:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[07:55:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:55:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:26] <wikibugs>	 (03PS2) 10Awight: [beta] Stash all logs for the Kartographer extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785852 (https://phabricator.wikimedia.org/T304813)
[07:56:04] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=yes; selector: name=mw2287.codfw.wmnet
[07:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:35] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=yes; selector: name=mw2288.codfw.wmnet
[07:56:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:42] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=yes; selector: name=mw2289.codfw.wmnet
[07:56:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:56] <jelto>	 ^ above hosts were not pooled again because of hardware failure of mw2286 in same cookbook run and failed cookbook
[08:03:49] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.reboot-cluster
[08:03:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:40] <wikibugs>	 (03Abandoned) 10Hashar: Stop including backports in Stretch images [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff)
[08:04:41] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2004.codfw.wmnet
[08:04:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:08:32] <wikibugs>	 (03PS1) 10Muehlenhoff: Add testvm2004 to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/786271
[08:08:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db1122.eqiad.wmnet with OS bullseye
[08:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:16] <wikibugs>	 10SRE, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) 05Open→03Resolved Closing this task, the remaining bits will be cleaned out when Stretch is removed completely.
[08:10:35] <wikibugs>	 10SRE, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff)
[08:11:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add testvm2004 to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/786271 (owner: 10Muehlenhoff)
[08:12:46] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:14:26] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: varnish: add new-version dynamic request filter template [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606)
[08:15:35] <godog>	 can someone with access here set me as clinic duty in the topic? thank you!
[08:15:56] <godog>	 tried asking ChanServ for op and I've been denied
[08:16:04] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:16:55] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: host reimage
[08:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:49] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: host reimage
[08:19:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:48] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1113.eqiad.wmnet with reason: Rebooting for T303174
[08:21:50] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1113.eqiad.wmnet with reason: Rebooting for T303174
[08:21:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:55] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3315 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26510 and previous config saved to /var/cache/conftool/dbconfig/20220426-082155-kormat.json
[08:21:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:11] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3316 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26511 and previous config saved to /var/cache/conftool/dbconfig/20220426-082210-kormat.json
[08:22:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
[08:25:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:00] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26512 and previous config saved to /var/cache/conftool/dbconfig/20220426-082559-kormat.json
[08:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:25] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "Let me know when/how you want to deploy this around, might cause a bit of downtime (for the cinder service, not so user-facing)" [puppet] - 10https://gerrit.wikimedia.org/r/785840 (https://phabricator.wikimedia.org/T297268) (owner: 10Majavah)
[08:29:00] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:30:58] <godog>	 marostegui: when you have a second could you set me as on clinic duty in topic ? thanks!
[08:31:25] <marostegui>	 godog: sure
[08:31:36] <logmsgbot>	 !log jelto@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
[08:31:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:45] <godog>	 I thought I had access with chanserv, turns out I don't
[08:31:48] <godog>	 marostegui: thank you <3
[08:31:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
[08:32:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:44] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1122.eqiad.wmnet with OS bullseye
[08:33:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:09] <moritzm>	 !log installing testvm2004 T306499
[08:34:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:13] <stashbot>	 T306499: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499
[08:34:23] <wikibugs>	 (03PS1) 10David Caro: openstack: remove ussuri files [puppet] - 10https://gerrit.wikimedia.org/r/786274 (https://phabricator.wikimedia.org/T218426)
[08:36:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: cache::haproxy: Log emergency messages to disk (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236) (owner: 10Vgutierrez)
[08:39:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add fix_img_major_mime_null_T306560.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/784762 (https://phabricator.wikimedia.org/T306560) (owner: 10Ladsgroup)
[08:41:04] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26513 and previous config saved to /var/cache/conftool/dbconfig/20220426-084103-kormat.json
[08:41:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34958/console" [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606) (owner: 10Giuseppe Lavagetto)
[08:43:13] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=yes; selector: name=mw229[7-9].codfw.wmnet
[08:43:15] <jelto>	 !log pool name=mw229[7-9].codfw.wmnet, manual icinga recheck green after reboot
[08:43:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:37] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1122: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/785931
[08:44:12] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1122: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/785931 (owner: 10Marostegui)
[08:44:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P26514 and previous config saved to /var/cache/conftool/dbconfig/20220426-084437-root.json
[08:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:21] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.reboot-cluster
[08:47:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:01] <wikibugs>	 10SRE, 10SRE-tools, 10DNS, 10Infrastructure-Foundations, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10Volans) The DNS Name field in Netbox is an FQDN, the same Netbox UI help message for the field is: `Hostname or FQDN (not ca...
[08:54:33] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Essex Igyan eigyan - https://phabricator.wikimedia.org/T305948 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Boldly resolving, though @eigyan please reopen if something is amiss and/or access is not working as expected
[08:56:08] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26515 and previous config saved to /var/cache/conftool/dbconfig/20220426-085607-kormat.json
[08:56:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:27] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] [beta] Stash all logs for the Kartographer extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785852 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[08:58:09] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10fgiunchedi) Boldly resolving, though @drochford please let us know and reopen is something is amiss!
[08:59:16] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10fgiunchedi) 05Open→03Resolved
[08:59:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P26516 and previous config saved to /var/cache/conftool/dbconfig/20220426-085941-root.json
[08:59:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:48] <wikibugs>	 (03Abandoned) 10Hashar: Introduce lint command [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/731149 (https://phabricator.wikimedia.org/T283855) (owner: 10Hashar)
[09:03:52] <wikibugs>	 (03Abandoned) 10Hashar: Be strict on undefined variables such as seed_image [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/747060 (https://phabricator.wikimedia.org/T297619) (owner: 10Hashar)
[09:03:54] <wikibugs>	 (03CR) 10Vgutierrez: varnish: add new-version dynamic request filter template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606) (owner: 10Giuseppe Lavagetto)
[09:07:25] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: add ukwiki & viwiki editquality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/786276 (https://phabricator.wikimedia.org/T301415)
[09:08:38] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Boldly resolving, @jmads please reopen if something is amiss and let us know!
[09:10:09] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1146.eqiad.wmnet with reason: Rebooting for T303174
[09:10:10] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1146.eqiad.wmnet with reason: Rebooting for T303174
[09:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:16] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3312 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26517 and previous config saved to /var/cache/conftool/dbconfig/20220426-091015-kormat.json
[09:10:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:34] <wikibugs>	 (03PS3) 10Cathal Mooney: Move elements from CR BGP policy and group config to separate files [homer/public] - 10https://gerrit.wikimedia.org/r/785284
[09:11:11] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26518 and previous config saved to /var/cache/conftool/dbconfig/20220426-091111-kormat.json
[09:11:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:16] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26519 and previous config saved to /var/cache/conftool/dbconfig/20220426-091115-kormat.json
[09:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:08] <wikibugs>	 10SRE, 10SRE-tools, 10DNS, 10Infrastructure-Foundations, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10jbond) Im  not sure i understand this response.   The value entered which caused an error was `ns-recursor0.openstack.codfw1...
[09:13:30] <wikibugs>	 (03PS4) 10Cathal Mooney: Move elements from CR BGP policy and group config to separate files [homer/public] - 10https://gerrit.wikimedia.org/r/785284
[09:13:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10fgiunchedi) @NHillard-WMF hello, thank you for the extensive testing. Is your access working now ? Also as an additional data point, does https://alerts.wikimedi...
[09:14:36] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
[09:14:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P26520 and previous config saved to /var/cache/conftool/dbconfig/20220426-091445-root.json
[09:14:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:18] <wikibugs>	 10SRE: Allow Wikimedia Maps usage on a private project for an university. - https://phabricator.wikimedia.org/T306467 (10fgiunchedi) p:05Triage→03Medium
[09:15:37] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Decommission mw13[07-48] - https://phabricator.wikimedia.org/T306162 (10fgiunchedi) p:05Triage→03Medium
[09:15:47] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: add ukwiki & viwiki editquality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/786276 (https://phabricator.wikimedia.org/T301415) (owner: 10Kevin Bazira)
[09:15:54] <nemo-yiannis>	 Hi, related to T301600: Can we purge all articles from the cache for a specific mediawiki (in our case ka.wikipedia.org) ?
[09:15:55] <stashbot>	 T301600: REST endpoints cannot handle requests from ka.wikipedia.org with Georgian titles - https://phabricator.wikimedia.org/T301600
[09:16:09] <wikibugs>	 10SRE, 10Traffic, 10Developer Productivity, 10Performance-Team (Radar): Let X-Analytics response header pass through with WikimediaDebug - https://phabricator.wikimedia.org/T305794 (10fgiunchedi) p:05Triage→03Medium
[09:16:17] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Move elements from CR BGP policy and group config to separate files [homer/public] - 10https://gerrit.wikimedia.org/r/785284 (owner: 10Cathal Mooney)
[09:16:40] <elukey>	 .7
[09:16:43] <elukey>	 uff :)
[09:17:19] <wikibugs>	 (03Merged) 10jenkins-bot: Move elements from CR BGP policy and group config to separate files [homer/public] - 10https://gerrit.wikimedia.org/r/785284 (owner: 10Cathal Mooney)
[09:20:18] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
[09:20:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:55] <wikibugs>	 (03PS3) 10WMDE-Fisch: [beta] Stash all logs for the Kartographer extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785852 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[09:23:09] <logmsgbot>	 !log mvernon@cumin1001 conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1012.eqiad.wmnet
[09:23:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:20] <logmsgbot>	 !log mvernon@cumin1001 conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1012.eqiad.wmnet
[09:23:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:59] <topranks>	 !log Reconfigure CR routers following bgp policy changes (no-op) CR785284
[09:24:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:30] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] varnish: add new-version dynamic request filter template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606) (owner: 10Giuseppe Lavagetto)
[09:24:53] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Cumin should group similar SSH errors - https://phabricator.wikimedia.org/T306490 (10Majavah) 05Open→03Declined Makes sense. Thanks!
[09:25:01] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db2093.codfw.wmnet,dborch1001.wikimedia.org with reason: Rebooting db1115 T303174
[09:25:04] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2093.codfw.wmnet,dborch1001.wikimedia.org with reason: Rebooting db1115 T303174
[09:25:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:13] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: varnish: add new-version dynamic request filter template [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606)
[09:25:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:32] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1115.eqiad.wmnet with reason: Rebooting for T303174
[09:25:34] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1115.eqiad.wmnet with reason: Rebooting for T303174
[09:25:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:20] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26521 and previous config saved to /var/cache/conftool/dbconfig/20220426-092619-kormat.json
[09:26:22] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[09:26:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host webperf2001.codfw.wmnet
[09:26:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:13] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[09:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:12] <WMDE-Fisch>	 jouncebot: now
[09:28:12] <jouncebot>	 No deployments scheduled for the next 3 hour(s) and 31 minute(s)
[09:28:27] * WMDE-Fisch going to merge a beta cluster config change
[09:28:29] <wikibugs>	 (03PS5) 10Jbond: P:mail: also exclude posfix aliases from vtr router [puppet] - 10https://gerrit.wikimedia.org/r/785870
[09:29:00] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[09:29:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:06] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:29:07] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+2] [beta] Stash all logs for the Kartographer extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785852 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[09:29:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P26522 and previous config saved to /var/cache/conftool/dbconfig/20220426-092949-root.json
[09:29:51] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[09:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:03] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Stash all logs for the Kartographer extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785852 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[09:30:46] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2001.codfw.wmnet
[09:30:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:50] * WMDE-Fisch done
[09:31:50] <wikibugs>	 10SRE-swift-storage: swift wmf/rewrite.py middleware broken on bullseye (and its test suite doesn't work either) - https://phabricator.wikimedia.org/T305942 (10MatthewVernon) New failures: ` Apr 26 09:25:15 ms-fe1012 proxy-server: Error: An error occurred: #012Traceback (most recent call last):#012  File "/usr/l...
[09:31:54] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host webperf2002.codfw.wmnet
[09:31:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:03] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[09:32:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:32] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] varnish: add new-version dynamic request filter template [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606) (owner: 10Giuseppe Lavagetto)
[09:32:56] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[09:32:58] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1146.eqiad.wmnet with reason: Rebooting for T303174
[09:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:59] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1146.eqiad.wmnet with reason: Rebooting for T303174
[09:33:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:08] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1002 is CRITICAL: instance=10.64.48.64 verb={CREATE,PATCH,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:33:09] <wikibugs>	 10SRE, 10SRE-tools, 10DNS, 10Infrastructure-Foundations, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10Volans) Sure, but they could cause various unwanted issues in different contexes, like not matching the fingerprint in the k...
[09:33:15] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3314 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26523 and previous config saved to /var/cache/conftool/dbconfig/20220426-093314-kormat.json
[09:33:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:36] <wikibugs>	 (03PS6) 10Jbond: P:mail: also exclude posfix aliases from vtr router [puppet] - 10https://gerrit.wikimedia.org/r/785870
[09:33:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[09:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[09:34:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[09:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[09:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:22] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2001 is CRITICAL: instance=10.192.32.33 verb={CREATE,PATCH} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:34:32] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2002 is CRITICAL: instance=10.192.48.41 verb={CREATE,PATCH} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:34:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] varnish: add new-version dynamic request filter template [puppet] - 10https://gerrit.wikimedia.org/r/778543 (https://phabricator.wikimedia.org/T305606) (owner: 10Giuseppe Lavagetto)
[09:35:06] <urbanecm>	 jouncebot: nowandnext
[09:35:06] <jouncebot>	 No deployments scheduled for the next 3 hour(s) and 24 minute(s)
[09:35:06] <jouncebot>	 In 3 hour(s) and 24 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1300)
[09:35:15] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "merging, docs-only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/770502 (owner: 10Gergő Tisza)
[09:35:56] <wikibugs>	 (03Merged) 10jenkins-bot: Add a note about tox requirements for changing logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/770502 (owner: 10Gergő Tisza)
[09:36:03] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2002.codfw.wmnet
[09:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:22] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host aqs1012.eqiad.wmnet
[09:36:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:07] * urbanecm done
[09:37:34] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:37:40] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:37:50] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26524 and previous config saved to /var/cache/conftool/dbconfig/20220426-093750-kormat.json
[09:37:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:35] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] hieradata: swap remaining ldap-labs names to ldap-rw [puppet] - 10https://gerrit.wikimedia.org/r/786265 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[09:38:58] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:39:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[09:39:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[09:39:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[09:39:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[09:39:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:42] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host webperf1001.eqiad.wmnet
[09:39:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:18] <wikibugs>	 (03CR) 10Jbond: "thanks" [puppet] - 10https://gerrit.wikimedia.org/r/785870 (owner: 10Jbond)
[09:41:23] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26525 and previous config saved to /var/cache/conftool/dbconfig/20220426-094123-kormat.json
[09:41:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:02] <_joe_>	 some CP hosts will alert now
[09:42:06] <_joe_>	 it's my fault, fixing now
[09:42:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1001.eqiad.wmnet
[09:42:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P26526 and previous config saved to /var/cache/conftool/dbconfig/20220426-094453-root.json
[09:44:55] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:mail: also exclude posfix aliases from vtr router [puppet] - 10https://gerrit.wikimedia.org/r/785870 (owner: 10Jbond)
[09:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:18] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti-test2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[09:45:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS bullseye
[09:45:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:53] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti-test2001.codfw.wmnet with OS bullseye
[09:47:21] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1012.eqiad.wmnet
[09:47:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:20] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.64.32.145:9042 on aqs1012 is CRITICAL: connect to address 10.64.32.145 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[09:50:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host webperf1002.eqiad.wmnet
[09:50:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:20] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.64.32.145:9042 on aqs1012 is OK: TCP OK - 0.000 second response time on 10.64.32.145 port 9042 https://phabricator.wikimedia.org/T93886
[09:51:00] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:51:04] <wikibugs>	 (03PS1) 10Vgutierrez: Release 8.0.8-1wm6 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/786282 (https://phabricator.wikimedia.org/T304835)
[09:51:24] <logmsgbot>	 !log nokafor@deploy1002 Started deploy [airflow-dags/analytics@9dbd5bc]: (no justification provided)
[09:51:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 8.0.8-1wm6 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/786282 (https://phabricator.wikimedia.org/T304835) (owner: 10Vgutierrez)
[09:51:31] <logmsgbot>	 !log nokafor@deploy1002 Finished deploy [airflow-dags/analytics@9dbd5bc]: (no justification provided) (duration: 00m 07s)
[09:51:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:39] <vgutierrez>	 that was fast :)
[09:51:55] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: cache-frontend: fix confd template [puppet] - 10https://gerrit.wikimedia.org/r/786283
[09:52:23] <_joe_>	 vgutierrez: jerkins knows you
[09:52:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1002.eqiad.wmnet
[09:52:28] <wikibugs>	 (03PS2) 10Vgutierrez: Release 8.0.8-1wm6 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/786282 (https://phabricator.wikimedia.org/T304835)
[09:52:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:54] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26528 and previous config saved to /var/cache/conftool/dbconfig/20220426-095254-kormat.json
[09:52:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:01] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp2031 is CRITICAL: File not found: /etc/varnish/requestctl-filters.inc.vcl https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[09:56:15] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] cache-frontend: fix confd template [puppet] - 10https://gerrit.wikimedia.org/r/786283 (owner: 10Giuseppe Lavagetto)
[09:56:27] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26530 and previous config saved to /var/cache/conftool/dbconfig/20220426-095627-kormat.json
[09:56:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:56] <wikibugs>	 10SRE, 10SRE-tools, 10DNS, 10Infrastructure-Foundations, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10jbond) 05Open→03Stalled As per an offline conversation with @Volans.  newer versions of netbox allow us to preform [[ ht...
[09:59:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P26531 and previous config saved to /var/cache/conftool/dbconfig/20220426-095957-root.json
[10:00:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:09] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2001.codfw.wmnet with reason: host reimage
[10:00:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1122 into API', diff saved to https://phabricator.wikimedia.org/P26532 and previous config saved to /var/cache/conftool/dbconfig/20220426-100031-marostegui.json
[10:00:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:57] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp4029 is CRITICAL: File not found: /etc/varnish/requestctl-filters.inc.vcl https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:01:16] <vgutierrez>	 _joe_: ^^
[10:03:35] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test2001.codfw.wmnet with reason: host reimage
[10:03:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:07] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp5007 is CRITICAL: File not found: /etc/varnish/requestctl-filters.inc.vcl https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:05:36] <_joe_>	 uhm
[10:05:48] <_joe_>	 that's quite strange vgutierrez 
[10:05:59] <_joe_>	 but yes I still have puppet disabled
[10:06:03] <_joe_>	 it will be fixed soon
[10:06:20] <vgutierrez>	 ack
[10:07:19] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp2031 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:07:23] <_joe_>	 yep
[10:07:33] <_joe_>	 that was me forcing a puppet run on that host :)
[10:07:34] <wikibugs>	 (03PS1) 10Jcrespo: mediabackup: Clone localy the mediawiki-config repo [puppet] - 10https://gerrit.wikimedia.org/r/786285 (https://phabricator.wikimedia.org/T305446)
[10:07:59] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26533 and previous config saved to /var/cache/conftool/dbconfig/20220426-100758-kormat.json
[10:08:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:31] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: certspotter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:08:41] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp4027 is CRITICAL: File not found: /etc/varnish/requestctl-filters.inc.vcl https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:08:42] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackup: Clone localy the mediawiki-config repo [puppet] - 10https://gerrit.wikimedia.org/r/786285 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[10:10:05] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp4027 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:13:27] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp4029 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:13:55] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp5007 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:14:19] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1064 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:14:27] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host ms-backup2002.codfw.wmnet with OS bullseye
[10:14:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:09] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host aqs1013.eqiad.wmnet
[10:15:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:22:32] <icinga-wm>	 PROBLEM - Check systemd state on doc1001 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc1002.eqiad.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:23:04] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26534 and previous config saved to /var/cache/conftool/dbconfig/20220426-102303-kormat.json
[10:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:08] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26535 and previous config saved to /var/cache/conftool/dbconfig/20220426-102307-kormat.json
[10:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:36] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp3060 is CRITICAL: File not found: /etc/varnish/requestctl-filters.inc.vcl https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:24:50] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786289 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[10:25:41] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1013.eqiad.wmnet
[10:25:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:05] <wikibugs>	 10SRE: role_contacts (service owners) as a custom puppet fact - https://phabricator.wikimedia.org/T306830 (10jbond) > use cumin to ask "what is the kernel version of all machines owned by $subteam" or "which hosts owned by $subteam are still on buster" As we pass this value as a paramter to profile::contacts we...
[10:28:42] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2002.codfw.wmnet with reason: host reimage
[10:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:12] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup2002.codfw.wmnet with reason: host reimage
[10:32:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:04] <wikibugs>	 (03PS1) 10MVernon: swift: wmf/rewrite.py py2->3 HTTPMessage changes [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942)
[10:33:36] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
[10:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] swift: wmf/rewrite.py py2->3 HTTPMessage changes [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[10:34:52] <wikibugs>	 10SRE: role_contacts (service owners) as a custom puppet fact - https://phabricator.wikimedia.org/T306830 (10MoritzMuehlenhoff) And if that syntax is too cumbersome in the day-to-day we could add a few Cumin aliases? like A:hosts-data-persistence and A:hosts-infrastructure-foundations or similar?
[10:34:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10netops, 10Patch-For-Review: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10elukey) Created https://gerrit.wikimedia.org/r/786264 to kick off the discussion about the next steps, let...
[10:36:07] <icinga-wm>	 RECOVERY - Check systemd state on alert1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[10:38:00] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10Wikimedia Enterprise: 301 redirect setup for wikimediaenterprise - https://phabricator.wikimedia.org/T302756 (10Protsack.stephan)
[10:38:12] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26536 and previous config saved to /var/cache/conftool/dbconfig/20220426-103811-kormat.json
[10:38:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:21] <wikibugs>	 (03PS2) 10MVernon: swift: wmf/rewrite.py py2->3 HTTPMessage changes [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942)
[10:38:34] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] Watch for mapdata cache misses in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786289 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[10:40:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS bullseye
[10:40:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:43] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti-test2001.codfw.wmnet with OS bullseye completed: - ganeti-test2001 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled P...
[10:42:44] <wikibugs>	 (03CR) 10MVernon: "I've lightly tested this fix on ms-fe1012." [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[10:43:43] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup2002.codfw.wmnet with OS bullseye
[10:43:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:40] <topranks>	 !log Reconfigre routing policy lsw1-f3-eqiad, rename policies to use lower-case
[10:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:53] <wikibugs>	 10SRE-swift-storage, 10Patch-For-Review: swift wmf/rewrite.py middleware broken on bullseye (and its test suite doesn't work either) - https://phabricator.wikimedia.org/T305942 (10MatthewVernon) That patch fixes the above error; I've found another...
[10:45:08] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host ms-backup1002.eqiad.wmnet with OS bullseye
[10:45:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[10:48:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me!" [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/786282 (https://phabricator.wikimedia.org/T304835) (owner: 10Vgutierrez)
[10:48:54] <wikibugs>	 (03CR) 10Vgutierrez: cache::haproxy: Log emergency messages to disk (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236) (owner: 10Vgutierrez)
[10:50:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236) (owner: 10Vgutierrez)
[10:50:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM when the time comes" [puppet] - 10https://gerrit.wikimedia.org/r/785927 (owner: 10Cwhite)
[10:51:59] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Watch for mapdata cache misses in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786289 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[10:53:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/785921 (https://phabricator.wikimedia.org/T294564) (owner: 10JHathaway)
[10:53:03] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: wmf/rewrite.py py2->3 HTTPMessage changes [puppet] - 10https://gerrit.wikimedia.org/r/786290 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[10:53:14] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] cache::haproxy: Log emergency messages to disk [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236) (owner: 10Vgutierrez)
[10:53:16] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26537 and previous config saved to /var/cache/conftool/dbconfig/20220426-105315-kormat.json
[10:53:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:59] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.reboot-cluster
[10:54:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:05] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: host reimage
[10:57:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:41] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/requestctl-filters.inc.vcl on cp3060 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:57:48] <topranks>	 !log Reconfigre routing policy lsw1-e3-eqiad, rename policies to use lower-case
[10:57:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:59:19] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:00:31] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: host reimage
[11:00:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:01:20] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
[11:01:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[11:02:05] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1064 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:05:07] <topranks>	 !log Reconfigre routing policy lsw1-f2-eqiad, rename policies to use lower-case
[11:05:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:19] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26538 and previous config saved to /var/cache/conftool/dbconfig/20220426-110819-kormat.json
[11:08:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:17] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
[11:09:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:49] <topranks>	 !log Reconfigre routing policy lsw1-e2-eqiad, rename policies to use lower-case
[11:09:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
[11:11:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:34] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup1002.eqiad.wmnet with OS bullseye
[11:11:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:07] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host ms-backup2001.codfw.wmnet with OS bullseye
[11:13:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:56] <topranks>	 !log Reconfigre routing policy lsw1-e1-eqiad, rename policies to use lower-case
[11:16:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:05] <wikibugs>	 (03CR) 10Jbond: "lgtm, some minor comments/questions inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/775904 (owner: 10Volans)
[11:17:34] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1170.eqiad.wmnet with reason: Rebooting for T303174
[11:17:36] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1170.eqiad.wmnet with reason: Rebooting for T303174
[11:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:41] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3312 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26539 and previous config saved to /var/cache/conftool/dbconfig/20220426-111741-kormat.json
[11:17:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:52] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3317 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P26540 and previous config saved to /var/cache/conftool/dbconfig/20220426-111751-kormat.json
[11:17:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:55] <icinga-wm>	 RECOVERY - Check systemd state on doc1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:22:16] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26541 and previous config saved to /var/cache/conftool/dbconfig/20220426-112215-kormat.json
[11:22:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:26] <topranks>	 !log Reconfigre routing policy lsw1-f1-eqiad, rename policies to use lower-case
[11:22:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[11:23:04] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2001.codfw.wmnet
[11:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:21] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2001.codfw.wmnet with reason: host reimage
[11:27:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:54] <icinga-wm>	 PROBLEM - configured eth on ganeti-test2001 is CRITICAL: public reporting no carrier. https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[11:29:12] <wikibugs>	 (03PS1) 10Cathal Mooney: Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758)
[11:29:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[11:30:48] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup2001.codfw.wmnet with reason: host reimage
[11:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:31] <wikibugs>	 (03PS2) 10Cathal Mooney: Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758)
[11:33:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[11:34:05] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host ms-backup1001.eqiad.wmnet with OS bullseye
[11:34:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:19] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26542 and previous config saved to /var/cache/conftool/dbconfig/20220426-113719-kormat.json
[11:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:24] <wikibugs>	 (03PS3) 10Cathal Mooney: Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758)
[11:40:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[11:42:07] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup2001.codfw.wmnet with OS bullseye
[11:42:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:42:18] <wikibugs>	 (03PS4) 10Cathal Mooney: Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758)
[11:46:06] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: host reimage
[11:46:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:19] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[11:46:52] <wikibugs>	 (03Merged) 10jenkins-bot: Add automation templates for EVPN switch overlay BGP [homer/public] - 10https://gerrit.wikimedia.org/r/786296 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[11:47:06] <wikibugs>	 (03CR) 10David Caro: "Neat, almost there, some comments 😊" [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[11:49:19] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: host reimage
[11:49:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[11:52:23] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26543 and previous config saved to /var/cache/conftool/dbconfig/20220426-115223-kormat.json
[11:52:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:27] <wikibugs>	 (03PS2) 10WMDE-Fisch: Watch for mapdata cache misses in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786289 (https://phabricator.wikimedia.org/T304813) (owner: 10Awight)
[11:59:27] <icinga-wm>	 RECOVERY - configured eth on ganeti-test2001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[12:00:46] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup1001.eqiad.wmnet with OS bullseye
[12:00:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[12:03:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:43] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[12:03:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/785921 (https://phabricator.wikimedia.org/T294564) (owner: 10JHathaway)
[12:07:27] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26544 and previous config saved to /var/cache/conftool/dbconfig/20220426-120727-kormat.json
[12:07:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] thanos: aggregate exporter 'up' metrics [puppet] - 10https://gerrit.wikimedia.org/r/784635 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi)
[12:07:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:31] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26545 and previous config saved to /var/cache/conftool/dbconfig/20220426-120731-kormat.json
[12:07:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:48] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host aqs1015.eqiad.wmnet
[12:14:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:08] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10netops, 10Patch-For-Review: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) @elukey thanks for the patch, certainly looks ok to me, if indeed it works in terms of the Calico...
[12:22:35] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26546 and previous config saved to /var/cache/conftool/dbconfig/20220426-122235-kormat.json
[12:22:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:51] <icinga-wm>	 RECOVERY - SSH on wtp1035.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:24:42] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1015.eqiad.wmnet
[12:24:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:04] <wikibugs>	 (03PS3) 10Cathal Mooney: Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:29:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:30:03] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:31:16] <wikibugs>	 10SRE, 10DBA, 10Security: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[12:31:19] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:32:10] <wikibugs>	 10SRE, 10DBA: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[12:32:13] <icinga-wm>	 PROBLEM - configured eth on ganeti-test2001 is CRITICAL: public reporting no carrier. https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[12:32:18] <wikibugs>	 10SRE, 10DBA: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat) p:05Triage→03Medium
[12:32:37] <wikibugs>	 10SRE, 10DBA: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[12:33:12] <wikibugs>	 (03PS4) 10Cathal Mooney: Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:33:23] <icinga-wm>	 PROBLEM - puppet last run on ml-staging-ctrl2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[12:33:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:34:04] <wikibugs>	 10SRE, 10DBA: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[12:35:39] <wikibugs>	 (03PS5) 10Cathal Mooney: Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:36:59] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING WARNING - Packet loss = 60%, RTA = 0.82 ms
[12:37:35] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:37:40] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26547 and previous config saved to /var/cache/conftool/dbconfig/20220426-123740-kormat.json
[12:37:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:08] <wikibugs>	 (03Merged) 10jenkins-bot: Add ml-serve100[5-8] to the ml-serve-eqiad k8s BGP neighbors [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[12:43:10] <wikibugs>	 (03PS1) 10Kormat: ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892)
[12:44:11] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.8-1wm6 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/786282 (https://phabricator.wikimedia.org/T304835) (owner: 10Vgutierrez)
[12:44:23] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
[12:44:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:25] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[12:46:32] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
[12:46:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:57] <icinga-wm>	 PROBLEM - BGP status on lsw1-e2-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv6: Connect - kubernetes-ml-eqiad, AS64606/IPv4: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:47:27] <icinga-wm>	 PROBLEM - BGP status on lsw1-e3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv6: Connect - kubernetes-ml-eqiad, AS64606/IPv4: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:48:10] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host druid1004.eqiad.wmnet
[12:48:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:21] <kormat>	 jouncebot: nowandnext
[12:48:22] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 11 minute(s)
[12:48:22] <jouncebot>	 In 0 hour(s) and 11 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1300)
[12:48:35] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 81, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:50:21] <icinga-wm>	 PROBLEM - BGP status on lsw1-f2-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv4: Connect - kubernetes-ml-eqiad, AS64606/IPv6: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:50:45] <icinga-wm>	 PROBLEM - BGP status on lsw1-f3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv6: Connect - kubernetes-ml-eqiad, AS64606/IPv4: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:52:44] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26550 and previous config saved to /var/cache/conftool/dbconfig/20220426-125244-kormat.json
[12:52:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:05] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1004.eqiad.wmnet
[12:55:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:21] <wikibugs>	 (03CR) 10Marostegui: "The rack location aren't updated, I am fine with that if this will be for a short time. But in case pc1011 doesn't come back, we should up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[12:57:04] <wikibugs>	 (03PS2) 10Kormat: ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892)
[12:57:42] <wikibugs>	 (03CR) 10Kormat: ProductionServices: Promote pc1014 to primary of pc1. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[12:59:03] <wikibugs>	 (03CR) 10Marostegui: [C: 04-1] "I just realised you are depooling pc2011 not pc1011" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1300).
[13:00:04] <jouncebot>	 tgr: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:24] <Lucas_WMDE>	 I’m in a meeting – I assume tgr / tgr_ can self-serve? :)
[13:00:24] <urbanecm>	 hey tgr! do you want to self-serve?
[13:00:41] <urbanecm>	 hey Lucas_WMDE! happy meeting :)
[13:00:41] <tgr>	 sure, i can
[13:01:05] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:01:05] <urbanecm>	 (y)
[13:01:11] <icinga-wm>	 PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[13:01:12] <wikibugs>	 (03PS3) 10Kormat: ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892)
[13:01:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:02:06] <wikibugs>	 (03CR) 10Kormat: ProductionServices: Promote pc1014 to primary of pc1. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[13:02:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[13:03:29] <icinga-wm>	 RECOVERY - configured eth on ganeti-test2001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[13:07:20] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host druid1005.eqiad.wmnet
[13:07:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:58] <wikibugs>	 (03PS1) 10Cathal Mooney: Modify Ganeti addnode.py script function when detecting bridge status [cookbooks] - 10https://gerrit.wikimedia.org/r/786304
[13:08:44] <wikibugs>	 (03CR) 10Elukey: "Thanks Cathal!" [homer/public] - 10https://gerrit.wikimedia.org/r/784703 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[13:10:53] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:11:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/786304 (owner: 10Cathal Mooney)
[13:11:25] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] [beta] Reopen beta eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785925 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[13:12:06] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Reopen beta eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785925 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[13:13:32] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Modify Ganeti addnode.py script function when detecting bridge status [cookbooks] - 10https://gerrit.wikimedia.org/r/786304 (owner: 10Cathal Mooney)
[13:13:40] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Backport video landing page changes [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785950 (https://phabricator.wikimedia.org/T303785) (owner: 10Gergő Tisza)
[13:14:01] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1005.eqiad.wmnet
[13:14:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:13] <kormat>	 tgr: can you ping me when you've finished deploying, please? i have something i want to deploy, and i don't want to step on your toes
[13:15:12] <tgr>	 kormat: if it's not in wmf.8, go ahead; the patch is going through CI, will take a while
[13:15:39] <kormat>	 tgr: it's a mediawiki-config change. i'm not in any rush, so i'd rather wait
[13:16:27] <tgr>	 as you wish, but it will take an hour at least
[13:16:39] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:16:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:16:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:16:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:08] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+1] Modify Ganeti addnode.py script function when detecting bridge status [cookbooks] - 10https://gerrit.wikimedia.org/r/786304 (owner: 10Cathal Mooney)
[13:17:11] <wikibugs>	 (03Merged) 10jenkins-bot: Modify Ganeti addnode.py script function when detecting bridge status [cookbooks] - 10https://gerrit.wikimedia.org/r/786304 (owner: 10Cathal Mooney)
[13:17:26] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Add Link: Add 'excluded sections' task setting [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785926 (https://phabricator.wikimedia.org/T304150) (owner: 10Gergő Tisza)
[13:18:03] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-novastats-dnsleaks.py: make slightly better at handling codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/786307
[13:18:24] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[13:18:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:30] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[13:18:37] <wikibugs>	 (03PS2) 10Elukey: Add calico BGP peering settings for ml-serve100[5-8] [deployment-charts] - 10https://gerrit.wikimedia.org/r/786264 (https://phabricator.wikimedia.org/T306649)
[13:18:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:05] <kormat>	 tgr: oh really.. ok. i'll do mine now then. wish me luck! ;)
[13:19:20] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[13:20:08] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices: Promote pc1014 to primary of pc1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786300 (https://phabricator.wikimedia.org/T306892) (owner: 10Kormat)
[13:21:19] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[13:21:28] <logmsgbot>	 !log kormat@deploy1002 Synchronized wmf-config/ProductionServices.php: Set pc1014 as pc1 primary T306892 (duration: 01m 07s)
[13:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:44] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Rebooting pc1011 T306892
[13:21:45] <stashbot>	 T306892: Reboot pc1011 - https://phabricator.wikimedia.org/T306892
[13:21:48] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Rebooting pc1011 T306892
[13:21:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:30] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on pc1011.eqiad.wmnet with reason: Rebooting for T303174
[13:23:31] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc1011.eqiad.wmnet with reason: Rebooting for T303174
[13:23:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:19] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[13:24:24] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat)
[13:24:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-novastats-dnsleaks.py: make slightly better at handling codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/786307 (owner: 10Andrew Bogott)
[13:25:49] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:26:39] <icinga-wm>	 RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[13:26:41] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] Add calico BGP peering settings for ml-serve100[5-8] [deployment-charts] - 10https://gerrit.wikimedia.org/r/786264 (https://phabricator.wikimedia.org/T306649) (owner: 10Elukey)
[13:27:00] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add calico BGP peering settings for ml-serve100[5-8] [deployment-charts] - 10https://gerrit.wikimedia.org/r/786264 (https://phabricator.wikimedia.org/T306649) (owner: 10Elukey)
[13:27:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:27:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:27:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:27:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:27:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:14] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
[13:28:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:30] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[13:29:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:42] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[13:29:43] <icinga-wm>	 RECOVERY - BGP status on lsw1-f2-eqiad.mgmt is OK: BGP OK - up: 4, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:29:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:09] <icinga-wm>	 RECOVERY - BGP status on lsw1-f3-eqiad.mgmt is OK: BGP OK - up: 4, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:30:49] <icinga-wm>	 RECOVERY - BGP status on lsw1-e2-eqiad.mgmt is OK: BGP OK - up: 4, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:31:13] <wikibugs>	 (03PS1) 10Kormat: Revert "ProductionServices: Promote pc1014 to primary of pc1." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785936
[13:31:19] <icinga-wm>	 RECOVERY - BGP status on lsw1-e3-eqiad.mgmt is OK: BGP OK - up: 4, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:32:23] <icinga-wm>	 PROBLEM - PHP7 jobrunner on mw1445 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[13:32:51] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Revert "ProductionServices: Promote pc1014 to primary of pc1." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785936 (owner: 10Kormat)
[13:33:27] <wikibugs>	 (03PS1) 10Jbond: rake_modules: add check for spdk licence header [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270)
[13:33:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1445 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:34:03] <wikibugs>	 10SRE, 10WMF-General-or-Unknown, 10WMF-Legal, 10Documentation, and 2 others: Default license for operations/puppet - https://phabricator.wikimedia.org/T67270 (10jbond) >>! In T67270#7832446, @Ladsgroup wrote: > @jbond In the meantime, maybe we can add a rule to lint -1ing any new puppet/or otherwise file t...
[13:34:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] rake_modules: add check for spdk licence header [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270) (owner: 10Jbond)
[13:34:38] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "ProductionServices: Promote pc1014 to primary of pc1." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785936 (owner: 10Kormat)
[13:35:59] <wikibugs>	 (03PS2) 10Jbond: rake_modules: add check for spdk licence header [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270)
[13:36:33] <logmsgbot>	 !log kormat@deploy1002 Synchronized wmf-config/ProductionServices.php: Set pc1011 as pc1 primary T306892 (duration: 01m 37s)
[13:36:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:39] <stashbot>	 T306892: Reboot pc1011 - https://phabricator.wikimedia.org/T306892
[13:36:51] <icinga-wm>	 RECOVERY - PHP7 jobrunner on mw1445 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[13:36:56] <wikibugs>	 (03Merged) 10jenkins-bot: Backport video landing page changes [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785950 (https://phabricator.wikimedia.org/T303785) (owner: 10Gergő Tisza)
[13:37:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:37:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:37:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:37:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:37:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:21] <wikibugs>	 (03PS1) 10MVernon: swift: wmf/rewrite.py say 400 earlier if passed bad UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/786311 (https://phabricator.wikimedia.org/T305942)
[13:38:46] <wikibugs>	 (03Merged) 10jenkins-bot: Add Link: Add 'excluded sections' task setting [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785926 (https://phabricator.wikimedia.org/T304150) (owner: 10Gergő Tisza)
[13:38:50] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: requestctl: preserve rules ordering in `requestctl commit` [software/conftool] - 10https://gerrit.wikimedia.org/r/786312
[13:38:52] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: New version 2.1.3 [software/conftool] - 10https://gerrit.wikimedia.org/r/786313
[13:38:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] swift: wmf/rewrite.py say 400 earlier if passed bad UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/786311 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[13:40:31] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1445 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:41:11] <icinga-wm>	 PROBLEM - PHP7 jobrunner on mw1438 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[13:41:38] <kormat>	 tgr: alright, i'm finished screwing with production. it's all yours again :)
[13:41:50] <tgr>	 thanks!
[13:42:00] <wikibugs>	 (03PS2) 10MVernon: swift: wmf/rewrite.py say 400 earlier if passed bad UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/786311 (https://phabricator.wikimedia.org/T305942)
[13:42:27] <wikibugs>	 10SRE, 10DBA: Reboot pc1011 - https://phabricator.wikimedia.org/T306892 (10Kormat) 05Open→03Resolved
[13:43:21] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1438 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:45:27] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 45, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:45:35] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:45:42] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host druid1006.eqiad.wmnet
[13:45:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] swift: wmf/rewrite.py say 400 earlier if passed bad UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/786311 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[13:49:00] <wikibugs>	 (03PS3) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[13:49:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[13:50:05] <icinga-wm>	 PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[13:51:00] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:51:40] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis)
[13:53:39] <logmsgbot>	 !log tgr@deploy1002 Started scap: (no justification provided)
[13:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:50] <wikibugs>	 (03PS4) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[13:54:01] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: wmf/rewrite.py say 400 earlier if passed bad UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/786311 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[13:54:35] <wikibugs>	 (03PS1) 10Kormat: pc1014: Move to pc2. [puppet] - 10https://gerrit.wikimedia.org/r/786317 (https://phabricator.wikimedia.org/T303174)
[13:56:17] <logmsgbot>	 !log tgr@deploy1002 Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org).  WARNING: canaries have not been rolled back.
[13:56:17] <logmsgbot>	 !log tgr@deploy1002 scap failed: RuntimeError Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org).  WARNING: canaries have not been rolled back. (duration: 02m 37s)
[13:56:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:31] <wikibugs>	 (03PS5) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[13:56:34] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) Should we call this done, or should we leave it open pending an outcome on {T305358}? Many thanks again for all your support with this request @JMeybohm.
[13:56:53] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host druid1006.eqiad.wmnet
[13:56:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:32] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM all nits and comments are minor/picky or unrelated to your changes so feel free to ignore them" [puppet] - 10https://gerrit.wikimedia.org/r/779936 (https://phabricator.wikimedia.org/T305589) (owner: 10Ssingh)
[13:57:44] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:57:46] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) Please keep this open as it is absolutely in a hacky state currently (DNS + service::catalog wise)
[13:57:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:57:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:57:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:57:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:34] <wikibugs>	 (03PS6) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[14:02:35] <tgr>	 apparently I don't know how to revert submodule commits
[14:02:36] <wikibugs>	 (03PS2) 10Majavah: P:openstack::encapi: add keystone token verification [puppet] - 10https://gerrit.wikimedia.org/r/785134 (https://phabricator.wikimedia.org/T274666)
[14:03:18] <tgr>	 the bacc command just gives "fatal: bad object"
[14:03:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:03:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:23] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:03:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:47] <tgr>	 manually I end up with "nothing to commit, working tree clean"
[14:03:57] <tgr>	 urbanecm: are you still around by any chance?
[14:04:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:04:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:11] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:04:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:56] <wikibugs>	 (03PS1) 10Klausman: Switch ML staging control plane to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/786319 (https://phabricator.wikimedia.org/T302195)
[14:06:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:06:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:45] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:07:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:52] <wikibugs>	 (03CR) 10Herron: [C: 03+1] profile: re-enable grafana db sync post 8.x upgrade (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785927 (owner: 10Cwhite)
[14:07:54] <wikibugs>	 (03PS2) 10Klausman: hiera: Switch ML staging control plane to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/786319 (https://phabricator.wikimedia.org/T302195)
[14:08:50] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34960/console" [puppet] - 10https://gerrit.wikimedia.org/r/785134 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[14:08:53] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34961/console" [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[14:09:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:09:42] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:09:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:10] <tgr>	 I guess I can just interactive-rebase away the commits that need to be reverted, in the short term
[14:11:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:12:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:02] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[14:12:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:13:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:13:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:13:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:13:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:39] <icinga-wm>	 RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[14:15:52] <tgr>	 !ops anyone around with a good command of git?
[14:16:08] <urbanecm>	 tgr: what's up
[14:16:18] <urbanecm>	 (!_ops pings chanops, but i guess this works too)
[14:16:46] <tgr>	 I tried to deploy two GrowthExperiments patches together, the canary broke, and I can't figure out how to revert it
[14:16:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.023 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[14:17:08] <urbanecm>	 tgr: so you want me to revert both patches you merged in the window?
[14:17:49] <tgr>	 I did a bunch of "git reset --hard @~" in the mediawiki dir, so that's on the last good commit now (having the two bad commits + a revert there would be ideal)
[14:18:12] <tgr>	 the submodule still contains those two commits
[14:18:17] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/786312 (owner: 10Giuseppe Lavagetto)
[14:18:33] <tgr>	 urbanecm: yeah, for now please revert both
[14:18:36] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/786313 (owner: 10Giuseppe Lavagetto)
[14:18:53] <urbanecm>	 tgr: fixed (git reset --hard in submodule itself)
[14:18:58] <urbanecm>	 syncing now
[14:19:13] <icinga-wm>	 RECOVERY - PHP7 jobrunner on mw1438 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[14:19:21] <tgr>	 ah. I thought git submodule update does that?
[14:19:54] <tgr>	 the scap that broke was a sync-world but I think a normal sync should be fine for restoring functionality
[14:20:06] <tgr>	 it was only needed due to i18n
[14:20:10] <urbanecm>	 ack ack
[14:20:40] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785937 (https://phabricator.wikimedia.org/T304150)
[14:20:46] <wikibugs>	 (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Revert "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785937 (https://phabricator.wikimedia.org/T304150) (owner: 10Urbanecm)
[14:20:57] <wikibugs>	 (03PS1) 10Klausman: Add service IP for ML staging k8s ctrl plane [dns] - 10https://gerrit.wikimedia.org/r/786320 (https://phabricator.wikimedia.org/T302195)
[14:21:01] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785938 (https://phabricator.wikimedia.org/T303785)
[14:21:07] <wikibugs>	 (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Revert "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785938 (https://phabricator.wikimedia.org/T303785) (owner: 10Urbanecm)
[14:21:10] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.8/extensions/GrowthExperiments/: REVERT: Failed backports (duration: 01m 40s)
[14:21:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:00] <urbanecm>	 tgr: hopefully should be reverted (in prod, gerrit and staging dir)
[14:22:11] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Add service IP for ML staging k8s ctrl plane [dns] - 10https://gerrit.wikimedia.org/r/786320 (https://phabricator.wikimedia.org/T302195) (owner: 10Klausman)
[14:22:12] <tgr>	 thanks urbanecm!
[14:22:18] <urbanecm>	 lemme sync a README now w/o --force to ensure canaries pass now
[14:22:55] <wikibugs>	 (03CR) 10Klausman: [C: 03+2] Add service IP for ML staging k8s ctrl plane [dns] - 10https://gerrit.wikimedia.org/r/786320 (https://phabricator.wikimedia.org/T302195) (owner: 10Klausman)
[14:23:09] <tgr>	 the bacc revert instructions for extensions are incorrect, right? https://deploy-commands.toolforge.org/bacc/785950 I think it's missing a step to actually change the submodule code?
[14:23:24] <urbanecm>	 tgr: no problem. tbh, i usually revert extension backports from gerrit (and bypassing CI with V+2), as it's...easier (and less error-prone)
[14:23:37] <urbanecm>	 in theory git submodule update should do the trick, not 100% sure why it doesn't
[14:23:45] <tgr>	 duh. didn't even think of that.
[14:24:28] <urbanecm>	 okay, canaries pass according to scap. i guess we're done?
[14:24:39] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized README: no op (duration: 02m 11s)
[14:24:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:43] <logmsgbot>	 !log klausman@cumin1001 START - Cookbook sre.dns.netbox
[14:25:00] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34962/console" [puppet] - 10https://gerrit.wikimedia.org/r/786319 (https://phabricator.wikimedia.org/T302195) (owner: 10Klausman)
[14:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:29] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1 C: 03+1] hiera: Switch ML staging control plane to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/786319 (https://phabricator.wikimedia.org/T302195) (owner: 10Klausman)
[14:25:56] <tgr>	 there is a single big error spike, looks like the canary prevented the code from getting any real traffic. Thanks, I think we are good for now. I'll debug later, gotta catch the meeting.
[14:26:04] <urbanecm>	 see you there :)
[14:28:18] <logmsgbot>	 !log klausman@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:28:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:45] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[14:38:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:38:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:38:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:38:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:05] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1438 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[14:43:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:43:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:43:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:43:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:43:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:02] <vgutierrez>	 !log upload trafficserver 8.0.8-1wm6 to apt.wm.o (buster) - T304835
[14:44:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:23] <wikibugs>	 (03PS1) 10JMeybohm: Update miscweb relates records for use with k8s ingress [dns] - 10https://gerrit.wikimedia.org/r/786322 (https://phabricator.wikimedia.org/T305358)
[14:49:07] <vgutierrez>	 !log upgrading trafficserver to 8.0.8-1wm6 on cp4026 - T304835
[14:49:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:18] <wikibugs>	 (03PS1) 10JMeybohm: Remove miscweb discovery resources [puppet] - 10https://gerrit.wikimedia.org/r/786323 (https://phabricator.wikimedia.org/T305358)
[14:52:22] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host druid1007.eqiad.wmnet
[14:52:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:26] <wikibugs>	 (03PS2) 10JMeybohm: Update miscweb relates records for use with k8s ingress [dns] - 10https://gerrit.wikimedia.org/r/786322 (https://phabricator.wikimedia.org/T305358)
[14:56:22] <vgutierrez>	 !log upgrading trafficserver to 8.0.8-1wm6 on cp4032 - T304835
[14:56:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:20] <wikibugs>	 (03PS1) 10Cathal Mooney: CHANGELOG: add changelogs for release v0.4.1 [software/homer] - 10https://gerrit.wikimedia.org/r/786325
[15:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[15:02:08] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer] - 10https://gerrit.wikimedia.org/r/786325 (owner: 10Cathal Mooney)
[15:04:47] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] CHANGELOG: add changelogs for release v0.4.1 [software/homer] - 10https://gerrit.wikimedia.org/r/786325 (owner: 10Cathal Mooney)
[15:08:08] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.4.1 [software/homer] - 10https://gerrit.wikimedia.org/r/786325 (owner: 10Cathal Mooney)
[15:09:25] <wikibugs>	 (03CR) 10Klausman: [C: 03+2] hiera: Switch ML staging control plane to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/786319 (https://phabricator.wikimedia.org/T302195) (owner: 10Klausman)
[15:10:05] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] dnsrecursor: refactor module (see detailed commit message) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779936 (https://phabricator.wikimedia.org/T305589) (owner: 10Ssingh)
[15:10:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add fix_img_major_mime_null_T306560.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/784762 (https://phabricator.wikimedia.org/T306560) (owner: 10Ladsgroup)
[15:11:09] <wikibugs>	 (03Merged) 10jenkins-bot: Add fix_img_major_mime_null_T306560.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/784762 (https://phabricator.wikimedia.org/T306560) (owner: 10Ladsgroup)
[15:11:48] <wikibugs>	 (03CR) 10Ladsgroup: "it looks good to me and thank you so much for doing it but my knowledge of ruby is not good enough to properly review this." [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270) (owner: 10Jbond)
[15:12:48] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1007.eqiad.wmnet
[15:12:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:24] <klausman>	 !log Restarting pybal on lvs2010 to pick up change 786319 (ML staging k8s service setup)
[15:14:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:27] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs2010 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.72:6443]) https://wikitech.wikimedia.org/wiki/PyBal
[15:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[15:22:59] <wikibugs>	 (03PS2) 10JMeybohm: Remove miscweb discovery resources [puppet] - 10https://gerrit.wikimedia.org/r/786323 (https://phabricator.wikimedia.org/T305358)
[15:24:32] <logmsgbot>	 !log klausman@puppetmaster1001 conftool action : set/pooled=yes,weight=10; selector: name=ml-staging-ctrl2001
[15:24:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:40] <logmsgbot>	 !log klausman@puppetmaster1001 conftool action : set/pooled=yes,weight=10; selector: name=ml-staging-ctrl2002
[15:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:56] <wikibugs>	 (03PS1) 10Cathal Mooney: Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786329
[15:26:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786329 (owner: 10Cathal Mooney)
[15:27:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786329 (owner: 10Cathal Mooney)
[15:27:54] <wikibugs>	 (03CR) 10Cathal Mooney: [V: 03+2 C: 03+2] Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786329 (owner: 10Cathal Mooney)
[15:27:58] <logmsgbot>	 !log klausman@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=ml-staging-ctrl2001.codfw.wmnet
[15:28:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:02] <logmsgbot>	 !log klausman@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=ml-staging-ctrl2002.codfw.wmnet
[15:28:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:24] <wikibugs>	 (03CR) 10Cwhite: profile: re-enable grafana db sync post 8.x upgrade (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785927 (owner: 10Cwhite)
[15:30:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[15:30:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[15:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26557 and previous config saved to /var/cache/conftool/dbconfig/20220426-153039-ladsgroup.json
[15:30:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:44] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[15:31:44] <wikibugs>	 (03PS1) 10David Caro: wmcs.codfw1: use the correct memcached port for the exporter [puppet] - 10https://gerrit.wikimedia.org/r/786330
[15:31:55] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2010 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[15:32:01] <vgutierrez>	 ^^ klausman 
[15:32:10] <klausman>	 excellent
[15:32:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26558 and previous config saved to /var/cache/conftool/dbconfig/20220426-153253-ladsgroup.json
[15:32:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:01] <wikibugs>	 (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34963/console" [puppet] - 10https://gerrit.wikimedia.org/r/786330 (owner: 10David Caro)
[15:34:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[15:34:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[15:34:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:32] <klausman>	 !log Restarting pybal on lvs2009 to pick up change 786319 (ML staging k8s service setup)
[15:34:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[15:34:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[15:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P26559 and previous config saved to /var/cache/conftool/dbconfig/20220426-153449-ladsgroup.json
[15:34:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:55] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[15:35:48] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] Fix permissions/ownership of helm directories [puppet] - 10https://gerrit.wikimedia.org/r/786269 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[15:37:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P26560 and previous config saved to /var/cache/conftool/dbconfig/20220426-153720-ladsgroup.json
[15:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:35] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.4.1 - cmooney@cumin1001
[15:40:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:13] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.4.1 - cmooney@cumin1001
[15:42:13] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10MatthewVernon)
[15:42:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:26] <wikibugs>	 10SRE-swift-storage, 10Patch-For-Review: swift wmf/rewrite.py middleware broken on bullseye (and its test suite doesn't work either) - https://phabricator.wikimedia.org/T305942 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon I think this is now all working satisfactorily (ms-fe1012 is now pooled in...
[15:45:13] <wikibugs>	 (03PS2) 10Klausman: labs: Add dummy token for istio-cni on ML staging k8s [labs/private] - 10https://gerrit.wikimedia.org/r/775823
[15:45:53] <wikibugs>	 (03PS3) 10Klausman: labs: Add dummy token for istio-cni on ML staging k8s [labs/private] - 10https://gerrit.wikimedia.org/r/775823
[15:47:39] <wikibugs>	 (03PS1) 10Ladsgroup: Set actor migration to read new for medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786341 (https://phabricator.wikimedia.org/T275246)
[15:47:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P26561 and previous config saved to /var/cache/conftool/dbconfig/20220426-154758-ladsgroup.json
[15:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:50] <wikibugs>	 (03PS1) 10BryanDavis: toolhub: Bump container version to 2022-04-21-215651-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/786342 (https://phabricator.wikimedia.org/T279713)
[15:50:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[15:52:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P26562 and previous config saved to /var/cache/conftool/dbconfig/20220426-155226-ladsgroup.json
[15:52:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:23] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] toolhub: Bump container version to 2022-04-21-215651-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/786342 (https://phabricator.wikimedia.org/T279713) (owner: 10BryanDavis)
[15:58:08] <logmsgbot>	 !log dancy@deploy1002 Started deploy [restbase/deploy@0205f1d] (dev-cluster): (no justification provided)
[15:58:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:59] <wikibugs>	 (03Merged) 10jenkins-bot: toolhub: Bump container version to 2022-04-21-215651-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/786342 (https://phabricator.wikimedia.org/T279713) (owner: 10BryanDavis)
[16:00:04] <jouncebot>	 jbond and rzl: (Dis)respected human, time to deploy Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1600). Please do the needful.
[16:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:00:04] <jouncebot>	 bd808: May I have your attention please! Toolhub. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1600)
[16:00:20] <bd808>	 o/
[16:00:51] <logmsgbot>	 !log dancy@deploy1002 Finished deploy [restbase/deploy@0205f1d] (dev-cluster): (no justification provided) (duration: 02m 43s)
[16:00:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:25] <logmsgbot>	 !log bd808@deploy1002 helmfile [staging] START helmfile.d/services/toolhub: apply
[16:01:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P26563 and previous config saved to /var/cache/conftool/dbconfig/20220426-160303-ladsgroup.json
[16:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:17] <logmsgbot>	 !log bd808@deploy1002 helmfile [staging] DONE helmfile.d/services/toolhub: apply
[16:03:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:33] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Neutron networking not working for cloudnet200[5,6]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306861 (10Papaul) a:05Papaul→03Andrew
[16:03:53] <icinga-wm>	 RECOVERY - puppet last run on ml-staging-ctrl2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:04:28] <logmsgbot>	 !log bd808@deploy1002 helmfile [codfw] START helmfile.d/services/toolhub: apply
[16:04:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:03] <wikibugs>	 (03CR) 10David Caro: "Got a question, otherwise LGTM (if the answer is "it will not" or "yes, but we don't care" feel free to merge)." [puppet] - 10https://gerrit.wikimedia.org/r/786307 (owner: 10Andrew Bogott)
[16:06:16] <logmsgbot>	 !log bd808@deploy1002 helmfile [codfw] DONE helmfile.d/services/toolhub: apply
[16:06:25] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[16:06:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P26564 and previous config saved to /var/cache/conftool/dbconfig/20220426-160731-ladsgroup.json
[16:07:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:47] <icinga-wm>	 RECOVERY - Disk space on ml-staging-ctrl2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2001&var-datasource=codfw+prometheus/ops
[16:09:49] <logmsgbot>	 !log dancy@deploy1002 Started deploy [restbase/deploy@0205f1d] (dev-cluster): testing
[16:09:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:06] <logmsgbot>	 !log dancy@deploy1002 Finished deploy [restbase/deploy@0205f1d] (dev-cluster): testing (duration: 00m 17s)
[16:10:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:04] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:11:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:48] <logmsgbot>	 !log bd808@deploy1002 helmfile [eqiad] START helmfile.d/services/toolhub: apply
[16:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[16:12:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:27] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10decommission-hardware, 10cloud-services-team (Kanban): Decom cloudcephmon200[2,3]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306840 (10Papaul)
[16:13:41] <logmsgbot>	 !log bd808@deploy1002 helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
[16:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:41] <icinga-wm>	 RECOVERY - puppet last run on ml-staging-ctrl2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:16:02] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10Papaul)
[16:16:11] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:16:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:38] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10decommission-hardware, 10cloud-services-team (Kanban): Decom cloudcephmon200[2,3]-dev.codfw.wmnet - https://phabricator.wikimedia.org/T306840 (10Papaul) 05Open→03Resolved complete
[16:17:04] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Kanban): Decom cloudservices200[2,3]-dev.wikimedia.org - https://phabricator.wikimedia.org/T306669 (10Papaul)
[16:18:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26566 and previous config saved to /var/cache/conftool/dbconfig/20220426-161808-ladsgroup.json
[16:18:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[16:18:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[16:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:14] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[16:18:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26567 and previous config saved to /var/cache/conftool/dbconfig/20220426-161816-ladsgroup.json
[16:18:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:45] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Kanban): Decom cloudservices200[2,3]-dev.wikimedia.org - https://phabricator.wikimedia.org/T306669 (10Papaul) 05Open→03Resolved complete
[16:18:52] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10Papaul)
[16:19:51] <wikibugs>	 (03CR) 10Herron: [C: 03+1] profile: re-enable grafana db sync post 8.x upgrade (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785927 (owner: 10Cwhite)
[16:20:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26568 and previous config saved to /var/cache/conftool/dbconfig/20220426-162029-ladsgroup.json
[16:20:32] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] conftool-date: add mw2412 through mw2419 as new appservers [puppet] - 10https://gerrit.wikimedia.org/r/785918 (https://phabricator.wikimedia.org/T290192) (owner: 10Dzahn)
[16:20:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:45] <jinxer-wm>	 (JobUnavailable) resolved: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:22:08] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[16:22:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P26569 and previous config saved to /var/cache/conftool/dbconfig/20220426-162236-ladsgroup.json
[16:22:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[16:22:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[16:22:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:41] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[16:22:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P26570 and previous config saved to /var/cache/conftool/dbconfig/20220426-162244-ladsgroup.json
[16:22:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:56] <bd808>	 !log Toolhub upgrade to 18d94d and post-deploy data migrations complete
[16:22:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:38] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] conftool-date: add mw2412 through mw2419 as new appservers [puppet] - 10https://gerrit.wikimedia.org/r/785918 (https://phabricator.wikimedia.org/T290192) (owner: 10Dzahn)
[16:24:28] <wikibugs>	 (03CR) 10David Caro: "LGTM, one question though." [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[16:25:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P26571 and previous config saved to /var/cache/conftool/dbconfig/20220426-162517-ladsgroup.json
[16:25:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:35] <icinga-wm>	 RECOVERY - Disk space on ml-staging-ctrl2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2002&var-datasource=codfw+prometheus/ops
[16:26:35] <wikibugs>	 10SRE, 10ops-codfw, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudweb2001-dev.wikimedia.org - https://phabricator.wikimedia.org/T306843 (10Papaul)
[16:27:09] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:27:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:20] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[16:28:22] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[16:28:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:24] <Amir1>	 jouncebot: nowandnext
[16:28:24] <jouncebot>	 For the next 0 hour(s) and 31 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1600)
[16:28:25] <jouncebot>	 For the next 0 hour(s) and 31 minute(s): Toolhub (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1600)
[16:28:25] <jouncebot>	 In 1 hour(s) and 31 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1800)
[16:28:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Set actor migration to read new for medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786341 (https://phabricator.wikimedia.org/T275246) (owner: 10Ladsgroup)
[16:29:14] <mutante>	 Amir1: adding a few new appservers in codfw in conftool right now. this means "add to scap groups"
[16:29:20] <mutante>	 but not starting yet, right
[16:29:28] <Amir1>	 sure, I'll be quick
[16:29:35] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10Papaul)
[16:29:39] <wikibugs>	 (03Merged) 10jenkins-bot: Set actor migration to read new for medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786341 (https://phabricator.wikimedia.org/T275246) (owner: 10Ladsgroup)
[16:29:45] <wikibugs>	 10SRE, 10ops-codfw, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudweb2001-dev.wikimedia.org - https://phabricator.wikimedia.org/T306843 (10Papaul) 05Open→03Resolved complete
[16:29:47] <mutante>	 ah, ok
[16:30:10] <mutante>	 I need to merge but let me set them to "inactive" asap
[16:30:11] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.hosts.reboot-single for host ml-staging-ctrl2002.codfw.wmnet
[16:30:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:21] <mutante>	 inactive = not in scap
[16:32:09] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:786341|Set actor migration to read new for medium wikis (T275246)]] (duration: 02m 01s)
[16:32:11] <brennen>	 Amir1, mutante: any reason to wait on starting train prep?
[16:32:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:16] <stashbot>	 T275246: Populate rev_actor and rev_comment_id - https://phabricator.wikimedia.org/T275246
[16:32:21] <Amir1>	 I am done 
[16:33:16] <mutante>	 brennen: no, the only thing that could happen right now is that during the actual sync you get timeouts from some mw2*. I am waiting for conftool-data to sync
[16:33:45] <icinga-wm>	 RECOVERY - Check systemd state on ml-staging-ctrl2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:35:06] <brennen>	 mutante: ack, thx.
[16:35:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[16:35:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[16:35:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[16:35:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[16:35:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P26572 and previous config saved to /var/cache/conftool/dbconfig/20220426-163535-ladsgroup.json
[16:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:36] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1021.eqiad.wmnet with OS bullseye
[16:35:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:20] <wikibugs>	 (03PS1) 10Ebernhardson: Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644)
[16:36:45] <logmsgbot>	 !log klausman@cumin2002 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-staging-ctrl2002.codfw.wmnet
[16:36:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644) (owner: 10Ebernhardson)
[16:40:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P26573 and previous config saved to /var/cache/conftool/dbconfig/20220426-164022-ladsgroup.json
[16:40:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:35] <wikibugs>	 (03PS2) 10Ebernhardson: Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644)
[16:43:03] <wikibugs>	 (03PS1) 10Brennen Bearnes: testwikis wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786349
[16:43:05] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] testwikis wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786349 (owner: 10Brennen Bearnes)
[16:43:44] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786349 (owner: 10Brennen Bearnes)
[16:44:41] <logmsgbot>	 !log brennen@deploy1002 Started scap: testwikis wikis to 1.39.0-wmf.9  refs T305215
[16:44:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:44:47] <stashbot>	 T305215: 1.39.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T305215
[16:46:24] <logmsgbot>	 !log brennen@deploy1002 deploy-promote aborted:  (duration: 03m 22s)
[16:46:24] <logmsgbot>	 !log brennen@deploy1002 stage-train aborted:  (duration: 06m 04s)
[16:46:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:47:09] <brennen>	 !log forgot SCAP=scap environment variable, re-running testwiki sync
[16:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:20] <logmsgbot>	 !log brennen@deploy1002 Started scap: testwikis wikis to 1.39.0-wmf.9  refs T305215
[16:48:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[16:50:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P26574 and previous config saved to /var/cache/conftool/dbconfig/20220426-165040-ladsgroup.json
[16:50:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[16:50:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[16:50:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[16:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:17] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1021.eqiad.wmnet with reason: host reimage
[16:51:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:01] <icinga-wm>	 PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2002 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner
[16:53:27] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:53:58] <mutante>	 brennen: ^ root-owned files in staging.. wondering if we need to fix that
[16:54:13] <mutante>	 but that is deploy2002
[16:54:34] <mutante>	 maybe sync related
[16:54:59] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1021.eqiad.wmnet with reason: host reimage
[16:55:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:55:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P26575 and previous config saved to /var/cache/conftool/dbconfig/20220426-165526-ladsgroup.json
[16:55:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:53] <brennen>	 yeah, sync-related says dancy, should be self-correcting.
[16:57:02] <mutante>	 yep, sounds like it. ACK
[16:57:05] <mutante>	 ty
[17:00:02] <mutante>	 brennen: I don't see the conftool-data synced on config-master ..BUT .. the hosts are in conftool and by default are "inactive" which means "not in scap / "dsh" groups" so for a deployer like you..nothing should happen at all.
[17:00:38] <wikibugs>	 (03PS1) 10Ahmon Dancy: Merge remote-tracking branch 'origin/master' into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/786351
[17:01:28] <brennen>	 mutante: ack, thanks.
[17:03:14] <icinga-wm>	 RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2002 is OK: Files ownership is ok. https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner
[17:03:31] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] Merge remote-tracking branch 'origin/master' into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/786351 (owner: 10Ahmon Dancy)
[17:04:16] <wikibugs>	 (03Merged) 10jenkins-bot: Merge remote-tracking branch 'origin/master' into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/786351 (owner: 10Ahmon Dancy)
[17:04:26] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:05:06] <wikibugs>	 (03PS3) 10Ebernhardson: Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644)
[17:05:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26576 and previous config saved to /var/cache/conftool/dbconfig/20220426-170545-ladsgroup.json
[17:05:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[17:05:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[17:05:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:05:52] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[17:05:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T298556)', diff saved to https://phabricator.wikimedia.org/P26577 and previous config saved to /var/cache/conftool/dbconfig/20220426-170553-ladsgroup.json
[17:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:15] <brennen>	 did just get some timeouts for codfw hosts.
[17:08:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298556)', diff saved to https://phabricator.wikimedia.org/P26578 and previous config saved to /var/cache/conftool/dbconfig/20220426-170807-ladsgroup.json
[17:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:26] <mutante>	 brennen: narff.. were they all starting with mw24*
[17:08:46] <mutante>	 2412 - 2419, right
[17:08:58] <mutante>	 it should not happen though
[17:09:17] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1021.eqiad.wmnet with OS bullseye
[17:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:32] <brennen>	 mutante: mw2258, mw2366, mw2253, mw2309, parse2012.codfw.wmnet, ...
[17:10:00] <brennen>	 list continues for a bit - you can see it in deploy1002:~brennen/1.39.0-wmf.9.log
[17:10:03] <mutante>	 brennen: oh.. that is NOT what I was doing.. 
[17:10:10] <brennen>	 hrm
[17:10:20] <brennen>	 32 failures total on sync-apaches
[17:10:25] <mutante>	 we have had reboots of codfw hosts though
[17:10:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P26579 and previous config saved to /var/cache/conftool/dbconfig/20220426-171032-ladsgroup.json
[17:10:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[17:10:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[17:10:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:38] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:10:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[17:11:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[17:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:17] <brennen>	 all codfw hosts i think, but no obvious pattern to it.
[17:11:28] <brennen>	 oh wait - hrm: mw1362.eqiad.wmnet
[17:11:32] <mutante>	 arg, so.. all of them have been rebooted
[17:11:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:11:40] <mutante>	 but that's not related to what I was talking about earlier
[17:11:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:11:41] <mutante>	 afaict
[17:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P26580 and previous config saved to /var/cache/conftool/dbconfig/20220426-171144-ladsgroup.json
[17:11:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:28] <wikibugs>	 (03CR) 10Ebernhardson: "can compare to previous ab test configured in I63e011610" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644) (owner: 10Ebernhardson)
[17:13:24] <mutante>	 looking at mw1362
[17:13:37] <brennen>	 mutante: https://phabricator.wikimedia.org/P26581
[17:13:38] <mutante>	 !log mw1362 - scap pull
[17:13:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:48] <mutante>	 17:13:39 Started scap-cdb-rebuild
[17:14:01] <mutante>	 brennen: it's pooled.. and it's online and it can pull... hrmm
[17:14:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P26582 and previous config saved to /var/cache/conftool/dbconfig/20220426-171418-ladsgroup.json
[17:14:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:14:26] <brennen>	 yeah, this is weird - bunch more eqiad hosts than i thought as well.
[17:14:44] <mutante>	 testing one of the wtp hosts
[17:14:57] <mutante>	 so far everything looks normal and working 
[17:15:16] <mutante>	 could we.. hmm.. just repeat it?
[17:15:43] <brennen>	 once the sync finishes i don't think there's any reason i couldn't run sync-world again.
[17:15:44] <mutante>	 !log wtp1046 - scap pull
[17:15:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:51] <brennen>	 should be fast.  i'll plan on that.
[17:16:56] <mutante>	 also this random parsoid machine is in the conftool-data and pooled and can pull
[17:17:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[17:21:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[17:21:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[17:21:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[17:21:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[17:21:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:22:57] <logmsgbot>	 !log brennen@deploy1002 Finished scap: testwikis wikis to 1.39.0-wmf.9  refs T305215 (duration: 34m 37s)
[17:23:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:23:07] <stashbot>	 T305215: 1.39.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T305215
[17:23:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P26583 and previous config saved to /var/cache/conftool/dbconfig/20220426-172312-ladsgroup.json
[17:23:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:23:31] <mutante>	 !log mw2309 - scap pull
[17:23:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:26:20] <logmsgbot>	 !log brennen@deploy1002 Started scap: Re-running sync-world to see if timeouts recur for 32 hosts (T305215)
[17:26:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:04] <logmsgbot>	 !log brennen@deploy1002 Finished scap: Re-running sync-world to see if timeouts recur for 32 hosts (T305215) (duration: 01m 43s)
[17:28:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:10] <stashbot>	 T305215: 1.39.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T305215
[17:28:18] <brennen>	 mutante, dancy: that one ran cleanly
[17:28:29] <dancy>	 👍🏾 
[17:28:51] <wikibugs>	 (03CR) 10Muehlenhoff: "That sounds great, but let's hold merging the patch until Legal has given the whole approach their blessing." [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270) (owner: 10Jbond)
[17:29:02] <mutante>	 brennen: uff, glad to hear that
[17:29:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P26584 and previous config saved to /var/cache/conftool/dbconfig/20220426-172923-ladsgroup.json
[17:29:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:42] <brennen>	 maybe just network weather, but i haven't generally encountered random timeouts like that in the past.
[17:30:36] <logmsgbot>	 !log brennen@deploy1002 Pruned MediaWiki: 1.39.0-wmf.7 (duration: 01m 29s)
[17:30:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:59] <mutante>	 the only thing is that I had just merged that conftool-data change
[17:32:08] <mutante>	 but it makes no sense that this group of hosts was affected
[17:32:12] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] P:openstack::encapi: add tls for write endpoint (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[17:33:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Don't prompt for loading additional firmware in d-i [puppet] - 10https://gerrit.wikimedia.org/r/784259 (https://phabricator.wikimedia.org/T306148) (owner: 10Muehlenhoff)
[17:36:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[17:36:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[17:36:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[17:36:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[17:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:38:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P26585 and previous config saved to /var/cache/conftool/dbconfig/20220426-173817-ladsgroup.json
[17:38:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:38:37] <wikibugs>	 10SRE, 10ops-codfw: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:41:12] <wikibugs>	 10SRE, 10ops-codfw: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:43:11] <wikibugs>	 (03PS1) 10Muehlenhoff: sre.ganeti.addnode: Fix bridge detection [cookbooks] - 10https://gerrit.wikimedia.org/r/786356
[17:44:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P26586 and previous config saved to /var/cache/conftool/dbconfig/20220426-174428-ladsgroup.json
[17:44:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:41] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] icinga: remove SMART check [puppet] - 10https://gerrit.wikimedia.org/r/785921 (https://phabricator.wikimedia.org/T294564) (owner: 10JHathaway)
[17:47:28] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/786356 (owner: 10Muehlenhoff)
[17:53:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298556)', diff saved to https://phabricator.wikimedia.org/P26587 and previous config saved to /var/cache/conftool/dbconfig/20220426-175322-ladsgroup.json
[17:53:23] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:53:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[17:53:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[17:53:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:29] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[17:53:29] <wikibugs>	 (03PS1) 10Jdlrobson: Enable table of contents a/b test on euwiki and hewiki, enable reading depth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786357 (https://phabricator.wikimedia.org/T306606)
[17:53:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
[17:53:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
[17:53:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
[17:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
[17:53:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[17:54:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[17:54:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:18] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[17:54:20] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[17:54:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1130 (T298556)', diff saved to https://phabricator.wikimedia.org/P26588 and previous config saved to /var/cache/conftool/dbconfig/20220426-175424-ladsgroup.json
[17:54:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:43] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10Papaul) @Eevans I received those nodes today so I will be racking them tomorrow. Here is my racking proposal for tomorrow. |Row| Rack| nodes| |A|A6|aqs2001,aqs2002,...
[17:55:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298556)', diff saved to https://phabricator.wikimedia.org/P26589 and previous config saved to /var/cache/conftool/dbconfig/20220426-175536-ladsgroup.json
[17:55:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:46] <wikibugs>	 (03CR) 10Gergő Tisza: "Caused:" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785950 (https://phabricator.wikimedia.org/T303785) (owner: 10Gergő Tisza)
[17:57:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10netops: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) The above patch is working, however I'm not 100% the resulting config is what we need.  Looking, for instance, at ml-se...
[17:58:18] <wikibugs>	 (03PS1) 10Gergő Tisza: Re-apply "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785941
[17:59:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P26590 and previous config saved to /var/cache/conftool/dbconfig/20220426-175933-ladsgroup.json
[17:59:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[17:59:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[17:59:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:40] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:59:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1132 (T306560)', diff saved to https://phabricator.wikimedia.org/P26591 and previous config saved to /var/cache/conftool/dbconfig/20220426-175941-ladsgroup.json
[17:59:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:53] <wikibugs>	 (03PS1) 10Jdlrobson: Expand max-width to login, create account, disable on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786358 (https://phabricator.wikimedia.org/T300182)
[18:00:05] <jouncebot>	 brennen and jeena: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1800).
[18:00:43] <brennen>	 o/ - going to group0 shortly.
[18:02:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132 (T306560)', diff saved to https://phabricator.wikimedia.org/P26592 and previous config saved to /var/cache/conftool/dbconfig/20220426-180214-ladsgroup.json
[18:02:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:02:21] <wikibugs>	 (03PS1) 10Jdlrobson: [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785942 (https://phabricator.wikimedia.org/T306904)
[18:03:19] <wikibugs>	 (03PS1) 10Brennen Bearnes: group0 wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786359
[18:03:21] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] group0 wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786359 (owner: 10Brennen Bearnes)
[18:03:36] <wikibugs>	 (03PS1) 10Jdlrobson: [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785943 (https://phabricator.wikimedia.org/T306904)
[18:04:45] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.39.0-wmf.9  refs T305215 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786359 (owner: 10Brennen Bearnes)
[18:06:02] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.9  refs T305215
[18:06:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:08] <stashbot>	 T305215: 1.39.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T305215
[18:07:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:07:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:07:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:07:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:07:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:51] <wikibugs>	 (03CR) 10Gergő Tisza: "The canary error was caused by Idf35f67fb298914dad7c80a2ad135909fd344860. This patch looks safe to re-apply." [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785926 (https://phabricator.wikimedia.org/T304150) (owner: 10Gergő Tisza)
[18:11:32] <wikibugs>	 (03PS1) 10Gergő Tisza: Re-apply "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944
[18:13:32] <wikibugs>	 (03PS2) 10Gergő Tisza: Re-apply "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944
[18:14:04] <wikibugs>	 (03PS2) 10Gergő Tisza: Re-apply "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785941
[18:17:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P26593 and previous config saved to /var/cache/conftool/dbconfig/20220426-181719-ladsgroup.json
[18:17:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:35] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T306927 (10RobH)
[18:21:56] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T306927 (10RobH)
[18:25:20] <wikibugs>	 (03PS1) 10Cathal Mooney: Correct wmf-netbox plugin failure with patch panel front ports [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361
[18:25:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db1185.eqiad.wmnet - db1195.eqiad.wmnet - https://phabricator.wikimedia.org/T306928 (10RobH)
[18:26:18] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db1185.eqiad.wmnet - db1195.eqiad.wmnet - https://phabricator.wikimedia.org/T306928 (10RobH)
[18:27:20] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, optional nit inline" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361 (owner: 10Cathal Mooney)
[18:31:57] <wikibugs>	 (03PS2) 10Cathal Mooney: Correct wmf-netbox plugin failure with patch panel front ports [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361
[18:32:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P26594 and previous config saved to /var/cache/conftool/dbconfig/20220426-183224-ladsgroup.json
[18:32:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361 (owner: 10Cathal Mooney)
[18:32:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:45] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Correct wmf-netbox plugin failure with patch panel front ports [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361 (owner: 10Cathal Mooney)
[18:32:48] <wikibugs>	 (03PS1) 10RobH: updating sku list [software] - 10https://gerrit.wikimedia.org/r/786363
[18:32:50] <wikibugs>	 (03CR) 10Cathal Mooney: [V: 03+2 C: 03+2] Correct wmf-netbox plugin failure with patch panel front ports [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361 (owner: 10Cathal Mooney)
[18:33:28] <wikibugs>	 (03CR) 10RobH: [C: 03+2] updating sku list [software] - 10https://gerrit.wikimedia.org/r/786363 (owner: 10RobH)
[18:34:40] <wikibugs>	 (03CR) 10Cathal Mooney: [V: 03+2 C: 03+2] "Thanks volans." [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786361 (owner: 10Cathal Mooney)
[18:35:07] <wikibugs>	 (03PS1) 10Jbond: C:monitoring: Add define for creating http checks [puppet] - 10https://gerrit.wikimedia.org/r/786365
[18:37:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:37:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:37:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:37:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:37:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[18:38:35] <wikibugs>	 (03CR) 10Jbond: "early review to discuss path forward" [puppet] - 10https://gerrit.wikimedia.org/r/786365 (owner: 10Jbond)
[18:39:21] <wikibugs>	 (03PS3) 10Gergő Tisza: Re-apply "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785941
[18:39:24] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable SkinAddFooterLinks hook [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/786366
[18:40:27] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "-1 awaiting legal sign of" [puppet] - 10https://gerrit.wikimedia.org/r/786310 (https://phabricator.wikimedia.org/T67270) (owner: 10Jbond)
[18:40:34] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Turn on retry_on_conflict quirk [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786367
[18:40:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[18:40:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[18:40:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:40:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26595 and previous config saved to /var/cache/conftool/dbconfig/20220426-184058-ladsgroup.json
[18:41:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:41:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:41:07] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[18:42:19] <wikibugs>	 (03CR) 10Umherirrender: [C: 03+1] "[Cannot help on deploying this, normal +2 is not enough on this repo, needs to be listed on https://wikitech.wikimedia.org/wiki/Deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/740304 (owner: 10Thiemo Kreuz (WMDE))
[18:43:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26596 and previous config saved to /var/cache/conftool/dbconfig/20220426-184313-ladsgroup.json
[18:43:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:16] <wikibugs>	 (03PS1) 10Cathal Mooney: Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786369
[18:47:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132 (T306560)', diff saved to https://phabricator.wikimedia.org/P26597 and previous config saved to /var/cache/conftool/dbconfig/20220426-184729-ladsgroup.json
[18:47:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[18:47:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[18:47:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:36] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:47:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:56] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786369 (owner: 10Cathal Mooney)
[18:48:00] <wikibugs>	 (03CR) 10Cathal Mooney: [V: 03+2 C: 03+2] Release v0.4.1 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/786369 (owner: 10Cathal Mooney)
[18:48:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[18:48:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[18:48:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:48:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:48:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P26598 and previous config saved to /var/cache/conftool/dbconfig/20220426-184815-ladsgroup.json
[18:48:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:18] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.4.1a - cmooney@cumin1001
[18:49:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P26599 and previous config saved to /var/cache/conftool/dbconfig/20220426-185047-ladsgroup.json
[18:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:54] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.4.1a - cmooney@cumin1001
[18:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:00] <wikibugs>	 (03CR) 10Umherirrender: "[Cannot help on deploying this, normal +2 is not enough on this repo, needs to be listed on https://wikitech.wikimedia.org/wiki/Deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 (owner: 10Thiemo Kreuz (WMDE))
[18:53:03] <wikibugs>	 (03CR) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf (0333 comments) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[18:57:11] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:58:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P26601 and previous config saved to /var/cache/conftool/dbconfig/20220426-185818-ladsgroup.json
[18:58:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:02:45] <aqu>	 !log About to deploy analytics/refinery: Weekly deployment train + Artifacts to 0.1.27
[19:02:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P26602 and previous config saved to /var/cache/conftool/dbconfig/20220426-190552-ladsgroup.json
[19:05:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:14] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@96a3934]: Regular analytics weekly train [analytics/refinery@96a3934]
[19:06:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:02] <wikibugs>	 (03PS1) 10Gergő Tisza: [beta] Restore eswiki Growth campaigns test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786375 (https://phabricator.wikimedia.org/T306833)
[19:13:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P26603 and previous config saved to /var/cache/conftool/dbconfig/20220426-191323-ladsgroup.json
[19:13:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:07] <wikibugs>	 (03PS1) 10Bking: elastic: Add wmf-elasticsearch-search-plugins package for bullseye [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/786376 (https://phabricator.wikimedia.org/T306911)
[19:20:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P26604 and previous config saved to /var/cache/conftool/dbconfig/20220426-192057-ladsgroup.json
[19:21:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:24:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] sre.ganeti.addnode: Fix bridge detection [cookbooks] - 10https://gerrit.wikimedia.org/r/786356 (owner: 10Muehlenhoff)
[19:27:42] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install frdb1005, frdev1003 - https://phabricator.wikimedia.org/T306935 (10RobH)
[19:27:59] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install frdb1005, frdev1003 - https://phabricator.wikimedia.org/T306935 (10RobH)
[19:28:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298556)', diff saved to https://phabricator.wikimedia.org/P26605 and previous config saved to /var/cache/conftool/dbconfig/20220426-192828-ladsgroup.json
[19:28:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[19:28:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[19:28:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:28:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:35] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[19:28:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298556)', diff saved to https://phabricator.wikimedia.org/P26606 and previous config saved to /var/cache/conftool/dbconfig/20220426-192841-ladsgroup.json
[19:28:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:49] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@96a3934]: Regular analytics weekly train [analytics/refinery@96a3934] (duration: 24m 35s)
[19:30:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298556)', diff saved to https://phabricator.wikimedia.org/P26607 and previous config saved to /var/cache/conftool/dbconfig/20220426-193055-ladsgroup.json
[19:31:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:11] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@96a3934] (thin): Regular analytics weekly train THIN [analytics/refinery@96a3934]
[19:34:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:18] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@96a3934] (thin): Regular analytics weekly train THIN [analytics/refinery@96a3934] (duration: 00m 07s)
[19:34:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:55] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@96a3934] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@96a3934]
[19:34:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P26608 and previous config saved to /var/cache/conftool/dbconfig/20220426-193602-ladsgroup.json
[19:36:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[19:36:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[19:36:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:08] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[19:36:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T306560)', diff saved to https://phabricator.wikimedia.org/P26609 and previous config saved to /var/cache/conftool/dbconfig/20220426-193610-ladsgroup.json
[19:36:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:05] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:38:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T306560)', diff saved to https://phabricator.wikimedia.org/P26610 and previous config saved to /var/cache/conftool/dbconfig/20220426-193844-ladsgroup.json
[19:38:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:13] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@96a3934] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@96a3934] (duration: 07m 19s)
[19:42:17] <wikibugs>	 (03PS2) 10Jbond: C:monitoring: Add define for creating http checks [puppet] - 10https://gerrit.wikimedia.org/r/786365
[19:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:07] <dancy>	 jouncebot now
[19:43:08] <jouncebot>	 For the next 0 hour(s) and 16 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T1800)
[19:45:32] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/pooled=no; selector: dc=codfw,name=mw2419.codfw.wmnet
[19:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P26611 and previous config saved to /var/cache/conftool/dbconfig/20220426-194600-ladsgroup.json
[19:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:30] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=25; selector: dc=codfw,name=mw2419.codfw.wmnet
[19:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:18] <mutante>	 !log mw2419 - set weight to 25 in conftool, scap pull, first time in production, jobrunner/videoscaler T290192
[19:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:24] <stashbot>	 T290192: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192
[19:50:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:53:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P26612 and previous config saved to /var/cache/conftool/dbconfig/20220426-195349-ladsgroup.json
[19:53:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:38] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/pooled=yes; selector: dc=codfw,name=mw2419.codfw.wmnet
[19:54:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:51] <wikibugs>	 10SRE: Migrate role::bastionhost::general and role::bastionhost::pop to Buster - https://phabricator.wikimedia.org/T253779 (10Dzahn) This looks resolved to me:   ` [cumin2002:~] $ sudo cumin 'bast*' 'lsb_release -c' 8 hosts will be targeted: bast[1003,2002,3004-3005,4003,5001-5002,6001].wikimedia.org Ok to proce...
[19:59:50] <wikibugs>	 10SRE, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T2000).
[20:00:05] <jouncebot>	 jdrewniak, ebernhardson, and tgr: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:13] <wikibugs>	 10SRE: Migrate role::bastionhost::general and role::bastionhost::pop to Buster - https://phabricator.wikimedia.org/T253779 (10Dzahn) 05Open→03Resolved boldly setting to resolved, correct me if I'm wrong @Muehlenhoff
[20:00:24] <urbanecm>	 hey
[20:00:26] <urbanecm>	 i can deploy today
[20:00:35] <ebernhardson>	 \o
[20:00:46] <urbanecm>	 ebernhardson: unless you (or others) wish to self-service? :)
[20:01:05] <urbanecm>	 jan_drewniak: hi, around? :)
[20:01:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P26614 and previous config saved to /var/cache/conftool/dbconfig/20220426-200105-ladsgroup.json
[20:01:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:40] <wikibugs>	 10SRE, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[20:01:43] <wikibugs>	 10SRE, 10Discovery-Search (Current work): Upgrade Cirrus Elasticsearch clusters to Debian Bullseye - https://phabricator.wikimedia.org/T289135 (10Dzahn)
[20:02:12] <ebernhardson>	 urbanecm: shrug, you can ship if you want :)
[20:02:20] <wikibugs>	 10SRE, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[20:02:23] <wikibugs>	 10SRE, 10Discovery-Search (Current work): Upgrade Cirrus Elasticsearch clusters to Debian Bullseye - https://phabricator.wikimedia.org/T289135 (10Dzahn)
[20:02:34] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] cirrus: Turn on retry_on_conflict quirk [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786367 (owner: 10Ebernhardson)
[20:02:52] <ebernhardson>	 the quirks one isn't really testable, it only takes effect on the job runners. Can monitor logstash to see if it fixes the thing it's supposed to
[20:03:14] <urbanecm>	 ebernhardson: okay, good to know. the calendar links only one patch (but twice). can you fix the second link please?
[20:03:21] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Turn on retry_on_conflict quirk [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786367 (owner: 10Ebernhardson)
[20:03:26] <ebernhardson>	 the AB test one should be safe as well but might as well test on mwdebug host, it's not turning on the test so there should be no visible change
[20:03:28] <ebernhardson>	 urbanecm: sure, sec
[20:04:12] <ebernhardson>	 urbanecm: nope it refers to two different patches, they are exactly 20 patches apart (47 vs 67)
[20:04:28] <urbanecm>	 hmm, must've opened it twice myself then. sorry!
[20:04:34] <wikibugs>	 (03PS4) 10Urbanecm: Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644) (owner: 10Ebernhardson)
[20:05:04] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 080b8fc573d9d682038e09a7a7ad875bce478c00: cirrus: Turn on retry_on_conflict quirk (duration: 00m 53s)
[20:05:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:11] <urbanecm>	 first patch is live
[20:05:22] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644) (owner: 10Ebernhardson)
[20:05:51] <wikibugs>	 (03PS2) 10Cwhite: profile: re-enable grafana db sync post 8.x upgrade [puppet] - 10https://gerrit.wikimedia.org/r/785927
[20:06:08] <wikibugs>	 (03Merged) 10jenkins-bot: Add wbsearchentities profiles for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786347 (https://phabricator.wikimedia.org/T306644) (owner: 10Ebernhardson)
[20:06:18] <ebernhardson>	 cool, dropped 200 deprecation warnings/s, makes things a little quieter :)
[20:06:24] <ebernhardson>	 err, /min
[20:07:04] <urbanecm>	 sounds like a good thing to have :)
[20:07:24] <urbanecm>	 ebernhardson: second patch is at mwdebug1001, can you test please?
[20:07:39] <wikibugs>	 (03PS2) 10Urbanecm: [beta] Restore eswiki Growth campaigns test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786375 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[20:07:42] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] [beta] Restore eswiki Growth campaigns test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786375 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[20:07:59] <ebernhardson>	 urbanecm: checking
[20:08:23] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Restore eswiki Growth campaigns test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786375 (https://phabricator.wikimedia.org/T306833) (owner: 10Gergő Tisza)
[20:08:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:08:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:08:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:08:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P26615 and previous config saved to /var/cache/conftool/dbconfig/20220426-200854-ladsgroup.json
[20:08:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:55] <ebernhardson>	 urbanecm: looks to work as expected, good to go
[20:10:02] <urbanecm>	 syncing
[20:10:06] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile: re-enable grafana db sync post 8.x upgrade (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785927 (owner: 10Cwhite)
[20:10:57] <urbanecm>	 jan_drewniak: hello, around?
[20:11:11] <jan_drewniak>	 urbanecm: hey, sorry im late!
[20:11:18] <urbanecm>	 no problem
[20:11:26] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785943 (https://phabricator.wikimedia.org/T306904) (owner: 10Jdlrobson)
[20:11:27] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/: 9805e61f7006edf45199a3e22494945bffaaeb4d: Add wbsearchentities profiles for testing (T306644) (duration: 00m 53s)
[20:11:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:11:33] <stashbot>	 T306644: re-run wbsearchentities optimization process  - https://phabricator.wikimedia.org/T306644
[20:11:34] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785942 (https://phabricator.wikimedia.org/T306904) (owner: 10Jdlrobson)
[20:11:50] <urbanecm>	 ebernhardson: it's live. anything else from you?
[20:12:19] <wikibugs>	 10SRE, 10WMF-JobQueue, 10serviceops, 10Sustainability (Incident Followup): Videoscalers fail health checks while CPU is maxed - https://phabricator.wikimedia.org/T306860 (10jhathaway) Another option would be to use cpu pinning via taskset(1), where ffmpeg is assigned to cpus 1-N and cpu 0 is left free to s...
[20:13:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:14:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:14:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:14:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:14:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:55] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: populate target index format and add pipeline diagnostics [puppet] - 10https://gerrit.wikimedia.org/r/775375 (https://phabricator.wikimedia.org/T305090) (owner: 10Cwhite)
[20:16:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298556)', diff saved to https://phabricator.wikimedia.org/P26616 and previous config saved to /var/cache/conftool/dbconfig/20220426-201610-ladsgroup.json
[20:16:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:17] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[20:16:43] <urbanecm>	 jan_drewniak: just noticed you've a config too. does it depend on the backport?
[20:17:15] <jan_drewniak>	 urbanecm: nope, two different things
[20:17:21] <urbanecm>	 great
[20:17:44] <urbanecm>	 jan_drewniak: and it's marked as depending on (unscheduled) https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/786357/1. do you want to do both?
[20:17:46] <urbanecm>	 or just the scheduled one?
[20:18:57] <jan_drewniak>	 urbanecm: both :)
[20:19:10] <urbanecm>	 okay
[20:19:13] <wikibugs>	 (03PS2) 10Urbanecm: Enable table of contents a/b test on euwiki and hewiki, enable reading depth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786357 (https://phabricator.wikimedia.org/T306606) (owner: 10Jdlrobson)
[20:19:24] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable table of contents a/b test on euwiki and hewiki, enable reading depth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786357 (https://phabricator.wikimedia.org/T306606) (owner: 10Jdlrobson)
[20:19:53] <wikibugs>	 (03PS2) 10Urbanecm: Expand max-width to login, create account, disable on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786358 (https://phabricator.wikimedia.org/T300182) (owner: 10Jdlrobson)
[20:20:10] <wikibugs>	 (03Merged) 10jenkins-bot: Enable table of contents a/b test on euwiki and hewiki, enable reading depth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786357 (https://phabricator.wikimedia.org/T306606) (owner: 10Jdlrobson)
[20:21:46] <urbanecm>	 jan_drewniak: first one pulled to mwdebug1001. can you test?
[20:22:59] <jan_drewniak>	 urbanecm: ok that one's good
[20:23:03] <urbanecm>	 syncing
[20:23:14] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Expand max-width to login, create account, disable on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786358 (https://phabricator.wikimedia.org/T300182) (owner: 10Jdlrobson)
[20:24:00] <wikibugs>	 (03Merged) 10jenkins-bot: Expand max-width to login, create account, disable on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786358 (https://phabricator.wikimedia.org/T300182) (owner: 10Jdlrobson)
[20:24:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T306560)', diff saved to https://phabricator.wikimedia.org/P26617 and previous config saved to /var/cache/conftool/dbconfig/20220426-202359-ladsgroup.json
[20:24:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[20:24:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[20:24:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:07] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[20:24:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P26618 and previous config saved to /var/cache/conftool/dbconfig/20220426-202407-ladsgroup.json
[20:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:24:14] <wikibugs>	 (03PS1) 10Ebernhardson: Correct wbsearchentities profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786381
[20:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:24:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:24:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:24:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:26] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: e3ce97b97c1d83dc4f538040da92a571895cb4d0: Enable table of contents a/b test on euwiki and hewiki, enable reading depth (T306606) (duration: 00m 52s)
[20:24:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:41] <urbanecm>	 jan_drewniak: second patch is at mwdebug1001, please check
[20:24:41] <stashbot>	 T306606: Deploy ToC A/B test to euwiki, hewiki - https://phabricator.wikimedia.org/T306606
[20:26:01] <ebernhardson>	 turns out my AB test profiles patch doesn't work entirely (thats why we deploy it with the test turned off :) and it needs a config fix: https://gerrit.wikimedia.org/r/786381
[20:26:19] <jan_drewniak>	 urbanecm: second patch is good too
[20:26:23] <wikibugs>	 (03PS2) 10Urbanecm: Correct wbsearchentities profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786381 (owner: 10Ebernhardson)
[20:26:27] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Correct wbsearchentities profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786381 (owner: 10Ebernhardson)
[20:26:29] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2419 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[20:26:35] <urbanecm>	 ebernhardson: okay, let's see :)
[20:26:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P26619 and previous config saved to /var/cache/conftool/dbconfig/20220426-202641-ladsgroup.json
[20:26:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:12] <wikibugs>	 (03Merged) 10jenkins-bot: Correct wbsearchentities profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786381 (owner: 10Ebernhardson)
[20:27:51] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@e177d87]: Bump jar dependency to 0.1.27 in mediarequest/hourly [airflow-dags/analytics@e177d87]
[20:27:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:00] <wikibugs>	 (03Merged) 10jenkins-bot: [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.9) - 10https://gerrit.wikimedia.org/r/785943 (https://phabricator.wikimedia.org/T306904) (owner: 10Jdlrobson)
[20:28:06] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: fe0e119ef7c768373db4afed21537f85004a8ae2: Expand max-width to login, create account, disable on Wikidata (T300182, T306834; 1/2) (duration: 00m 56s)
[20:28:08] <wikibugs>	 (03Merged) 10jenkins-bot: [ToC] Increase threshold for ToC collapsing to 1000px [skins/Vector] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785942 (https://phabricator.wikimedia.org/T306904) (owner: 10Jdlrobson)
[20:28:09] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@e177d87]: Bump jar dependency to 0.1.27 in mediarequest/hourly [airflow-dags/analytics@e177d87] (duration: 00m 17s)
[20:28:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:13] <stashbot>	 T300182: Wikidata.org responsive behaviour conflicts with Vector Max width - https://phabricator.wikimedia.org/T300182
[20:28:13] <stashbot>	 T306834: Add max-width to Log-in & Create account pages - https://phabricator.wikimedia.org/T306834
[20:28:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:27] <wikibugs>	 (03PS1) 10BryanDavis: wikireplicas: Improve log message for skipped views [puppet] - 10https://gerrit.wikimedia.org/r/786382
[20:29:01] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: fe0e119ef7c768373db4afed21537f85004a8ae2: Expand max-width to login, create account, disable on Wikidata (T300182, T306834; 2/2) (duration: 00m 54s)
[20:29:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:12] <urbanecm>	 jan_drewniak: and live
[20:29:23] <urbanecm>	 ebernhardson: your fix is at mwdebug1001
[20:29:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:29:31] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:29:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:29:39] <jan_drewniak>	 urbanecm: perfect, thanks!
[20:29:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:44] <urbanecm>	 np
[20:30:26] <wikibugs>	 (03CR) 10BryanDavis: "Code untested. Likely most easily testable via manual editing on single live wiki replica server. I don't know of any equivalent testing e" [puppet] - 10https://gerrit.wikimedia.org/r/786382 (owner: 10BryanDavis)
[20:30:50] <urbanecm>	 jan_drewniak: backports are at mwdebug1001. can you test?
[20:33:40] <jan_drewniak>	 urbanecm: alright we can go ahead with it
[20:33:45] <mutante>	 !log mw2412, mw2413, mw2414, mw2415 - scap pull, get into production the first time
[20:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:26] <urbanecm>	 jan_drewniak: thanks, syncing
[20:34:31] <urbanecm>	 mutante: just fyi i'm deploying atm
[20:34:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:34:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:34:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:34:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:34:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:13] <mutante>	 urbanecm: ok, thanks. I will just have to repeat it but since it's rsync.. will be quicker next time
[20:35:57] <urbanecm>	 sure, just wanted to make sure you're aware :)
[20:36:23] <urbanecm>	 ebernhardson: how is the fix testing going?
[20:36:39] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.8/skins/Vector/resources/skins.vector.styles/: 31ed884d6eda998f8625a88be0f4aa5fd67aef4b: [ToC] Increase threshold for ToC collapsing to 1000px (T306904) (duration: 00m 50s)
[20:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:45] <stashbot>	 T306904: [ToC] Increase threshold for ToC collapsing to 1000px  - https://phabricator.wikimedia.org/T306904
[20:37:13] <mutante>	 urbanecm: I will wait before adding them to "dsh" so you should not see issues
[20:37:23] <urbanecm>	 ok
[20:37:30] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.9/skins/Vector/resources/skins.vector.styles/: 019a812176bb940383ddeb22f8a74b5d0f447bf1: [ToC] Increase threshold for ToC collapsing to 1000px (T306904) (duration: 00m 50s)
[20:37:32] <urbanecm>	 jan_drewniak: and live
[20:37:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:35] <urbanecm>	 anything else?
[20:38:17] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2412.codfw.wmnet
[20:38:20] <jan_drewniak>	 urbanecm: that's all for today, thanks again!
[20:38:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:27] <urbanecm>	 np
[20:39:40] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2413.codfw.wmnet
[20:39:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:45] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2414.codfw.wmnet
[20:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:39:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:39:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:39:54] <wikibugs>	 (03PS1) 10Jbond: C:monitoring::check::http: move config to config ini file [puppet] - 10https://gerrit.wikimedia.org/r/786384
[20:39:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:02] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2415.codfw.wmnet
[20:40:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:34] <mutante>	 urbanecm: that's it?
[20:40:35] <icinga-wm>	 PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - AS6939/IPv6: Active - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:40:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:monitoring::check::http: move config to config ini file [puppet] - 10https://gerrit.wikimedia.org/r/786384 (owner: 10Jbond)
[20:40:54] <urbanecm>	 mutante: I'm waiting on ebernhardson's testing of a fix atm
[20:40:57] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:41:06] <mutante>	 ah, alright
[20:41:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P26620 and previous config saved to /var/cache/conftool/dbconfig/20220426-204146-ladsgroup.json
[20:41:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:50] <wikibugs>	 (03CR) 10Jbond: C:monitoring: Add define for creating http checks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/786365 (owner: 10Jbond)
[20:43:40] <urbanecm>	 ebernhardson: how is it going please?
[20:44:49] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review: eventgate chart should use common_templates - https://phabricator.wikimedia.org/T303543 (10Aklapper) @Ottomata: A #good_first_task is a self-contained, non-controversial task with a clear approach. It should be well-described with pointers to help a completely new con...
[20:45:12] <RhinosF1>	 mutante: fyi, I believe tgr is going after per -releng
[20:45:13] <wikibugs>	 (03PS2) 10Jbond: C:monitoring::check::http: move config to config ini file [puppet] - 10https://gerrit.wikimedia.org/r/786384
[20:45:59] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:46:05] <urbanecm>	 and yes, a window's scheduled right after this one
[20:46:38] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Re-apply "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944 (owner: 10Gergő Tisza)
[20:46:44] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Re-apply "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785941 (owner: 10Gergő Tisza)
[20:47:25] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Correct wbsearchentities profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786406
[20:47:31] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revert "Correct wbsearchentities profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786406 (owner: 10Urbanecm)
[20:47:38] <urbanecm>	 ebernhardson: reverting the fix.
[20:47:40] <ebernhardson>	 urbanecm: doh, sorry i got distracted on another task
[20:47:47] <urbanecm>	 or perhaps not
[20:47:55] <ebernhardson>	 urbanecm: the expected request works now (https://www.wikidata.org/w/api.php?action=wbsearchentities&search=e&format=json&errorformat=plaintext&language=en&uselang=en&type=item&cirrusWBProfile=wikibase_config_prefix_query-202203-en&cirrusRescoreProfile=wikibase_config_entity_weight-202203-en)
[20:47:59] <urbanecm>	 okay, great
[20:48:01] <urbanecm>	 so, syncing
[20:48:15] <wikibugs>	 (03Abandoned) 10Urbanecm: Revert "Correct wbsearchentities profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786406 (owner: 10Urbanecm)
[20:49:41] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/SearchSettingsForWikidata.php: f76bc806157a3f4c88d44cd467de347b4b471f4e: Correct wbsearchentities profiles (duration: 00m 57s)
[20:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:49:47] <urbanecm>	 ebernhardson: and, live
[20:49:50] <urbanecm>	 i believe that's all?
[20:50:13] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@e177d87]: Bump jar dependency to 0.1.27 in mediarequest/hourly [airflow-dags/analytics@e177d87]
[20:50:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:18] <Jdlrobson>	 urbanecm: looks like I might need to partially revert that last change by Jan
[20:50:21] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@e177d87]: Bump jar dependency to 0.1.27 in mediarequest/hourly [airflow-dags/analytics@e177d87] (duration: 00m 07s)
[20:50:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:28] <urbanecm>	 Jdlrobson: what does that mean please?
[20:51:08] <Jdlrobson>	 wmgVectorMaxWidthOptionsNamespaces is not working
[20:51:14] <Jdlrobson>	 it's applying to pages it shouldn't be
[20:51:35] <urbanecm>	 Jdlrobson: so i should revert the config patch, right?
[20:52:46] <wikibugs>	 (03PS1) 10Jdlrobson: wmgVectorMaxWidthOptionsNamespaces not working [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786388
[20:52:50] <Jdlrobson>	 ^ urbanecm 
[20:52:59] <icinga-wm>	 PROBLEM - Disk space on grafana2001 is CRITICAL: DISK CRITICAL - free space: / 156 MB (1% inode=87%): /tmp 156 MB (1% inode=87%): /var/tmp 156 MB (1% inode=87%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops
[20:53:00] <urbanecm>	 dunno what "applying to pages it shouldn't be" means exactly, it applies correctly at the very last (on those few wikis) https://www.irccloud.com/pastebin/wbZmfNsD/
[20:53:17] <Jdlrobson>	 Mm on https://en.wikipedia.org/wiki/Special:Contributions/Jdlrobson I'm seeing the max width
[20:53:19] <ebernhardson>	 urbanecm: thanks!
[20:53:26] <ebernhardson>	 sorry bout the delay
[20:53:37] <urbanecm>	 it happens :)
[20:53:38] <Jdlrobson>	 urbanecm: is that for all wikis?
[20:53:51] <Jdlrobson>	 what is wikidatawiki?
[20:53:56] <urbanecm>	 wikidata.org
[20:54:12] <urbanecm>	 Jdlrobson: my paste? just randomly picked some wikipedias (and wikidatawiki) to spot-check the config applies
[20:54:14] <Jdlrobson>	 I'm seeing no max width on https://www.wikidata.org/wiki/Q1
[20:54:21] <Jdlrobson>	 so somethings not right here
[20:54:38] <urbanecm>	 k, reverting
[20:54:44] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] wmgVectorMaxWidthOptionsNamespaces not working [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786388 (owner: 10Jdlrobson)
[20:54:44] <Jdlrobson>	 So in short: 1) The content on https://www.wikidata.org/wiki/Q1 should not be limited
[20:54:53] <Jdlrobson>	 2) Or on https://en.wikipedia.org/wiki/Special:Contributions/Jdlrobson
[20:55:08] <Jdlrobson>	 oohhh
[20:55:15] <Jdlrobson>	 it should be $wgVectorMaxWidthOptions['exclude']['namespaces']
[20:55:20] <Jdlrobson>	 that's what's going on here
[20:55:21] <urbanecm>	 oh
[20:55:27] <urbanecm>	 removed the +2
[20:55:33] <urbanecm>	 Jdlrobson: wanna upload a followup?
[20:56:07] <wikibugs>	 (03PS2) 10Jdlrobson: wmgVectorMaxWidthOptionsNamespaces not working [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786388
[20:56:11] <Jdlrobson>	 I amended that patch
[20:56:25] <Jdlrobson>	 yep that will do it
[20:56:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P26621 and previous config saved to /var/cache/conftool/dbconfig/20220426-205651-ladsgroup.json
[20:56:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] wmgVectorMaxWidthOptionsNamespaces not working [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786388 (owner: 10Jdlrobson)
[20:57:07] <urbanecm>	 let's hoper
[20:57:09] <urbanecm>	 *hope
[20:57:09] <wikibugs>	 (03PS3) 10Jbond: C:monitoring::check::http: move config to config ini file [puppet] - 10https://gerrit.wikimedia.org/r/786384
[20:57:47] <wikibugs>	 (03Merged) 10jenkins-bot: wmgVectorMaxWidthOptionsNamespaces not working [mediawiki-config] - 10https://gerrit.wikimedia.org/r/786388 (owner: 10Jdlrobson)
[20:58:19] <Jdlrobson>	 urbanecm: sorry about that
[20:58:26] <urbanecm>	 Jdlrobson: pulled to mwdebug1001. can you have a look?
[20:58:34] <urbanecm>	 (and no problem at all, happens from time to time)
[20:58:51] <mutante>	 RhinosF1: ACK, thx. It can wait :)
[20:59:05] <Jdlrobson>	 urbanecm: testing now
[20:59:41] <Jdlrobson>	 urbanecm: that's working now
[20:59:46] <urbanecm>	 excellent
[20:59:47] <urbanecm>	 syncing
[21:00:05] <jouncebot>	 tgr: That opportune time is upon us again. Time for a Retry of UTC afternoon backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220426T2100).
[21:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[21:00:07] <mutante>	 !log mw2416, mw2417, mw2418 - scap pull
[21:00:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:00:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:00:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:00:15] <urbanecm>	 tgr: please wait for a while
[21:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:08] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: cab00628da0ba6226ff162cfc848bea35a35783a: fix wmgVectorMaxWidthOptionsNamespaces (T300182) (duration: 01m 00s)
[21:01:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:14] <urbanecm>	 Jdlrobson: and, live
[21:01:14] <stashbot>	 T300182: Wikidata.org responsive behaviour conflicts with Vector Max width - https://phabricator.wikimedia.org/T300182
[21:01:22] <urbanecm>	 tgr: floor is yours now
[21:01:29] <urbanecm>	 let me know if i can help in any way
[21:01:43] <tgr>	 thanks urbanecm 
[21:01:53] <Jdlrobson>	 thanks urbanecm  :)
[21:01:55] <urbanecm>	 np
[21:02:08] <tgr>	 mutante: do you want to do the scaps first? I'll take a while
[21:02:16] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2416.codfw.wmnet
[21:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:32] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021 - https://phabricator.wikimedia.org/T305570 (10Jclark-ctr) Hostname. Rack.  U  Cableid Port aqs1016    a3        u21   1877                                   port23 aqs1017    b5       u38    23000056...
[21:02:50] <mutante>	 tgr: nah, you can ignore my !log line for now. they are not in the scap groups yet. I am just preparing them. thanks for offering
[21:03:22] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2417.codfw.wmnet
[21:03:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:28] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/weight=30; selector: dc=codfw,name=mw2418.codfw.wmnet
[21:03:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:34] <mutante>	 stepping back, all yours
[21:05:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:05:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:05:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:05:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:05:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:39] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021 - https://phabricator.wikimedia.org/T305570 (10Jclark-ctr)
[21:10:46] <ebernhardson>	 ls
[21:11:27] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021 - https://phabricator.wikimedia.org/T305570 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[21:11:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P26623 and previous config saved to /var/cache/conftool/dbconfig/20220426-211156-ladsgroup.json
[21:11:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[21:12:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[21:12:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:03] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[21:12:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P26624 and previous config saved to /var/cache/conftool/dbconfig/20220426-211204-ladsgroup.json
[21:12:05] <wikibugs>	 10SRE, 10serviceops, 10Sustainability (Incident Followup): Set API server weights - https://phabricator.wikimedia.org/T304800 (10Dzahn) There is a new type of servers now:  group D - mw2416, mw2417 and mw2418 - R440 - Xeon Silver 4210R 2.4G - (**40 processors, 128GB RAM**), that's only 40 processors vs 48 bu...
[21:12:06] <urbanecm>	 ebernhardson: ls: cannot open directory '.'
[21:12:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:13:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install kafka-jumbo101[0-5] - https://phabricator.wikimedia.org/T306939 (10RobH)
[21:14:37] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install kafka-jumbo101[0-5] - https://phabricator.wikimedia.org/T306939 (10RobH)
[21:14:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P26625 and previous config saved to /var/cache/conftool/dbconfig/20220426-211437-ladsgroup.json
[21:14:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:15:39] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 124 probes of 676 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:17:33] <wikibugs>	 (03PS1) 10Dzahn: parsoid: move template for testing server to profile, remove old module [puppet] - 10https://gerrit.wikimedia.org/r/786391 (https://phabricator.wikimedia.org/T279059)
[21:28:05] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 86 probes of 676 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:29:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P26626 and previous config saved to /var/cache/conftool/dbconfig/20220426-212943-ladsgroup.json
[21:29:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:07] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[21:37:54] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@e5fecc9]: Fix typo in mediarequest/hourly sensor [airflow-dags/analytics@e5fecc9]
[21:37:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:01] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@e5fecc9]: Fix typo in mediarequest/hourly sensor [airflow-dags/analytics@e5fecc9] (duration: 00m 07s)
[21:38:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:35] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:42:25] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab200[2|3] and gitlab-runner200[2|3|4] - https://phabricator.wikimedia.org/T301183 (10Papaul)
[21:44:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P26627 and previous config saved to /var/cache/conftool/dbconfig/20220426-214448-ladsgroup.json
[21:44:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Re-apply "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944 (owner: 10Gergő Tisza)
[21:50:44] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "fyi, Effie. I am trying to finish the clean-up. https://puppet-compiler.wmflabs.org/pcc-worker1002/34965/" [puppet] - 10https://gerrit.wikimedia.org/r/786391 (https://phabricator.wikimedia.org/T279059) (owner: 10Dzahn)
[21:53:06] <tgr>	 "Build timed out (after 60 minutes). Marking the build as failed." :/
[21:53:14] <wikibugs>	 (03CR) 10Dzahn: "noop on scandium and testreduce1001, parsoid-test hosts" [puppet] - 10https://gerrit.wikimedia.org/r/786391 (https://phabricator.wikimedia.org/T279059) (owner: 10Dzahn)
[21:53:34] <tgr>	 I think I'll just force-merge that, all the nonselenium jobs passed
[21:59:13] <wikibugs>	 (03PS1) 10Volans: homer: suppress cryptography deprecation warning [puppet] - 10https://gerrit.wikimedia.org/r/786400
[21:59:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P26628 and previous config saved to /var/cache/conftool/dbconfig/20220426-215953-ladsgroup.json
[21:59:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[21:59:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[21:59:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:00] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[22:00:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T306560)', diff saved to https://phabricator.wikimedia.org/P26629 and previous config saved to /var/cache/conftool/dbconfig/20220426-220001-ladsgroup.json
[22:00:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:02:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T306560)', diff saved to https://phabricator.wikimedia.org/P26630 and previous config saved to /var/cache/conftool/dbconfig/20220426-220234-ladsgroup.json
[22:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:57] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab200[2|3] and gitlab-runner200[2|3|4] - https://phabricator.wikimedia.org/T301183 (10Papaul)
[22:17:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P26631 and previous config saved to /var/cache/conftool/dbconfig/20220426-221739-ladsgroup.json
[22:17:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:10] <wikibugs>	 (03Merged) 10jenkins-bot: Re-apply "Backport video landing page changes" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785941 (owner: 10Gergő Tisza)
[22:18:57] <wikibugs>	 (03CR) 10Gergő Tisza: "Error was "Build timed out (after 60 minutes). Marking the build as failed." for the selenium job." [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944 (owner: 10Gergő Tisza)
[22:19:00] <wikibugs>	 (03CR) 10Gergő Tisza: [V: 03+2] Re-apply "Add Link: Add 'excluded sections' task setting" [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785944 (owner: 10Gergő Tisza)
[22:25:10] <logmsgbot>	 !log tgr@deploy1002 Started scap: backport with i18n changes: [[gerrit:785944]], [[gerrit:785941]]
[22:25:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P26632 and previous config saved to /var/cache/conftool/dbconfig/20220426-223244-ladsgroup.json
[22:32:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[22:46:50] <logmsgbot>	 !log tgr@deploy1002 Finished scap: backport with i18n changes: [[gerrit:785944]], [[gerrit:785941]] (duration: 21m 40s)
[22:46:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:47:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T306560)', diff saved to https://phabricator.wikimedia.org/P26633 and previous config saved to /var/cache/conftool/dbconfig/20220426-224749-ladsgroup.json
[22:47:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[22:47:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[22:47:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:47:55] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[22:47:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P26634 and previous config saved to /var/cache/conftool/dbconfig/20220426-224757-ladsgroup.json
[22:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:25] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, and 2 others: Blubber setup for Image Suggestions Service - https://phabricator.wikimedia.org/T305155 (10Dzahn) for updates here also see T304891#7869885  It seems you have already requested the Gerrit repo.
[22:48:25] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:48:43] <wikibugs>	 (03CR) 10Gergő Tisza: [V: 03+2 C: 03+2] Enable SkinAddFooterLinks hook [extensions/GrowthExperiments] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/786366 (owner: 10Gergő Tisza)
[22:50:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P26635 and previous config saved to /var/cache/conftool/dbconfig/20220426-225030-ladsgroup.json
[22:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:24] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[22:51:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[22:51:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[22:51:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[22:51:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:20] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[22:53:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[22:53:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3316 (T298556)', diff saved to https://phabricator.wikimedia.org/P26636 and previous config saved to /var/cache/conftool/dbconfig/20220426-225326-ladsgroup.json
[22:53:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:34] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[22:54:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298556)', diff saved to https://phabricator.wikimedia.org/P26637 and previous config saved to /var/cache/conftool/dbconfig/20220426-225437-ladsgroup.json
[22:54:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[22:56:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[22:56:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[22:56:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[22:56:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:57:21] <logmsgbot>	 !log tgr@deploy1002 Synchronized php-1.39.0-wmf.8/extensions/GrowthExperiments/extension.json: Backport: [[gerrit:786366|Enable SkinAddFooterLinks hook]] (duration: 00m 51s)
[22:57:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:59:32] <tgr>	 an hour behind schedule (90 minutes CI time for a patch must be a new record) but done
[23:00:07] <icinga-wm>	 RECOVERY - Disk space on grafana2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops
[23:01:46] <wikibugs>	 (03PS1) 10Dzahn: add image-suggestion.discovery.wmnet and point to ingress-wikikube [dns] - 10https://gerrit.wikimedia.org/r/786426 (https://phabricator.wikimedia.org/T304891)
[23:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:03:45] <wikibugs>	 (03PS2) 10Dzahn: add image-suggestion.discovery.wmnet and point to ingress-wikikube [dns] - 10https://gerrit.wikimedia.org/r/786426 (https://phabricator.wikimedia.org/T304891)
[23:05:31] <wikibugs>	 (03CR) 10Dzahn: "can step 5 be done before step 4 in https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service? I have questions about step 4 and wh" [deployment-charts] - 10https://gerrit.wikimedia.org/r/775964 (https://phabricator.wikimedia.org/T304891) (owner: 10Dzahn)
[23:05:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P26638 and previous config saved to /var/cache/conftool/dbconfig/20220426-230535-ladsgroup.json
[23:05:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:09:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P26639 and previous config saved to /var/cache/conftool/dbconfig/20220426-230942-ladsgroup.json
[23:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:47] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:20:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P26640 and previous config saved to /var/cache/conftool/dbconfig/20220426-232040-ladsgroup.json
[23:20:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[23:24:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P26641 and previous config saved to /var/cache/conftool/dbconfig/20220426-232447-ladsgroup.json
[23:24:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:24:56] <wikibugs>	 10SRE: role_contacts (service owners) as a custom puppet fact - https://phabricator.wikimedia.org/T306830 (10Dzahn) >>! In T306830#7880182, @jbond wrote: >> use cumin to ask "what is the kernel version of all machines owned by $subteam" or "which hosts owned by $subteam are still on buster" > As we pass this val...
[23:29:14] <wikibugs>	 10SRE: role_contacts (service owners) as a custom puppet fact - https://phabricator.wikimedia.org/T306830 (10Dzahn) >>! In T306830#7880221, @MoritzMuehlenhoff wrote: > And if that syntax is too cumbersome in the day-to-day we could add a few Cumin aliases? like A:hosts-data-persistence and A:hosts-infrastructure...
[23:35:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P26642 and previous config saved to /var/cache/conftool/dbconfig/20220426-233545-ladsgroup.json
[23:35:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[23:35:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[23:35:52] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[23:35:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[23:35:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[23:36:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:36] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[23:36:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[23:36:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P26643 and previous config saved to /var/cache/conftool/dbconfig/20220426-233642-ladsgroup.json
[23:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:39:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P26644 and previous config saved to /var/cache/conftool/dbconfig/20220426-233917-ladsgroup.json
[23:39:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:39:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298556)', diff saved to https://phabricator.wikimedia.org/P26645 and previous config saved to /var/cache/conftool/dbconfig/20220426-233953-ladsgroup.json
[23:39:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[23:39:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[23:39:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:39:59] <stashbot>	 T298556: Fix mismatching field type of oldimage.oi_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298556
[23:40:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T298556)', diff saved to https://phabricator.wikimedia.org/P26646 and previous config saved to /var/cache/conftool/dbconfig/20220426-234000-ladsgroup.json
[23:40:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[23:42:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[23:42:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:42:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298554)', diff saved to https://phabricator.wikimedia.org/P26647 and previous config saved to /var/cache/conftool/dbconfig/20220426-234224-ladsgroup.json
[23:42:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:34] <stashbot>	 T298554: Fix mismatching field type of archive.ar_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298554
[23:42:48] <wikibugs>	 (03PS1) 10Dzahn: cumin: add "owner" aliases to get lists of host per SRE subteam [puppet] - 10https://gerrit.wikimedia.org/r/786430 (https://phabricator.wikimedia.org/T306830)
[23:50:02] <mutante>	 jouncebot: test
[23:50:12] <mutante>	 jouncebot: next
[23:50:12] <jouncebot>	 In 7 hour(s) and 9 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220427T0700)
[23:50:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:54:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P26648 and previous config saved to /var/cache/conftool/dbconfig/20220426-235422-ladsgroup.json
[23:54:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log