[00:04:13] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1019784 (owner: 10TrainBranchBot)
[00:13:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 871.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:15:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.084s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:21:15] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet - https://phabricator.wikimedia.org/T354896#9720987 (10Papaul)
[00:23:42] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9721003 (10Papaul) @Jhancock.wm anything else left to be done on this task?
[00:23:44] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet - https://phabricator.wikimedia.org/T354896#9720988 (10Papaul) 05Open→03Resolved Complete
[00:35:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 810.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:37:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 864.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:38:12] <wikibugs>	 10ops-magru, 06DC-Ops, 06Traffic: Q4:rack/setup/install cp70[01-16] - https://phabricator.wikimedia.org/T362729#9721040 (10ssingh) Thanks for the task @RobH! As in the previous runs, please feel free to leave these for Traffic:  `  Update the operations/puppet repo - this should include updates to preseed.ya...
[00:41:55] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] phabricator: Switch certificate generation to cfssl [puppet] - 10https://gerrit.wikimedia.org/r/1020190 (https://phabricator.wikimedia.org/T360413) (owner: 10EoghanGaffney)
[00:42:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 828.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:45:38] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "puppet won't delete the old certs on the host, so a revert would only be editing the envoy config to point back to the old cert location a" [puppet] - 10https://gerrit.wikimedia.org/r/1020190 (https://phabricator.wikimedia.org/T360413) (owner: 10EoghanGaffney)
[02:02:02] <wikibugs>	 (03PS1) 10DDesouza: miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020427 (https://phabricator.wikimedia.org/T219903)
[02:02:38] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] site.pp: move elastic2088 back into production [puppet] - 10https://gerrit.wikimedia.org/r/1020375 (https://phabricator.wikimedia.org/T361525) (owner: 10Bking)
[02:02:39] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] site.pp: move elastic2088 back into production [puppet] - 10https://gerrit.wikimedia.org/r/1020375 (https://phabricator.wikimedia.org/T361525) (owner: 10Bking)
[02:04:40] <wikibugs>	 (03CR) 10DDesouza: [C:03+2] miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020427 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[02:05:15] <wikibugs>	 (03CR) 10DDesouza: [V:03+2 C:03+2] miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020427 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[02:05:53] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020427 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[02:06:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:12:25] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:31:31] <jinxer-wm>	 (Traffic bill over quota) firing: Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[02:36:31] <jinxer-wm>	 (Traffic bill over quota) firing: (3) Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[02:38:29] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:41:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 839.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:42:50] <logmsgbot>	 !log dani@deploy1002 helmfile [staging] START helmfile.d/services/miscweb: apply
[02:43:06] <logmsgbot>	 !log dani@deploy1002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[02:43:07] <logmsgbot>	 !log dani@deploy1002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[02:43:29] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:43:33] <logmsgbot>	 !log dani@deploy1002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[02:43:34] <logmsgbot>	 !log dani@deploy1002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[02:43:57] <logmsgbot>	 !log dani@deploy1002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[02:46:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 826.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:48:31] <ryankemper>	 !log T361525 Trying to powercycle `elastic2088` thru mgmt port (host not responding to ssh)
[02:48:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:48:36] <stashbot>	 T361525: Degraded RAID on elastic2088 - https://phabricator.wikimedia.org/T361525
[02:51:31] <jinxer-wm>	 (Traffic bill over quota) firing: (3) Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[02:52:11] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "site.pp: move elastic2088 back into production" [puppet] - 10https://gerrit.wikimedia.org/r/1020238
[02:52:33] <wikibugs>	 (03PS2) 10Ryan Kemper: Revert "site.pp: move elastic2088 back into production" [puppet] - 10https://gerrit.wikimedia.org/r/1020238 (https://phabricator.wikimedia.org/T361525)
[02:53:41] <wikibugs>	 10ops-eqiad, 06SRE: Inbound interface errors - https://phabricator.wikimedia.org/T362366#9721140 (10phaultfinder)
[02:54:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P60695 and previous config saved to /var/cache/conftool/dbconfig/20240417-025403-ladsgroup.json
[02:54:11] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[02:54:46] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Revert "site.pp: move elastic2088 back into production" [puppet] - 10https://gerrit.wikimedia.org/r/1020238 (https://phabricator.wikimedia.org/T361525) (owner: 10Ryan Kemper)
[02:56:31] <jinxer-wm>	 (Traffic bill over quota) resolved: (2) Alert for device cr2-eqiad.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[02:58:29] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:01:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 800.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:06:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 800.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:06:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:09:11] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P60696 and previous config saved to /var/cache/conftool/dbconfig/20240417-030911-ladsgroup.json
[03:23:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 917.8ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:24:19] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P60697 and previous config saved to /var/cache/conftool/dbconfig/20240417-032418-ladsgroup.json
[03:33:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 802.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:39:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P60698 and previous config saved to /var/cache/conftool/dbconfig/20240417-033926-ladsgroup.json
[03:39:29] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[03:39:32] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[03:39:41] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[03:39:49] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60699 and previous config saved to /var/cache/conftool/dbconfig/20240417-033948-ladsgroup.json
[03:43:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 924.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:53:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 806.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[04:00:17] <wikibugs>	 (03PS6) 10JHathaway: Postfix profile [puppet] - 10https://gerrit.wikimedia.org/r/1019131 (https://phabricator.wikimedia.org/T325398)
[04:03:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Postfix profile [puppet] - 10https://gerrit.wikimedia.org/r/1019131 (https://phabricator.wikimedia.org/T325398) (owner: 10JHathaway)
[04:38:51] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
[04:39:04] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
[04:40:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60700 and previous config saved to /var/cache/conftool/dbconfig/20240417-044015-ladsgroup.json
[04:40:20] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[04:44:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[04:45:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[04:45:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60701 and previous config saved to /var/cache/conftool/dbconfig/20240417-044517-marostegui.json
[04:45:22] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[04:50:59] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 10decommission-hardware, 13Patch-For-Review: decommission db2100.codfw.wmnet - https://phabricator.wikimedia.org/T361584#9721220 (10Marostegui) This host wasn't removed from zarcillo - I have done so.
[04:51:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60702 and previous config saved to /var/cache/conftool/dbconfig/20240417-045130-marostegui.json
[04:51:36] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[04:53:50] <wikibugs>	 (03PS1) 10Marostegui: db2182: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020466
[04:53:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2182', diff saved to https://phabricator.wikimedia.org/P60703 and previous config saved to /var/cache/conftool/dbconfig/20240417-045353-root.json
[04:54:47] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2182: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020466 (owner: 10Marostegui)
[04:55:01] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bookworm
[04:55:23] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60704 and previous config saved to /var/cache/conftool/dbconfig/20240417-045522-ladsgroup.json
[04:59:58] <marostegui>	 !log dbmaint Upgrade s7 codfw to Bookworm and MariaDB 10.6 T362745
[05:00:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:00:20] <stashbot>	 T362745: Upgrade s7 to MariaDB 10.6 - https://phabricator.wikimedia.org/T362745
[05:05:51] <marostegui>	 !log Rename machine_vision tables on db1249 eqiad dbmaint s4 T362229
[05:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:56] <stashbot>	 T362229: Drop MachineVision tables from beta and production - https://phabricator.wikimedia.org/T362229
[05:06:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P60705 and previous config saved to /var/cache/conftool/dbconfig/20240417-050638-marostegui.json
[05:10:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60706 and previous config saved to /var/cache/conftool/dbconfig/20240417-051029-ladsgroup.json
[05:12:37] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
[05:15:26] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
[05:21:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P60707 and previous config saved to /var/cache/conftool/dbconfig/20240417-052145-marostegui.json
[05:22:22] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2182: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020239
[05:25:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60708 and previous config saved to /var/cache/conftool/dbconfig/20240417-052537-ladsgroup.json
[05:25:39] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[05:25:42] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[05:25:52] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[05:26:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60709 and previous config saved to /var/cache/conftool/dbconfig/20240417-052600-ladsgroup.json
[05:31:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60710 and previous config saved to /var/cache/conftool/dbconfig/20240417-053131-root.json
[05:31:39] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2182: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020239 (owner: 10Marostegui)
[05:33:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 957.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[05:34:15] <wikibugs>	 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage, 06DC-Ops, 06Traffic: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9721277 (10Papaul) @ssingh After 2 days working on this issue, I finally got at the bottom of the of problem. After many reboots on cp11...
[05:35:21] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bookworm
[05:36:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60711 and previous config saved to /var/cache/conftool/dbconfig/20240417-053653-marostegui.json
[05:36:55] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[05:36:58] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[05:37:09] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[05:37:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60712 and previous config saved to /var/cache/conftool/dbconfig/20240417-053716-marostegui.json
[05:43:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 809ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[05:43:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60713 and previous config saved to /var/cache/conftool/dbconfig/20240417-054333-marostegui.json
[05:43:39] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[05:46:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60714 and previous config saved to /var/cache/conftool/dbconfig/20240417-054637-root.json
[05:56:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:58:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P60715 and previous config saved to /var/cache/conftool/dbconfig/20240417-055841-marostegui.json
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T0600)
[06:01:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60716 and previous config saved to /var/cache/conftool/dbconfig/20240417-060143-root.json
[06:04:21] <jinxer-wm>	 (PoolcounterFullQueues) firing: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:09:21] <jinxer-wm>	 (PoolcounterFullQueues) resolved: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:12:25] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:13:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P60717 and previous config saved to /var/cache/conftool/dbconfig/20240417-061349-marostegui.json
[06:16:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60718 and previous config saved to /var/cache/conftool/dbconfig/20240417-061649-root.json
[06:25:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 820.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:27:15] <wikibugs>	 (03Abandoned) 10Hashar: Increase default thumbnail display size from 220px to 300px [mediawiki-config] - 10https://gerrit.wikimedia.org/r/154408 (owner: 10Jforrester)
[06:27:29] <wikibugs>	 (03Abandoned) 10Hashar: Match 'editcontentmodel' permission with 'move' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309066 (https://phabricator.wikimedia.org/T85847) (owner: 10Legoktm)
[06:28:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60719 and previous config saved to /var/cache/conftool/dbconfig/20240417-062856-marostegui.json
[06:28:58] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[06:29:03] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[06:29:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[06:29:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60720 and previous config saved to /var/cache/conftool/dbconfig/20240417-062918-marostegui.json
[06:30:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 807.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:31:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60721 and previous config saved to /var/cache/conftool/dbconfig/20240417-063155-root.json
[06:34:46] <wikibugs>	 (03PS2) 10Hashar: logging: pluralize $wmgDefaultMonologHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019267 (https://phabricator.wikimedia.org/T238838)
[06:34:46] <wikibugs>	 (03PS3) 10Hashar: logging: always register udp2log handlers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019253 (https://phabricator.wikimedia.org/T228838)
[06:35:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60722 and previous config saved to /var/cache/conftool/dbconfig/20240417-063537-marostegui.json
[06:35:45] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[06:40:46] <wikibugs>	 (03PS3) 10Anzx: mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653)
[06:47:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60723 and previous config saved to /var/cache/conftool/dbconfig/20240417-064700-root.json
[06:50:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P60724 and previous config saved to /var/cache/conftool/dbconfig/20240417-065044-marostegui.json
[06:53:29] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job pdu_sentry4 in ops@ulsfo - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC morning backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T0700).
[07:00:05] <jouncebot>	 anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:02:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60725 and previous config saved to /var/cache/conftool/dbconfig/20240417-070206-root.json
[07:05:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P60726 and previous config saved to /var/cache/conftool/dbconfig/20240417-070552-marostegui.json
[07:05:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1020190 (https://phabricator.wikimedia.org/T360413) (owner: 10EoghanGaffney)
[07:09:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1019887 (https://phabricator.wikimedia.org/T360414) (owner: 10Dzahn)
[07:10:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete restbase discovery cert [puppet] - 10https://gerrit.wikimedia.org/r/1020258 (https://phabricator.wikimedia.org/T360636) (owner: 10Muehlenhoff)
[07:15:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete stub cert [labs/private] - 10https://gerrit.wikimedia.org/r/1020624 (https://phabricator.wikimedia.org/T360636)
[07:18:27] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete stub cert [labs/private] - 10https://gerrit.wikimedia.org/r/1020624 (https://phabricator.wikimedia.org/T360636) (owner: 10Muehlenhoff)
[07:21:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60727 and previous config saved to /var/cache/conftool/dbconfig/20240417-072059-marostegui.json
[07:21:02] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[07:21:05] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[07:21:15] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[07:21:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2150', diff saved to https://phabricator.wikimedia.org/P60728 and previous config saved to /var/cache/conftool/dbconfig/20240417-072115-root.json
[07:21:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60729 and previous config saved to /var/cache/conftool/dbconfig/20240417-072122-marostegui.json
[07:21:48] <wikibugs>	 (03PS1) 10Marostegui: db2150: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020625
[07:22:22] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2150: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020625 (owner: 10Marostegui)
[07:22:51] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS bookworm
[07:26:24] <jynus>	 !log restart db1240 database for mariadb upgrade
[07:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host db2214.codfw.wmnet
[07:27:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60730 and previous config saved to /var/cache/conftool/dbconfig/20240417-072733-marostegui.json
[07:27:38] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[07:30:00] <wikibugs>	 (03PS1) 10Jcrespo: mariadbd: Upgrade mariadb package on db1240 from 10.4 to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1020685 (https://phabricator.wikimedia.org/T360751)
[07:31:13] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] mariadbd: Upgrade mariadb package on db1240 from 10.4 to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1020685 (https://phabricator.wikimedia.org/T360751) (owner: 10Jcrespo)
[07:32:30] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch db2214 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020691 (https://phabricator.wikimedia.org/T349619)
[07:33:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch db2214 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020691 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[07:35:56] <wikibugs>	 (03PS1) 10Fabfur: benthos: added some labels [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109)
[07:37:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2214.codfw.wmnet
[07:38:21] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Upgrade mariadb package on db1216 from 10.4 to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1020693 (https://phabricator.wikimedia.org/T360751)
[07:38:53] <jynus>	 !log restart db1216 database for mariadb upgrade
[07:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:13] <aqu>	 !log analytics/refinery deploy begin (added source jars 0.2.35)
[07:39:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
[07:39:54] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@c4e197f]: Regular analytics weekly train [analytics/refinery@c4e197fa]
[07:40:30] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] mariadb: Upgrade mariadb package on db1216 from 10.4 to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1020693 (https://phabricator.wikimedia.org/T360751) (owner: 10Jcrespo)
[07:40:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host db1173.eqiad.wmnet
[07:41:50] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch db1173 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020694 (https://phabricator.wikimedia.org/T349619)
[07:42:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P60731 and previous config saved to /var/cache/conftool/dbconfig/20240417-074241-marostegui.json
[07:42:52] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
[07:45:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch db1173 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020694 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[07:45:48] <jynus>	 some ulsfo routers went unreachable
[07:46:40] <wikibugs>	 (03PS1) 10Fabfur: benthos: fix check for possible empty values [puppet] - 10https://gerrit.wikimedia.org/r/1020695 (https://phabricator.wikimedia.org/T358109)
[07:49:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1173.eqiad.wmnet
[07:54:06] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2150: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020243
[07:55:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: benthos: added some labels (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[07:57:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] benthos: fix check for possible empty values [puppet] - 10https://gerrit.wikimedia.org/r/1020695 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[07:57:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P60732 and previous config saved to /var/cache/conftool/dbconfig/20240417-075748-marostegui.json
[07:58:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60733 and previous config saved to /var/cache/conftool/dbconfig/20240417-075850-root.json
[07:58:57] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2150: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020243 (owner: 10Marostegui)
[08:00:47] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
[08:03:10] <wikibugs>	 (03PS2) 10Fabfur: benthos: added some labels [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109)
[08:03:23] <wikibugs>	 (03CR) 10Fabfur: benthos: added some labels (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[08:03:36] <wikibugs>	 (03PS1) 10Jcrespo: site.pp: Reorder backup sources by server name and update comments [puppet] - 10https://gerrit.wikimedia.org/r/1020697 (https://phabricator.wikimedia.org/T360751)
[08:03:56] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS bookworm
[08:05:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] benthos: added some labels [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[08:07:51] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@c4e197f]: Regular analytics weekly train [analytics/refinery@c4e197fa] (duration: 27m 57s)
[08:10:17] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
[08:12:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60734 and previous config saved to /var/cache/conftool/dbconfig/20240417-081256-marostegui.json
[08:13:00] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
[08:13:01] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[08:13:03] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@c4e197f] (thin): Regular analytics weekly train THIN [analytics/refinery@c4e197fa]
[08:13:13] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
[08:13:14] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:13:19] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:13:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60735 and previous config saved to /var/cache/conftool/dbconfig/20240417-081326-marostegui.json
[08:13:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60736 and previous config saved to /var/cache/conftool/dbconfig/20240417-081356-root.json
[08:15:34] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] "https://puppet-compiler.wmflabs.org/output/1020697/1964/" [puppet] - 10https://gerrit.wikimedia.org/r/1020697 (https://phabricator.wikimedia.org/T360751) (owner: 10Jcrespo)
[08:16:42] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@c4e197f] (thin): Regular analytics weekly train THIN [analytics/refinery@c4e197fa] (duration: 03m 39s)
[08:19:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60737 and previous config saved to /var/cache/conftool/dbconfig/20240417-081953-marostegui.json
[08:19:59] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[08:23:57] <wikibugs>	 (03PS1) 10JMeybohm: kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785)
[08:24:18] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@c4e197f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c4e197fa]
[08:24:28] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: add logo-detection isvc to experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749)
[08:25:36] <wikibugs>	 (03PS1) 10JMeybohm: wikifunction: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785)
[08:26:41] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@c4e197f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c4e197fa] (duration: 02m 23s)
[08:27:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[08:27:46] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 18): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1965/c" [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[08:29:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60738 and previous config saved to /var/cache/conftool/dbconfig/20240417-082901-root.json
[08:31:41] <wikibugs>	 (03PS1) 10SimmeD: Updated wmf-config/InitialiseSettings.php by adding single space expected between "//" and comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020707
[08:33:20] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] benthos: added some labels [puppet] - 10https://gerrit.wikimedia.org/r/1020692 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[08:35:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P60739 and previous config saved to /var/cache/conftool/dbconfig/20240417-083501-marostegui.json
[08:36:07] <wikibugs>	 (03CR) 10Hashar: logging: always register udp2log handlers (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019253 (https://phabricator.wikimedia.org/T228838) (owner: 10Hashar)
[08:37:29] <hashar>	 jouncebot: owandnext
[08:37:40] <hashar>	 jouncebot: nowandnext
[08:37:40] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 22 minute(s)
[08:37:40] <jouncebot>	 In 1 hour(s) and 22 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1000)
[08:38:12] <wikibugs>	 (03PS2) 10JMeybohm: kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785)
[08:38:19] <wikibugs>	 (03CR) 10Hashar: [C:03+2] logging: pluralize $wmgDefaultMonologHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019267 (https://phabricator.wikimedia.org/T238838) (owner: 10Hashar)
[08:39:09] <wikibugs>	 (03Merged) 10jenkins-bot: logging: pluralize $wmgDefaultMonologHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019267 (https://phabricator.wikimedia.org/T238838) (owner: 10Hashar)
[08:39:18] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] benthos: fix check for possible empty values [puppet] - 10https://gerrit.wikimedia.org/r/1020695 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur)
[08:39:24] <wikibugs>	 (03Abandoned) 10SimmeD: Updated wmf-config/InitialiseSettings.php by adding single space expected between "//" and comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020707 (owner: 10SimmeD)
[08:39:51] <wikibugs>	 (03PS3) 10JMeybohm: kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785)
[08:40:49] <aqu>	 !log Deployed refinery using scap, then deployed onto hdfs
[08:40:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:17] <logmsgbot>	 !log hashar@deploy1002 Started scap: Backport for [[gerrit:1019267|logging: pluralize $wmgDefaultMonologHandler (T238838)]]
[08:41:21] <stashbot>	 T238838: Disabling old AWB versions - https://phabricator.wikimedia.org/T238838
[08:41:44] <hashar>	 hmm
[08:41:46] <hashar>	 wrong bug
[08:42:57] <wikibugs>	 (03CR) 10Hashar: [C:03+2] logging: pluralize $wmgDefaultMonologHandler (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019267 (https://phabricator.wikimedia.org/T238838) (owner: 10Hashar)
[08:44:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60741 and previous config saved to /var/cache/conftool/dbconfig/20240417-084407-root.json
[08:44:13] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 14 NOOP 8): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node" [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[08:44:26] <logmsgbot>	 !log hashar@deploy1002 hashar: Backport for [[gerrit:1019267|logging: pluralize $wmgDefaultMonologHandler (T238838)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:44:32] <logmsgbot>	 !log hashar@deploy1002 hashar: Continuing with sync
[08:46:40] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9721680 (10MoritzMuehlenhoff)
[08:50:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P60742 and previous config saved to /var/cache/conftool/dbconfig/20240417-085009-marostegui.json
[08:55:01] <wikibugs>	 (03PS3) 10Effie Mouzeli: mediawiki deployments: use mcrouter daemonset for both DCs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020251 (https://phabricator.wikimedia.org/T346690)
[08:57:54] <logmsgbot>	 !log hashar@deploy1002 Finished scap: Backport for [[gerrit:1019267|logging: pluralize $wmgDefaultMonologHandler (T238838)]] (duration: 16m 37s)
[08:57:59] <stashbot>	 T238838: Disabling old AWB versions - https://phabricator.wikimedia.org/T238838
[08:58:21] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mediawiki deployments: use mcrouter daemonset for both DCs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020251 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[08:58:27] <effie>	 jouncebot:  now
[08:58:27] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 1 minute(s)
[08:59:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60743 and previous config saved to /var/cache/conftool/dbconfig/20240417-085912-root.json
[09:01:23] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mediawiki deployments: use mcrouter daemonset for both DCs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020251 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[09:02:48] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki deployments: use mcrouter daemonset for both DCs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020251 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[09:03:17] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[09:05:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60744 and previous config saved to /var/cache/conftool/dbconfig/20240417-090516-marostegui.json
[09:05:19] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
[09:05:22] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[09:05:29] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[09:05:32] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
[09:05:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60745 and previous config saved to /var/cache/conftool/dbconfig/20240417-090539-marostegui.json
[09:08:42] <logmsgbot>	 !log jiji@deploy1002 Started scap: Switch mediawiki in eqiad to use node-local mcrouter ds - T346690
[09:08:47] <stashbot>	 T346690: mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690
[09:08:52] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail, 10MediaWiki-Email: Old "Email this user" email is repeatedly resent - https://phabricator.wikimedia.org/T361860#9721761 (10Xover) And now I just got a resend of a different email to a different user, originally sent on April 11. That’s something like two out of...
[09:12:02] <wikibugs>	 (03PS2) 10Msz2001: Only people who belong to 'editor' or 'sysop' groups will be able to publish translations directly. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020729 (https://phabricator.wikimedia.org/T362756)
[09:12:06] <hashar>	 effie: since you have pushed your change we get roughly 800 messages per minute stating "Duplicate get(): "{key}" fetched {count} times" from `objectcache`  https://logstash.wikimedia.org/goto/d17686d0fd57c0a2c94dfc5348991efa
[09:12:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60746 and previous config saved to /var/cache/conftool/dbconfig/20240417-091203-marostegui.json
[09:12:12] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[09:12:42] <hashar>	 that is for fetches of keys such as `wikidatawiki:MWSession:......` and I have no idea what it means
[09:12:43] <effie>	 hashar: do we get the same on codfw ?
[09:13:15] <wikibugs>	 (03PS3) 10Msz2001: [plwiki] Limit Content Translation publishing to mainspace for non-editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020729 (https://phabricator.wikimedia.org/T362756)
[09:13:27] <wikibugs>	 (03CR) 10Hashar: [C:03+2] "I have confirmed we still have logs :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019267 (https://phabricator.wikimedia.org/T238838) (owner: 10Hashar)
[09:13:33] <effie>	 hashar: lets give it a little time to see if it will stop 
[09:14:07] <hashar>	 and we got a similar bump under the `session` channel https://logstash.wikimedia.org/goto/5335cc7943b94d988c7b4631c2e05e7e
[09:14:17] <hashar>	 I haven't looked which message exactly
[09:14:17] <effie>	 it has slowed down already 
[09:14:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60747 and previous config saved to /var/cache/conftool/dbconfig/20240417-091418-root.json
[09:15:21] <hashar>	 Session "{session}": Metadata merge failed: {exception}
[09:15:37] <hashar>	 https://logstash.wikimedia.org/goto/745c3f29242f28c23867f6fc7d267b9e
[09:16:07] <hashar>	 those are exception being thrown though they get logged at warning level
[09:17:33] <effie>	 they are not errors though 
[09:17:49] <hashar>	 I have no idea about the impacts
[09:17:59] <hashar>	 I just found out the elevated logging as I was looking for something else
[09:18:01] <effie>	 obviously they are related to the change, no doubt 
[09:18:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (3) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 30.2% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:19:02] <wikibugs>	 (03CR) 10Jaime Nuche: scap: introduce bootstrapping mechanism specific to deployment hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/820749 (owner: 10Jaime Nuche)
[09:19:03] <effie>	 now that is an actual problem 
[09:20:10] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Setup dbprov1005 as new host to send s3 and s5 backups [puppet] - 10https://gerrit.wikimedia.org/r/1020750 (https://phabricator.wikimedia.org/T362509)
[09:21:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (4) p75 latency high: eqiad mw-api-ext (k8s) 5.255s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:21:55] <jinxer-wm>	 (MaxConntrack) firing: Max conntrack at 93.03% on kubernetes1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[09:21:57] <jinxer-wm>	 (ProbeDown) firing: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:22:34] <jinxer-wm>	 (ProbeDown) firing: (9) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:23:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (3) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 17.66% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:23:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://wikifeeds.svc.eqiad.wmnet:4101 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:23:54] <jayme>	 effie: is this you?
[09:24:27] <effie>	 jayme: yes, I am trying to undestand why, let's go to sre 
[09:24:30] <effie>	 -sre
[09:25:27] <hashar>	 the authentication metrics are showing users are Login  and the Central Login went done   https://grafana.wikimedia.org/d/000000004/authentication-metrics?orgId=1
[09:25:30] <hashar>	 so I guess something is broken
[09:25:33] <jynus>	 oh, I didn't realize it was a deploy
[09:25:47] <jynus>	 I will report on status page
[09:25:55] <hashar>	 and that more or less aligns with the elevated rates of logs in `session` and `objectcache`
[09:25:59] <AzaTht>	 just got "Request from 87.96.230.208 via cp3067.esams.wmnet, ATS/9.1.4
[09:25:59] <AzaTht>	 Error: 502, Broken pipe at 2024-04-17 09:24:38 GMT"
[09:26:15] <effie>	 jynus: this is me, on -sre
[09:26:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (4) p75 latency high: eqiad mw-api-ext (k8s) 6.79s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:26:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:26:40] <jynus>	 AzaTht: https://www.wikimediastatus.net/
[09:26:55] <jinxer-wm>	 (MaxConntrack) resolved: Max conntrack at 100% on kubernetes1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[09:26:57] <jinxer-wm>	 (ProbeDown) firing: (8) Service mw-api-int:4446 has failed probes (http_mw-api-int_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:27:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P60748 and previous config saved to /var/cache/conftool/dbconfig/20240417-092714-marostegui.json
[09:27:34] <jinxer-wm>	 (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:27:40] <AzaTht>	 jynus: it was "All systems operational" when I looked right before posted ツ
[09:27:44] <jinxer-wm>	 (HaproxyUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[09:27:53] <jinxer-wm>	 (KubernetesAPILatency) firing: (3) High Kubernetes API latency (GET pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:28:15] <jinxer-wm>	 (MediaWikiMemcachedHighErrorRate) firing: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[09:28:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (5) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 23.67% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:28:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: (6) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:29:14] <wikibugs>	 (03CR) 10Klausman: [C:03+1] ml-services: fix indentation in mistral model resources and increase memory [deployment-charts] - 10https://gerrit.wikimedia.org/r/1018646 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos)
[09:29:19] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1019726 (https://phabricator.wikimedia.org/T362518) (owner: 10Muehlenhoff)
[09:29:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2150 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60749 and previous config saved to /var/cache/conftool/dbconfig/20240417-092923-root.json
[09:29:36] <jinxer-wm>	 (GatewayBackendErrorsHigh) firing: rest-gateway: elevated 5xx errors from wikifeeds_cluster in eqiad #page - https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_debug_it - https://grafana.wikimedia.org/d/UOH-5IDMz/api-and-rest-gateway?orgId=1&refresh=30s&viewPanel=57&var-datasource=eqiad%20prometheus/k8s&var-instance=rest-gateway - https://alerts.wikimedia.org/?q=alertname%3DGatewayBackendErrorsHigh
[09:30:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:31:03] <logmsgbot>	 !log jiji@deploy1002 scap failed: KeyError 'production' (duration: 22m 21s)
[09:31:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (5) p75 latency high: eqiad mw-api-ext (k8s) 1.702s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:31:57] <jinxer-wm>	 (ProbeDown) firing: (12) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:32:34] <jinxer-wm>	 (ProbeDown) firing: (16) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:32:53] <jinxer-wm>	 (KubernetesAPILatency) resolved: (4) High Kubernetes API latency (GET pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:33:15] <jinxer-wm>	 (MediaWikiMemcachedHighErrorRate) firing: (2) MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[09:33:22] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: removes underscore on striker database name [puppet] - 10https://gerrit.wikimedia.org/r/1020709 (https://phabricator.wikimedia.org/T360149)
[09:33:43] <jinxer-wm>	 (VarnishUnavailable) firing: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[09:33:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: (7) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:34:36] <jinxer-wm>	 (GatewayBackendErrorsHigh) firing: (2) rest-gateway: elevated 5xx errors from wikifeeds_cluster in codfw #page - https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_debug_it  - https://alerts.wikimedia.org/?q=alertname%3DGatewayBackendErrorsHigh
[09:34:51] <jinxer-wm>	 (ATSBackendErrorsHigh) firing: (3) ATS: elevated 5xx errors from mw-web-ro.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[09:35:00] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[09:35:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (4) Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:36:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (5) p75 latency high: eqiad mw-api-ext (k8s) 807ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:36:57] <jinxer-wm>	 (ProbeDown) firing: (10) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:37:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:37:34] <jinxer-wm>	 (ProbeDown) firing: (14) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:38:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (6) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 25.46% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:38:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: (7) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:39:51] <jinxer-wm>	 (ATSBackendErrorsHigh) firing: (9) ATS: elevated 5xx errors from mw-web-ro.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[09:41:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (4) p75 latency high: eqiad mw-api-ext (k8s) 1.765s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:41:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:41:49] <wikibugs>	 (03CR) 10Klausman: ml-services: add logo-detection isvc to experimental namespace (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749) (owner: 10Kevin Bazira)
[09:41:57] <jinxer-wm>	 (ProbeDown) firing: (10) Service mw-api-int:4446 has failed probes (http_mw-api-int_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:42:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (4) Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:42:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P60750 and previous config saved to /var/cache/conftool/dbconfig/20240417-094223-marostegui.json
[09:42:34] <jinxer-wm>	 (ProbeDown) firing: (15) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:43:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (6) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 17.76% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:43:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: (8) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:44:12] <logmsgbot>	 !log cgoubert@cumin1002 conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=eqiad
[09:44:20] <logmsgbot>	 !log cgoubert@cumin1002 conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=eqiad
[09:44:29] <logmsgbot>	 !log cgoubert@cumin1002 conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=eqiad
[09:44:51] <jinxer-wm>	 (ATSBackendErrorsHigh) firing: (12) ATS: elevated 5xx errors from mw-api-ext-ro.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[09:46:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: (4) p75 latency high: eqiad mw-api-ext (k8s) 807ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:46:57] <jinxer-wm>	 (ProbeDown) firing: (9) Service mw-api-int:4446 has failed probes (http_mw-api-int_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:47:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (6) Elevated rate of MediaWiki errors - kube-mw-api-int - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:47:34] <jinxer-wm>	 (ProbeDown) firing: (11) Service appservers-https:443 has failed probes (http_appservers-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:47:44] <jinxer-wm>	 (HaproxyUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[09:48:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (6) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 12.93% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:48:44] <jinxer-wm>	 (HaproxyUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[09:48:51] <jinxer-wm>	 (SwaggerProbeHasFailures) firing: (10) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:49:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:49:51] <jinxer-wm>	 (ATSBackendErrorsHigh) firing: (13) ATS: elevated 5xx errors from mw-api-ext-ro.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[09:51:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: (3) p75 latency high: eqiad mw-api-ext (k8s) 807ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:51:56] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=rdf-streaming-updater&var-helm_release=wikidata - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[09:51:58] <jinxer-wm>	 (ProbeDown) resolved: (7) Service mw-api-int:4446 has failed probes (http_mw-api-int_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:52:34] <jinxer-wm>	 (ProbeDown) resolved: (7) Service mw-api-int:4446 has failed probes (http_mw-api-int_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:53:15] <jinxer-wm>	 (MediaWikiMemcachedHighErrorRate) resolved: (2) MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[09:53:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: (5) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 14.83% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:53:43] <jinxer-wm>	 (VarnishUnavailable) resolved: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[09:53:44] <jinxer-wm>	 (HaproxyUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[09:53:51] <jinxer-wm>	 (SwaggerProbeHasFailures) resolved: (5) Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[09:54:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:54:36] <jinxer-wm>	 (GatewayBackendErrorsHigh) firing: (2) rest-gateway: elevated 5xx errors from wikifeeds_cluster in codfw #page - https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_debug_it  - https://alerts.wikimedia.org/?q=alertname%3DGatewayBackendErrorsHigh
[09:54:51] <jinxer-wm>	 (ATSBackendErrorsHigh) resolved: (9) ATS: elevated 5xx errors from mw-web-ro.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[09:56:56] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=rdf-streaming-updater&var-helm_release=wikidata - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[09:57:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60753 and previous config saved to /var/cache/conftool/dbconfig/20240417-095731-marostegui.json
[09:57:33] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
[09:57:37] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[09:57:47] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1000)
[10:02:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Only install Go from backports on bullseye-based stat hosts [puppet] - 10https://gerrit.wikimedia.org/r/1019726 (https://phabricator.wikimedia.org/T362518) (owner: 10Muehlenhoff)
[10:04:37] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9721866 (10MoritzMuehlenhoff)
[10:06:25] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[10:06:38] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[10:06:40] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: fix indentation in mistral model resources and increase memory [deployment-charts] - 10https://gerrit.wikimedia.org/r/1018646 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos)
[10:07:30] <thedj>	 fyi, replag for tools is still increasing it seems ?
[10:07:51] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: fix indentation in mistral model resources and increase memory [deployment-charts] - 10https://gerrit.wikimedia.org/r/1018646 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos)
[10:08:14] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
[10:08:21] <wikibugs>	 (03PS1) 10Clément Goubert: admin_ng: Bump coredns replicas to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020765 (https://phabricator.wikimedia.org/T346690)
[10:08:36] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
[10:08:47] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] admin_ng: Bump coredns replicas to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020765 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:09:32] <wikibugs>	 (03PS1) 10David Caro: dynamicproxy: disable response buffering [puppet] - 10https://gerrit.wikimedia.org/r/1020767
[10:11:46] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] admin_ng: Bump coredns replicas to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020765 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:12:25] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:12:59] <wikibugs>	 (03PS1) 10Effie Mouzeli: mediawiki-common: add a dot to the mcrouter url [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020768 (https://phabricator.wikimedia.org/T346690)
[10:13:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] mediawiki-common: add a dot to the mcrouter url [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020768 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:14:26] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[10:14:36] <jinxer-wm>	 (GatewayBackendErrorsHigh) resolved: rest-gateway: elevated 5xx errors from wikifeeds_cluster in eqiad #page - https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_debug_it - https://grafana.wikimedia.org/d/UOH-5IDMz/api-and-rest-gateway?orgId=1&refresh=30s&viewPanel=57&var-datasource=eqiad%20prometheus/k8s&var-instance=rest-gateway - https://alerts.wikimedia.org/?q=alertname%3DGatewayBackendErrorsHigh
[10:14:39] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[10:14:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60755 and previous config saved to /var/cache/conftool/dbconfig/20240417-101446-marostegui.json
[10:14:53] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[10:14:59] <wikibugs>	 (03Merged) 10jenkins-bot: admin_ng: Bump coredns replicas to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020765 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:15:06] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mediawiki-common: add a dot to the mcrouter url [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020768 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:16:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:16:27] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki-common: add a dot to the mcrouter url [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020768 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:17:29] <wikibugs>	 (03PS1) 10Btullis: Disable CustomVariables and CustomPiwikJs on new matomo server [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397)
[10:18:28] <wikibugs>	 (03PS2) 10David Caro: dynamicproxy: disable response buffering to files [puppet] - 10https://gerrit.wikimedia.org/r/1020767
[10:18:41] <wikibugs>	 (03PS3) 10David Caro: dynamicproxy: disable response buffering to files [puppet] - 10https://gerrit.wikimedia.org/r/1020767
[10:18:42] <wikibugs>	 (03PS2) 10Btullis: Disable CustomVariables and CustomPiwikJs on new matomo server [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397)
[10:19:32] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "also LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:20:00] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1967/co" [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:20:30] <wikibugs>	 (03CR) 10Slavina Stefanova: dynamicproxy: disable response buffering to files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:22:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2028.codfw.wmnet
[10:23:46] <wikibugs>	 (03CR) 10David Caro: dynamicproxy: disable response buffering to files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:23:49] <wikibugs>	 (03PS4) 10David Caro: dynamicproxy: disable response buffering to files [puppet] - 10https://gerrit.wikimedia.org/r/1020767
[10:23:56] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2028 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020772 (https://phabricator.wikimedia.org/T349619)
[10:25:04] <wikibugs>	 (03PS1) 10Effie Mouzeli: mediawiki-common: use mcrouter ds only on codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020774 (https://phabricator.wikimedia.org/T346690)
[10:25:28] <wikibugs>	 (03CR) 10David Caro: [C:03+2] dynamicproxy: disable response buffering to files [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:26:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:27:35] <wikibugs>	 (03PS3) 10Btullis: Disable CustomVariables and CustomPiwikJs on new matomo server [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397)
[10:27:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] mediawiki-common: use mcrouter ds only on codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020774 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:28:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2028 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020772 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[10:29:18] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/1020266 (https://phabricator.wikimedia.org/T352647) (owner: 10Elukey)
[10:29:36] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1968/co" [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:29:45] <wikibugs>	 (03PS2) 10Effie Mouzeli: mediawiki-common: use mcrouter ds only on codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020774 (https://phabricator.wikimedia.org/T346690)
[10:30:09] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mediawiki-common: use mcrouter ds only on codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020774 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:31:35] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki-common: use mcrouter ds only on codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020774 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:33:05] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2028.codfw.wmnet
[10:33:14] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "Forgot to add the task: https://phabricator.wikimedia.org/T354116" [puppet] - 10https://gerrit.wikimedia.org/r/1020767 (owner: 10David Caro)
[10:33:52] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[10:34:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es1027.eqiad.wmnet
[10:34:13] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[10:34:16] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[10:34:29] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[10:34:51] <akosiaris>	 !log apply the coredns patches for bumping instances from 4 to 6. They are noop, I am applying them to update helm's state.
[10:34:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60756 and previous config saved to /var/cache/conftool/dbconfig/20240417-103455-marostegui.json
[10:34:58] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[10:35:00] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[10:35:04] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[10:35:05] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[10:35:09] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[10:35:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es1027 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020775 (https://phabricator.wikimedia.org/T349619)
[10:36:03] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
[10:36:03] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
[10:36:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:36:51] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
[10:37:29] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1969/co" [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:37:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es1027 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020775 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[10:37:54] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
[10:38:21] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[10:38:30] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job pdu_sentry4 in ops@ulsfo - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:39:01] <wikibugs>	 (03PS2) 10Kevin Bazira: ml-services: add logo-detection isvc to experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749)
[10:40:10] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[10:41:03] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[10:41:17] <wikibugs>	 (03PS4) 10Btullis: Disable CustomVariables and CustomPiwikJs on new matomo server [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397)
[10:41:24] <wikibugs>	 (03PS3) 10Kevin Bazira: ml-services: add logo-detection isvc to experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749)
[10:41:54] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[10:42:07] <wikibugs>	 (03CR) 10Kevin Bazira: ml-services: add logo-detection isvc to experimental namespace (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749) (owner: 10Kevin Bazira)
[10:42:08] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
[10:42:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1027.eqiad.wmnet
[10:42:32] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
[10:42:34] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:44:56] <logmsgbot>	 !log jiji@cumin1002 conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=eqiad
[10:45:03] <wikibugs>	 (03PS1) 10Clément Goubert: admin_ng: Bump coredns memory for wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020778 (https://phabricator.wikimedia.org/T346690)
[10:45:33] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] admin_ng: Bump coredns memory for wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020778 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:45:59] <effie>	 !log pool eqiad back for mw-web-ro,  mw-api-int-ro and mw-api-ext-ro 
[10:46:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:24] <logmsgbot>	 !log jiji@cumin1002 conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=eqiad
[10:49:05] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] admin_ng: Bump coredns memory for wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020778 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:50:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P60757 and previous config saved to /var/cache/conftool/dbconfig/20240417-105002-marostegui.json
[10:51:32] <wikibugs>	 (03PS1) 10Btullis: Install a matomo plugin on the new host [puppet] - 10https://gerrit.wikimedia.org/r/1020780 (https://phabricator.wikimedia.org/T349397)
[10:52:40] <wikibugs>	 (03Merged) 10jenkins-bot: admin_ng: Bump coredns memory for wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020778 (https://phabricator.wikimedia.org/T346690) (owner: 10Clément Goubert)
[10:52:56] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1020780 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:53:04] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[10:53:27] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Disable CustomVariables and CustomPiwikJs on new matomo server [puppet] - 10https://gerrit.wikimedia.org/r/1020769 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[10:53:49] <logmsgbot>	 !log jiji@cumin1002 conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=eqiad
[10:53:56] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[10:54:05] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[11:00:05] <jouncebot>	 mvolz: #bothumor My software never has bugs. It just develops random features. Rise for Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1100).
[11:00:27] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9722005 (10Ladsgroup) >>! In T360029#9658042, @CDanis wrote: > Just to make sure I understand, the request here is an easy-to-automate way of...
[11:04:30] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[11:05:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P60758 and previous config saved to /var/cache/conftool/dbconfig/20240417-110510-marostegui.json
[11:06:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2030.codfw.wmnet
[11:07:05] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2030 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020788 (https://phabricator.wikimedia.org/T349619)
[11:07:40] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789
[11:10:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789 (owner: 10Alexandros Kosiaris)
[11:11:08] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Install a matomo plugin on the new host [puppet] - 10https://gerrit.wikimedia.org/r/1020780 (https://phabricator.wikimedia.org/T349397) (owner: 10Btullis)
[11:11:11] <effie>	 jouncebot: now
[11:11:11] <jouncebot>	 For the next 0 hour(s) and 48 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1100)
[11:12:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2030 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020788 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[11:13:18] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789
[11:13:25] <logmsgbot>	 !log jiji@deploy1002 Started scap: NoOp
[11:17:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2030.codfw.wmnet
[11:19:59] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789 (owner: 10Alexandros Kosiaris)
[11:20:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60759 and previous config saved to /var/cache/conftool/dbconfig/20240417-112017-marostegui.json
[11:20:20] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
[11:20:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789 (owner: 10Alexandros Kosiaris)
[11:20:33] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[11:20:33] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
[11:20:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60760 and previous config saved to /var/cache/conftool/dbconfig/20240417-112040-marostegui.json
[11:20:50] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Netbox custom script to add additional IPv4 addresses to host [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1017064 (https://phabricator.wikimedia.org/T358096) (owner: 10Cathal Mooney)
[11:22:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es1032.eqiad.wmnet
[11:23:03] <logmsgbot>	 !log jiji@deploy1002 Finished scap: NoOp (duration: 09m 38s)
[11:23:09] <wikibugs>	 (03Merged) 10jenkins-bot: coredns: Switch podAntiAffinity rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020789 (owner: 10Alexandros Kosiaris)
[11:23:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es1032 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020795 (https://phabricator.wikimedia.org/T349619)
[11:23:41] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[11:23:54] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[11:23:55] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:24:11] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:24:18] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60761 and previous config saved to /var/cache/conftool/dbconfig/20240417-112418-ladsgroup.json
[11:24:25] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[11:24:25] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[11:25:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es1032 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020795 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[11:27:28] <wikibugs>	 (03PS1) 10Btullis: Swith matomo/piwik to the new host [puppet] - 10https://gerrit.wikimedia.org/r/1020798 (https://phabricator.wikimedia.org/T351552)
[11:28:16] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1972/console" [puppet] - 10https://gerrit.wikimedia.org/r/1020798 (https://phabricator.wikimedia.org/T351552) (owner: 10Btullis)
[11:29:21] <wikibugs>	 (03PS2) 10Btullis: Swith matomo/piwik to the new host [puppet] - 10https://gerrit.wikimedia.org/r/1020798 (https://phabricator.wikimedia.org/T351552)
[11:29:34] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[11:29:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1032.eqiad.wmnet
[11:30:27] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[11:30:41] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[11:33:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2032.codfw.wmnet
[11:33:40] <wikibugs>	 (03PS3) 10Muehlenhoff: Move cloudcephosd2001-dev to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1017248 (https://phabricator.wikimedia.org/T361913)
[11:35:30] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2032 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020799 (https://phabricator.wikimedia.org/T349619)
[11:36:09] <logmsgbot>	 !log stevemunene@deploy1002 helmfile [codfw] START helmfile.d/services/datahub: apply on main
[11:38:57] <wikibugs>	 (03CR) 10Elukey: {echo,session}store (staging): use wmf-ca-certificates.crt (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647) (owner: 10Eevans)
[11:42:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60762 and previous config saved to /var/cache/conftool/dbconfig/20240417-114201-marostegui.json
[11:42:08] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[11:42:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2032 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020799 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[11:44:39] <vgutierrez>	 !log depool ncredir2001
[11:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2032.codfw.wmnet
[11:47:27] <wikibugs>	 (03CR) 10Jcrespo: "So it is my understanding that usernames with underscores require escaping (\_), otherwise they grant the rights to any user containing an" [puppet] - 10https://gerrit.wikimedia.org/r/1020709 (https://phabricator.wikimedia.org/T360149) (owner: 10Arnaudb)
[11:48:47] <jinxer-wm>	 (HelmReleaseBadStatus) firing: Helm release datahub/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[11:48:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] alertmanager: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1020198 (owner: 10Muehlenhoff)
[11:52:27] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722192 (10MoritzMuehlenhoff)
[11:53:11] <jayme>	 stevemunene: ^ I've seen the datahub release alert flying by a couple of times now - is that expected or is there something wrong with the deployment?
[11:55:17] <stevemunene>	 hi jayme That is from some ongoing work
[11:56:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772 (10cmooney) 03NEW p:05Triage→03Medium
[11:56:24] <jayme>	 stevemunene: so you're fixing the deployment?
[11:57:00] <logmsgbot>	 !log stevemunene@deploy1002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
[11:57:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P60763 and previous config saved to /var/cache/conftool/dbconfig/20240417-115709-marostegui.json
[11:57:46] <stevemunene>	 jayme:  Was in the middle of an upgrade and yes, sorry I was a bit unclear
[11:57:57] <jayme>	 ack, okay
[11:58:47] <jinxer-wm>	 (HelmReleaseBadStatus) resolved: Helm release datahub/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[12:00:03] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1 C:03+2] kubernetes::node: Ensure apparmor profiles are loaded automatically [puppet] - 10https://gerrit.wikimedia.org/r/1020700 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:01:25] <wikibugs>	 (03PS1) 10Slyngshede: Initial documentation for the Bitu API. [software/bitu] - 10https://gerrit.wikimedia.org/r/1020802
[12:02:12] <wikibugs>	 (03PS1) 10JMeybohm: kubernetes::node: Remove apparmor cleanup code [puppet] - 10https://gerrit.wikimedia.org/r/1020803 (https://phabricator.wikimedia.org/T326785)
[12:03:47] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "Cory/James: Feel free to merge and deploy when you see fit" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:04:29] <wikibugs>	 (03PS2) 10Slyngshede: Initial documentation for the Bitu API. [software/bitu] - 10https://gerrit.wikimedia.org/r/1020802
[12:05:43] <wikibugs>	 (03CR) 10JMeybohm: "ocf. no merge before ~13.00 UTC" [puppet] - 10https://gerrit.wikimedia.org/r/1020803 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:06:33] <moritzm>	 !log upgrading PHP on mediawiki baremetal canaries servers T362511
[12:06:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:31] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1020798 (https://phabricator.wikimedia.org/T351552) (owner: 10Btullis)
[12:07:45] <jinxer-wm>	 (WidespreadPuppetFailure) firing: (2) Puppet has failed in codfw - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[12:07:56] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (NOOP 8 CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1020803 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:08:20] <jayme>	 WidespreadPuppetFailure is me
[12:11:35] <wikibugs>	 (03CR) 10Btullis: Swith matomo/piwik to the new host [puppet] - 10https://gerrit.wikimedia.org/r/1020798 (https://phabricator.wikimedia.org/T351552) (owner: 10Btullis)
[12:12:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P60765 and previous config saved to /var/cache/conftool/dbconfig/20240417-121218-marostegui.json
[12:12:53] <vgutierrez>	 !log repool ncredir2001
[12:12:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:13:45] <wikibugs>	 (03PS4) 10Kevin Bazira: ml-services: add logo-detection isvc to experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020706 (https://phabricator.wikimedia.org/T362749)
[12:16:28] <wikibugs>	 (03PS2) 10JMeybohm: kubernetes::node: Remove apparmor cleanup code [puppet] - 10https://gerrit.wikimedia.org/r/1020803 (https://phabricator.wikimedia.org/T326785)
[12:16:28] <wikibugs>	 (03PS1) 10JMeybohm: apparmor::profile: Don't try to define /etc/apparmor.d resource [puppet] - 10https://gerrit.wikimedia.org/r/1020805 (https://phabricator.wikimedia.org/T326785)
[12:19:41] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 14): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1975/c" [puppet] - 10https://gerrit.wikimedia.org/r/1020805 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:20:01] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1 C:03+2] apparmor::profile: Don't try to define /etc/apparmor.d resource [puppet] - 10https://gerrit.wikimedia.org/r/1020805 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:21:34] <jynus>	 thanks for the heads up, jayme!
[12:21:49] <wikibugs>	 (03PS1) 10Marostegui: db2120: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020806
[12:21:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2120', diff saved to https://phabricator.wikimedia.org/P60766 and previous config saved to /var/cache/conftool/dbconfig/20240417-122150-root.json
[12:22:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2120: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1020806 (owner: 10Marostegui)
[12:23:12] <jayme>	 jynus: sure - code is fixed, alert should go away in a bit
[12:25:28] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db2120.codfw.wmnet with OS bookworm
[12:27:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60767 and previous config saved to /var/cache/conftool/dbconfig/20240417-122725-marostegui.json
[12:27:28] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
[12:27:31] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[12:27:41] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
[12:27:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60768 and previous config saved to /var/cache/conftool/dbconfig/20240417-122748-marostegui.json
[12:28:06] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove parsoid-canary Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1020807 (https://phabricator.wikimedia.org/T359387)
[12:29:59] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[12:31:01] <wikibugs>	 (03PS1) 10Elukey: knative-serving: move net_istio configs to a dict [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020808 (https://phabricator.wikimedia.org/T353622)
[12:32:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove parsoid-canary Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1020807 (https://phabricator.wikimedia.org/T359387) (owner: 10Muehlenhoff)
[12:32:57] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Puppet: add magru (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1019810 (owner: 10Ayounsi)
[12:40:58] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2120.codfw.wmnet with reason: host reimage
[12:41:20] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2120: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020730
[12:44:14] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2120.codfw.wmnet with reason: host reimage
[12:45:08] <wikibugs>	 (03CR) 10Elukey: "The diff is a little mixed since probably the current order from the list/array is not the same compared to the one that the dict creates." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020808 (https://phabricator.wikimedia.org/T353622) (owner: 10Elukey)
[12:46:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2026.codfw.wmnet
[12:47:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2026 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020819 (https://phabricator.wikimedia.org/T349619)
[12:47:23] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1017248 (https://phabricator.wikimedia.org/T361913) (owner: 10Muehlenhoff)
[12:47:35] <wikibugs>	 (03CR) 10Jforrester: "Thanks, will do in an hour's time in our window." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[12:47:45] <jinxer-wm>	 (WidespreadPuppetFailure) firing: (2) Puppet has failed in codfw - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[12:47:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60769 and previous config saved to /var/cache/conftool/dbconfig/20240417-124756-marostegui.json
[12:48:02] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[12:49:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2026 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020819 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[12:50:36] <wikibugs>	 (03PS1) 10Ssingh: geo-maps: add drmrs LVS map [dns] - 10https://gerrit.wikimedia.org/r/1020823
[12:52:45] <jinxer-wm>	 (WidespreadPuppetFailure) resolved: (2) Puppet has failed in codfw - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[12:54:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2026.codfw.wmnet
[12:57:13] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Puppet: add magru [puppet] - 10https://gerrit.wikimedia.org/r/1019810 (owner: 10Ayounsi)
[13:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1300).
[13:00:05] <jouncebot>	 anzx and kostajh: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:10] <kostajh>	 hello
[13:00:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60770 and previous config saved to /var/cache/conftool/dbconfig/20240417-130027-root.json
[13:00:35] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2120: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1020730 (owner: 10Marostegui)
[13:00:45] <Lucas_WMDE>	 o/
[13:01:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2031.codfw.wmnet
[13:01:56] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2031 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020824 (https://phabricator.wikimedia.org/T349619)
[13:02:03] <kostajh>	 Lucas_WMDE: does the patch for T362653 look OK to you? 
[13:02:03] <stashbot>	 T362653: Create Draft Namespace in Malayalam Wikipedia - https://phabricator.wikimedia.org/T362653
[13:03:00] <Lucas_WMDE>	 I think so, yeah
[13:03:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60771 and previous config saved to /var/cache/conftool/dbconfig/20240417-130303-marostegui.json
[13:03:10] <Lucas_WMDE>	 we just need to not forget to run namespaceDupes ^^
[13:03:27] <wikibugs>	 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage, 06DC-Ops, 06Traffic: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9722316 (10ssingh) @Papaul: Thanks for the update! Looks promising indeed and to actually close this, we should downgrade another host i...
[13:03:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2031 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020824 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[13:04:17] <kostajh>	 anzx: are you around?
[13:04:17] <Lucas_WMDE>	 what’s the current status of production after the incident earlier (T362766)? is it okay to deploy normal changes?
[13:04:17] <stashbot>	 T362766: 2024-04-17 mw-* went down in eqiad - https://phabricator.wikimedia.org/T362766
[13:05:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2120.codfw.wmnet with OS bookworm
[13:06:00] <anzx>	 kostajh: i sm around 
[13:06:03] <Lucas_WMDE>	 (pinging jynus as the IC but would also be happy for anyone else to respond ^^)
[13:06:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:06:26] <wikibugs>	 (03PS1) 10Ssingh: geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722)
[13:06:41] <wikibugs>	 (03CR) 10Kosta Harlan: "Comparing with Ibe5548a6759e794c125a81a59d87bde0134da825, do we want to set noindex/nofollow, and enable VisualEditor for the Draft namesp" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:06:41] <jynus>	 yes, status is resolved, no blockers (there may be some followups, but ok with normal operations)
[13:06:47] <Lucas_WMDE>	 okay, thanks!
[13:07:01] <kostajh>	 anzx: hi :) I left a comment on your patch
[13:07:03] <Lucas_WMDE>	 kostajh: do you want to do the deployments or should I start with mlwiki?
[13:07:07] <Lucas_WMDE>	 ah, ok
[13:08:04] <kostajh>	 Lucas_WMDE: if the mlwiki patch looks good to you, please start
[13:08:11] <kostajh>	 otherwise, my 3 patches can be synced together
[13:08:22] <Lucas_WMDE>	 I think you raised a valid point so I’ll let anzx reply to that ^^
[13:08:25] * Lucas_WMDE looks at your changes
[13:08:43] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] "Task for this patch is https://phabricator.wikimedia.org/T346722." [puppet] - 10https://gerrit.wikimedia.org/r/1019810 (owner: 10Ayounsi)
[13:09:08] <wikibugs>	 (03PS2) 10Kosta Harlan: beta: Disable wgWikimediaEventsIPoidUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597)
[13:09:14] <wikibugs>	 (03PS2) 10Kosta Harlan: WikimediaEvents: Set IPoid URL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015296 (https://phabricator.wikimedia.org/T354597)
[13:09:17] <wikibugs>	 (03PS2) 10Kosta Harlan: EventStreamConfig: Register ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015299 (https://phabricator.wikimedia.org/T354597)
[13:09:26] <Lucas_WMDE>	 (rebased to get some new CI builds after the old ones were gone)
[13:10:23] <kostajh>	 thx
[13:10:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2031.codfw.wmnet
[13:11:06] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "VE probably makes sense, good point. I think noindex/nofollow is already set (line 4786); if I understand correctly, the `wmgExemptFromUse" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:11:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (5) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:11:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es2033.codfw.wmnet
[13:12:02] <Lucas_WMDE>	 kostajh: I wonder why diffConfig detects no change in the first change (Disable …IPoidUrl)
[13:12:06] <sukhe>	 topranks: this build2001 one doesn't seem to be related to us
[13:12:25] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es2033 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020826 (https://phabricator.wikimedia.org/T349619)
[13:12:36] <wikibugs>	 10ops-codfw, 10Data-Platform-SRE (2024.04.15 - 2024.05.05), 13Patch-For-Review: Degraded RAID on elastic2088 - https://phabricator.wikimedia.org/T361525#9722367 (10Jhancock.wm)
[13:12:42] <Lucas_WMDE>	 ohh, it’s only set (outside of beta) in the following change?
[13:13:08] <Lucas_WMDE>	 hm, but I’m not sure if overriding CS.php in IS-labs.php works like that
[13:13:15] <topranks>	 sukhe: yeah that seems wide of the changes we're making alright 
[13:13:33] <wikibugs>	 10ops-codfw, 10Data-Platform-SRE (2024.04.15 - 2024.05.05), 13Patch-For-Review: Degraded RAID on elastic2088 - https://phabricator.wikimedia.org/T361525#9722368 (10bking) a:05bking→03None
[13:13:34] <wikibugs>	 (03PS1) 10Ssingh: magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722)
[13:13:40] <kostajh>	 Lucas_WMDE: the goal is to unset the variable in beta and enable in production
[13:13:45] <kostajh>	 Maybe I did it wrong
[13:14:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es2033 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020826 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[13:14:14] <wikibugs>	 10ops-codfw, 10Data-Platform-SRE (2024.04.15 - 2024.05.05), 13Patch-For-Review: Degraded RAID on elastic2088 - https://phabricator.wikimedia.org/T361525#9722374 (10Jhancock.wm) @RKemper I am going to check it out and get back in touch with dell. These are the same errors we were getting before the card was r...
[13:14:18] <anzx>	 kostajh: Lucas_WMDE I think since they didn't ask for visual editor , should I update patch to enable it
[13:14:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:15:17] <Lucas_WMDE>	 anzx: https://ml.wikipedia.org/wiki/Special:Tags looks like VE is used a lot on that wiki, so IMHO it would be fine to just guess that they’ll be fine with enabling it
[13:15:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60772 and previous config saved to /var/cache/conftool/dbconfig/20240417-131533-root.json
[13:15:47] <anzx>	 Lucas_WMDE: I will update patch 
[13:15:49] <wikibugs>	 (03PS2) 10Ssingh: magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722)
[13:16:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (6) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:16:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:17:04] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:04-1] "I don’t think this will work – I857cefbd4a sets the IPoid URL in `CommonSettings.php`, and according to [the comment near the top of `Init" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:17:07] <wikibugs>	 (03PS1) 10Ladsgroup: Setting dummy password for cumin dedicated mysql user [labs/private] - 10https://gerrit.wikimedia.org/r/1020828
[13:17:57] <wikibugs>	 (03CR) 10Ssingh: "Failure is expected since we don't have the data center name magru yet. Will merge after I93fe45bed44583c86680b5595c481181e048282b" [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:17:57] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2033.codfw.wmnet
[13:18:08] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es1026.eqiad.wmnet
[13:18:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60773 and previous config saved to /var/cache/conftool/dbconfig/20240417-131811-marostegui.json
[13:20:41] <wikibugs>	 (03PS2) 10Ssingh: realm: fix consistency for site IPs [puppet] - 10https://gerrit.wikimedia.org/r/1019843
[13:20:55] <kostajh>	 Lucas_WMDE: I guess I should use CommonSettings-Labs.php to set the URL to null
[13:21:09] <Lucas_WMDE>	 yeah, I guess that would also work
[13:21:29] <Lucas_WMDE>	 I was wondering if CS.php could just check if the variable was already set via isset(), but then I remembered that isset() is false for null values
[13:21:40] <Lucas_WMDE>	 so that wouldn’t quite work out, annoyingly
[13:22:00] <Lucas_WMDE>	 (I think you could do that in one change btw, set the URL for production and unset it for beta)
[13:23:16] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.remove-downtime for cp1115.eqiad.wmnet
[13:23:17] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1115.eqiad.wmnet
[13:23:18] <wikibugs>	 (03PS1) 10Ladsgroup: mariadb: Set up dedicated cumin user [puppet] - 10https://gerrit.wikimedia.org/r/1020830
[13:23:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es1026 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020831 (https://phabricator.wikimedia.org/T349619)
[13:23:38] <Lucas_WMDE>	 kostajh: alternatively… do you even need to do anything? it looks like $wmgLocalServices['ipoid'] might be null on beta anyways
[13:23:54] <Lucas_WMDE>	 (checked in `mwscript shell testwiki` on deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud)
[13:23:56] <kostajh>	 yeah just reached that conclusion :)
[13:24:02] <kostajh>	 labs.php sets it to null
[13:24:05] <Lucas_WMDE>	 but I don’t know where $wmgLocalServices is set otherwise
[13:24:05] <Lucas_WMDE>	 ah ok
[13:24:06] <kostajh>	 I'll update the patches
[13:24:12] <Lucas_WMDE>	 yeah then that’s probably enough ^^
[13:24:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es1026 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020831 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[13:24:40] <Lucas_WMDE>	 could add a like // can be null, e.g. on Beta
[13:24:42] <Lucas_WMDE>	 maybe ^^
[13:24:44] <Lucas_WMDE>	 *a comment like
[13:25:14] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM.  Apart from eqiad and codw these private LVS ranges don't seem to have any usage, so I'm not sure if we should keep them longer term" [dns] - 10https://gerrit.wikimedia.org/r/1020823 (owner: 10Ssingh)
[13:25:23] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) ferm.service on mw1367:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:25:30] <sukhe>	 ^ this might be us, checking
[13:25:42] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] Setting dummy password for cumin dedicated mysql user [labs/private] - 10https://gerrit.wikimedia.org/r/1020828 (owner: 10Ladsgroup)
[13:26:01] <wikibugs>	 (03PS4) 10Anzx: mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653)
[13:26:10] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] Setting dummy password for cumin dedicated mysql user [labs/private] - 10https://gerrit.wikimedia.org/r/1020828 (owner: 10Ladsgroup)
[13:26:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:27:11] <wikibugs>	 (03PS3) 10Kosta Harlan: WikimediaEvents: Set IPoid URL and enable ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597)
[13:27:16] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:27:50] <sukhe>	 mw1367 should be recovering, we have seen this in the past where ferm doesn't reload and so needs a manual push
[13:27:54] <wikibugs>	 (03PS4) 10Kosta Harlan: WikimediaEvents: Set IPoid URL and enable ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597)
[13:28:02] <wikibugs>	 (03Abandoned) 10Kosta Harlan: WikimediaEvents: Set IPoid URL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015296 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:28:07] <wikibugs>	 (03Abandoned) 10Kosta Harlan: EventStreamConfig: Register ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015299 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:28:18] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:04-1] "sorry, just one small copy+paste mistake and then this should be good to go ^^" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:28:27] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:28:41] <kostajh>	 Lucas_WMDE: ready for review
[13:29:04] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1026.eqiad.wmnet
[13:29:12] <Lucas_WMDE>	 looking
[13:29:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9722407 (10Andrew) 05Open→03Resolved These are now in service and working fine.
[13:29:26] <Lucas_WMDE>	 oh, and another patch appeared on the wikitech page
[13:29:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host es1033.eqiad.wmnet
[13:30:23] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:30:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60774 and previous config saved to /var/cache/conftool/dbconfig/20240417-133040-root.json
[13:30:47] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] WikimediaEvents: Set IPoid URL and enable ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:31:12] <Lucas_WMDE>	 kostajh: I’ll go ahead with your change then
[13:31:16] <Lucas_WMDE>	 and then hopefully anzx right afterwards
[13:31:27] <Lucas_WMDE>	 DreamRimmer: not sure we’ll have time for your change :/
[13:31:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:31:35] <kostajh>	 Lucas_WMDE: ty
[13:32:05] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be)
[13:32:36] <DreamRimmer>	 no worries, we can see next time
[13:32:41] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch es1033 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020833 (https://phabricator.wikimedia.org/T349619)
[13:32:59] <wikibugs>	 (03Merged) 10jenkins-bot: WikimediaEvents: Set IPoid URL and enable ip_reputation/score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:33:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60775 and previous config saved to /var/cache/conftool/dbconfig/20240417-133318-marostegui.json
[13:33:21] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
[13:33:23] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
[13:33:25] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[13:33:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1015295|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597)]]
[13:33:38] <stashbot>	 T354597: Record IP reputation data for account creations and edits - https://phabricator.wikimedia.org/T354597
[13:33:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:34:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch es1033 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020833 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[13:34:53] <wikibugs>	 (03PS5) 10Anzx: mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653)
[13:35:10] <wikibugs>	 (03CR) 10Anzx: "i think enabl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:35:57] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "good to go once the current deployment is done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[13:36:30] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] geo-maps: add drmrs LVS map [dns] - 10https://gerrit.wikimedia.org/r/1020823 (owner: 10Ssingh)
[13:36:32] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 kharlan and lucaswerkmeister-wmde: Backport for [[gerrit:1015295|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:36:39] <wikibugs>	 (03PS1) 10Bking: query_service: enable CPU performance governor for w[cd]qs [puppet] - 10https://gerrit.wikimedia.org/r/1020834 (https://phabricator.wikimedia.org/T336443)
[13:36:43] <Lucas_WMDE>	 kostajh: is the production part of the change testable?
[13:36:48] <sukhe>	 !log running authdns-update for CR 1020823
[13:36:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:52] <kostajh>	 Lucas_WMDE: yes
[13:36:59] <Lucas_WMDE>	 okay, then please test :)
[13:37:06] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1020834 (https://phabricator.wikimedia.org/T336443) (owner: 10Bking)
[13:37:16] <kostajh>	 ok! I'll need a few minutes
[13:37:37] <wikibugs>	 (03CR) 10Eevans: {echo,session}store (staging): use wmf-ca-certificates.crt (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647) (owner: 10Eevans)
[13:37:41] <kostajh>	 Lucas_WMDE: which mwdebug backend to use?
[13:37:42] <wikibugs>	 (03PS3) 10Eevans: {echo,session}store (staging): use wmf-ca-certificates.crt [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647)
[13:37:48] <Lucas_WMDE>	 any of them
[13:37:59] <Lucas_WMDE>	 `scap backport` always syncs changes to all of them
[13:38:09] <Lucas_WMDE>	 (including “k8s-experimental” which will soon be renamed to be less experimental)
[13:38:16] <kostajh>	 ok
[13:39:28] <wikibugs>	 (03PS1) 10Slyngshede: Keymanagement, fix parsing and display of FIDO/U2F keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1020836
[13:39:43] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator 2024-04-04-132719 to 2024-04-17-125039 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020837 (https://phabricator.wikimedia.org/T302519)
[13:40:14] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:40:17] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1033.eqiad.wmnet
[13:41:56] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[13:43:26] <kostajh>	 Lucas_WMDE: I don't see the events generated when using https://wikitech.wikimedia.org/wiki/Kafka#kafkacat but it's possible I'm doing something wrong there. 
[13:43:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:44:16] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722479 (10MoritzMuehlenhoff)
[13:44:28] <Lucas_WMDE>	 hmm
[13:44:31] <kostajh>	 Lucas_WMDE: hmm, I do see an error in logstash https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.04.17?id=29JL7I4BXQUFBRtCE_7H
[13:44:51] <Lucas_WMDE>	 “Event submitted for unregistered stream name "mediawiki.ip_reputation.score"”
[13:45:06] <kostajh>	 yeah
[13:45:07] <wikibugs>	 (03CR) 10DCausse: [C:03+1] query_service: enable CPU performance governor for w[cd]qs [puppet] - 10https://gerrit.wikimedia.org/r/1020834 (https://phabricator.wikimedia.org/T336443) (owner: 10Bking)
[13:45:16] <kostajh>	 I think I have the wrong name reference in WikimediaEvents, looking
[13:45:21] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] realm: fix consistency for site IPs [puppet] - 10https://gerrit.wikimedia.org/r/1019843 (owner: 10Ssingh)
[13:45:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60776 and previous config saved to /var/cache/conftool/dbconfig/20240417-134545-root.json
[13:46:06] * Lucas_WMDE knows very little about event stream stuff
[13:46:33] <wikibugs>	 (03CR) 10Bking: [C:03+2] query_service: enable CPU performance governor for w[cd]qs [puppet] - 10https://gerrit.wikimedia.org/r/1020834 (https://phabricator.wikimedia.org/T336443) (owner: 10Bking)
[13:47:41] <Lucas_WMDE>	 not a huge fan of how https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Using_scap_backport and https://wikitech.wikimedia.org/wiki/Scap#Backport_Deployments point to each other saying “look over there for more details”
[13:47:57] <kostajh>	 Lucas_WMDE: I'm also confused about what I am doing wrong here.
[13:48:02] <Lucas_WMDE>	 but I think it’s clear enough how to revert the change if necessary
[13:48:46] <Lucas_WMDE>	 kostajh: is it urgent to get this configuration deployed? otherwise I’d say revert now, understand later :/
[13:49:29] <kostajh>	 Lucas_WMDE: yeah let's revert it. Sorry for the trouble.
[13:49:34] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Sync cancelled.
[13:49:59] <wikibugs>	 (03PS1) 10TrainBranchBot: Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020839
[13:49:59] <wikibugs>	 (03CR) 10TrainBranchBot: "lucaswerkmeister-wmde@deploy1002 created a revert of this change as I6a299ce2c67f81faa520a3366bd83657988f96f6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015295 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan)
[13:50:05] <Lucas_WMDE>	 kostajh: no problem at all
[13:50:18] <Lucas_WMDE>	 hopefully you’ll be able to figure out what’s wrong
[13:50:23] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:50:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020839 (owner: 10TrainBranchBot)
[13:51:24] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020839 (owner: 10TrainBranchBot)
[13:51:32] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Details: It produced [one error in logstash](https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020839 (owner: 10TrainBranchBot)
[13:51:53] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1020839|Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score"]]
[13:51:58] <wikibugs>	 (03PS2) 10Ssingh: geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722)
[13:52:12] <wikibugs>	 (03CR) 10Ssingh: "rebased for drmrs LVS change" [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[13:52:33] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
[13:52:46] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
[13:52:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60777 and previous config saved to /var/cache/conftool/dbconfig/20240417-135253-marostegui.json
[13:52:58] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[13:53:02] <wikibugs>	 (03PS1) 10Hashar: wm-zuul-status: filter based solely on change number [software/gerrit] (deploy/wmf/stable-3.8) - 10https://gerrit.wikimedia.org/r/1020840 (https://phabricator.wikimedia.org/T358253)
[13:53:41] <wikibugs>	 (03PS2) 10Slyngshede: Keymanagement, fix parsing and display of FIDO/U2F keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1020836
[13:53:49] <wikibugs>	 (03CR) 10Hashar: "It was a bit long to reach the task you filed back in February, but that is the implementation/fix :)" [software/gerrit] (deploy/wmf/stable-3.8) - 10https://gerrit.wikimedia.org/r/1020840 (https://phabricator.wikimedia.org/T358253) (owner: 10Hashar)
[13:53:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (7) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:54:23] <Lucas_WMDE>	 kostajh: maybe something like https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/992631 was missing?
[13:54:24] <Lucas_WMDE>	 *looks closer*
[13:55:06] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 trainbranchbot and lucaswerkmeister-wmde: Backport for [[gerrit:1020839|Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:55:13] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 trainbranchbot and lucaswerkmeister-wmde: Continuing with sync
[13:55:18] <Lucas_WMDE>	 (no need to test the revert I think)
[13:56:03] <wikibugs>	 (03PS3) 10Slyngshede: Keymanagement, fix parsing and display of FIDO/U2F keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1020836
[13:56:46] <wikibugs>	 (03PS3) 10Ssingh: geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722)
[13:57:04] <Lucas_WMDE>	 kostajh: although the streams next to the one you added (mediawiki.cirrussearch.page_rerender.v1, mediawiki.page-create) don’t show up in wgEventLoggingStreamNames either, so maybe that’s not the problem after all
[13:57:33] <wikibugs>	 (03PS4) 10Ssingh: geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722)
[13:57:37] <kostajh>	 Yeah those were the ones I was referencing when writing my patch
[13:58:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (7) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:58:58] <Lucas_WMDE>	 jouncebot: next
[13:58:59] <jouncebot>	 In 0 hour(s) and 1 minute(s): Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1400)
[13:59:04] <Lucas_WMDE>	 we’ll definitely run into that, sorry
[14:00:05] <jouncebot>	 Deploy window Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1400)
[14:00:10] <James_F>	 :-(
[14:00:14] <Lucas_WMDE>	 I’m still deploying, sorry :(
[14:00:26] <James_F>	 I mean, my deploy tool is different from yours, so I /can/ deploy.
[14:00:34] <James_F>	 But it's probably better that I don't. :-)
[14:00:38] <Lucas_WMDE>	 heh
[14:00:46] <Lucas_WMDE>	 I mean, in theory what I’m deploying right now should be a 100% no-op
[14:00:48] <domas>	 hello
[14:00:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60778 and previous config saved to /var/cache/conftool/dbconfig/20240417-140051-root.json
[14:00:56] <James_F>	 `helmfile` vs. `scap`.
[14:00:56] <James_F>	 Hey domas, how's life?
[14:00:57] <Lucas_WMDE>	 since it’s a revert of a change that never made it beyond mwdebug
[14:01:05] <Lucas_WMDE>	 so now the production servers are getting the same code deployed again (in theory)
[14:01:44] <Lucas_WMDE>	 (I also wanted to deploy anzx’ namespace change but I guess that’s not happening, damn)
[14:01:47] <domas>	 @James_F, same old same old! 
[14:01:55] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] geo-maps: add magru to geo maps [dns] - 10https://gerrit.wikimedia.org/r/1020825 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:01:55] <James_F>	 Lucas_WMDE: Ack.
[14:02:07] <sukhe>	 !log running authdns-update for adding magru to geo-maps: T346722
[14:02:10] * Lucas_WMDE peeks at kubectl for progress info
[14:02:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:12] <stashbot>	 T346722: Sao Paulo, Brazil, South America POP tracking task - https://phabricator.wikimedia.org/T346722
[14:03:10] <Lucas_WMDE>	 179/223 up-to-date, probably a few more minutes
[14:03:22] <domas>	 James_F, became an IG food influencer nowadays, doing everything I can do to avoid working on AI :-D 
[14:03:50] <James_F>	 domas: … isn't that just working /for/ AI, namely the feed algo? 
[14:03:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (6) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:04:16] <wikibugs>	 (03PS3) 10Ssingh: magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722)
[14:04:50] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+1] wikifunctions: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[14:05:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:05:23] <domas>	 true true
[14:05:39] <domas>	 has wikipedia been replaced by LLM yet?
[14:05:46] <domas>	 I saw there was some director-of-ML role !
[14:05:46] <James_F>	 Always has been.
[14:05:47] <wikibugs>	 (03PS1) 10Cathal Mooney: Add new BGP group for cross-rack PyBal peerings at L3 POPs [homer/public] - 10https://gerrit.wikimedia.org/r/1020843 (https://phabricator.wikimedia.org/T362772)
[14:05:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60779 and previous config saved to /var/cache/conftool/dbconfig/20240417-140549-ladsgroup.json
[14:05:53] <vgutierrez>	 !log depool ncredir2001
[14:06:01] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[14:06:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add new BGP group for cross-rack PyBal peerings at L3 POPs [homer/public] - 10https://gerrit.wikimedia.org/r/1020843 (https://phabricator.wikimedia.org/T362772) (owner: 10Cathal Mooney)
[14:06:55] <wikibugs>	 (03PS1) 10Cathal Mooney: Adjust LVS config in esams, drmrs to peer bit both ASWs [puppet] - 10https://gerrit.wikimedia.org/r/1020844 (https://phabricator.wikimedia.org/T362772)
[14:07:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adjust LVS config in esams, drmrs to peer bit both ASWs [puppet] - 10https://gerrit.wikimedia.org/r/1020844 (https://phabricator.wikimedia.org/T362772) (owner: 10Cathal Mooney)
[14:07:26] <wikibugs>	 (03PS3) 10Jforrester: wikifunctions: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[14:07:28] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[14:07:34] * domas looks at shard groups in DBs... they did not change in 10+ years?!!??!?
[14:07:58] <wikibugs>	 (03CR) 10Ssingh: "10:04:47 error: CNAME 'measure-magru.wikimedia.org.' points to known same-zone NXDOMAIN 'upload-lb.magru.wikimedia.org.'" [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:07:59] <James_F>	 domas: Not much; s5 is more the default for new wikis than s3, but otherwise we've scaled hardware just about fast enough.
[14:08:24] <James_F>	 Plus some feature re-writing / endless DB query tuning to keep pace.
[14:08:24] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Move apparmor annotation to pod template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020701 (https://phabricator.wikimedia.org/T326785) (owner: 10JMeybohm)
[14:08:29] <domas>	 yea, looks like operation is at the level where everything is still done manually! nice jobs program tho!
[14:08:42] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1020839|Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score"]] (duration: 16m 49s)
[14:08:43] <domas>	 /o\
[14:08:47] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: insetup::data_persistence
[14:08:51] <domas>	 just morebot is now stashbot :( 
[14:08:53] <James_F>	 Lucas_WMDE: All done?
[14:08:55] <domas>	 morebots
[14:09:02] <Lucas_WMDE>	 yes, sorry, you’re good to go
[14:09:05] <James_F>	 Awesome.
[14:09:14] <Lucas_WMDE>	 was distracted for a moment
[14:09:15] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:09:19] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:09:42] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:09:57] <domas>	 has mediawiki been implemented in wikifunctions yet?
[14:10:08] <James_F>	 No. :-)
[14:10:14] <domas>	 you're not serious people
[14:10:18] <James_F>	 Next step is async content loading for WF into MW pages.
[14:10:22] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:10:25] <James_F>	 Which'll be fun.
[14:10:48] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:11:56] <James_F>	 domas: Hackathon is in Tallinn in a couple of weeks' time. You should come by! ;-)
[14:13:08] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:13:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60780 and previous config saved to /var/cache/conftool/dbconfig/20240417-141314-marostegui.json
[14:13:17] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:13:21] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[14:13:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Unfortunately there wasn’t enough time to deploy this today, but it should be okay to deploy at any time later." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[14:13:49] <wikibugs>	 (03PS2) 10Cathal Mooney: Add new BGP group for cross-rack PyBal peerings at L3 POPs [homer/public] - 10https://gerrit.wikimedia.org/r/1020843 (https://phabricator.wikimedia.org/T362772)
[14:13:50] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:14:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add new BGP group for cross-rack PyBal peerings at L3 POPs [homer/public] - 10https://gerrit.wikimedia.org/r/1020843 (https://phabricator.wikimedia.org/T362772) (owner: 10Cathal Mooney)
[14:14:36] <James_F>	 domas: We now have fancy bot-maintained pages to tell us what bits of the DBs are likely to flake today: https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance ;-)
[14:14:39] <domas>	 James_F, heh, weird location
[14:14:55] <wikibugs>	 (03CR) 10Ssingh: Reverse DNS changes for new Magru prefixes (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[14:15:04] <domas>	 hah
[14:15:26] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:15:50] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch insetup::data_persistence to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020845 (https://phabricator.wikimedia.org/T349619)
[14:15:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60781 and previous config saved to /var/cache/conftool/dbconfig/20240417-141557-root.json
[14:16:10] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Upgrade orchestrator 2024-04-04-132719 to 2024-04-17-125039 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020837 (https://phabricator.wikimedia.org/T302519)
[14:16:13] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade orchestrator 2024-04-04-132719 to 2024-04-17-125039 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020837 (https://phabricator.wikimedia.org/T302519) (owner: 10Jforrester)
[14:16:29] <domas>	 jee this channel got spammy
[14:16:45] <wikibugs>	 (03PS5) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[14:17:10] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator 2024-04-04-132719 to 2024-04-17-125039 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020837 (https://phabricator.wikimedia.org/T302519) (owner: 10Jforrester)
[14:17:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[14:18:24] <wikibugs>	 (03PS6) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[14:18:34] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:18:35] <wikibugs>	 (03CR) 10Cathal Mooney: DNS zone changes for new Magru prefixes (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[14:18:46] <wikibugs>	 (03PS7) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[14:19:10] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:19:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[14:19:40] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:20:36] <sukhe>	 !log depool cp1114.eqiad.wmnet for PXE boot testing issues and downgrade NIC firmware: T350179
[14:20:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:43] <stashbot>	 T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179
[14:20:49] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1114.eqiad.wmnet,service=(cdn|ats-be)
[14:20:49] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:20:56] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:20:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60782 and previous config saved to /var/cache/conftool/dbconfig/20240417-142057-ladsgroup.json
[14:21:10] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1114.eqiad.wmnet
[14:21:13] <wikibugs>	 (03PS1) 10Majavah: P:toolforge::bastion: add rsync [puppet] - 10https://gerrit.wikimedia.org/r/1020847 (https://phabricator.wikimedia.org/T362679)
[14:22:00] <logmsgbot>	 !log sukhe@cumin1002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1114.eqiad.wmnet
[14:22:06] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:23:03] <James_F>	 (Done with our deploy window if others need it.)
[14:23:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch insetup::data_persistence to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020845 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:23:34] <Lucas_WMDE>	 anzx: if you’re still around, it sounds like we could deploy the mlwiki draft namespace now?
[14:23:48] <Lucas_WMDE>	 (unless someone objects ^^)
[14:23:51] <Lucas_WMDE>	 (and thanks James_F!)
[14:25:43] <domas>	 excited to see a database person as CTO ! 
[14:28:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P60783 and previous config saved to /var/cache/conftool/dbconfig/20240417-142823-marostegui.json
[14:28:32] <anzx>	 Lucas_WMDE: yeah I am available 
[14:28:50] <Lucas_WMDE>	 alright!
[14:29:21] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Deploying in a gap between other windows now (Wikifunctions didn’t need their full window)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[14:29:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[14:29:46] <wikibugs>	 (03PS6) 10Anzx: mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653)
[14:29:50] <wikibugs>	 (03CR) 10TrainBranchBot: "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[14:30:36] <wikibugs>	 (03Merged) 10jenkins-bot: mlwiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020242 (https://phabricator.wikimedia.org/T362653) (owner: 10Anzx)
[14:31:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1020242|mlwiki: create draft namespace (T362653)]]
[14:31:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60784 and previous config saved to /var/cache/conftool/dbconfig/20240417-143103-root.json
[14:31:16] <stashbot>	 T362653: Create Draft Namespace in Malayalam Wikipedia - https://phabricator.wikimedia.org/T362653
[14:33:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::data_persistence
[14:34:05] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 anzx and lucaswerkmeister-wmde: Backport for [[gerrit:1020242|mlwiki: create draft namespace (T362653)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:34:09] <Lucas_WMDE>	 anzx: please test :)
[14:34:27] <Lucas_WMDE>	 https://ml.wikipedia.org/wiki/Draft:XYZ redirects me to a URL with localized namespace name, that already sounds like a good sign
[14:34:58] <wikibugs>	 (03PS1) 10Muehlenhoff: Add explicit Hiera host entries for es1035-es1040,es2035-es2040 [puppet] - 10https://gerrit.wikimedia.org/r/1020849 (https://phabricator.wikimedia.org/T349619)
[14:36:06] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60785 and previous config saved to /var/cache/conftool/dbconfig/20240417-143606-ladsgroup.json
[14:36:12] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1020847 (https://phabricator.wikimedia.org/T362679) (owner: 10Majavah)
[14:37:23] <wikibugs>	 (03CR) 10Majavah: [C:03+2] P:toolforge::bastion: add rsync [puppet] - 10https://gerrit.wikimedia.org/r/1020847 (https://phabricator.wikimedia.org/T362679) (owner: 10Majavah)
[14:38:50] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:40:01] <Lucas_WMDE>	 anzx: are you still there?
[14:42:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove now obsolete Hiera host entries for Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020850 (https://phabricator.wikimedia.org/T349619)
[14:43:02] <wikibugs>	 (03PS1) 10Ssingh: hiera: add magru installserver in dhcp.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1020851 (https://phabricator.wikimedia.org/T346722)
[14:43:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P60786 and previous config saved to /var/cache/conftool/dbconfig/20240417-144330-marostegui.json
[14:44:15] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[14:44:20] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722701 (10MoritzMuehlenhoff)
[14:44:38] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[14:44:45] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1976/console" [puppet] - 10https://gerrit.wikimedia.org/r/1020851 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:44:48] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "ok" [puppet] - 10https://gerrit.wikimedia.org/r/1020851 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:44:56] <Lucas_WMDE>	 anzx: ping
[14:45:11] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] Add explicit Hiera host entries for es1035-es1040,es2035-es2040 [puppet] - 10https://gerrit.wikimedia.org/r/1020849 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:45:38] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] hiera: add magru installserver in dhcp.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1020851 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:46:00] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] hiera: add magru installserver in dhcp.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1020851 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[14:46:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add explicit Hiera host entries for es1035-es1040,es2035-es2040 [puppet] - 10https://gerrit.wikimedia.org/r/1020849 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:49:13] <Lucas_WMDE>	 no sign of anzx :(
[14:49:57] <Lucas_WMDE>	 as far as I can tell the namespace is working, so I’ll just go ahead and deploy it anyway
[14:50:01] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 anzx and lucaswerkmeister-wmde: Continuing with sync
[14:51:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60787 and previous config saved to /var/cache/conftool/dbconfig/20240417-145113-ladsgroup.json
[14:51:16] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[14:51:20] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[14:51:29] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[14:51:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P60788 and previous config saved to /var/cache/conftool/dbconfig/20240417-145136-ladsgroup.json
[14:51:55] <Lucas_WMDE>	 hm, those were some *very* suspiciously fast helmfile runs
[14:52:08] <wikibugs>	 (03PS1) 10Clément Goubert: kubernetes: move 6 appservers from codfw [puppet] - 10https://gerrit.wikimedia.org/r/1020852 (https://phabricator.wikimedia.org/T351074)
[14:52:17] <Lucas_WMDE>	 I’m not used to mw-web taking just 1 minute (eqiad) or 52 seconds (codfw)
[14:52:35] <claime>	 hmm
[14:53:05] <Lucas_WMDE>	 `kube_env mw-web eqiad; kubectl get deployments` reports 223 somethings though, that’s the same number as earlier
[14:53:15] <Lucas_WMDE>	 (somethings = pods, I think? 😅)
[14:53:49] <claime>	 pods are being replaced as we speak
[14:54:12] <claime>	 weird that helmfile returned before it was done replacing the pods
[14:54:17] <Lucas_WMDE>	 ohhhhh
[14:54:19] <Lucas_WMDE>	 no, I’m just an idiot
[14:54:24] <Lucas_WMDE>	 that was --selector name=canary :)
[14:54:28] <claime>	 hahaha
[14:54:30] <Lucas_WMDE>	 the --selector name=main ones are ongoing now
[14:54:32] <Lucas_WMDE>	 :D
[14:54:37] <claime>	 yeah, that's *a lot* faster
[14:54:46] <Lucas_WMDE>	 weird that :D
[14:55:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 986.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:55:27] <Lucas_WMDE>	 yeah now `kubectl get deployments` shows a lower number of up-to-date, as expected
[14:55:37] <Lucas_WMDE>	 jouncebot: next
[14:55:38] <jouncebot>	 In 2 hour(s) and 4 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1700)
[14:55:41] <claime>	 yes yes parsoid, you're slow, it's ok
[14:56:10] <claime>	 or is it though :/
[14:58:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60789 and previous config saved to /var/cache/conftool/dbconfig/20240417-145838-marostegui.json
[14:58:41] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
[14:58:43] <claime>	 It's serving more 400s than usual since 1443
[14:58:44] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[14:58:50] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:58:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
[14:58:56] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
[14:59:09] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
[14:59:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60790 and previous config saved to /var/cache/conftool/dbconfig/20240417-145916-marostegui.json
[14:59:39] <claime>	 latency is coming down, must have been a bunch of reparses
[15:00:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 935.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:01:24] <wikibugs>	 (03CR) 10Ssingh: DNS zone changes for new Magru prefixes (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:01:38] <anzx>	 Lucas_WMDE: sorry I didn't notice ping, was having lunch in that time, thanks for deploy 
[15:01:59] <Lucas_WMDE>	 you can still test it now, just in case any follow-up fixes are necessary ^^
[15:02:06] <anzx>	 Testing 
[15:02:22] <Lucas_WMDE>	 ok, thanks!
[15:02:39] <Lucas_WMDE>	 and I’ll run namespaceDupes in a moment, apparently there are 0 pages but 82 links to fix
[15:03:27] <anzx>	 Lucas_WMDE: looks good 
[15:03:33] <Lucas_WMDE>	 great, thanks!
[15:03:46] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1020242|mlwiki: create draft namespace (T362653)]] (duration: 32m 43s)
[15:03:59] <stashbot>	 T362653: Create Draft Namespace in Malayalam Wikipedia - https://phabricator.wikimedia.org/T362653
[15:04:21] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mlwiki --fix # T362653: 0 pages to fix, 0 were resolvable; 82 links to fix, 82 were resolvable, 0 were deleted.
[15:04:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:20] <Lucas_WMDE>	 !log UTC afternoon backport+config window (belatedly) done
[15:05:22] * Lucas_WMDE done
[15:05:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:33] <vgutierrez>	 !log repool ncredir2001
[15:06:34] <anzx>	 Lucas_WMDE: thank you 
[15:06:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1018256 (https://phabricator.wikimedia.org/T361066) (owner: 10Slyngshede)
[15:07:58] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
[15:08:06] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 10SRE-swift-storage, and 2 others: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9722796 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp1114.eqiad.wmnet with OS bul...
[15:08:24] <wikibugs>	 (03PS8) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[15:09:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:09:45] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp1115.eqiad.wmnet
[15:12:51] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[15:13:13] <wikibugs>	 (03CR) 10Muehlenhoff: Initial documentation for the Bitu API. (038 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/1020802 (owner: 10Slyngshede)
[15:13:14] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[15:13:36] <wikibugs>	 (03PS2) 10Ladsgroup: mariadb: Set up dedicated cumin user [puppet] - 10https://gerrit.wikimedia.org/r/1020830
[15:15:04] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: move 6 appservers from codfw [puppet] - 10https://gerrit.wikimedia.org/r/1020852 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[15:16:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Set up dedicated cumin user [puppet] - 10https://gerrit.wikimedia.org/r/1020830 (owner: 10Ladsgroup)
[15:16:41] <wikibugs>	 (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1020830 (owner: 10Ladsgroup)
[15:16:44] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
[15:16:46] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
[15:16:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60792 and previous config saved to /var/cache/conftool/dbconfig/20240417-151653-arnaudb.json
[15:17:22] <wikibugs>	 06SRE, 06serviceops, 10Data Products (Data Products Sprint 12), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9722869 (10WDoranWMF)
[15:17:27] <stashbot>	 T360332: Make the cupe_actor column nullable on WMF wikis - https://phabricator.wikimedia.org/T360332
[15:17:28] <wikibugs>	 (03PS3) 10Ladsgroup: mariadb: Set up dedicated cumin user [puppet] - 10https://gerrit.wikimedia.org/r/1020830
[15:17:54] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "I double checked the PTRs for ip6 and looks good now" [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:18:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2120 depool T358741', diff saved to https://phabricator.wikimedia.org/P60793 and previous config saved to /var/cache/conftool/dbconfig/20240417-151811-arnaudb.json
[15:18:27] <stashbot>	 T358741: Decommission db2096-db2120 - https://phabricator.wikimedia.org/T358741
[15:18:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2412:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2412 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[15:20:09] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: remove db2120 [puppet] - 10https://gerrit.wikimedia.org/r/1020716 (https://phabricator.wikimedia.org/T358741)
[15:20:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60794 and previous config saved to /var/cache/conftool/dbconfig/20240417-152023-marostegui.json
[15:20:38] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[15:21:02] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 10SRE-swift-storage, and 2 others: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9722934 (10ssingh) 05Open→03Resolved @Papaul deserves a lot of love for fixing this persistent issue. The 21.x firmware (specifica...
[15:22:39] <wikibugs>	 (03CR) 10Muehlenhoff: Keymanagement, fix parsing and display of FIDO/U2F keys (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1020836 (owner: 10Slyngshede)
[15:23:13] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looks good, thanks!" [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:25:49] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 10SRE-swift-storage, and 2 others: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9722986 (10MoritzMuehlenhoff) >>! In T350179#9722934, @ssingh wrote: > @Papaul deserves a lot of love for fixing this persistent issue...
[15:26:49] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 10SRE-swift-storage, and 2 others: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9722990 (10MatthewVernon) +1 to thanks to Papaul for getting to the bottom of this!
[15:27:09] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail, 10MediaWiki-Email: Old "Email this user" email is repeatedly resent - https://phabricator.wikimedia.org/T361860#9722988 (10jhathaway) Given that this has reoccurred and from the emails you provided looks to be duplication on the application layer I think we need...
[15:27:59] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
[15:28:03] <domas>	 ROB!
[15:28:20] <domas>	 Can you please unplug all cables!
[15:30:30] <topranks>	 !log making magru IPs live in netbox and generating DNS records with cookbook T362421
[15:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:40] <stashbot>	 T362421: magru network setup - https://phabricator.wikimedia.org/T362421
[15:31:34] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[15:31:50] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
[15:32:05] <wikibugs>	 (03PS1) 10Jdlrobson: Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.42.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1020736 (https://phabricator.wikimedia.org/T3603861)
[15:32:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60795 and previous config saved to /var/cache/conftool/dbconfig/20240417-153238-arnaudb.json
[15:32:40] <wikibugs>	 (03PS1) 10Jdlrobson: Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020737 (https://phabricator.wikimedia.org/T3603861)
[15:32:43] <stashbot>	 T360332: Make the cupe_actor column nullable on WMF wikis - https://phabricator.wikimedia.org/T360332
[15:32:53] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Grant Access to Superset for aitolkyn - https://phabricator.wikimedia.org/T362533#9723032 (10ssingh) 05Open→03Resolved a:03ssingh @Aitolkyn I am marking this as resolved but if that's not the case, please re-open it again thanks!
[15:33:04] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mariadb: remove db2120 [puppet] - 10https://gerrit.wikimedia.org/r/1020716 (https://phabricator.wikimedia.org/T358741) (owner: 10Arnaudb)
[15:33:20] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: remove db2120 [puppet] - 10https://gerrit.wikimedia.org/r/1020716 (https://phabricator.wikimedia.org/T358741) (owner: 10Arnaudb)
[15:33:44] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[15:34:34] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[15:34:34] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:34:50] <wikibugs>	 (03PS1) 10Jdlrobson: Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020854 (https://phabricator.wikimedia.org/T362726)
[15:35:31] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.decommission for hosts db2120.codfw.wmnet
[15:36:54] <wikibugs>	 (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1020830 (owner: 10Ladsgroup)
[15:39:29] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "ok for me" [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[15:40:39] <topranks>	 !log merging patch and updating dns servers with new magru ranges T362421
[15:40:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:51] <stashbot>	 T362421: magru network setup - https://phabricator.wikimedia.org/T362421
[15:41:27] <wikibugs>	 (03PS9) 10Cathal Mooney: DNS zone changes for new Magru ranges [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[15:42:10] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.dns.netbox
[15:42:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru ranges [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:44:41] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2120.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
[15:45:48] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2120.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
[15:45:48] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:45:49] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2120.codfw.wmnet
[15:47:19] <wikibugs>	 06SRE, 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9723080 (10elukey) Added some thoughts to T353622#9723070, I found out a big can of worms while testing staging :) The upgrade is more complex than...
[15:50:48] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[15:51:38] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
[15:52:37] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[15:53:07] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 10SRE-swift-storage, and 2 others: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9723209 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp1114.eqiad.wmnet with OS bul...
[15:53:28] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[15:53:28] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:53:59] <wikibugs>	 (03PS10) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[15:54:30] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[15:54:48] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: removes db2119 [puppet] - 10https://gerrit.wikimedia.org/r/1020717 (https://phabricator.wikimedia.org/T362790)
[15:56:43] <wikibugs>	 (03CR) 10Ladsgroup: "https://puppet-compiler.wmflabs.org/output/1020830/830/cumin1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1020830 (owner: 10Ladsgroup)
[15:57:41] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mariadb: removes db2119 [puppet] - 10https://gerrit.wikimedia.org/r/1020717 (https://phabricator.wikimedia.org/T362790) (owner: 10Arnaudb)
[15:57:57] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[15:59:45] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9723345 (10CDanis) >>! In T360029#9722005, @Ladsgroup wrote: >>>! In T360029#9658042, @CDanis wrote: > Actually the idea is that dbctl should...
[15:59:49] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:00:24] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Migrate geo-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014659 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:00:40] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:00:40] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:01:00] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: removes db2119 [puppet] - 10https://gerrit.wikimedia.org/r/1020717 (https://phabricator.wikimedia.org/T362790) (owner: 10Arnaudb)
[16:01:22] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate geo-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014659 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:02:21] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.decommission for hosts db2119.codfw.wmnet
[16:03:08] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/geo-analytics: apply
[16:03:27] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
[16:03:47] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/geo-analytics: apply
[16:03:56] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[16:04:06] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
[16:04:12] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
[16:04:16] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1114.eqiad.wmnet,service=(cdn|ats-be)
[16:04:39] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
[16:04:44] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2119 depool T358741', diff saved to https://phabricator.wikimedia.org/P60796 and previous config saved to /var/cache/conftool/dbconfig/20240417-160443-arnaudb.json
[16:04:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60797 and previous config saved to /var/cache/conftool/dbconfig/20240417-160451-arnaudb.json
[16:04:59] <stashbot>	 T358741: Decommission db2096-db2120 - https://phabricator.wikimedia.org/T358741
[16:05:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P60798 and previous config saved to /var/cache/conftool/dbconfig/20240417-160501-marostegui.json
[16:05:57] <logmsgbot>	 !log cdanis@cumin1002 conftool action : set/host_ip=69.69.69.69; selector: name=db1211
[16:06:02] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:06:06] <logmsgbot>	 !log cdanis@cumin1002 conftool action : set/host_ip=10.64.16.8; selector: name=db1211
[16:07:15] <logmsgbot>	 !log cdanis@cumin1002 conftool action : set/host_ip=1.1.1.1; selector: name=db1211
[16:07:21] <logmsgbot>	 !log cdanis@cumin1002 conftool action : set/host_ip=10.64.16.8; selector: name=db1211
[16:08:03] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9723385 (10CDanis) @Marostegui As it turns out, plain old `confctl` can be used to do this already.  You can for instance do `sh sudo confctl...
[16:08:35] <wikibugs>	 (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[16:08:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on mw2412:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2412 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:08:40] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:08:40] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:08:44] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.dns.netbox
[16:09:39] <cdanis>	 !log above conftool actions had no impact on production, no dbctl config commit was performed.
[16:09:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:08] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "The CI's diff is lovely, now the new ca file rendered is the one from values.yaml, namely the rootCa configured for the session store prod" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647) (owner: 10Eevans)
[16:10:08] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:10:09] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2119.codfw.wmnet
[16:10:58] <wikibugs>	 (03CR) 10Ladsgroup: "According to my reporting, the new user should be everywhere with all the rights." [puppet] - 10https://gerrit.wikimedia.org/r/1020830 (owner: 10Ladsgroup)
[16:12:21] <wikibugs>	 10ops-codfw, 10decommission-hardware, 13Patch-For-Review: decommission db2119.codfw.wmnet - https://phabricator.wikimedia.org/T362790#9723389 (10ABran-WMF) a:05ABran-WMF→03None
[16:13:21] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[16:13:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2412:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2412 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:14:42] <claime>	 !log restarted rsyslog on mw2412 - T357616
[16:14:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:47] <stashbot>	 T357616: Logs from containers sometimes not visible in logstash - https://phabricator.wikimedia.org/T357616
[16:17:41] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:18:33] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:18:33] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:18:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on mw2412:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2412 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:19:59] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60799 and previous config saved to /var/cache/conftool/dbconfig/20240417-161958-arnaudb.json
[16:20:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P60800 and previous config saved to /var/cache/conftool/dbconfig/20240417-162008-marostegui.json
[16:21:58] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9723417 (10Marostegui) That's awesome!! Then I guess the cookbook to orchestrate all this can be done? Do we need something else?
[16:24:21] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[16:24:36] <wikibugs>	 (03PS7) 10Btullis: Migrate image-suggestions to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014660 (https://phabricator.wikimedia.org/T360531)
[16:24:52] <wikibugs>	 (03PS7) 10JHathaway: Postfix profile [puppet] - 10https://gerrit.wikimedia.org/r/1019131 (https://phabricator.wikimedia.org/T325398)
[16:24:52] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Migrate media-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014661 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:25:10] <wikibugs>	 (03PS7) 10Btullis: Migrate media-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014661 (https://phabricator.wikimedia.org/T360531)
[16:25:44] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:27:08] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Migrate media-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014661 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:27:22] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[16:28:01] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Postfix profile [puppet] - 10https://gerrit.wikimedia.org/r/1019131 (https://phabricator.wikimedia.org/T325398) (owner: 10JHathaway)
[16:28:03] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate media-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014661 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:29:00] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:29:51] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:29:51] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:30:38] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[16:35:06] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60801 and previous config saved to /var/cache/conftool/dbconfig/20240417-163506-arnaudb.json
[16:35:20] <stashbot>	 T360332: Make the cupe_actor column nullable on WMF wikis - https://phabricator.wikimedia.org/T360332
[16:35:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60802 and previous config saved to /var/cache/conftool/dbconfig/20240417-163518-marostegui.json
[16:35:22] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
[16:35:25] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
[16:35:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60803 and previous config saved to /var/cache/conftool/dbconfig/20240417-163532-marostegui.json
[16:35:32] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[16:35:42] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:36:33] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
[16:36:33] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:36:40] <wikibugs>	 (03PS11) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[16:37:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[16:38:31] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/media-analytics: apply
[16:38:50] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/media-analytics: apply
[16:39:00] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/media-analytics: apply
[16:39:19] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
[16:39:32] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/media-analytics: apply
[16:39:48] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
[16:40:50] <wikibugs>	 (03PS7) 10Btullis: Migrate page-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014662 (https://phabricator.wikimedia.org/T360531)
[16:41:56] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Migrate page-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014662 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:42:55] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate page-analytics to use the new aqs-http-gateway chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014662 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis)
[16:44:57] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/page-analytics: apply
[16:45:17] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/page-analytics: apply
[16:45:22] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/page-analytics: apply
[16:45:41] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
[16:46:41] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/page-analytics: apply
[16:47:07] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
[16:50:35] <wikibugs>	 (03PS12) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[16:53:51] <wikibugs>	 (03PS13) 10Cathal Mooney: DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421)
[16:55:34] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] DNS zone changes for new Magru prefixes [dns] - 10https://gerrit.wikimedia.org/r/1020196 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[16:56:34] <topranks>	 !log running authdns-update to make magru dns records live T362421
[16:56:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:39] <stashbot>	 T362421: magru network setup - https://phabricator.wikimedia.org/T362421
[16:56:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60804 and previous config saved to /var/cache/conftool/dbconfig/20240417-165647-marostegui.json
[16:56:52] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[16:59:54] <wikibugs>	 (03PS4) 10Ssingh: magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722)
[17:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1700)
[17:01:29] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9723602 (10CDanis) I think you should be able to use the existing spicerack interface to confctl to do the `set/host_ip=...` action -- that sh...
[17:01:38] <wikibugs>	 10ops-magru, 06DC-Ops, 06Traffic: Q4:rack/setup/install cp70[01-16] - https://phabricator.wikimedia.org/T362729#9723606 (10RobH) >>! In T362729#9721040, @ssingh wrote: > Thanks for the task @RobH! As in the previous runs, please feel free to leave these for Traffic: >  > ` >  Update the operations/puppet rep...
[17:08:14] <wikibugs>	 (03PS8) 10CDobbins: purged: add PKI cert handling [puppet] - 10https://gerrit.wikimedia.org/r/1019866
[17:09:40] <wikibugs>	 (03CR) 10CDobbins: [V:03+1] "PCC SUCCESS (DIFF 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1019866 (owner: 10CDobbins)
[17:11:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P60805 and previous config saved to /var/cache/conftool/dbconfig/20240417-171154-marostegui.json
[17:14:06] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] magru: add geo-resources and update wikimedia.org zone [dns] - 10https://gerrit.wikimedia.org/r/1020827 (https://phabricator.wikimedia.org/T346722) (owner: 10Ssingh)
[17:14:27] <sukhe>	 !log running authdns-update for adding magru geo-resources/IPs: T346722
[17:14:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:14:32] <stashbot>	 T346722: Sao Paulo, Brazil, South America POP tracking task - https://phabricator.wikimedia.org/T346722
[17:18:13] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Configure prometheus endpoints on both services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020872
[17:20:16] <wikibugs>	 (03CR) 10Jforrester: [C:04-1] "https://integration.wikimedia.org/ci/job/helm-lint/16896/console shows now diff (except for the chart version), so this is probably wrong?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020872 (owner: 10Jforrester)
[17:21:28] <wikibugs>	 (03PS9) 10CDobbins: purged: add PKI cert handling [puppet] - 10https://gerrit.wikimedia.org/r/1019866
[17:22:30] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2024-04-17-125039 to 2024-04-17-163312 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020874
[17:22:50] <wikibugs>	 (03CR) 10CDobbins: [V:03+1] "PCC SUCCESS (CORE_DIFF 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1019866 (owner: 10CDobbins)
[17:27:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P60807 and previous config saved to /var/cache/conftool/dbconfig/20240417-172702-marostegui.json
[17:37:55] <wikibugs>	 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9723736 (10Ladsgroup) >>! In T360029#9723602, @CDanis wrote: > I don't see why you couldn't do a simple `subprocess.run` to do a commit, proba...
[17:42:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60808 and previous config saved to /var/cache/conftool/dbconfig/20240417-174210-marostegui.json
[17:42:13] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
[17:42:17] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[17:42:26] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
[17:42:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60809 and previous config saved to /var/cache/conftool/dbconfig/20240417-174233-marostegui.json
[17:44:44] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to 'wmf' ldap group for DErenrich to allow logstash access - https://phabricator.wikimedia.org/T362731#9723754 (10ssingh) @NBaca-WMF: This needs your approval, thanks!
[17:52:33] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf  for kgraessle - https://phabricator.wikimedia.org/T362812 (10Kgraessle) 03NEW
[17:54:31] <wikibugs>	 (03PS1) 10Ssingh: admin: add derenrich to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/1020879 (https://phabricator.wikimedia.org/T362731)
[17:57:28] <logmsgbot>	 !log ebernhardson@deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[17:57:35] <logmsgbot>	 !log ebernhardson@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[17:59:32] <logmsgbot>	 !log ebernhardson@deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[17:59:37] <logmsgbot>	 !log ebernhardson@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:00:04] <jouncebot>	 dancy and hashar: Deploy window MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T1800)
[18:00:58] <dancy>	 o/
[18:01:27] <wikibugs>	 (03PS1) 10Ssingh: admin: add kgraessle to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/1020881 (https://phabricator.wikimedia.org/T362812)
[18:01:46] * dancy browses logspam
[18:03:28] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Grant Access to ldap/wmf  for kgraessle - https://phabricator.wikimedia.org/T362812#9723831 (10ssingh) @DMburugu: this requires your approval, thanks!
[18:03:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60810 and previous config saved to /var/cache/conftool/dbconfig/20240417-180346-marostegui.json
[18:03:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:03:54] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[18:11:29] <wikibugs>	 (03PS1) 10Jdlrobson: Enable night mode in AMC for all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020883 (https://phabricator.wikimedia.org/T361555)
[18:13:50] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:14:02] <wikibugs>	 (03PS1) 10Jdlrobson: Enable limited width on all main pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020886 (https://phabricator.wikimedia.org/T357706)
[18:18:20] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.43.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020889 (https://phabricator.wikimedia.org/T361395)
[18:18:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 wikis to 1.43.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020889 (https://phabricator.wikimedia.org/T361395) (owner: 10TrainBranchBot)
[18:18:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P60812 and previous config saved to /var/cache/conftool/dbconfig/20240417-181854-marostegui.json
[18:19:05] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.43.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020889 (https://phabricator.wikimedia.org/T361395) (owner: 10TrainBranchBot)
[18:34:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P60813 and previous config saved to /var/cache/conftool/dbconfig/20240417-183401-marostegui.json
[18:35:33] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.1  refs T361395
[18:35:40] <stashbot>	 T361395: 1.43.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T361395
[18:43:46] <wikibugs>	 (03PS1) 10Cathal Mooney: Remove comment added in error [dns] - 10https://gerrit.wikimedia.org/r/1020901 (https://phabricator.wikimedia.org/T362421)
[18:49:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60814 and previous config saved to /var/cache/conftool/dbconfig/20240417-184908-marostegui.json
[18:49:11] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
[18:49:16] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[18:49:24] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
[18:49:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60815 and previous config saved to /var/cache/conftool/dbconfig/20240417-184931-marostegui.json
[18:50:19] <wikibugs>	 (03PS3) 10Cathal Mooney: Add new BGP group for cross-rack PyBal peerings at L3 POPs [homer/public] - 10https://gerrit.wikimedia.org/r/1020843 (https://phabricator.wikimedia.org/T362772)
[18:53:11] <dancy>	 Train is blocked on https://phabricator.wikimedia.org/T362817
[18:54:54] <wikibugs>	 (03PS2) 10Cathal Mooney: Adjust LVS config in esams, drmrs to peer bit both ASWs [puppet] - 10https://gerrit.wikimedia.org/r/1020844 (https://phabricator.wikimedia.org/T362772)
[18:56:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on kubernetes2040:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2040 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[18:58:39] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic, 13Patch-For-Review: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772#9724034 (10cmooney) I believe the two patches above, once merged, will add the required redundancy.  Following option 1 above, creatin...
[19:02:45] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] Remove comment added in error [dns] - 10https://gerrit.wikimedia.org/r/1020901 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[19:02:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic, 13Patch-For-Review: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772#9724049 (10cmooney) Perhaps one option would be to ignore the puppet patch to change drmrs and esams for now - but merge the Homer one...
[19:10:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60816 and previous config saved to /var/cache/conftool/dbconfig/20240417-191043-marostegui.json
[19:10:49] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[19:11:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on kubernetes2040:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2040 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:12:03] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06Traffic: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps - https://phabricator.wikimedia.org/T359054#9724065 (10CDanis) I largely agree with Arzhel's assessment.  At a cursory glance, Uruguay or Paraguay look ideal as first candidates....
[19:18:44] <wikibugs>	 (03PS1) 10Majavah: P:toolforge::bastion: install locales-all [puppet] - 10https://gerrit.wikimedia.org/r/1020906 (https://phabricator.wikimedia.org/T362680)
[19:19:47] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1020906 (https://phabricator.wikimedia.org/T362680) (owner: 10Majavah)
[19:25:48] <wikibugs>	 (03PS1) 10Jforrester: Revert "REST: Deprecate using "post" as the parameter source" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020910 (https://phabricator.wikimedia.org/T362817)
[19:25:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P60817 and previous config saved to /var/cache/conftool/dbconfig/20240417-192551-marostegui.json
[19:40:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P60818 and previous config saved to /var/cache/conftool/dbconfig/20240417-194058-marostegui.json
[19:43:38] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Update container image and increase metaspace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020913
[19:50:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "REST: Deprecate using "post" as the parameter source" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020910 (https://phabricator.wikimedia.org/T362817) (owner: 10Jforrester)
[19:50:56] <wikibugs>	 (03CR) 10Jforrester: "recheck" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020910 (https://phabricator.wikimedia.org/T362817) (owner: 10Jforrester)
[19:55:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on kubernetes2040:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2040 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:56:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60819 and previous config saved to /var/cache/conftool/dbconfig/20240417-195605-marostegui.json
[19:56:08] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
[19:56:11] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[19:56:21] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
[19:56:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60820 and previous config saved to /var/cache/conftool/dbconfig/20240417-195628-marostegui.json
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: gettimeofday() says it's time for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T2000)
[20:00:05] <jouncebot>	 Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:37] <Jdlrobson>	 o/
[20:01:23] <cjming>	 o/
[20:01:42] <cjming>	 Jdlrobson: can the 2 pairs of patches go out together? config + backports?
[20:03:19] <wikibugs>	 (03CR) 10Clare Ming: [C:03+2] Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.42.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1020736 (https://phabricator.wikimedia.org/T3603861) (owner: 10Jdlrobson)
[20:03:22] <wikibugs>	 (03CR) 10Clare Ming: [C:03+2] Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020737 (https://phabricator.wikimedia.org/T3603861) (owner: 10Jdlrobson)
[20:04:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020854 (https://phabricator.wikimedia.org/T362726) (owner: 10Jdlrobson)
[20:04:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020883 (https://phabricator.wikimedia.org/T361555) (owner: 10Jdlrobson)
[20:05:02] <Jdlrobson>	 cjming: they can yet
[20:05:04] <Jdlrobson>	 *yes
[20:05:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on kubernetes2040:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2040 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[20:06:10] <wikibugs>	 (03Merged) 10jenkins-bot: Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020854 (https://phabricator.wikimedia.org/T362726) (owner: 10Jdlrobson)
[20:06:12] <wikibugs>	 (03Merged) 10jenkins-bot: Enable night mode in AMC for all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020883 (https://phabricator.wikimedia.org/T361555) (owner: 10Jdlrobson)
[20:06:46] <logmsgbot>	 !log cjming@deploy1002 Started scap: Backport for [[gerrit:1020854|Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726)]], [[gerrit:1020883|Enable night mode in AMC for all projects (T361555)]]
[20:06:52] <stashbot>	 T362726: [config] Enable night mode styles on Vector 2022 skin - https://phabricator.wikimedia.org/T362726
[20:06:53] <stashbot>	 T361555: [Config] Enable night mode for logged in AMC users on mobile for more projects and include template namespace - https://phabricator.wikimedia.org/T361555
[20:09:48] <logmsgbot>	 !log cjming@deploy1002 cjming and jdlrobson: Backport for [[gerrit:1020854|Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726)]], [[gerrit:1020883|Enable night mode in AMC for all projects (T361555)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:09:52] <cjming>	 Jdlrobson: 1st 2 config patches on test servers
[20:10:39] <Jdlrobson>	 cjming: on it
[20:11:21] <Jdlrobson>	 cjming: and LGTM!
[20:11:28] <logmsgbot>	 !log cjming@deploy1002 cjming and jdlrobson: Continuing with sync
[20:17:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60821 and previous config saved to /var/cache/conftool/dbconfig/20240417-201733-marostegui.json
[20:17:50] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[20:21:15] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.42.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1020736 (https://phabricator.wikimedia.org/T3603861) (owner: 10Jdlrobson)
[20:23:12] <wikibugs>	 10ops-codfw, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9724322 (10RobH)
[20:24:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.303s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:24:49] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream tablet infobox styles [extensions/WikimediaMessages] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020737 (https://phabricator.wikimedia.org/T3603861) (owner: 10Jdlrobson)
[20:24:59] <logmsgbot>	 !log cjming@deploy1002 Finished scap: Backport for [[gerrit:1020854|Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726)]], [[gerrit:1020883|Enable night mode in AMC for all projects (T361555)]] (duration: 18m 13s)
[20:25:08] <stashbot>	 T362726: [config] Enable night mode styles on Vector 2022 skin - https://phabricator.wikimedia.org/T362726
[20:25:08] <stashbot>	 T361555: [Config] Enable night mode for logged in AMC users on mobile for more projects and include template namespace - https://phabricator.wikimedia.org/T361555
[20:26:01] <logmsgbot>	 !log cjming@deploy1002 Started scap: Backport for [[gerrit:1020736|Upstream tablet infobox styles (T3603861)]], [[gerrit:1020737|Upstream tablet infobox styles (T3603861)]]
[20:29:02] <logmsgbot>	 !log cjming@deploy1002 cjming and jdlrobson: Backport for [[gerrit:1020736|Upstream tablet infobox styles (T3603861)]], [[gerrit:1020737|Upstream tablet infobox styles (T3603861)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:29:07] <cjming>	 Jdlrobson: config patches should be live! backports on test servers
[20:29:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.125s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:29:39] <Jdlrobson>	 cjming: looking! :D
[20:29:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 909.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:30:48] <Jdlrobson>	 cjming: yep that's working! Please sync!
[20:30:53] <logmsgbot>	 !log cjming@deploy1002 cjming and jdlrobson: Continuing with sync
[20:32:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60822 and previous config saved to /var/cache/conftool/dbconfig/20240417-203241-marostegui.json
[20:35:29] <Jdlrobson>	 thanks cjming - how come this went so much quicker today?! :)
[20:36:28] <cjming>	 yw! shipping in pairs helps lol -- and i +2'd the backports 1st thing
[20:39:01] <Jdlrobson>	 makes sense
[20:39:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.023s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:40:29] <wikibugs>	 (03CR) 10Eevans: [C:03+2] {echo,session}store (staging): use wmf-ca-certificates.crt [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647) (owner: 10Eevans)
[20:41:30] <wikibugs>	 (03Merged) 10jenkins-bot: {echo,session}store (staging): use wmf-ca-certificates.crt [deployment-charts] - 10https://gerrit.wikimedia.org/r/1020356 (https://phabricator.wikimedia.org/T352647) (owner: 10Eevans)
[20:43:32] <logmsgbot>	 !log cjming@deploy1002 Finished scap: Backport for [[gerrit:1020736|Upstream tablet infobox styles (T3603861)]], [[gerrit:1020737|Upstream tablet infobox styles (T3603861)]] (duration: 17m 30s)
[20:43:34] <cjming>	 Jdlrobson: alrighty - backports should be live!
[20:43:35] <logmsgbot>	 !log eevans@deploy1002 helmfile [staging] START helmfile.d/services/sessionstore: apply
[20:44:13] <logmsgbot>	 !log eevans@deploy1002 helmfile [staging] DONE helmfile.d/services/sessionstore: apply
[20:44:19] <cjming>	 !log end of UTC late backport window
[20:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:31] <logmsgbot>	 !log eevans@deploy1002 helmfile [staging] START helmfile.d/services/echostore: apply
[20:44:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 959.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:44:55] <logmsgbot>	 !log eevans@deploy1002 helmfile [staging] DONE helmfile.d/services/echostore: apply
[20:47:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60823 and previous config saved to /var/cache/conftool/dbconfig/20240417-204748-marostegui.json
[20:48:20] <Jdlrobson>	 thanks cjming !
[20:49:00] <cjming>	 ur welcome!
[20:49:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 913.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:50:49] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] create ae.wikimedia.org for United Arab Emirates User Group [dns] - 10https://gerrit.wikimedia.org/r/1020311 (https://phabricator.wikimedia.org/T362529) (owner: 10Dzahn)
[20:51:12] <wikibugs>	 (03PS3) 10Dzahn: create ae.wikimedia.org for United Arab Emirates User Group [dns] - 10https://gerrit.wikimedia.org/r/1020311 (https://phabricator.wikimedia.org/T362529)
[20:55:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 864.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:00:05] <jouncebot>	 Deploy window Wikifunction Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T2100)
[21:00:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 857.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:02:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60824 and previous config saved to /var/cache/conftool/dbconfig/20240417-210256-marostegui.json
[21:03:02] <stashbot>	 T361627: Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis - https://phabricator.wikimedia.org/T361627
[21:06:29] <wikibugs>	 (03CR) 10Dzahn: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1020311 (https://phabricator.wikimedia.org/T362529) (owner: 10Dzahn)
[21:09:47] <mutante>	 !log DNS - created ae.wikimedia.org for United Arab Emirates User Group wiki - T362529
[21:09:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:52] <stashbot>	 T362529: Create a Wikimedians of United Arab Emirates User Group Wiki - https://phabricator.wikimedia.org/T362529
[21:15:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2413:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2413 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:15:52] <wikibugs>	 (03PS1) 10Zabe: Add Apache configuration for ae.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529)
[21:20:16] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] Add Apache configuration for ae.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529) (owner: 10Zabe)
[21:20:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on mw2413:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2413 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:26:33] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "created in DNS today - user group confirmed - and that we are using country TLD here" [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529) (owner: 10Zabe)
[21:29:18] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "follows  I23cb7cd2911ff7" [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529) (owner: 10Zabe)
[21:29:48] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1020087 (https://phabricator.wikimedia.org/T362421) (owner: 10Volans)
[21:30:34] <wikibugs>	 (03CR) 10Cathal Mooney: [C:04-1] "Acknowledged" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1020202 (owner: 10Ssingh)
[21:31:34] <wikibugs>	 (03PS2) 10Cathal Mooney: Remove comment added in error [dns] - 10https://gerrit.wikimedia.org/r/1020901 (https://phabricator.wikimedia.org/T362421)
[21:32:47] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Remove comment added in error [dns] - 10https://gerrit.wikimedia.org/r/1020901 (https://phabricator.wikimedia.org/T362421) (owner: 10Cathal Mooney)
[21:33:41] <wikibugs>	 (03PS34) 10Ryan Kemper: Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213) (owner: 10Bking)
[21:35:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213) (owner: 10Bking)
[21:35:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2414:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2414 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:39:57] <wikibugs>	 (03PS35) 10Bking: Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213)
[21:40:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on mw2414:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2414 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:41:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213) (owner: 10Bking)
[21:41:40] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Revert "REST: Deprecate using "post" as the parameter source" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020910 (https://phabricator.wikimedia.org/T362817) (owner: 10Jforrester)
[21:42:38] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 10Thumbor, and 6 others: Change default image thumbnail size - https://phabricator.wikimedia.org/T355914#9724526 (10Jdlrobson)
[21:46:42] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] "🎉 no more strange symlink!" [puppet] - 10https://gerrit.wikimedia.org/r/1020321 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy)
[21:47:23] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] scap.cfg.erb: Disable /srv/mediawiki-staging/php symlink management [puppet] - 10https://gerrit.wikimedia.org/r/1020321 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy)
[21:47:48] <mutante>	 jouncebot: nowandnext
[21:47:48] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): Wikifunction Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240417T2100)
[21:47:48] <jouncebot>	 In 8 hour(s) and 12 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240418T0600)
[21:47:49] <jouncebot>	 In 8 hour(s) and 12 minute(s): Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240418T0600)
[21:50:26] <mutante>	 !log deploying scap config change (gerrit:1020321) - [cumin2002:~] $ sudo cumin -b 4 -s 40 'C:scap AND mw*' 'run-puppet-agent' T359643
[21:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:30] <stashbot>	 T359643: Get rid of the /srv/mediawiki/php symbolic link - https://phabricator.wikimedia.org/T359643
[21:52:53] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "running puppet on all mw* via cumin, slowly" [puppet] - 10https://gerrit.wikimedia.org/r/1020321 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy)
[21:55:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.017s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:56:49] <wikibugs>	 (03PS36) 10Bking: Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213)
[21:57:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213) (owner: 10Bking)
[22:00:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 870.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:02:09] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove now obsolete Hiera host entries for Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020850 (https://phabricator.wikimedia.org/T349619)
[22:02:28] <wikibugs>	 (03PS37) 10Bking: Add Flink alerts for Cirrus Streaming Updater [alerts] - 10https://gerrit.wikimedia.org/r/1009359 (https://phabricator.wikimedia.org/T359213)
[22:02:30] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] Remove now obsolete Hiera host entries for Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1020850 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[22:03:50] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:03:52] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "REST: Deprecate using "post" as the parameter source" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1020910 (https://phabricator.wikimedia.org/T362817) (owner: 10Jforrester)
[22:06:31] <zabe>	 mutante: could you ping me when it is okay to deploy?
[22:07:19] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] "I haven't double-checked the policy or approvals or anything, but LGTM for the config change." [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529) (owner: 10Zabe)
[22:08:17] <mutante>	 zabe: unless I abort the cumin run, it will take hours but also no deployments are scheduled until in 8 hours?
[22:08:59] <zabe>	 heh, I +2'ed https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1020910 like 30min ago
[22:10:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 832.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:10:23] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (10) wdqs2013:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[22:10:23] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (2) wdqs2014:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[22:10:56] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 19 hosts with reason: T362508
[22:11:03] <stashbot>	 T362508: WDQS updater misbehaving in codfw - https://phabricator.wikimedia.org/T362508
[22:11:27] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 19 hosts with reason: T362508
[22:11:42] <mutante>	 just being cautious, maybe it's not a problem.. but deploying while scap config is being changed seems like it could potentially be messy
[22:11:49] <mutante>	 I can speed it up though
[22:13:50] <jinxer-wm>	 (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:14:24] <rzl>	 mutante: puppet will have run everywhere after 30 min anyhow, right? :)
[22:15:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 805.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:15:24] <mutante>	 rzl: that's right, I just canceled cumin
[22:15:36] <mutante>	 zabe: it's ok in ~ 8 minutes
[22:16:01] <mutante>	 checks where it was actually applied
[22:17:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.149s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:19:13] <zabe>	 alright
[22:21:14] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "approval via https://phabricator.wikimedia.org/T362529#9713714 and https://meta.wikimedia.org/wiki/Affiliations_Committee/Resolutions/Reco" [puppet] - 10https://gerrit.wikimedia.org/r/1020920 (https://phabricator.wikimedia.org/T362529) (owner: 10Zabe)
[22:21:16] <mutante>	 zabe: it's ok right now. scap.cfg was already edited on 141/143  mw* with scap.. and now on all. go ahead.
[22:22:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 934.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:24:32] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.36s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:24:51] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:1020910|Revert "REST: Deprecate using "post" as the parameter source" (T362817)]]
[22:25:01] <stashbot>	 T362817: PHP Deprecated: The "post" source is deprecated, use "body" instead [Called from MediaWiki\Rest\Validator\ParamValidatorCallbacks::getValue] - https://phabricator.wikimedia.org/T362817
[22:27:58] <logmsgbot>	 !log zabe@deploy1002 jforrester and zabe: Backport for [[gerrit:1020910|Revert "REST: Deprecate using "post" as the parameter source" (T362817)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:28:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2415:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2415 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[22:29:08] <logmsgbot>	 !log zabe@deploy1002 jforrester and zabe: Continuing with sync
[22:29:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.085s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:33:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on mw2415:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2415 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[22:38:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.183s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:42:06] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:1020910|Revert "REST: Deprecate using "post" as the parameter source" (T362817)]] (duration: 17m 14s)
[22:42:15] <stashbot>	 T362817: PHP Deprecated: The "post" source is deprecated, use "body" instead [Called from MediaWiki\Rest\Validator\ParamValidatorCallbacks::getValue] - https://phabricator.wikimedia.org/T362817
[22:42:34] * zabe done
[22:43:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 828.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:45:17] <mutante>	 zabe: nice confirmation that nothing was wrong with scap
[22:48:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 925.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:52:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60825 and previous config saved to /var/cache/conftool/dbconfig/20240417-225206-ladsgroup.json
[22:52:12] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[22:53:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 925.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:54:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 904.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:57:37] <wikibugs>	 (03PS1) 10Dzahn: ci: test data_rsync dest host change [puppet] - 10https://gerrit.wikimedia.org/r/1020949
[22:59:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 831.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:01:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw2318:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2318 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[23:03:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 932.8ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:07:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P60826 and previous config saved to /var/cache/conftool/dbconfig/20240417-230714-ladsgroup.json
[23:14:01] <mutante>	 !log rsyncing jenkins data from contint2002 to contint1002, pre-sync in preparation for migration next week - /srv/jenkins (291G) and much smaller zuul and jenkins data dirs T334517
[23:14:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:14:06] <stashbot>	 T334517: upgrade contint servers to bullseye - https://phabricator.wikimedia.org/T334517
[23:17:59] <wikibugs>	 (03Abandoned) 10Dzahn: ci: test data_rsync dest host change [puppet] - 10https://gerrit.wikimedia.org/r/1020949 (owner: 10Dzahn)
[23:18:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 818.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:19:57] <wikibugs>	 (03PS2) 10Dzahn: create wikipedia-pl-sysop.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1018747 (https://phabricator.wikimedia.org/T361041)
[23:20:39] <wikibugs>	 (03PS3) 10Dzahn: create wikipedia-pl-sysop.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1018747 (https://phabricator.wikimedia.org/T361041)
[23:20:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P60827 and previous config saved to /var/cache/conftool/dbconfig/20240417-232050-ladsgroup.json
[23:20:56] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[23:22:24] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P60828 and previous config saved to /var/cache/conftool/dbconfig/20240417-232221-ladsgroup.json
[23:22:55] <sukhe>	 !log sukhe@cp1114:~$ sudo -i haproxy-restart
[23:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:26:00] <wikibugs>	 (03PS1) 10Dzahn: ci: disable zuul merger on contint2002 for migration [puppet] - 10https://gerrit.wikimedia.org/r/1020950 (https://phabricator.wikimedia.org/T334517)
[23:27:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 848.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:31:38] <wikibugs>	 (03PS1) 10Dzahn: switch contint.wikimedia.org from contint2002 to contint1002 [dns] - 10https://gerrit.wikimedia.org/r/1020951 (https://phabricator.wikimedia.org/T334517)
[23:31:55] <wikibugs>	 (03CR) 10Dzahn: [C:04-2] "next week" [puppet] - 10https://gerrit.wikimedia.org/r/1020950 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[23:32:03] <wikibugs>	 (03CR) 10Dzahn: [C:04-2] "next week" [dns] - 10https://gerrit.wikimedia.org/r/1020951 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[23:35:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60829 and previous config saved to /var/cache/conftool/dbconfig/20240417-233557-ladsgroup.json
[23:37:32] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60830 and previous config saved to /var/cache/conftool/dbconfig/20240417-233731-ladsgroup.json
[23:37:34] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[23:37:37] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[23:37:47] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[23:38:13] <wikibugs>	 (03PS1) 10Dzahn: ci: switch contint manager_host from 2002 to 1002 [puppet] - 10https://gerrit.wikimedia.org/r/1020954 (https://phabricator.wikimedia.org/T334517)
[23:38:16] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1020721
[23:38:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1020721 (owner: 10TrainBranchBot)
[23:47:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 832.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:47:19] <wikibugs>	 (03PS1) 10Dzahn: ci: switch gearman_server IP from contint2002 to contint1002 [puppet] - 10https://gerrit.wikimedia.org/r/1020955 (https://phabricator.wikimedia.org/T334517)
[23:47:23] <logmsgbot>	 !log amastilovic@deploy1002 Started deploy [airflow-dags/analytics@c9d6969]: (no justification provided)
[23:48:00] <logmsgbot>	 !log amastilovic@deploy1002 Finished deploy [airflow-dags/analytics@c9d6969]: (no justification provided) (duration: 00m 37s)
[23:51:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60831 and previous config saved to /var/cache/conftool/dbconfig/20240417-235105-ladsgroup.json
[23:52:36] <wikibugs>	 (03PS1) 10Dzahn: ci: switch source and destination server for data rsync [puppet] - 10https://gerrit.wikimedia.org/r/1020957 (https://phabricator.wikimedia.org/T334517)
[23:54:38] <wikibugs>	 (03PS2) 10Dzahn: ci: switch gearman_server IP from contint2002 to contint1002 [puppet] - 10https://gerrit.wikimedia.org/r/1020955 (https://phabricator.wikimedia.org/T334517)
[23:57:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 845.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:59:30] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1020721 (owner: 10TrainBranchBot)