[00:01:16] <wikibugs>	 (03CR) 10Eccenux: "^" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051469 (https://phabricator.wikimedia.org/T368712) (owner: 10Wargo)
[00:01:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1051486 (owner: 10TrainBranchBot)
[00:02:07] <wikibugs>	 (03PS1) 10Arlolra: Change Linter log level to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487
[00:05:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json
[00:05:10] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[00:05:13] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[00:05:24] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[00:06:45] <wikibugs>	 (03CR) 10Arlolra: Change Linter log level to info (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487 (owner: 10Arlolra)
[00:09:38] <wikibugs>	 (03CR) 10Arlolra: [C:04-1] Change Linter log level to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487 (owner: 10Arlolra)
[00:15:13] <wikibugs>	 (03PS2) 10Arlolra: Change Linter log level to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487
[00:15:15] <wikibugs>	 10ops-eqiad, 06SRE, 10Cassandra, 06DC-Ops: Degraded RAID on aqs1013 - https://phabricator.wikimedia.org/T362033#9947564 (10wiki_willy) Hi @Eevans - since we've replaced all hardware parts on this host, and the error is still showing up, it doesn't seem like it's a hardware problem.  It's also really odd th...
[00:16:00] <wikibugs>	 (03CR) 10Arlolra: Change Linter log level to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487 (owner: 10Arlolra)
[00:16:16] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
[00:20:11] <wikibugs>	 (03PS1) 10RLazarus: deployment_server: Add a daily systemd timer for mwscript_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553)
[00:27:06] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
[01:15:53] <icinga-wm>	 PROBLEM - Disk space on restbase2023 is CRITICAL: DISK CRITICAL - free space: /srv/sdb4 96068 MB (5% inode=99%): /srv/sdc4 68760 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=restbase2023&var-datasource=codfw+prometheus/ops
[01:16:41] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
[01:16:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
[01:17:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
[01:17:05] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[01:17:57] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Had to check that "1 day" is a valid time span definition :) LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[01:25:41] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Hmmm ... actually, I ran a PCC diff on this, and indeed it complains about the interval definition [0]." [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[01:54:16] <jinxer-wm>	 FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:39:16] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:40:11] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 336.73 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:48:11] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 9.81 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:50:33] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:59:16] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:04:16] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:47:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
[03:47:54] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[03:56:02] <wikibugs>	 (03CR) 10Krinkle: Handle sso.wikimedia.org domain (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[04:00:34] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:02:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
[04:03:14] <wikibugs>	 (03CR) 10Krinkle: Handle sso.wikimedia.org domain (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[04:03:59] <wikibugs>	 (03CR) 10Krinkle: [C:04-1] "Looks like the wgLoadScript comment still applies." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[04:18:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
[04:19:48] <wikibugs>	 (03PS1) 10Andrew Bogott: deployment-prep mcrouter: replace old memc servers with new ones [puppet] - 10https://gerrit.wikimedia.org/r/1051499 (https://phabricator.wikimedia.org/T361384)
[04:33:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
[04:33:15] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
[04:33:16] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[04:33:28] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
[04:33:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
[04:46:36] <wikibugs>	 (03PS1) 10Marostegui: db22[21-40]: Add new hosts [puppet] - 10https://gerrit.wikimedia.org/r/1051500 (https://phabricator.wikimedia.org/T368922)
[04:47:26] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db22[21-40]: Add new hosts [puppet] - 10https://gerrit.wikimedia.org/r/1051500 (https://phabricator.wikimedia.org/T368922) (owner: 10Marostegui)
[04:50:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
[04:51:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
[04:51:12] <stashbot>	 T365805: Test MariaDB 10.11 - https://phabricator.wikimedia.org/T365805
[04:57:55] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9947893 (10Marostegui) >>! In T368098#9946355, @xcollazo wrote: >>>! In T36809...
[05:05:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
[05:06:02] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2204 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/1051502 (https://phabricator.wikimedia.org/T369130)
[05:06:40] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
[05:06:43] <stashbot>	 T369130: Switchover s2 master (db2207 -> db2204) - https://phabricator.wikimedia.org/T369130
[05:06:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
[05:07:04] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
[05:07:41] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db2204 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/1051502 (https://phabricator.wikimedia.org/T369130) (owner: 10Gerrit maintenance bot)
[05:14:19] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Add db22[21-40] [puppet] - 10https://gerrit.wikimedia.org/r/1051504 (https://phabricator.wikimedia.org/T368922)
[05:14:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] site.pp: Add db22[21-40] [puppet] - 10https://gerrit.wikimedia.org/r/1051504 (https://phabricator.wikimedia.org/T368922) (owner: 10Marostegui)
[05:20:06] <marostegui>	 !log Starting s2 codfw failover from db2207 to db2204 - T369130
[05:20:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:20:11] <stashbot>	 T369130: Switchover s2 master (db2207 -> db2204) - https://phabricator.wikimedia.org/T369130
[05:20:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
[05:20:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
[05:21:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
[05:22:58] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
[05:23:01] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
[05:23:48] <marostegui>	 !log Deploy schema change on db2207 s2 codfw dbmaint T367856
[05:23:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:51] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[05:23:58] <wikibugs>	 06SRE, 06serviceops, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9947951 (10SGupta-WMF) @xcollazo The column renaming is done to match api outp...
[05:35:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
[05:50:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T0600)
[06:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:05:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
[06:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:20:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
[06:43:58] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.dns.netbox
[06:46:38] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 - ayounsi@cumin1002"
[06:47:36] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 - ayounsi@cumin1002"
[06:47:36] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:58:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
[06:58:04] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[06:59:02] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] LDAP key sync: Improvements to SSH key sync with LDAP. [software/bitu] - 10https://gerrit.wikimedia.org/r/1051293 (https://phabricator.wikimedia.org/T366525) (owner: 10Slyngshede)
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T0700). nyaa~
[07:00:05] <jouncebot>	 wargo: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:32] <wikibugs>	 (03Merged) 10jenkins-bot: LDAP key sync: Improvements to SSH key sync with LDAP. [software/bitu] - 10https://gerrit.wikimedia.org/r/1051293 (https://phabricator.wikimedia.org/T366525) (owner: 10Slyngshede)
[07:01:54] <kart_>	 Can I deploy MinT since there is no patches to deploy in the backport/config?
[07:03:20] <kart_>	 1.. 2.. 3.. seems no one deploying. I'll go ahead.
[07:03:53] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update MinT to 2024-07-02-060114-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051290 (https://phabricator.wikimedia.org/T364525) (owner: 10KartikMistry)
[07:04:45] <wikibugs>	 (03Merged) 10jenkins-bot: Update MinT to 2024-07-02-060114-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051290 (https://phabricator.wikimedia.org/T364525) (owner: 10KartikMistry)
[07:07:29] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/machinetranslation: apply
[07:12:06] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
[07:13:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
[07:14:09] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/machinetranslation: apply
[07:17:43] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] DHCP: send subnet-mask 255.255.255.255 for routed ganeti VMs [puppet] - 10https://gerrit.wikimedia.org/r/1051366 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[07:21:59] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
[07:23:40] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
[07:24:49] <wikibugs>	 (03CR) 10Superpes15: [C:04-1] "It doesn't work like this, you have to follow logos/README.md and run Tox, thanks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051469 (https://phabricator.wikimedia.org/T368712) (owner: 10Wargo)
[07:28:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
[07:31:51] <wikibugs>	 (03PS1) 10JMeybohm: kubernetes: Remove etcd_urls from wikikube clusters [puppet] - 10https://gerrit.wikimedia.org/r/1051678 (https://phabricator.wikimedia.org/T353464)
[07:32:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: Remove etcd_urls from wikikube clusters [puppet] - 10https://gerrit.wikimedia.org/r/1051678 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm)
[07:32:19] <wikibugs>	 (03CR) 10JMeybohm: "Feel free to merge as you see fit" [puppet] - 10https://gerrit.wikimedia.org/r/1051678 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm)
[07:32:36] <wikibugs>	 (03PS2) 10JMeybohm: kubernetes: Remove etcd_urls from wikikube clusters [puppet] - 10https://gerrit.wikimedia.org/r/1051678 (https://phabricator.wikimedia.org/T353464)
[07:33:09] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051678 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm)
[07:33:38] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
[07:34:09] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] DHCP: send subnet-mask 255.255.255.255 for routed ganeti VMs [puppet] - 10https://gerrit.wikimedia.org/r/1051366 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[07:36:57] <kart_>	 !log Updated MinT to 2024-07-02-060114-production (T364525)
[07:37:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:37:00] <stashbot>	 T364525: Ignore extra spaces form source text in the MinT test instance - https://phabricator.wikimedia.org/T364525
[07:38:41] <wikibugs>	 (03PS1) 10Brouberol: OpenJDK: build JDK/JDE 17 production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051679 (https://phabricator.wikimedia.org/T363461)
[07:40:11] <wikibugs>	 (03PS2) 10Brouberol: OpenJDK: build JDK/JRE 17 production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051679 (https://phabricator.wikimedia.org/T363461)
[07:43:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
[07:43:25] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[07:47:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] logstash: route thumbor logs in routing filter [puppet] - 10https://gerrit.wikimedia.org/r/1051214 (https://phabricator.wikimedia.org/T368180) (owner: 10Cwhite)
[07:48:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] logstash: add curator delete job for ecs-k8s indices [puppet] - 10https://gerrit.wikimedia.org/r/1051427 (https://phabricator.wikimedia.org/T368186) (owner: 10Cwhite)
[07:50:41] <wikibugs>	 (03CR) 10Volans: "Given the comment from Andrew Otto on task I think it's fine with just Research as approvers." [puppet] - 10https://gerrit.wikimedia.org/r/1049239 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[07:52:25] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[07:52:38] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[07:52:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
[07:52:48] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[07:57:35] <wikibugs>	 (03PS2) 10Filippo Giunchedi: shellboxen: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043085 (https://phabricator.wikimedia.org/T320563)
[08:00:05] <jouncebot>	 hashar and jeena: Deploy window MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T0800)
[08:00:41] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948101 (10dcaro) Doing some tests this morning with rados bench from several of the nodes.  Running on 12 osd nodes...
[08:00:57] <hashar>	 jouncebot: now
[08:00:57] <jouncebot>	 For the next 1 hour(s) and 59 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T0800)
[08:00:58] <hashar>	 hi
[08:00:59] <hashar>	 ;)
[08:02:57] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.43.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051681 (https://phabricator.wikimedia.org/T366957)
[08:02:59] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 wikis to 1.43.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051681 (https://phabricator.wikimedia.org/T366957) (owner: 10TrainBranchBot)
[08:03:40] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.43.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051681 (https://phabricator.wikimedia.org/T366957) (owner: 10TrainBranchBot)
[08:04:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Cleaning up my queue, feel free to add me again as needed" [puppet] - 10https://gerrit.wikimedia.org/r/912872 (owner: 10Majavah)
[08:05:12] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 324.90 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[08:05:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Cleaning up my queue, feel free to add me again as needed" [puppet] - 10https://gerrit.wikimedia.org/r/966804 (https://phabricator.wikimedia.org/T288053) (owner: 10Majavah)
[08:09:22] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
[08:09:24] <logmsgbot>	 !log brouberol@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
[08:09:40] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
[08:10:05] <wikibugs>	 (03PS1) 10JMeybohm: Add securityContext to istio components [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051685 (https://phabricator.wikimedia.org/T362978)
[08:10:47] <wikibugs>	 (03CR) 10JMeybohm: "Differences in manifests are:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051685 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:11:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
[08:11:03] <stashbot>	 T365805: Test MariaDB 10.11 - https://phabricator.wikimedia.org/T365805
[08:11:25] <logmsgbot>	 !log hashar@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12  refs T366957
[08:11:27] <stashbot>	 T366957: 1.43.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T366957
[08:15:12] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[08:18:38] <logmsgbot>	 !log brouberol@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
[08:20:49] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1051458 (https://phabricator.wikimedia.org/T362330) (owner: 10Ayounsi)
[08:22:36] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
[08:22:41] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "LGTM! I guess that a similar thing should be done for istio sidecars in ML-land, adding Tobias as FYI." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051685 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:23:13] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9948167 (10WMDECyn) hello @Dzahn , @AndyRussG  WMDE email address is: andrew.green@extern.wikimedia.de in case this is still required.
[08:23:43] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Add securityContext to istio components [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051685 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:24:18] <wikibugs>	 (03Merged) 10jenkins-bot: Add securityContext to istio components [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051685 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:31:04] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, and 2 others: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9948184 (10elukey) Reporting some thoughts from IRC:  ` 10:48  <elukey> Generic question about the future of puppet-merge, I'll write some...
[08:31:46] <logmsgbot>	 !log brouberol@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
[08:35:44] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
[08:36:28] <godog>	 jouncebot: now and next
[08:36:29] <jouncebot>	 For the next 1 hour(s) and 23 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T0800)
[08:36:36] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Routed Ganeti: add public v4 tap_ip [puppet] - 10https://gerrit.wikimedia.org/r/1051458 (https://phabricator.wikimedia.org/T362330) (owner: 10Ayounsi)
[08:36:48] <godog>	 I'm going ahead with a few mesh tracing patches for non-mw services btw
[08:37:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] shellboxen: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043085 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[08:38:31] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[08:38:55] <wikibugs>	 (03PS1) 10Jgiannelos: pcs: Connect to eventgate staging using ip for debugging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688
[08:39:01] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[08:39:06] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[08:39:37] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[08:39:59] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[08:40:11] <wikibugs>	 (03PS2) 10Jgiannelos: pcs: Connect to eventgate staging using ipv4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688
[08:40:13] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[08:40:13] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[08:40:22] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[08:40:29] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[08:40:35] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
[08:40:37] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[08:40:47] <wikibugs>	 (03PS1) 10JMeybohm: Add securityContext to opentelemetry pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051690 (https://phabricator.wikimedia.org/T362978)
[08:40:48] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
[08:40:49] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.dns.netbox
[08:40:51] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[08:40:57] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[08:41:20] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[08:41:23] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[08:41:27] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[08:41:28] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[08:41:54] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[08:41:58] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox: apply
[08:42:13] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
[08:42:38] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[08:42:39] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/shellbox: apply
[08:42:54] <Lucas_WMDE>	 how do I clean up after a broken mwscript-k8s command?
[08:42:59] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "consider having interface primary from facts instead of hiera. Can't think of a VM in toolforge with an interface different than interface" [puppet] - 10https://gerrit.wikimedia.org/r/1051444 (https://phabricator.wikimedia.org/T311905) (owner: 10Andrew Bogott)
[08:43:04] <Lucas_WMDE>	 there’s a broken pod in `kube_env mw-script eqiad` now
[08:43:15] <Lucas_WMDE>	 (`mwscript-cleanup --dry-run eqiad` prints no output…)
[08:43:16] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
[08:43:24] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
[08:43:32] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Let's find out!" [puppet] - 10https://gerrit.wikimedia.org/r/1051415 (https://phabricator.wikimedia.org/T367076) (owner: 10Kamila Součková)
[08:44:26] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
[08:44:26] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:44:26] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
[08:44:29] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
[08:44:42] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
[08:44:43] <wikibugs>	 (03PS4) 10Filippo Giunchedi: Allow running CI in a container when using rootless podman [deployment-charts] - 10https://gerrit.wikimedia.org/r/1040218 (owner: 10Giuseppe Lavagetto)
[08:44:43] <wikibugs>	 (03PS3) 10Filippo Giunchedi: wikifeeds: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043078 (https://phabricator.wikimedia.org/T320563)
[08:44:43] <wikibugs>	 (03PS2) 10Filippo Giunchedi: mobileapps: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043107 (https://phabricator.wikimedia.org/T320563)
[08:44:58] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
[08:45:49] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948214 (10dcaro) To minimize the routers load I'm going to use a spread-out set of nodes for the tests and try agai...
[08:45:52] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
[08:45:59] <Lucas_WMDE>	 okay, if I add --debug I can see that mwscript-cleanup is skipping release r72z2aop because the job completed recently
[08:46:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] wikifeeds: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043078 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[08:46:50] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
[08:47:38] <claime>	 Lucas_WMDE: yeah the script skips removing deployments if they're less 5 minutes old
[08:47:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM! Adding Chris too as heads up" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051690 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:47:56] <claime>	 (I'm currently reading it)
[08:48:16] <Lucas_WMDE>	 alright, this worked:
[08:48:16] <Lucas_WMDE>	 RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy
[08:48:28] <Lucas_WMDE>	 just cobbled together from what mwscript-cleanup would’ve done ^^
[08:48:36] <Lucas_WMDE>	 (should I !log that?)
[08:48:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] wikifeeds: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043078 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[08:48:49] <wikibugs>	 (03PS4) 10Filippo Giunchedi: wikifeeds: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043078 (https://phabricator.wikimedia.org/T320563)
[08:49:02] <hashar>	 Lucas_WMDE: if in doubt: !log
[08:49:15] <Lucas_WMDE>	 sure
[08:49:26] <Lucas_WMDE>	 !log RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
[08:49:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] Add securityContext to opentelemetry pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051690 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:49:32] <hashar>	 maybe that can prove to be helpful later ;)
[08:49:35] <claime>	 Lucas_WMDE: You can yeah. Also drop a message to r.zl to ask why we're not cleaning up fail releases immediately
[08:49:42] <claime>	 failed*
[08:49:53] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Add securityContext to opentelemetry pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051690 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:51:35] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
[08:51:40] <Lucas_WMDE>	 claime: I’m filing a few tasks yeah
[08:51:48] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
[08:53:02] <jayme>	 !log deployed istio (adding securityContext) to wikikube clusters - T362978
[08:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:05] <stashbot>	 T362978: Update all helm modules and charts to be compatible with the restricted PSS - https://phabricator.wikimedia.org/T362978
[08:54:12] <wikibugs>	 (03Merged) 10jenkins-bot: Add securityContext to opentelemetry pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051690 (https://phabricator.wikimedia.org/T362978) (owner: 10JMeybohm)
[08:56:33] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948349 (10dcaro) using 12 spread nodes hits the discards again:  {F56197512}  and nothing popping up on the disks s...
[08:57:11] <wikibugs>	 (03PS1) 10Matthias Mullie: Handle campaigns where wikibase is not enabled [extensions/UploadWizard] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051696 (https://phabricator.wikimedia.org/T369085)
[08:57:13] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[08:57:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] wikifeeds: enable mesh tracing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1043078 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[08:57:37] <Lucas_WMDE>	 claime: created T369142 and T369143 if you’re interested
[08:57:38] <stashbot>	 T369142: Show more useful information when mwscript-k8s fails to launch - https://phabricator.wikimedia.org/T369142
[08:57:38] <stashbot>	 T369143: Allow cleaning up specific mwscript-k8s runs - https://phabricator.wikimedia.org/T369143
[08:58:07] <wikibugs>	 (03CR) 10Matthias Mullie: [C:03+2] Handle campaigns where wikibase is not enabled [extensions/UploadWizard] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051696 (https://phabricator.wikimedia.org/T369085) (owner: 10Matthias Mullie)
[08:58:37] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
[08:58:49] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[08:59:25] <claime>	 Lucas_WMDE: Thanks for that
[08:59:59] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/wikifeeds: apply
[09:00:20] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
[09:00:21] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
[09:00:52] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
[09:01:12] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[09:01:19] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[09:01:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/UploadWizard] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051696 (https://phabricator.wikimedia.org/T369085) (owner: 10Matthias Mullie)
[09:02:04] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
[09:02:41] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
[09:04:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, 06serviceops: kubernetes1051.eqiad.wmnet failed to pull mediawiki images - https://phabricator.wikimedia.org/T369011#9948452 (10JMeybohm) I've deleted the node from the k8s API as a required istio update would not finish successfully because it was waiting...
[09:06:02] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
[09:07:34] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Add DSCP marking options to current firewall classes [puppet] - 10https://gerrit.wikimedia.org/r/1007437 (https://phabricator.wikimedia.org/T339850) (owner: 10Cathal Mooney)
[09:08:19] <wikibugs>	 (03Merged) 10jenkins-bot: Handle campaigns where wikibase is not enabled [extensions/UploadWizard] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051696 (https://phabricator.wikimedia.org/T369085) (owner: 10Matthias Mullie)
[09:09:06] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
[09:11:47] <wikibugs>	 10SRE-swift-storage, 10CX-deployments, 10LPL Essential, 10MinT: Provide better long-term storage for translation models - https://phabricator.wikimedia.org/T335491#9948480 (10Pginer-WMF)
[09:13:37] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "I agree 100%" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051407 (https://phabricator.wikimedia.org/T251812) (owner: 10Alexandros Kosiaris)
[09:14:02] <wikibugs>	 (03PS5) 10Filippo Giunchedi: Allow running CI in a container when using rootless podman [deployment-charts] - 10https://gerrit.wikimedia.org/r/1040218 (owner: 10Giuseppe Lavagetto)
[09:14:03] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wikifeeds: lower tracing sample rate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051699 (https://phabricator.wikimedia.org/T320563)
[09:15:09] <matthiasmullie>	 gah
[09:15:54] <matthiasmullie>	 I accidentally +2'ed (now merged) a patch-to-be-backported later today
[09:16:06] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] wikifeeds: lower tracing sample rate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051699 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[09:16:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] wikifeeds: lower tracing sample rate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051699 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[09:16:26] <matthiasmullie>	 should we leave it merged (and deploy in couple of hrs), revert, or deploy now?
[09:16:28] <wikibugs>	 (03PS2) 10Filippo Giunchedi: wikifeeds: lower tracing sample rate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051699 (https://phabricator.wikimedia.org/T320563)
[09:17:06] <Lucas_WMDE>	 matthiasmullie: I’d lean towards “deploy now” if that’s okay with hashar and jeena 
[09:17:15] <wikibugs>	 (03PS4) 10Anzx: mswikisource: create author and translation namespaces and add namespace aliases  [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047)
[09:17:20] <hashar>	 please do yes
[09:17:27] <matthiasmullie>	 will do, thanks
[09:17:40] <matthiasmullie>	 and thanks Lucas_WMDE for pointing it out; wasn't aware I had merged that :D
[09:17:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] wikifeeds: lower tracing sample rate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051699 (https://phabricator.wikimedia.org/T320563) (owner: 10Filippo Giunchedi)
[09:17:48] <Lucas_WMDE>	 np ^^
[09:17:53] <hashar>	 hehe
[09:17:54] <Lucas_WMDE>	 I was just randomly looking at the deployment calendar and noticed it
[09:18:09] <Lucas_WMDE>	 (and apparently I happened to look at it like one or two minutes after the merge)
[09:18:13] <hashar>	 if it is already merged, I imagine it is quite quick to deploy it
[09:18:15] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "One question though - should we merge this after removing it from the api-gateway in deployment-charts first?" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051407 (https://phabricator.wikimedia.org/T251812) (owner: 10Alexandros Kosiaris)
[09:18:56] <logmsgbot>	 !log mlitn@deploy1002 Started scap sync-world: Backport for [[gerrit:1051696|Handle campaigns where wikibase is not enabled (T369085)]]
[09:18:59] <stashbot>	 T369085: Cannot upload! – TypeError: Cannot read properties of undefined (reading 'dataValueType') - https://phabricator.wikimedia.org/T369085
[09:19:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
[09:19:59] <stashbot>	 T365805: Test MariaDB 10.11 - https://phabricator.wikimedia.org/T365805
[09:20:28] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] START helmfile.d/services/wikifeeds: apply
[09:20:35] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
[09:20:35] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
[09:20:42] <logmsgbot>	 !log filippo@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
[09:20:43] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
[09:20:55] <logmsgbot>	 !log filippo@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
[09:21:29] <logmsgbot>	 !log mlitn@deploy1002 mlitn: Backport for [[gerrit:1051696|Handle campaigns where wikibase is not enabled (T369085)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:26:13] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
[09:26:33] <logmsgbot>	 !log mlitn@deploy1002 mlitn: Continuing with sync
[09:27:30] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
[09:28:27] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948601 (10dcaro) Created the data: ` dcaro@cumin1002:~$ sudo cumin -x cloudcephosd[1006,1016,1021].eqiad.wmnet,clou...
[09:31:56] <logmsgbot>	 !log mlitn@deploy1002 Finished scap: Backport for [[gerrit:1051696|Handle campaigns where wikibase is not enabled (T369085)]] (duration: 12m 59s)
[09:32:00] <stashbot>	 T369085: Cannot upload! – TypeError: Cannot read properties of undefined (reading 'dataValueType') - https://phabricator.wikimedia.org/T369085
[09:32:43] <matthiasmullie>	 Lucas_WMDE & hashar - backport of my messed up merge is complete; thanks!
[09:32:53] <Lucas_WMDE>	 \o/ thanks for deploying it ^^
[09:33:32] <matthiasmullie>	 haha; was the least I could do :D
[09:37:29] <wikibugs>	 10SRE-swift-storage, 10CX-deployments, 10LPL Essential, 10MinT: Provide better long-term storage for translation models - https://phabricator.wikimedia.org/T335491#9948650 (10elukey) >>! In T335491#9925777, @santhosh wrote: > @elukey Thanks for these details. Currently in our code, models are downloaded [[...
[09:41:48] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948660 (10dcaro) Compare the traffic generated when the cluster is rebalancing some data:  {F56198741}  :/
[09:46:38] <hashar>	 matthiasmullie: congratulations! :)
[09:48:50] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Create Quality of Service design for WMF internal networks - https://phabricator.wikimedia.org/T316358#9948690 (10cmooney) 05Open→03Resolved Gonna close this one as the design is finalised, see detail on wikitech here:  https://wikitech.wikimedia.org/wik...
[09:49:48] <logmsgbot>	 !log andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
[09:49:55] <logmsgbot>	 !log andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
[09:51:08] <wikibugs>	 (03PS1) 10Gmodena: beta: eventbus: enable instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051709 (https://phabricator.wikimedia.org/T363587)
[09:53:55] <wikibugs>	 (03PS3) 10Jgiannelos: pcs: Connect to eventgate staging using cluster IP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688 (https://phabricator.wikimedia.org/T366819)
[09:58:21] <wikibugs>	 (03PS1) 10Clément Goubert: mw-on-k8s: Move php.envvars to mediawiki-common [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051711 (https://phabricator.wikimedia.org/T365265)
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1000)
[10:01:28] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] pcs: Connect to eventgate staging using cluster IP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688 (https://phabricator.wikimedia.org/T366819) (owner: 10Jgiannelos)
[10:05:47] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] pcs: Connect to eventgate staging using cluster IP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688 (https://phabricator.wikimedia.org/T366819) (owner: 10Jgiannelos)
[10:06:42] <wikibugs>	 (03Merged) 10jenkins-bot: pcs: Connect to eventgate staging using cluster IP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051688 (https://phabricator.wikimedia.org/T366819) (owner: 10Jgiannelos)
[10:12:28] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9948751 (10dcaro) Ok, using 16 nodes, with 64 parallel operations each still does not trigger any issues on the driv...
[10:16:47] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051716
[10:28:48] <wikibugs>	 (03PS1) 10Jgiannelos: mobileapps: Bump staging to latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051719
[10:28:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mobileapps: Bump staging to latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051719 (owner: 10Jgiannelos)
[10:29:01] <wikibugs>	 (03PS2) 10Jgiannelos: mobileapps: Bump staging to latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051719
[10:29:13] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Thumbor, 06Traffic: Cache thumbs in our caching infrastructure (e.g. ATS) - https://phabricator.wikimedia.org/T345334#9948790 (10Midleading) Due to T266155, I have to keep refreshing the category page, about 5~10 times, until all 200 thumbnails are generated. Therefore some "c...
[10:30:37] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] mobileapps: Bump staging to latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051719 (owner: 10Jgiannelos)
[10:31:25] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: Bump staging to latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051719 (owner: 10Jgiannelos)
[10:32:06] <wikibugs>	 (03PS1) 10Btullis: Add an-conf100[4-6] role and partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/1051720 (https://phabricator.wikimedia.org/T364429)
[10:32:35] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] START helmfile.d/services/mobileapps: apply
[10:32:41] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] START helmfile.d/services/mobileapps: apply
[10:33:07] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[10:36:50] <wikibugs>	 (03PS2) 10Clément Goubert: mw-on-k8s: Move php.envvars to mediawiki-common [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051711 (https://phabricator.wikimedia.org/T365265)
[10:37:55] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+1] Change Linter log level to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051487 (owner: 10Arlolra)
[10:38:20] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
[10:38:33] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
[10:38:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
[10:38:43] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[10:41:53] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051679 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[10:45:19] <wikibugs>	 (03CR) 10Milimetric: [C:03+1] Add wikilambda_zobject_join to puppet script for sqooping Wikifunctions tables [puppet] - 10https://gerrit.wikimedia.org/r/1041817 (https://phabricator.wikimedia.org/T363435) (owner: 10David Martin)
[10:52:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] mw-on-k8s: Move php.envvars to mediawiki-common [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051711 (https://phabricator.wikimedia.org/T365265) (owner: 10Clément Goubert)
[10:54:53] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, and 2 others: 502 Server Hangup Error on esams for "Upload a new version of this file" on Special:Upload on Commons - https://phabricator.wikimedia.org/T247454#9948905 (10Aklapper) 05Stalled→03Invalid Unfortunately closing this Phabricat...
[10:55:31] <wikibugs>	 (03PS3) 10Fabfur: benthos:cache: encode problematic fields as b64url [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718)
[10:58:46] <wikibugs>	 (03PS4) 10Fabfur: benthos:cache: encode problematic fields as b64url [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718)
[10:59:37] <wikibugs>	 (03CR) 10Fabfur: benthos:cache: encode problematic fields as b64url (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[11:00:01] <wikibugs>	 (03CR) 10Fabfur: "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[11:00:05] <jouncebot>	 mvolz: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1100).
[11:02:36] <mvolz>	 I moved this window to later to day but I guess the bot put it back. Anyway, this window is free :).
[11:03:50] <Lucas_WMDE>	 jouncebot: now
[11:03:50] <jouncebot>	 For the next 0 hour(s) and 56 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1100)
[11:03:53] <Lucas_WMDE>	 jouncebot: refresh
[11:03:53] <jouncebot>	 I refreshed my knowledge about deployments.
[11:03:55] <Lucas_WMDE>	 jouncebot: now
[11:03:55] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 56 minute(s)
[11:04:04] <Lucas_WMDE>	 I thought it was supposed to auto-refresh before each window, weird
[11:04:22] <Lucas_WMDE>	 maybe it was only done for the backport+config windows? idk
[11:06:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
[11:06:31] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[11:08:47] <Lucas_WMDE>	 ah, IIUC it’ll refresh the *contents* of each window just before notifying about it, but if the window was dropped in the meantime it won’t delete it
[11:12:39] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mw-on-k8s: Move php.envvars to mediawiki-common [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051711 (https://phabricator.wikimedia.org/T365265) (owner: 10Clément Goubert)
[11:14:13] <wikibugs>	 (03Merged) 10jenkins-bot: mw-on-k8s: Move php.envvars to mediawiki-common [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051711 (https://phabricator.wikimedia.org/T365265) (owner: 10Clément Goubert)
[11:15:24] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[11:15:47] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[11:16:55] <logmsgbot>	 !log cgoubert@deploy1002 Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
[11:16:58] <stashbot>	 T365265: Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265
[11:18:32] <wikibugs>	 (03PS1) 10Btullis: cephcsi: Grant elevated privileges to the driver-registrar container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051732 (https://phabricator.wikimedia.org/T327259)
[11:19:54] <wikibugs>	 (03PS2) 10Btullis: cephcsi: Grant elevated privileges to the driver-registrar container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051732 (https://phabricator.wikimedia.org/T327259)
[11:20:11] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add an-conf100[4-6] role and partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/1051720 (https://phabricator.wikimedia.org/T364429) (owner: 10Btullis)
[11:20:22] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9948995 (10Ladsgroup) The explain: ` *************************** 1. row ******...
[11:21:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
[11:21:45] <logmsgbot>	 !log cgoubert@deploy1002 Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
[11:24:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
[11:27:09] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[11:27:22] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[11:27:29] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
[11:27:32] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[11:27:45] <wikibugs>	 (03CR) 10Vgutierrez: benthos:cache: encode problematic fields as b64url (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[11:31:56] <wikibugs>	 (03CR) 10Btullis: [C:03+2] OpenJDK: build JDK/JRE 17 production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051679 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:31:59] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] OpenJDK: build JDK/JRE 17 production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051679 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:32:57] <Amir1>	 jouncebot: nowandnext
[11:32:57] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 27 minute(s)
[11:32:57] <jouncebot>	 In 1 hour(s) and 27 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1300)
[11:33:04] <wikibugs>	 (03PS2) 10VolkerE: Optimize static footer 'a Wikimedia project' icon further [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1047521 (https://phabricator.wikimedia.org/T256190)
[11:33:12] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Optimize static footer 'a Wikimedia project' icon further [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1047521 (https://phabricator.wikimedia.org/T256190) (owner: 10VolkerE)
[11:33:56] <wikibugs>	 (03Merged) 10jenkins-bot: Optimize static footer 'a Wikimedia project' icon further [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1047521 (https://phabricator.wikimedia.org/T256190) (owner: 10VolkerE)
[11:34:48] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add an-conf100[4-6] role and partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/1051720 (https://phabricator.wikimedia.org/T364429) (owner: 10Btullis)
[11:35:35] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap sync-world: Backport for [[gerrit:1047521|Optimize static footer 'a Wikimedia project' icon further (T256190)]]
[11:35:38] <stashbot>	 T256190: Update footer image links on all MediaWiki skins to be legible and accessible - https://phabricator.wikimedia.org/T256190
[11:36:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
[11:38:15] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949035 (10BTullis) a:05BTullis→03Jclark-ctr Hi @Jclark-ctr - apologies for the delay. I've updated the required files, so please feel free to reimag...
[11:38:17] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949037 (10BTullis)
[11:38:53] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047) (owner: 10Anzx)
[11:39:17] <logmsgbot>	 !log ladsgroup@deploy1002 volker-e, ladsgroup: Backport for [[gerrit:1047521|Optimize static footer 'a Wikimedia project' icon further (T256190)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:39:33] <wikibugs>	 (03PS2) 10Anzx: kawikisource: create author namespace, add namespace aliases and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243)
[11:39:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
[11:40:01] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243) (owner: 10Anzx)
[11:40:02] <logmsgbot>	 !log ladsgroup@deploy1002 volker-e, ladsgroup: Continuing with sync
[11:42:57] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Set dbprov[12]00[12] to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509)
[11:44:41] <wikibugs>	 (03CR) 10Jcrespo: "This requires a follow up patch to clean up ip grants from those servers and a careful deploy, but wanted at least to be aware of this." [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[11:45:03] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:1047521|Optimize static footer 'a Wikimedia project' icon further (T256190)]] (duration: 09m 28s)
[11:45:06] <stashbot>	 T256190: Update footer image links on all MediaWiki skins to be legible and accessible - https://phabricator.wikimedia.org/T256190
[11:45:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1038785 (https://phabricator.wikimedia.org/T363839) (owner: 10Ladsgroup)
[11:46:32] <wikibugs>	 (03Merged) 10jenkins-bot: rpc: Update function call in RunSingleJob [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1038785 (https://phabricator.wikimedia.org/T363839) (owner: 10Ladsgroup)
[11:47:02] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap sync-world: Backport for [[gerrit:1038785|rpc: Update function call in RunSingleJob (T363839)]]
[11:47:04] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Set dbprov[12]00[12] to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509)
[11:47:05] <stashbot>	 T363839: Remove old/unused/internal methods in rdbms library from the public APIs - https://phabricator.wikimedia.org/T363839
[11:48:02] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690)
[11:48:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T368766#9949051 (10phaultfinder)
[11:48:16] <wikibugs>	 (03PS1) 10Brouberol: OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461)
[11:48:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[11:49:56] <logmsgbot>	 !log ladsgroup@deploy1002 ladsgroup: Backport for [[gerrit:1038785|rpc: Update function call in RunSingleJob (T363839)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:50:01] <logmsgbot>	 !log ladsgroup@deploy1002 ladsgroup: Continuing with sync
[11:50:29] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: use MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051738 (https://phabricator.wikimedia.org/T368875)
[11:50:41] <wikibugs>	 (03CR) 10Btullis: OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:51:32] <wikibugs>	 (03PS2) 10Brouberol: OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461)
[11:51:42] <wikibugs>	 (03CR) 10Brouberol: OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:51:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
[11:51:51] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[11:51:53] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[11:52:04] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[11:52:10] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690)
[11:52:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
[11:52:36] <wikibugs>	 (03CR) 10Btullis: [C:03+2] OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:52:38] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] OpenJRE 17: prevent the openjdk-jre-headless post-inst step from crashing [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051737 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[11:54:33] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051449 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[11:54:48] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mesh: use namespace for default service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051450 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[11:55:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
[11:55:11] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:1038785|rpc: Update function call in RunSingleJob (T363839)]] (duration: 08m 08s)
[11:55:13] <stashbot>	 T363839: Remove old/unused/internal methods in rdbms library from the public APIs - https://phabricator.wikimedia.org/T363839
[11:55:35] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366)
[11:56:04] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Bump mediawiki chart version & mesh version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051453 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[11:56:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[11:59:59] <wikibugs>	 (03PS5) 10Fabfur: benthos:cache: encode problematic fields as b64url [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718)
[12:01:12] <wikibugs>	 (03CR) 10Fabfur: benthos:cache: encode problematic fields as b64url (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[12:06:03] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366)
[12:06:51] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Disable es read-only backups and reenable rw ones [puppet] - 10https://gerrit.wikimedia.org/r/1051744 (https://phabricator.wikimedia.org/T363812)
[12:07:22] <wikibugs>	 (03CR) 10Jcrespo: "FYI" [puppet] - 10https://gerrit.wikimedia.org/r/1051744 (https://phabricator.wikimedia.org/T363812) (owner: 10Jcrespo)
[12:07:55] <wikibugs>	 (03PS3) 10Effie Mouzeli: mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690)
[12:08:21] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[12:08:40] <wikibugs>	 (03CR) 10Effie Mouzeli: mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[12:08:48] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[12:09:17] <wikibugs>	 (03CR) 10Volans: "Will this leave processes on the hosts running that might fail and are not managed anymore by Puppet?" [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[12:09:41] <wikibugs>	 (03Merged) 10jenkins-bot: mw-mcrouter: bump number of proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051736 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[12:09:56] <wikibugs>	 (03PS7) 10Gergő Tisza: varnish: Copy value of X-Wikimedia-Debug cookie to header [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094)
[12:10:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
[12:11:29] <wikibugs>	 (03PS3) 10Effie Mouzeli: mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366)
[12:11:46] <wikibugs>	 (03PS4) 10Effie Mouzeli: mw-mcouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051740 (https://phabricator.wikimedia.org/T368366)
[12:11:54] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1454 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[12:12:31] <effie>	 jouncebot: now
[12:12:31] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 47 minute(s)
[12:12:32] <wikibugs>	 (03PS1) 10Brouberol: OpenJDK17: sync both the openjdk-{jdk,jre} debian version [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051745 (https://phabricator.wikimedia.org/T363461)
[12:12:35] <effie>	 jouncebot: next
[12:12:36] <jouncebot>	 In 0 hour(s) and 47 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1300)
[12:13:06] <wikibugs>	 (03CR) 10Btullis: [C:03+2] OpenJDK17: sync both the openjdk-{jdk,jre} debian version [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051745 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[12:13:08] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] OpenJDK17: sync both the openjdk-{jdk,jre} debian version [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051745 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[12:13:38] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "I will make sure it doesn't, by disabling them manually + deleting the existing passwords. If it was important, I would disable and delete" [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[12:14:14] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] "it looks like tests benthos tests need to be updated as well" [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[12:15:03] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "Note also dbprovs are for the most part "temporary storage" (glorified disk space), other than the backups they shouldn't have any importa" [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[12:15:48] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Disable es read-only backups and reenable rw ones [puppet] - 10https://gerrit.wikimedia.org/r/1051744 (https://phabricator.wikimedia.org/T363812) (owner: 10Jcrespo)
[12:16:12] <wikibugs>	 (03PS1) 10Ayounsi: Add public1-virtual-codfw PTR [dns] - 10https://gerrit.wikimedia.org/r/1051746 (https://phabricator.wikimedia.org/T362330)
[12:17:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add public1-virtual-codfw PTR [dns] - 10https://gerrit.wikimedia.org/r/1051746 (https://phabricator.wikimedia.org/T362330) (owner: 10Ayounsi)
[12:17:23] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
[12:20:56] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1015 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 320.14 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:21:28] <wikibugs>	 (03CR) 10Volans: "Sure, but if CI was able to run 3.12 it would fail. Same on any local checkout. Hence my reluctance to commit to the repo something that i" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[12:22:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:23:31] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9949203 (10dcaro) I'll try adding the `sdc` drive to `cloudcephosd1034`, that should force it to get populated with...
[12:23:42] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, though hard to judge accurately until the heartbeat metrics are in prometheus" [alerts] - 10https://gerrit.wikimedia.org/r/1047983 (https://phabricator.wikimedia.org/T367278) (owner: 10Arnaudb)
[12:23:48] <claime>	 effie: ^
[12:23:53] <claime>	 expected?
[12:24:08] <effie>	 yes, it it still rollin y out
[12:24:14] <claime>	 ack
[12:24:30] <wikibugs>	 (03CR) 10Volans: Spicerack: fix Netbox 4 breaking changes (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[12:24:32] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9949207 (10dcaro) Current error counters (before adding `sdc`): ` root@cloudcephosd1034:~# for i in /dev/sd?; do ech...
[12:24:34] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1454 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[12:25:47] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949210 (10Ladsgroup) The prefetch has been done now so these are causing issu...
[12:27:15] <jinxer-wm>	 FIRING: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:27:56] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on clouddb1015 is OK: OK slave_sql_lag Replication lag: 0.34 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:29:47] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-mcrouter/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-mcrouter - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[12:29:56] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): PropertyValueExpertsModule: Turn on enableModuleContentVersion() [extensions/Wikibase] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051748 (https://phabricator.wikimedia.org/T369155)
[12:30:19] <effie>	 sigh, 199 out of 214 new pods have been updated, it shouldnt complain 
[12:30:49] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
[12:30:56] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/Wikibase] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051748 (https://phabricator.wikimedia.org/T369155) (owner: 10Lucas Werkmeister (WMDE))
[12:31:35] <wikibugs>	 (03PS1) 10Kosta Harlan: GlobalRenameQueue: Fix issues with wiki ID and row query [extensions/CentralAuth] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051749 (https://phabricator.wikimedia.org/T369147)
[12:31:59] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/CentralAuth] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051749 (https://phabricator.wikimedia.org/T369147) (owner: 10Kosta Harlan)
[12:32:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037587 (owner: 10DCausse)
[12:33:02] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037783 (owner: 10DCausse)
[12:33:54] <wikibugs>	 (03PS2) 10Ayounsi: Add public1-virtual-codfw PTR [dns] - 10https://gerrit.wikimedia.org/r/1051746 (https://phabricator.wikimedia.org/T362330)
[12:34:20] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] START helmfile.d/services/thumbor: sync
[12:34:47] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release mw-mcrouter/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-mcrouter - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[12:36:02] <cdanis>	 jouncebot: nowandnext
[12:36:02] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 23 minute(s)
[12:36:02] <jouncebot>	 In 0 hour(s) and 23 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1300)
[12:36:36] <cdanis>	 hm that's a busy patch window, I'll wait
[12:36:48] <Lucas_WMDE>	 yeah, pretty full
[12:37:57] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] DONE helmfile.d/services/thumbor: sync
[12:38:35] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] admin: add approvers to group analytics-research-admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1049239 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[12:39:47] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
[12:39:49] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
[12:42:43] <wikibugs>	 (03Abandoned) 10Wargo: Set logo and favicon for sysop_plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051469 (https://phabricator.wikimedia.org/T368712) (owner: 10Wargo)
[12:42:53] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] mswikisource: create author and translation namespaces and add namespace aliases  [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047) (owner: 10Anzx)
[12:43:36] <wikibugs>	 (03PS1) 10Brouberol: OpenJDK17: fix typos in the changelog [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051753 (https://phabricator.wikimedia.org/T363461)
[12:44:49] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] kawikisource: create author namespace, add namespace aliases and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243) (owner: 10Anzx)
[12:45:05] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: add approvers to analytics-research-admins - https://phabricator.wikimedia.org/T368435#9949268 (10Dzahn) @Miriam Would you be ok with becoming a formal "group approver" for the group "analytics-research-admins"?  That would mean we'd ask...
[12:47:26] <wikibugs>	 (03CR) 10Jgiannelos: [C:04-1] Remove page html endpoints from changeprop (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051361 (https://phabricator.wikimedia.org/T367418) (owner: 10Jgiannelos)
[12:47:37] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[12:47:44] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "The email address has been provided now:  andrew.green@extern.wikimedia.de" [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[12:48:32] <wikibugs>	 (03PS5) 10Anzx: mswikisource: create author and translation namespaces and add namespace aliases  [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047)
[12:48:46] <wikibugs>	 (03PS3) 10Anzx: kawikisource: create author namespace, add namespace aliases and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243)
[12:49:00] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:49:46] <wikibugs>	 (03PS4) 10Dzahn: admin: Extend access for AndyRussG [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[12:50:02] <wikibugs>	 (03CR) 10Dzahn: admin: Extend access for AndyRussG (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[12:50:37] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] admin: Extend access for AndyRussG [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[12:51:02] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9949324 (10Dzahn) @WMDECyn Yes, it was still needed. Thank you!   I updated https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047473
[12:51:08] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9949326 (10Dzahn) 05Stalled→03In progress
[12:51:36] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "kicking off gate-and-submit ahead of backport window" [extensions/Wikibase] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051748 (https://phabricator.wikimedia.org/T369155) (owner: 10Lucas Werkmeister (WMDE))
[12:51:44] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9949330 (10Dzahn) a:05AndyRussG→03None
[12:52:00] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[12:59:20] <wikibugs>	 (03CR) 10Klausman: "I'd prefer going with Go 1.22 (I added the image for that a week or two back). Go has strong compatibility guarantees: code building and w" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359) (owner: 10Elukey)
[13:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1300).
[13:00:05] <jouncebot>	 anzx, Lucas_WMDE, kostajh, and dcausse: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:07] <anzx>	 o/
[13:00:11] <Lucas_WMDE>	 o/
[13:00:11] <kostajh>	 hi
[13:00:13] <Lucas_WMDE>	 I can deploy!
[13:00:14] <dcausse>	 o/
[13:00:17] <dcausse>	 thx!
[13:00:17] <kostajh>	 thanks Lucas_WMDE 
[13:00:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047) (owner: 10Anzx)
[13:00:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243) (owner: 10Anzx)
[13:00:38] <Lucas_WMDE>	 let’s start with anzx 
[13:00:42] <kostajh>	 Lucas_WMDE: urbanecm will verify the CentralAuth patch
[13:00:45] <wikibugs>	 (03CR) 10CDanis: [C:03+2] CHANGELOG for configuration 1.8.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051412 (https://phabricator.wikimedia.org/T362310) (owner: 10CDanis)
[13:00:47] <wikibugs>	 (03CR) 10CDanis: [C:03+2] copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051449 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[13:00:50] <Lucas_WMDE>	 ack
[13:00:51] * urbanecm waves
[13:00:54] <wikibugs>	 (03CR) 10CDanis: [C:03+2] mesh: use namespace for default service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051450 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[13:01:03] <wikibugs>	 (03CR) 10CDanis: [C:04-2] DO NOT SUBMIT, testing mesh change against mediawiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051466 (owner: 10CDanis)
[13:01:08] <Lucas_WMDE>	 dcausse: hi! could you maybe take a quick look at https://phabricator.wikimedia.org/T369149?
[13:01:14] <wikibugs>	 (03Merged) 10jenkins-bot: mswikisource: create author and translation namespaces and add namespace aliases  [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051503 (https://phabricator.wikimedia.org/T369047) (owner: 10Anzx)
[13:01:15] <Lucas_WMDE>	 mainly in case I shouldn’t run the maintenance script ^^
[13:01:18] <wikibugs>	 (03Merged) 10jenkins-bot: kawikisource: create author namespace, add namespace aliases and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051733 (https://phabricator.wikimedia.org/T363243) (owner: 10Anzx)
[13:01:26] <dcausse>	 Lucas_WMDE: looking
[13:01:29] <Lucas_WMDE>	 thanks
[13:01:39] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG for configuration 1.8.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051412 (https://phabricator.wikimedia.org/T362310) (owner: 10CDanis)
[13:01:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051503|mswikisource: create author and translation namespaces and add namespace aliases  (T369047)]], [[gerrit:1051733|kawikisource: create author namespace, add namespace aliases and sitename (T363243)]]
[13:01:51] <stashbot>	 T369047: Configure the namespaces on Malay Wikisource - https://phabricator.wikimedia.org/T369047
[13:01:52] <stashbot>	 T363243: Post-creation work for kawikisource - https://phabricator.wikimedia.org/T363243
[13:02:06] <Lucas_WMDE>	 (that’s a property with a new datatype that we enabled yesterday, so it’s possible some code is erroring out about it… but I didn’t see anything in logstash)
[13:02:07] <wikibugs>	 (03Merged) 10jenkins-bot: copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051449 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[13:02:08] <wikibugs>	 (03Merged) 10jenkins-bot: mesh: use namespace for default service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051450 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[13:02:39] <dcausse>	 Lucas_WMDE: ForceSearchIndex might not work... I'll dig into it
[13:02:50] <Lucas_WMDE>	 alright, then I’ll skip that for now
[13:02:51] <Lucas_WMDE>	 thanks!
[13:04:31] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for [[gerrit:1051503|mswikisource: create author and translation namespaces and add namespace aliases  (T369047)]], [[gerrit:1051733|kawikisource: create author namespace, add namespace aliases and sitename (T363243)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:04:35] <wikibugs>	 (03CR) 10Elukey: "Sure, feel free to amend the patch with go 1.22, I'd be more conservative but if you feel strong about it I am +1." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359) (owner: 10Elukey)
[13:04:36] <anzx>	 Lucas_WMDE: checking
[13:04:38] <Lucas_WMDE>	 thanks!
[13:05:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Request access to servers Dcops group - https://phabricator.wikimedia.org/T360356#9949479 (10elukey) Sorry for the late reply, I had a chat with Willy and with the I/F team, this is our proposal:  * We create a new POSIX group for `dcops` that gets deployed to all productio...
[13:06:23] <Lucas_WMDE>	 hm, I just realized that the namespace (alias) Perbualan Wikisource now completely vanished from mswikisource
[13:06:24] <anzx>	 Lucas_WMDE: look good to me
[13:06:27] <Lucas_WMDE>	 (it has Perbualan Wikisumber instead)
[13:06:29] <Lucas_WMDE>	 is that okay?
[13:06:51] <wikibugs>	 (03PS6) 10Fabfur: benthos:cache: encode problematic fields as b64url [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718)
[13:07:11] <wikibugs>	 (03Merged) 10jenkins-bot: PropertyValueExpertsModule: Turn on enableModuleContentVersion() [extensions/Wikibase] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051748 (https://phabricator.wikimedia.org/T369155) (owner: 10Lucas Werkmeister (WMDE))
[13:07:17] <Lucas_WMDE>	 eh, it’s a pretty new wiki, probably okay
[13:07:19] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
[13:07:56] <wikibugs>	 06SRE, 06cloud-services-team, 10Data-Services: [wikireplicas] Make sure there is no sensitive data in clouddb hosts - https://phabricator.wikimedia.org/T368136#9949511 (10Ladsgroup) >>! In T368136#9935249, @fnegri wrote: > Can we somehow remove the data that is currently filtered at the view layer, and inste...
[13:08:51] <anzx>	  Lucas_WMDE: Ok , I thought it would get fixed through namespacesdupes.php or I can add perbulan wikisumber as namespace alias
[13:09:04] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations: Pairing tool for new SREs using sudo under supervision - https://phabricator.wikimedia.org/T299989#9949514 (10elukey) To keep archives happy: T360356#9949479  We filed a proposal to basically implement sudo_pair "socially", as starting experiment. While at it...
[13:09:15] <Lucas_WMDE>	 I’ll run namespaceDupes once the deployment is done
[13:12:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051503|mswikisource: create author and translation namespaces and add namespace aliases  (T369047)]], [[gerrit:1051733|kawikisource: create author namespace, add namespace aliases and sitename (T363243)]] (duration: 10m 39s)
[13:12:31] <stashbot>	 T369047: Configure the namespaces on Malay Wikisource - https://phabricator.wikimedia.org/T369047
[13:12:31] <stashbot>	 T363243: Post-creation work for kawikisource - https://phabricator.wikimedia.org/T363243
[13:12:49] <anzx>	 Lucas_WMDE: there were no pages left on that namespace, probably no need for adding alias
[13:13:05] <Lucas_WMDE>	 “TypeError: 'NoneType' object is not iterable” meh
[13:13:10] <Lucas_WMDE>	 let’s try non-k8s mwscript then
[13:13:25] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[13:14:25] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
[13:14:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:46] <wikibugs>	 (03CR) 10Bking: "Thanks for the tip, I didn't know that was an option. Will start checking this out." [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405) (owner: 10Bking)
[13:15:50] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
[13:15:50] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
[13:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:41] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
[13:16:42] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:17:06] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]]
[13:17:08] <stashbot>	 T369155: New data type not available for all users after being enabled - https://phabricator.wikimedia.org/T369155
[13:17:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
[13:17:18] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[13:17:21] <anzx>	 Lucas_WMDE: thanks for deployment 
[13:17:26] <Lucas_WMDE>	 np :)
[13:17:30] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
[13:17:33] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
[13:17:56] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
[13:17:59] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
[13:18:00] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] benthos:cache: encode problematic fields as b64url [puppet] - 10https://gerrit.wikimedia.org/r/1051198 (https://phabricator.wikimedia.org/T365718) (owner: 10Fabfur)
[13:18:10] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
[13:18:20] <wikibugs>	 06SRE, 06serviceops, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9949542 (10xcollazo) >>! In T361835#9947951, @SGupta-WMF wrote: > @xcollazo Th...
[13:18:25] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
[13:19:20] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/CentralAuth] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051749 (https://phabricator.wikimedia.org/T369147) (owner: 10Kosta Harlan)
[13:19:43] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
[13:19:43] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:19:49] <Lucas_WMDE>	 testing
[13:20:22] <Lucas_WMDE>	 seems to work as far as I can tell
[13:20:24] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
[13:20:54] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
[13:22:45] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
[13:22:53] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye
[13:23:23] <wikibugs>	 (03PS1) 10Kgraessle: Remove QuickSurvey coverage rate for Automoderator patroller workstream survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969)
[13:23:24] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949564 (10Jclark-ctr)
[13:23:54] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, and 2 others: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9949559 (10elukey) Proposed plan:  * In T368023 we move the private repo to puppetserver1001, and we add a git pre-commit hook config to t...
[13:24:10] <wikibugs>	 (03PS6) 10Elukey: role::puppetserver: skip puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023)
[13:24:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949567 (10Jclark-ctr) a:03Jclark-ctr
[13:24:21] <wikibugs>	 (03PS7) 10Elukey: role::puppetserver: skip puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023)
[13:24:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] role::puppetserver: skip puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023) (owner: 10Elukey)
[13:25:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] (duration: 08m 20s)
[13:25:29] <stashbot>	 T369155: New data type not available for all users after being enabled - https://phabricator.wikimedia.org/T369155
[13:25:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051749 (https://phabricator.wikimedia.org/T369147) (owner: 10Kosta Harlan)
[13:26:21] <wikibugs>	 (03PS2) 10Vgutierrez: varnish: Fix text/02-frontend-headers.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162)
[13:26:21] <wikibugs>	 (03CR) 10Vgutierrez: "tests are happy:" [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162) (owner: 10Vgutierrez)
[13:26:26] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949572 (10xcollazo) >>! In T368098#9949210, @Ladsgroup wrote: > The prefetch...
[13:27:33] <wikibugs>	 (03PS8) 10Elukey: role::puppetserver: skip puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023)
[13:28:06] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949574 (10xcollazo) Ok I am going to postpone re-enabling the Commons RDF/JSO...
[13:28:14] <wikibugs>	 (03Merged) 10jenkins-bot: GlobalRenameQueue: Fix issues with wiki ID and row query [extensions/CentralAuth] (wmf/1.43.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1051749 (https://phabricator.wikimedia.org/T369147) (owner: 10Kosta Harlan)
[13:28:44] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]]
[13:28:47] <stashbot>	 T369147: GlobalRenameQueue shows internal error Wikimedia\Assert\PreconditionException when opening requests - https://phabricator.wikimedia.org/T369147
[13:29:09] <wikibugs>	 (03PS9) 10Elukey: role::puppetserver: skip puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023)
[13:29:34] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
[13:29:35] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
[13:29:36] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
[13:29:51] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949577 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm
[13:29:52] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949578 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm
[13:29:58] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949579 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm
[13:30:04] <effie>	 jouncebot: now
[13:30:04] <jouncebot>	 For the next 0 hour(s) and 29 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1300)
[13:30:09] <wikibugs>	 (03CR) 10CDanis: [C:03+1] "lgtm, thank you" [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162) (owner: 10Vgutierrez)
[13:30:37] <elukey>	 /14
[13:30:39] <elukey>	 err :)
[13:31:14] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (NOOP 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023) (owner: 10Elukey)
[13:31:21] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:31:42] <Lucas_WMDE>	 urbanecm: can you test the CentralAuth change?
[13:32:27] <urbanecm>	 Lucas_WMDE: sure
[13:32:55] <wikibugs>	 (03PS6) 10DCausse: noc: fail with a 404 when the selected wiki is nonexistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037587
[13:32:59] <urbanecm>	 Lucas_WMDE: it works
[13:33:00] <wikibugs>	 (03PS1) 10Superpes15: [sysop_plwiki] Change the logo/icon and the favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051757 (https://phabricator.wikimedia.org/T368712)
[13:33:02] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
[13:33:05] <Lucas_WMDE>	 great, thanks!
[13:33:08] <wikibugs>	 (03Abandoned) 10Jforrester: wikifunctions: Reduce helm deploy timeout from 600s default to 120s [deployment-charts] - 10https://gerrit.wikimedia.org/r/975873 (owner: 10Jforrester)
[13:33:14] <wikibugs>	 (03PS2) 10DCausse: CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037783
[13:33:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [sysop_plwiki] Change the logo/icon and the favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051757 (https://phabricator.wikimedia.org/T368712) (owner: 10Superpes15)
[13:34:02] <kostajh>	 thank you urbanecm 
[13:34:07] <urbanecm>	 np
[13:34:30] <wikibugs>	 (03CR) 10Btullis: [C:03+2] OpenJDK17: fix typos in the changelog [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051753 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[13:34:31] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] OpenJDK17: fix typos in the changelog [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051753 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[13:34:46] <wikibugs>	 (03PS1) 10Kgraessle: Remove QuickSurvey coverage rate for Automoderator patroller workstream survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969)
[13:35:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, July 03 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969) (owner: 10Kgraessle)
[13:35:52] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T368766#9949603 (10Eevans) >>! In T368766#9935779, @VRiley-WMF wrote: > Not that I'm aware of. I used the same cable for everything. @Eevans would you happen to know if the IP address changed on this?  @VRiley-WMF when the ma...
[13:36:04] <wikibugs>	 (03PS2) 10Superpes15: [sysop_plwiki] Change the logo/icon and the favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051757 (https://phabricator.wikimedia.org/T368712)
[13:36:42] <wikibugs>	 (03PS2) 10Kgraessle: Remove QuickSurvey for Automoderator patroller workstream survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969)
[13:38:13] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] (duration: 09m 28s)
[13:38:16] <stashbot>	 T369147: GlobalRenameQueue shows internal error Wikimedia\Assert\PreconditionException when opening requests - https://phabricator.wikimedia.org/T369147
[13:38:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037587 (owner: 10DCausse)
[13:38:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037783 (owner: 10DCausse)
[13:39:14] <wikibugs>	 (03Merged) 10jenkins-bot: noc: fail with a 404 when the selected wiki is nonexistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037587 (owner: 10DCausse)
[13:39:16] <wikibugs>	 (03Merged) 10jenkins-bot: CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037783 (owner: 10DCausse)
[13:39:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]]
[13:40:00] <wikibugs>	 06SRE, 10Observability-Metrics: statsd-exporter in k8s is not configured to use its mapping configuration - https://phabricator.wikimedia.org/T369080#9949616 (10colewhite) 05Open→03Resolved a:03Joe
[13:42:29] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:42:37] <Lucas_WMDE>	 dcausse: can the wgCirrusSearchIndexFieldsToCleanup change be tested on mwdebug?
[13:42:43] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949632 (10JMeybohm)
[13:42:52] <Lucas_WMDE>	 (the other change already seems to be live, I can see the new message at https://noc.wikimedia.org/wiki.php?wiki=foobar)
[13:43:05] <dcausse>	 Lucas_WMDE: the noc change seems good, the wgCirrusSearchIndexFieldsToCleanup change can't be tested
[13:43:11] <dcausse>	 yes
[13:43:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
[13:43:14] <Lucas_WMDE>	 alright :)
[13:43:18] <Lucas_WMDE>	 thanks!
[13:43:18] <dcausse>	 thanks! :)
[13:44:04] <jayme>	 !log draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
[13:44:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:08] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[13:48:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] (duration: 08m 38s)
[13:48:49] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[13:48:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:50] * Lucas_WMDE done
[13:49:00] <Lucas_WMDE>	 cc effie you pinged jouncebot earlier ^^
[13:49:03] <dcausse>	 thanks for the deploy!
[13:49:21] <Lucas_WMDE>	 I’m shocked, six patches deployed and we finished *before* time :D
[13:49:29] <Lucas_WMDE>	 two of them backports even
[13:49:45] <Lucas_WMDE>	 kicking bare-metal (almost fully) out of the deploy really sped it up
[13:50:35] <wikibugs>	 06SRE, 06cloud-services-team, 10Data-Services: [wikireplicas] Make sure there is no sensitive data in clouddb hosts - https://phabricator.wikimedia.org/T368136#9949639 (10fnegri) 05Open→03Declined > I highly doubt it'd be possible honestly for everything.  I tend to agree, I underestimated the amount...
[13:51:22] <wikibugs>	 (03PS1) 10Ssingh: dnsbox and Wikimedia DNS: revert usage of LE's alternate chain [puppet] - 10https://gerrit.wikimedia.org/r/1051759
[13:52:23] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3160/co" [puppet] - 10https://gerrit.wikimedia.org/r/1051759 (owner: 10Ssingh)
[13:52:47] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
[13:53:02] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
[13:53:17] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949642 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c8dbb89d-640c-4078-bc10-bbbe9c30f3ef) set by cmooney...
[13:55:44] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
[13:55:54] <icinga-wm>	 RECOVERY - Disk space on restbase2023 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=restbase2023&var-datasource=codfw+prometheus/ops
[13:56:00] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
[13:56:12] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949650 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=753739a5-e1fb-44b6-9174-f7b3a8c4b73b) set by jayme@c...
[13:56:45] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
[13:56:48] <stashbot>	 T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977
[13:56:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
[13:57:43] <logmsgbot>	 !log jayme@cumin1002 conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
[13:58:15] <wikibugs>	 (03CR) 10CDanis: [C:03+1] "I'd like it a little better if running puppet-merge on a puppetserver gave you a helpful error message instead of just command not found, " [puppet] - 10https://gerrit.wikimedia.org/r/1050607 (https://phabricator.wikimedia.org/T368023) (owner: 10Elukey)
[13:58:30] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
[13:58:47] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
[13:58:48] <wikibugs>	 (03CR) 10Herron: [C:03+1] admin: add new ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1051421 (owner: 10Cwhite)
[13:58:55] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949656 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=185956f6-b0e6-4a89-9e32-6a8223f5678e) set by cmooney...
[13:59:13] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
[13:59:21] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
[14:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1400)
[14:00:05] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949655 (10JMeybohm) !log jayme@cumin1002 conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikik...
[14:00:54] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
[14:01:03] <wikibugs>	 (03PS5) 10Jgiannelos: Remove page html endpoints from changeprop [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051361 (https://phabricator.wikimedia.org/T367418)
[14:01:13] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
[14:01:25] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949662 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=11036a9f-0b48-4b07-9e63-571b4f67c201) set by cmooney...
[14:03:05] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Nice catch and fix." [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162) (owner: 10Vgutierrez)
[14:03:25] <wikibugs>	 (03CR) 10Jgiannelos: Remove page html endpoints from changeprop (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051361 (https://phabricator.wikimedia.org/T367418) (owner: 10Jgiannelos)
[14:04:09] <wikibugs>	 (03PS3) 10Vgutierrez: varnish: Fix text/02-frontend-headers.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162)
[14:04:33] <topranks>	 !log rebooting lsw1-e2-eqiad to install updated JunOS version T365994
[14:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:36] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[14:04:37] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] varnish: Fix text/02-frontend-headers.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162) (owner: 10Vgutierrez)
[14:06:20] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] benthos/mw_accesslog_metrics: Add buffer [puppet] - 10https://gerrit.wikimedia.org/r/1051415 (https://phabricator.wikimedia.org/T367076) (owner: 10Kamila Součková)
[14:07:38] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
[14:07:52] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949685 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye executed...
[14:08:47] <logmsgbot>	 !log klausman@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
[14:09:04] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
[14:09:07] <logmsgbot>	 !log klausman@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
[14:09:33] <logmsgbot>	 !log klausman@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
[14:10:01] <effie>	 Lucas_WMDE: than you 
[14:10:10] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
[14:10:34] <jinxer-wm>	 FIRING: [3x] ProbeDown: Service aqs1020-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:10:58] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s1 on db1154 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl2024@db1196.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 100000 message: Cant connect to server on db1196.eqiad.wmnet (110 Connection timed out) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:11:00] <logmsgbot>	 !log klausman@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
[14:11:17] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949691 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye
[14:11:56] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] varnish: Fix text/02-frontend-headers.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1051750 (https://phabricator.wikimedia.org/T369162) (owner: 10Vgutierrez)
[14:14:16] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service aqs1020-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:14:48] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-mcrouter: bump eqiad proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051762 (https://phabricator.wikimedia.org/T346690)
[14:15:18] <arnaudb>	 ah I forgot about that
[14:15:23] <arnaudb>	 fixing, sorry for the alert
[14:16:50] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
[14:16:53] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[14:17:03] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
[14:17:36] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 617.25 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:17:42] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
[14:17:58] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
[14:18:03] <jinxer-wm>	 FIRING: [2x] KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster jumbo-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration  - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[14:18:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
[14:18:30] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[14:18:39] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
[14:21:39] <logmsgbot>	 !log jayme@cumin1002 conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
[14:22:09] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949750 (10cmooney) Switch is back up, all looks good at first glance from the network side.
[14:22:19] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690)
[14:22:33] <wikibugs>	 (03PS2) 10Ssingh: dnsbox and Wikimedia DNS: revert usage of LE's alternate chain [puppet] - 10https://gerrit.wikimedia.org/r/1051759
[14:22:46] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-mcrouter: bump eqiad proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051762 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:22:58] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s1 on db1154 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:23:31] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3162/co" [puppet] - 10https://gerrit.wikimedia.org/r/1051759 (owner: 10Ssingh)
[14:23:35] <wikibugs>	 (03Merged) 10jenkins-bot: mw-mcrouter: bump eqiad proxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051762 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:24:08] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1051759 (owner: 10Ssingh)
[14:24:16] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service aqs1020-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:24:36] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1021 is OK: OK slave_sql_lag Replication lag: 0.06 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:25:11] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949772 (10ABran-WMF) db hosts as well, repooling
[14:25:17] <logmsgbot>	 !log klausman@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
[14:25:41] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
[14:25:44] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[14:25:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
[14:26:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] admin: add new ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1051421 (owner: 10Cwhite)
[14:26:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
[14:26:31] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] dnsbox and Wikimedia DNS: revert usage of LE's alternate chain [puppet] - 10https://gerrit.wikimedia.org/r/1051759 (owner: 10Ssingh)
[14:26:54] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690)
[14:27:03] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
[14:27:11] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
[14:28:03] <jinxer-wm>	 RESOLVED: [2x] KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster jumbo-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration  - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[14:30:04] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[14:30:59] <logmsgbot>	 !log jayme@cumin1002 conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
[14:32:24] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
[14:32:25] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
[14:32:31] <wikibugs>	 (03CR) 10Vgutierrez: "please could you rebase this change on top of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051750 and adjust the 02-frontend-head" [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza)
[14:32:45] <sukhe>	 !log sudo cumin "A:wikidough" "run-puppet-agent"
[14:32:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:02] <sukhe>	 !log sudo cumin "A:dnsbox" "run-puppet-agent"
[14:33:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:13] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949834 (10JMeybohm) >>! In T365994#9949655, @JMeybohm wrote: > !log jayme@cumin1002 conftool action : set/pooled=no; selector:...
[14:33:20] <wikibugs>	 (03CR) 10Volans: [C:03+1] "Ack, thx for the context" [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[14:33:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
[14:33:46] <wikibugs>	 (03CR) 10Clément Goubert: "LGTM, extremely minor nit that can be fixed when going global." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:33:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:34:35] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:34:58] <wikibugs>	 (03CR) 10Volans: [C:03+2] admin: Extend access for AndyRussG [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[14:35:15] <sukhe>	 !log [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
[14:35:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:56] <wikibugs>	 (03PS3) 10Effie Mouzeli: mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690)
[14:36:09] <wikibugs>	 (03CR) 10Effie Mouzeli: mw-parsoid: enable mcrouter ds (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:36:30] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] admin: add new ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1051421 (owner: 10Cwhite)
[14:37:17] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:38:07] <wikibugs>	 (03Merged) 10jenkins-bot: mw-parsoid: enable mcrouter ds [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[14:38:54] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:38:57] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:39:05] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9949849 (10Volans) 05In progress→03Resolved a:03Volans This should be all done, resolving. Please feel free to re-open it if you encounter...
[14:39:16] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:39:21] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:39:49] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:40:24] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
[14:40:27] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
[14:40:31] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
[14:40:40] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm execute...
[14:40:44] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949856 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm execute...
[14:40:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
[14:40:48] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949857 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm execute...
[14:40:50] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[14:41:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
[14:41:20] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
[14:41:38] <icinga-wm>	 PROBLEM - Druid overlord on druid1009 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.druid.cli.Main server overlord https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[14:41:38] <icinga-wm>	 PROBLEM - Druid coordinator on druid1009 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.druid.cli.Main server coordinator https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[14:45:37] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[14:46:24] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: deployment::rsync: Remove long absented resources [puppet] - 10https://gerrit.wikimedia.org/r/1051772 (https://phabricator.wikimedia.org/T364417)
[14:46:50] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[14:48:15] <wikibugs>	 06SRE, 10MW-on-K8s, 06serviceops, 06Traffic, and 2 others: Turn down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#9949876 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium
[14:48:38] <icinga-wm>	 RECOVERY - Druid overlord on druid1009 is OK: PROCS OK: 1 process with command name java, args org.apache.druid.cli.Main server overlord https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[14:48:38] <icinga-wm>	 RECOVERY - Druid coordinator on druid1009 is OK: PROCS OK: 1 process with command name java, args org.apache.druid.cli.Main server coordinator https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[14:48:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
[14:49:46] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949915 (10Eevans) >>! In T365994#9949750, @cmooney wrote: > Switch is back up, all looks good at first glance from the network...
[14:50:27] <wikibugs>	 06SRE, 10MW-on-K8s, 10Observability-Logging, 06serviceops, 13Patch-For-Review: benthos mw-accesslog-metrics kafka lag and interpolation errors - https://phabricator.wikimedia.org/T367076#9949907 (10kamila) 05Open→03Resolved a:03kamila Increasing batch size slightly improved the situation, very...
[14:50:41] <wikibugs>	 (03PS3) 10Kgraessle: Remove QuickSurvey for Automoderator patroller workstream survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969)
[14:50:54] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:50:57] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9949920 (10AndyRussG) Yaayy thanks so much @Volans, @Dzahn, @kamila, @WMDECyn!
[14:51:20] <fabfur>	 !log start rebooting A:cp-drmrs (upload|text in parallel) for T366555
[14:51:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:28] <logmsgbot>	 !log fabfur@cumin1002 START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
[14:51:31] <logmsgbot>	 !log fabfur@cumin1002 START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
[14:53:30] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: recording rules to monitor [puppet] - 10https://gerrit.wikimedia.org/r/1050376 (https://phabricator.wikimedia.org/T367283)
[14:54:50] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
[14:55:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9949929 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye executed...
[14:55:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
[14:55:55] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[14:56:05] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
[14:56:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
[14:59:16] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:59:20] <wikibugs>	 (03PS5) 10Arnaudb: mariadb: monitoring memory pressure [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280)
[15:00:50] <wikibugs>	 (03CR) 10Arnaudb: mariadb: monitoring memory pressure (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280) (owner: 10Arnaudb)
[15:00:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: monitoring memory pressure [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280) (owner: 10Arnaudb)
[15:00:56] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 36 probes of 794 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:01:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
[15:01:25] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[15:02:53] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: WIP deployment::rsync: Temporarily disable stunnel [puppet] - 10https://gerrit.wikimedia.org/r/1051782 (https://phabricator.wikimedia.org/T364417)
[15:03:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: use MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051738 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[15:03:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
[15:03:51] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
[15:03:52] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[15:04:04] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
[15:04:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
[15:04:55] <wikibugs>	 (03PS6) 10Arnaudb: mariadb: monitoring memory pressure [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280)
[15:05:54] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 34 probes of 794 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:06:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: monitoring memory pressure [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280) (owner: 10Arnaudb)
[15:06:16] <wikibugs>	 (03CR) 10Jsn.sherman: [C:03+1] "thanks for the decommissioning patch!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969) (owner: 10Kgraessle)
[15:08:05] <wikibugs>	 (03PS3) 10Jcrespo: dbbackups: Set dbprov[12]00[12] to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509)
[15:09:13] <wikibugs>	 (03PS1) 10Ahmon Dancy: InitialiseSettings-dev: Disable wmgUseEntitySchema,enable wgShowExceptionDetails [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/1051785
[15:09:24] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] "Attested the accuracy of this in our team meeting." [puppet] - 10https://gerrit.wikimedia.org/r/1051421 (owner: 10Cwhite)
[15:10:19] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+2] InitialiseSettings-dev: Disable wmgUseEntitySchema,enable wgShowExceptionDetails [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/1051785 (owner: 10Ahmon Dancy)
[15:10:32] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.dns.netbox
[15:10:33] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review: Move the private Puppet repository to puppetserver1001 - https://phabricator.wikimedia.org/T368023#9949993 (10elukey) Reporting a summary of various chats with Moritz:  * On `puppetmasterXXXX` (Puppet 5 infra), the authoritat...
[15:10:58] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
[15:11:01] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[15:11:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
[15:11:23] <wikibugs>	 (03Merged) 10jenkins-bot: InitialiseSettings-dev: Disable wmgUseEntitySchema,enable wgShowExceptionDetails [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/1051785 (owner: 10Ahmon Dancy)
[15:11:31] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
[15:11:46] <wikibugs>	 (03PS1) 10Brouberol: datahub-next: upgrade datahub to 0.13.3 (latest version) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051786 (https://phabricator.wikimedia.org/T363461)
[15:12:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] datahub-next: upgrade datahub to 0.13.3 (latest version) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051786 (https://phabricator.wikimedia.org/T363461) (owner: 10Brouberol)
[15:13:23] <wikibugs>	 (03PS7) 10Arnaudb: mariadb: monitoring memory pressure [alerts] - 10https://gerrit.wikimedia.org/r/1049159 (https://phabricator.wikimedia.org/T367280)
[15:13:41] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
[15:14:32] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
[15:14:32] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:14:47] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Set dbprov[12]00[12] to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1051735 (https://phabricator.wikimedia.org/T362509) (owner: 10Jcrespo)
[15:15:25] <wikibugs>	 (03PS2) 10Brouberol: datahub-next: upgrade datahub to 0.13.3 (latest version) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051786 (https://phabricator.wikimedia.org/T363461)
[15:16:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
[15:18:46] <wikibugs>	 (03PS3) 10Brouberol: datahub-next: upgrade datahub to 0.13.3 (latest version) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051786 (https://phabricator.wikimedia.org/T363461)
[15:20:04] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:20:53] <wikibugs>	 (03PS3) 10Arnaudb: mariadb: add monitoring on io pressure for mariadb hosts [alerts] - 10https://gerrit.wikimedia.org/r/1049196 (https://phabricator.wikimedia.org/T367281)
[15:21:12] <wikibugs>	 (03CR) 10Arnaudb: mariadb: add monitoring on io pressure for mariadb hosts (034 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1049196 (https://phabricator.wikimedia.org/T367281) (owner: 10Arnaudb)
[15:26:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
[15:26:07] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[15:26:14] <wikibugs>	 (03CR) 10Ayounsi: "> comments where we use the [0] approach would be awesome :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[15:26:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
[15:26:21] <wikibugs>	 (03PS4) 10Ayounsi: Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275)
[15:26:37] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
[15:27:06] <wikibugs>	 (03PS5) 10Ayounsi: Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275)
[15:27:06] <wikibugs>	 (03PS2) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[15:27:20] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: use MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051738 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[15:27:32] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-VPS, 06DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9950063 (10dcaro) The osd in now in, no changes in the error counter: ` root@cloudcephosd1034:~# for i in /dev/sd?;...
[15:27:49] <wikibugs>	 (03CR) 10Ayounsi: "Sounds good! Ok to wait. I re-ordered them so we can merge the Netbox 4 breaking changes." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[15:28:14] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: use MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051738 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[15:28:31] <wikibugs>	 (03PS8) 10Clare Ming: Deploy MetricsPlatform to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1046732 (https://phabricator.wikimedia.org/T366234)
[15:29:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Deploy MetricsPlatform to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1046732 (https://phabricator.wikimedia.org/T366234) (owner: 10Clare Ming)
[15:29:20] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] "Cherry-picked without incident to Beta Cluster's puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/1051499 (https://phabricator.wikimedia.org/T361384) (owner: 10Andrew Bogott)
[15:29:45] <wikibugs>	 (03CR) 10Ayounsi: "It's in a lot of places so I worry it would make the code more difficult to read." [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1050379 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[15:31:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
[15:31:53] <sukhe>	 !log restart haproxy on dns1005
[15:31:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:47] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[15:33:05] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1047473 (https://phabricator.wikimedia.org/T367681) (owner: 10Kamila Součková)
[15:35:32] <wikibugs>	 (03PS2) 10Elukey: knative: upgrade all images to Bullseye and Golang 1.19 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359)
[15:35:32] <wikibugs>	 (03PS2) 10Elukey: wmfdebug: Upgrade to Bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051402 (https://phabricator.wikimedia.org/T368366)
[15:36:10] <wikibugs>	 (03CR) 10Elukey: "Found some time and I tried the upgrade, so far it seems building:" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359) (owner: 10Elukey)
[15:41:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
[15:41:13] <stashbot>	 T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994
[15:41:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
[15:41:42] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
[15:41:43] <wikibugs>	 (03PS3) 10Elukey: knative: upgrade all images to Bullseye and Golang 1.19 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359)
[15:41:43] <wikibugs>	 (03PS3) 10Elukey: wmfdebug: Upgrade to Bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051402 (https://phabricator.wikimedia.org/T368366)
[15:46:08] <wikibugs>	 (03PS4) 10Elukey: knative: upgrade all images to Bookworm and Golang 1.22 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359)
[15:46:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
[15:46:46] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[15:46:48] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[15:47:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[15:47:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
[15:47:19] <wikibugs>	 (03CR) 10Aaron Schulz: Set "s3" as the default section name (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/909763 (owner: 10Aaron Schulz)
[15:47:28] <wikibugs>	 (03PS5) 10Aaron Schulz: Set "s3" as the default section name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/909763
[15:49:09] <wikibugs>	 (03CR) 10Klausman: [C:03+1] "Thank you for taking care of this!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1051387 (https://phabricator.wikimedia.org/T368359) (owner: 10Elukey)
[15:56:20] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9950212 (10ayounsi) 05Open→03Resolved All is done here.
[15:57:08] <wikibugs>	 (03PS1) 10JHathaway: add mx-in{1001,2001) as MX servers [dns] - 10https://gerrit.wikimedia.org/r/1051797 (https://phabricator.wikimedia.org/T367517)
[15:58:15] <wikibugs>	 (03PS1) 10DCausse: cirrus: re-enable search updates on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051798
[15:58:51] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: assign MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051799 (https://phabricator.wikimedia.org/T368875)
[16:00:15] <jinxer-wm>	 FIRING: [2x] PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 5.61% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:00:34] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 112665312 and 21 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:01:04] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: assign MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051799 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[16:01:34] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 71912 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:02:09] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: assign MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051799 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[16:02:58] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: assign MAX_FEATURE_VALS in articlequality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051799 (https://phabricator.wikimedia.org/T368875) (owner: 10Kevin Bazira)
[16:04:12] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[16:05:15] <jinxer-wm>	 RESOLVED: [3x] PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 6.735% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:05:16] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [dns] - 10https://gerrit.wikimedia.org/r/1051746 (https://phabricator.wikimedia.org/T362330) (owner: 10Ayounsi)
[16:05:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
[16:06:23] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add public1-virtual-codfw PTR [dns] - 10https://gerrit.wikimedia.org/r/1051746 (https://phabricator.wikimedia.org/T362330) (owner: 10Ayounsi)
[16:06:46] <jinxer-wm>	 FIRING: [2x] Primary inbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[16:06:56] <wikibugs>	 (03PS2) 10DCausse: cirrus: re-enable search updates on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051798
[16:07:37] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Grant Access to analytics-privatedata-users for Sharvaniharan - https://phabricator.wikimedia.org/T368566#9950297 (10Sharvaniharan) Hi @Volans  Thank you for getting the patch going.  Confirming that I have read the user responsibilities doc and will adhere t...
[16:09:44] <XioNoX>	 ^ analytics https://librenms.wikimedia.org/bill/bill_id=28/ (cc topranks )
[16:09:56] <akosiaris>	 ah, I was about to ask
[16:10:41] <XioNoX>	 do we know who ran the job? and if they can stop it?
[16:11:47] <jinxer-wm>	 RESOLVED: [2x] Primary inbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[16:12:52] <cdanis>	 XioNoX: has this been happening more often lately?
[16:14:44] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 1103717840 and 85 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:15:01] <wikibugs>	 (03PS1) 10Kevin Bazira: Revert "ml-services: assign MAX_FEATURE_VALS in articlequality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051800
[16:16:12] <wikibugs>	 (03PS13) 10Jdlrobson: [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050671 (https://phabricator.wikimedia.org/T366366)
[16:16:44] <topranks>	 last time we had something similar it was tricky to find exactly who 
[16:16:45] <topranks>	 https://phabricator.wikimedia.org/T364893#9800673
[16:17:16] <cdanis>	 jouncebot: nowandnext
[16:17:16] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 42 minute(s)
[16:17:16] <jouncebot>	 In 0 hour(s) and 42 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1700)
[16:17:46] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:17:58] <cwhite>	 looks like all the an-worker nodes are pulling a lot of data
[16:18:37] <wikibugs>	 (03PS2) 10Kevin Bazira: Revert "ml-services: assign MAX_FEATURE_VALS in articlequality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051800
[16:18:50] <topranks>	 ok
[16:18:58] <cdanis>	 I'm checking if it is Presto again
[16:19:38] <topranks>	 quick check of that dashboard doesn't look like it was the exact same this time 
[16:19:42] <topranks>	 https://grafana-rw.wikimedia.org/d/000000006/presto-server-utilization-btullis?orgId=1&refresh=30s&viewPanel=27&from=1720012683206&to=1720023483207
[16:19:44] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] Revert "ml-services: assign MAX_FEATURE_VALS in articlequality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051800 (owner: 10Kevin Bazira)
[16:19:46] <cdanis>	 yeah, not nearly the same magnitude
[16:20:14] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "ml-services: assign MAX_FEATURE_VALS in articlequality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051800 (owner: 10Kevin Bazira)
[16:20:23] <cdanis>	 topranks: https://grafana-rw.wikimedia.org/d/ZvSPbGOnz/hadoop-server-utilization-btullis?orgId=1&from=1720017829797&to=1720023514494
[16:20:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
[16:20:49] <cdanis>	 so I'm guessing it's HDFS + a query on Hive/Yarn (/Spark?)
[16:22:23] <wikibugs>	 (03PS2) 10Andrew Bogott: deployment-prep mcrouter: replace old memc servers with new ones [puppet] - 10https://gerrit.wikimedia.org/r/1051499 (https://phabricator.wikimedia.org/T361384)
[16:22:23] <wikibugs>	 (03PS1) 10Andrew Bogott: environment: add wikimediacloud.org to no_proxy domains [puppet] - 10https://gerrit.wikimedia.org/r/1051802
[16:22:35] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "just out of curiosity, what are we considering here "low traffic"?" [puppet] - 10https://gerrit.wikimedia.org/r/1047191 (owner: 10BCornwall)
[16:23:12] <cdanis>	 I'm trying to grok https://yarn.wikimedia.org/cluster/scheduler 
[16:24:46] <wikibugs>	 (03PS1) 10JHathaway: Move outbound email to mx-out{1001,2001}.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1051803 (https://phabricator.wikimedia.org/T365395)
[16:26:09] <wikibugs>	 (03CR) 10Tjones: [C:03+1] "looks good to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051798 (owner: 10DCausse)
[16:26:35] <cwhite>	 network rx picked up around 15:58 - one job started ~10min before that (https://yarn.wikimedia.org/cluster/app/application_1719935448343_10585)
[16:27:19] <cwhite>	 although starttime on that page differs from the scheduler page
[16:27:53] <cwhite>	 nvm, wrong link
[16:27:58] <cwhite>	 https://yarn.wikimedia.org/cluster/app/application_1719935448343_13378
[16:28:12] <wikibugs>	 (03PS2) 10JHathaway: add mx-in{1001,2001) as MX servers [dns] - 10https://gerrit.wikimedia.org/r/1051797 (https://phabricator.wikimedia.org/T367517)
[16:28:53] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051803 (https://phabricator.wikimedia.org/T365395) (owner: 10JHathaway)
[16:32:04] <wikibugs>	 (03PS2) 10JHathaway: Move outbound email to mx-out{1001,2001}.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1051803 (https://phabricator.wikimedia.org/T365395)
[16:32:10] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051803 (https://phabricator.wikimedia.org/T365395) (owner: 10JHathaway)
[16:34:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] deployment-prep mcrouter: replace old memc servers with new ones [puppet] - 10https://gerrit.wikimedia.org/r/1051499 (https://phabricator.wikimedia.org/T361384) (owner: 10Andrew Bogott)
[16:35:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
[16:38:54] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 1236769896 and 111 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:41:54] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 168840 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:42:23] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] add mx-in{1001,2001) as MX servers [dns] - 10https://gerrit.wikimedia.org/r/1051797 (https://phabricator.wikimedia.org/T367517) (owner: 10JHathaway)
[16:44:20] <jhathaway>	 !log adding inbound email servers mx-in{1001,2001} to our MX record
[16:44:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:50] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
[16:47:06] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
[16:48:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 20.93% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:48:32] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: deploy gemma2-27b-it on ml-staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051806 (https://phabricator.wikimedia.org/T369055)
[16:50:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
[16:53:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 20.93% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:54:28] <wikibugs>	 06SRE, 06serviceops, 10Data Products (Data Products Sprint 16), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9950626 (10WDoranWMF)
[16:56:45] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9950632 (10WDoranWMF)
[17:00:03] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] beta: eventbus: enable instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051709 (https://phabricator.wikimedia.org/T363587) (owner: 10Gmodena)
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1700)
[17:02:09] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9950686 (10xcollazo) I played with the offending SQL statements from T368098#9...
[17:02:17] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2024-06-17-221517 to 2024-07-03-155425 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051807 (https://phabricator.wikimedia.org/T364413)
[17:02:34] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2024-06-11-161031 to 2024-07-03-153821 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051808 (https://phabricator.wikimedia.org/T364413)
[17:03:49] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] EventStreamConfig: Add hive ingestion defaults (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050596 (https://phabricator.wikimedia.org/T367134) (owner: 10TChin)
[17:05:20] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade orchestrator from 2024-06-17-221517 to 2024-07-03-155425 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051807 (https://phabricator.wikimedia.org/T364413) (owner: 10Jforrester)
[17:05:47] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail, 13Patch-For-Review: Postfix inbound rollout sequence, mx-in - https://phabricator.wikimedia.org/T367517#9950703 (10bcampbell) I see the new MX records in Google Workspace Admin now @jhathaway.  {F56203753}
[17:05:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
[17:06:20] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2024-06-17-221517 to 2024-07-03-155425 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051807 (https://phabricator.wikimedia.org/T364413) (owner: 10Jforrester)
[17:07:02] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Grant Access to analytics-privatedata-users for Sharvaniharan - https://phabricator.wikimedia.org/T368566#9950724 (10Volans) a:03ATsay-WMF
[17:07:32] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[17:08:08] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[17:09:08] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[17:10:19] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[17:10:23] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Request access to servers Dcops group - https://phabricator.wikimedia.org/T360356#9950740 (10wiki_willy) Thanks so much @elukey for putting this proposal together, and for the chat during office hours today.  I like the entire idea, and will run it by the rest of the team d...
[17:10:38] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[17:11:47] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[17:13:16] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2024-06-11-161031 to 2024-07-03-153821 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051808 (https://phabricator.wikimedia.org/T364413) (owner: 10Jforrester)
[17:14:16] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2024-06-11-161031 to 2024-07-03-153821 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051808 (https://phabricator.wikimedia.org/T364413) (owner: 10Jforrester)
[17:14:46] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9950763 (10cmooney) 05Open→03Resolved
[17:15:19] <cdanis>	 jouncebot: nowandnext
[17:15:19] <jouncebot>	 For the next 0 hour(s) and 44 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1700)
[17:15:20] <jouncebot>	 In 0 hour(s) and 44 minute(s): MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1800)
[17:15:38] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[17:15:42] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Bump mediawiki chart version & mesh version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051453 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[17:17:16] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[17:17:23] <wikibugs>	 (03Merged) 10jenkins-bot: Bump mediawiki chart version & mesh version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051453 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis)
[17:17:53] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[17:19:46] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[17:19:50] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[17:20:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
[17:22:06] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[17:22:45] <wikibugs>	 (03PS1) 10Dreamrimmer: Remove "Create a book" link from sidebar on German Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051809 (https://phabricator.wikimedia.org/T368900)
[17:23:11] <wikibugs>	 (03PS1) 10CDanis: actually bump mediawiki chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051810
[17:23:23] <wikibugs>	 (03CR) 10CDanis: [C:03+2] actually bump mediawiki chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051810 (owner: 10CDanis)
[17:25:12] <wikibugs>	 (03Merged) 10jenkins-bot: actually bump mediawiki chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051810 (owner: 10CDanis)
[17:25:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, July 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051809 (https://phabricator.wikimedia.org/T368900) (owner: 10Dreamrimmer)
[17:26:38] <wikibugs>	 (03PS2) 10Dreamrimmer: [Wikitech] Remove namespace 666 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1043797 (https://phabricator.wikimedia.org/T367254)
[17:27:31] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Raise CPU limit in orchestrator from 200m to 400m [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051813 (https://phabricator.wikimedia.org/T368892)
[17:27:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, July 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1043797 (https://phabricator.wikimedia.org/T367254) (owner: 10Dreamrimmer)
[17:28:25] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[17:28:53] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[17:29:54] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[17:30:15] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[17:30:18] <wikibugs>	 (03PS21) 10Gergő Tisza: Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162)
[17:31:30] <wikibugs>	 (03CR) 10Gergő Tisza: "DOne." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[17:31:54] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[17:33:13] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
[17:33:14] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[17:34:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] environment: add wikimediacloud.org to no_proxy domains [puppet] - 10https://gerrit.wikimedia.org/r/1051802 (owner: 10Andrew Bogott)
[17:34:27] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
[17:34:28] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
[17:34:50] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
[17:34:51] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
[17:35:12] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
[17:35:13] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-misc: apply
[17:35:36] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
[17:35:37] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-misc: apply
[17:35:57] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
[17:36:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
[17:36:41] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[17:37:54] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[17:37:55] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[17:37:58] <wikibugs>	 (03PS1) 10Pppery: WIP: Add wmf-config changes for mos: interwiki hack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051814 (https://phabricator.wikimedia.org/T363538)
[17:38:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: Add wmf-config changes for mos: interwiki hack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051814 (https://phabricator.wikimedia.org/T363538) (owner: 10Pppery)
[17:40:11] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[17:40:11] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[17:41:30] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[17:41:31] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[17:41:33] <wikibugs>	 (03PS8) 10Gergő Tisza: varnish: Copy value of X-Wikimedia-Debug cookie to header [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094)
[17:41:46] <wikibugs>	 06SRE, 06serviceops: k8s master capacity issues - https://phabricator.wikimedia.org/T366094#9950976 (10CDanis) 05In progress→03Resolved a:03CDanis Boldly closing this because we've resolved all of {T353464} and the two tasks for 10G NICs T366204 T366205
[17:42:02] <wikibugs>	 (03CR) 10Gergő Tisza: "Done." [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza)
[17:43:06] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[17:43:25] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[17:44:38] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[17:44:39] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[17:44:44] <jinxer-wm>	 FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[17:45:30] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
[17:45:33] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
[17:46:08] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[17:48:09] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[17:49:30] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[17:49:31] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[17:50:46] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[17:53:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 304364432 and 14 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:54:20] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 56520 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:00:05] <jouncebot>	 hashar and jeena: Deploy window MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T1800)
[18:02:39] <jinxer-wm>	 FIRING: [2x] CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance elastic1091-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[18:04:12] <inflatador>	 ^^ Elastic alert should clear shortly, see unban cmd a few lines up
[18:10:36] <wikibugs>	 (03PS61) 10Bking: dse-k8s-services: Add net-new chart for Airflow [deployment-charts] - 10https://gerrit.wikimedia.org/r/1041759 (https://phabricator.wikimedia.org/T363001)
[18:11:26] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 29524856 and 4 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:11:35] <wikibugs>	 (03CR) 10Bking: "Yeah, this seems like the proper path forward. I've actually already started work on this, I just wanted to leave that for a different CR." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1041759 (https://phabricator.wikimedia.org/T363001) (owner: 10Bking)
[18:11:40] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9951125 (10cmooney) So one thing I noticed is that we are not getting the stats for LAG/ae interfaces with the current setup, nor routed...
[18:12:26] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 2448 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:14:43] <jinxer-wm>	 RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[18:17:30] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 1433152560 and 91 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:17:39] <jinxer-wm>	 FIRING: [2x] CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance elastic1091-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[18:21:30] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 11848 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:22:39] <jinxer-wm>	 RESOLVED: [2x] CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance elastic1091-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[18:25:32] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:26:52] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:28:17] <wikibugs>	 (03PS1) 10CDanis: otelcol: update hardcoded k8s master IPs for the last time [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051820 (https://phabricator.wikimedia.org/T365855)
[18:28:24] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.186 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:28:44] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52339 bytes in 0.119 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:29:10] <sukhe>	 cdanis: did someone already give you the "last famous words" punchline on this already or not :)
[18:29:32] <cdanis>	 sukhe: lol
[18:29:41] <sukhe>	 every time I put last/temporary/fix somewhere, someone messages me to tell me that it won't be the csae
[18:29:48] <cdanis>	 well it *looks* trivial to do it the right way soon, even with an external chart
[18:31:36] <wikibugs>	 (03CR) 10CDanis: [C:03+2] otelcol: update hardcoded k8s master IPs for the last time [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051820 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[18:34:33] <wikibugs>	 (03Merged) 10jenkins-bot: otelcol: update hardcoded k8s master IPs for the last time [deployment-charts] - 10https://gerrit.wikimedia.org/r/1051820 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[18:34:54] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[18:35:05] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[18:36:48] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[18:36:56] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[18:39:33] <wikibugs>	 (03CR) 10Mforns: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1041817 (https://phabricator.wikimedia.org/T363435) (owner: 10David Martin)
[18:40:38] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 568625416 and 61 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:41:38] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 20624 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:54:40] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
[18:54:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951390 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo...
[18:55:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
[18:55:15] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[18:59:26] <jinxer-wm>	 FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[19:08:57] <SandraEbele_>	 !log deploying airflow dags
[19:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
[19:11:54] <logmsgbot>	 !log ebysans@deploy1002 Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
[19:12:27] <logmsgbot>	 !log ebysans@deploy1002 Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
[19:16:16] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
[19:19:03] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
[19:19:09] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951466 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo...
[19:24:22] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
[19:24:31] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951473 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bookwo...
[19:25:17] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
[19:25:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951474 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo...
[19:25:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
[19:30:19] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
[19:30:21] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
[19:38:21] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Update terms and timeline of access already granted for AndyRussG - https://phabricator.wikimedia.org/T367681#9951496 (10KFrancis) I went ahead and processed an NDA here.  It's just better to have our bases covered.  I'll confirm when it's complete.
[19:40:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
[19:40:36] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[19:40:37] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[19:40:49] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[19:40:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
[19:49:47] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
[19:54:05] <wikibugs>	 06SRE, 06Data-Engineering, 10Dumps-Generation, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9951531 (10xcollazo) In {T29112} they modified the code to `ORDER BY page_id A...
[19:54:24] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
[19:55:05] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[19:55:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951547 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo...
[19:56:52] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T2000). nyaa~
[20:00:05] <jouncebot>	 katherine_g: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:12] <katherine_g>	 here! 
[20:00:38] <cjming>	 hi katherine_g - i can deploy for you unless you can self-deploy?
[20:00:54] <katherine_g>	 I cannot self deploy yet, so that would be great! 
[20:01:04] <cjming>	 alrighty - let's go!
[20:01:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969) (owner: 10Kgraessle)
[20:02:34] <wikibugs>	 (03Merged) 10jenkins-bot: Remove QuickSurvey for Automoderator patroller workstream survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051756 (https://phabricator.wikimedia.org/T362969) (owner: 10Kgraessle)
[20:03:07] <logmsgbot>	 !log cjming@deploy1002 Started scap sync-world: Backport for [[gerrit:1051756|Remove QuickSurvey for Automoderator patroller workstream survey (T362969)]]
[20:03:10] <stashbot>	 T362969: Deploy QuickSurvey for Automoderator patroller workstream survey - https://phabricator.wikimedia.org/T362969
[20:04:41] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
[20:04:43] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
[20:05:06] <logmsgbot>	 !log bking@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
[20:05:48] <logmsgbot>	 !log cjming@deploy1002 kgraessle, cjming: Backport for [[gerrit:1051756|Remove QuickSurvey for Automoderator patroller workstream survey (T362969)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:05:51] <cjming>	 katherine_g: up on test servers if you want to check - lmk if/when to sync
[20:06:10] <katherine_g>	 looks good to sync! 
[20:06:17] <cjming>	 yay!
[20:06:20] <logmsgbot>	 !log cjming@deploy1002 kgraessle, cjming: Continuing with sync
[20:09:26] <jinxer-wm>	 RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[20:10:13] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
[20:11:29] <logmsgbot>	 !log cjming@deploy1002 Finished scap: Backport for [[gerrit:1051756|Remove QuickSurvey for Automoderator patroller workstream survey (T362969)]] (duration: 08m 22s)
[20:11:32] <stashbot>	 T362969: Deploy QuickSurvey for Automoderator patroller workstream survey - https://phabricator.wikimedia.org/T362969
[20:11:41] <cjming>	 katherine_g: should be live!
[20:11:55] <katherine_g>	 ok, thanks! 
[20:12:05] <cjming>	 yw
[20:13:07] <cjming>	 ok - closing window bec i have to wrap up stuff for the upcoming long weekend - if someone needs something deployed in the next 45 mins, please ping me here or on slack and i can hop on
[20:13:52] <cjming>	 !log end of UTC late backport window
[20:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:31] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-image-create: update with g4 flavors [puppet] - 10https://gerrit.wikimedia.org/r/1051836
[20:47:01] <wikibugs>	 (03PS1) 10RLazarus: systemd: Expand Systemd::Timer::Interval pattern [puppet] - 10https://gerrit.wikimedia.org/r/1051839
[20:47:23] <wikibugs>	 (03CR) 10RLazarus: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051839 (owner: 10RLazarus)
[21:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T2100)
[21:04:26] <jinxer-wm>	 FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[21:09:37] <wikibugs>	 (03PS2) 10RLazarus: systemd: Expand Systemd::Timer::Interval pattern [puppet] - 10https://gerrit.wikimedia.org/r/1051839
[21:10:26] <wikibugs>	 (03CR) 10RLazarus: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051839 (owner: 10RLazarus)
[21:12:30] <wikibugs>	 (03PS3) 10Bking: wdqs: detune blackbox checks [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405)
[21:13:32] <wikibugs>	 (03CR) 10RLazarus: "As discussed!" [puppet] - 10https://gerrit.wikimedia.org/r/1051839 (owner: 10RLazarus)
[21:14:21] <wikibugs>	 (03PS2) 10RLazarus: deployment_server: Add a daily systemd timer for mwscript_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553)
[21:16:04] <wikibugs>	 (03CR) 10RLazarus: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[21:16:05] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405) (owner: 10Bking)
[21:18:09] <wikibugs>	 (03PS3) 10RLazarus: deployment_server: Add a daily systemd timer for mwscript_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553)
[21:18:20] <wikibugs>	 (03CR) 10RLazarus: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[21:20:12] <wikibugs>	 (03PS4) 10Bking: wdqs: detune blackbox checks [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405)
[21:22:15] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] Move outbound email to mx-out{1001,2001}.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1051803 (https://phabricator.wikimedia.org/T365395) (owner: 10JHathaway)
[21:24:03] <wikibugs>	 (03PS4) 10RLazarus: deployment_server: Add a daily systemd timer for mwscript_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553)
[21:24:14] <wikibugs>	 (03PS5) 10Bking: wdqs: detune blackbox checks [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405)
[21:24:26] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[21:25:43] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, 13Patch-For-Review: Postfix outbound rollout sequence, mx-out - https://phabricator.wikimedia.org/T365395#9951919 (10jhathaway)
[21:25:54] <wikibugs>	 (03CR) 10RLazarus: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[21:29:25] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] wdqs: detune blackbox checks [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405) (owner: 10Bking)
[21:29:27] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs: detune blackbox checks [puppet] - 10https://gerrit.wikimedia.org/r/1051369 (https://phabricator.wikimedia.org/T366405) (owner: 10Bking)
[21:33:16] <wikibugs>	 (03CR) 10RLazarus: "Damn, nice digging! As discussed I addressed this at the regex, mostly out of indignation." [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[21:34:26] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[21:35:21] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
[21:36:25] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Nice! Thank you :)" [puppet] - 10https://gerrit.wikimedia.org/r/1051839 (owner: 10RLazarus)
[21:39:12] <wikibugs>	 (03CR) 10Scott French: "Thanks for updating the regex!" [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[21:40:33] <logmsgbot>	 !log ryankemper@cumin2002 END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
[21:40:49] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
[21:42:28] <wikibugs>	 (03Abandoned) 10Dzahn: gerrit: remove NRPE process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1032526 (owner: 10Dzahn)
[21:43:18] <icinga-wm>	 PROBLEM - Dell PowerEdge RAID Controller on db2161 is CRITICAL: communication: 0 OK https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
[21:43:20] <icinga-wm>	 ACKNOWLEDGEMENT - Dell PowerEdge RAID Controller on db2161 is CRITICAL: communication: 0 OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T369229 https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
[21:43:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on db2161 - https://phabricator.wikimedia.org/T369229 (10ops-monitoring-bot) 03NEW
[21:49:27] <jinxer-wm>	 RESOLVED: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[21:50:08] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs-image-create: update with g4 flavors [puppet] - 10https://gerrit.wikimedia.org/r/1051836
[21:50:08] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-image-create: clear image id in base image [puppet] - 10https://gerrit.wikimedia.org/r/1051845 (https://phabricator.wikimedia.org/T351507)
[21:51:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] wmcs-image-create: update with g4 flavors [puppet] - 10https://gerrit.wikimedia.org/r/1051836 (owner: 10Andrew Bogott)
[21:51:30] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] wmcs-image-create: clear image id in base image [puppet] - 10https://gerrit.wikimedia.org/r/1051845 (https://phabricator.wikimedia.org/T351507) (owner: 10Andrew Bogott)
[21:55:32] <wikibugs>	 (03PS1) 10Dzahn: puppetmaster: change git sender email address to git@wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1051846
[21:56:47] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
[21:57:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9952054 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye
[22:08:31] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] systemd: Expand Systemd::Timer::Interval pattern [puppet] - 10https://gerrit.wikimedia.org/r/1051839 (owner: 10RLazarus)
[22:08:40] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Add a daily systemd timer for mwscript_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1051489 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[22:20:10] <wikibugs>	 (03PS1) 10RLazarus: deployment_server: Run mwscript-cleanup as mwdeploy, not www-data [puppet] - 10https://gerrit.wikimedia.org/r/1051848
[22:21:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: mwscript-cleanup.service on deploy1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:27:30] <wikibugs>	 (03PS5) 10Jdlrobson: [July 15th] Deploy dark mode to all logged-in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050082 (https://phabricator.wikimedia.org/T368795)
[22:36:14] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
[22:36:25] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9952219 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host parsoidtest1001.eqiad.wmnet with OS bullseye executed...
[22:36:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:36:33] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
[22:36:35] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[22:37:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
[22:37:03] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[22:37:41] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Should we add links between our spine switches aggregating each row of two? - https://phabricator.wikimedia.org/T369238 (10cmooney) 03NEW p:05Triage→03Low
[22:38:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9952225 (10Jclark-ctr) @Papaul  if you get a chance can you look at this one?
[22:47:28] <wikibugs>	 (03CR) 10Scott French: [C:03+1] deployment_server: Run mwscript-cleanup as mwdeploy, not www-data [puppet] - 10https://gerrit.wikimedia.org/r/1051848 (owner: 10RLazarus)
[22:51:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
[22:52:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
[22:53:50] <icinga-wm>	 PROBLEM - Check unit status of mwscript-cleanup on deploy1002 is CRITICAL: CRITICAL: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:00:04] <jouncebot>	 mvolz: Time to do the Services – Citoid / Zotero deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240703T2300).
[23:00:31] <icinga-wm>	 PROBLEM - Check unit status of mwscript-cleanup on deploy1003 is CRITICAL: CRITICAL: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:03:17] <icinga-wm>	 PROBLEM - Check unit status of mwscript-cleanup on deploy2002 is CRITICAL: CRITICAL: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:06:45] <rzl>	 ^ mwscript-cleanup is me, working on it
[23:06:47] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
[23:07:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
[23:08:00] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Run mwscript-cleanup as mwdeploy, not www-data [puppet] - 10https://gerrit.wikimedia.org/r/1051848 (owner: 10RLazarus)
[23:08:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9952366 (10Dzahn) We can see in reimage-extended.log that the reimage fails but it's not immediately clear why.  ` 2024-07-03 22:36:13,115 jclark 2636322 [ERRO...
[23:09:26] <jinxer-wm>	 FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[23:16:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:21:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
[23:21:57] <stashbot>	 T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010
[23:22:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
[23:22:24] <stashbot>	 T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069
[23:22:24] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[23:22:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[23:22:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:22:55] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:23:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
[23:23:17] <icinga-wm>	 RECOVERY - Check unit status of mwscript-cleanup on deploy2002 is OK: OK: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:23:51] <icinga-wm>	 RECOVERY - Check unit status of mwscript-cleanup on deploy1002 is OK: OK: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:24:04] <wikibugs>	 (03PS1) 10Dwisehaupt: crm: Stop civicrm callouts to the internet for version checks [puppet] - 10https://gerrit.wikimedia.org/r/1051851 (https://phabricator.wikimedia.org/T343486)
[23:24:38] <wikibugs>	 (03CR) 10Dwisehaupt: "This is the ET change we worked through at the offsite." [puppet] - 10https://gerrit.wikimedia.org/r/1051851 (https://phabricator.wikimedia.org/T343486) (owner: 10Dwisehaupt)
[23:26:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:29:26] <jinxer-wm>	 RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[23:30:10] <wikibugs>	 (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1051486 (owner: 10TrainBranchBot)
[23:30:31] <icinga-wm>	 RECOVERY - Check unit status of mwscript-cleanup on deploy1003 is OK: OK: Status of the systemd unit mwscript-cleanup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:33:03] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists, 13Patch-For-Review: Migrate Mailman/lists to Bullseye/Bookworm - https://phabricator.wikimedia.org/T331706#9952430 (10Dzahn)
[23:34:07] <wikibugs>	 (03PS1) 10Dzahn: Revert "Phabricator: Add safe.directory directive" [puppet] - 10https://gerrit.wikimedia.org/r/1051852
[23:34:38] <wikibugs>	 (03PS1) 10Catrope: Graph extension: Add tracking for data sources used in <graph> tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051853
[23:38:34] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1051854
[23:38:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1051854 (owner: 10TrainBranchBot)
[23:40:11] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Revert "Phabricator: Add safe.directory directive" [puppet] - 10https://gerrit.wikimedia.org/r/1051852 (owner: 10Dzahn)
[23:41:20] <wikibugs>	 (03PS1) 10RLazarus: deployment_server: Handle None container_statuses in mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1051855 (https://phabricator.wikimedia.org/T369175)
[23:46:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9952433 (10Papaul) @Jclark-ctr @Dzahn  this is what i have on the conole   [            (1*installer)  2 shell  3 shell  4- log           ][ Jul 03 23:44 ]...
[23:47:39] <tzatziki>	 !log removing 11 files for legal compliance
[23:47:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:55:32] <wikibugs>	 (03CR) 10Scott French: [C:03+1] deployment_server: Handle None container_statuses in mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1051855 (https://phabricator.wikimedia.org/T369175) (owner: 10RLazarus)
[23:56:54] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Handle None container_statuses in mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1051855 (https://phabricator.wikimedia.org/T369175) (owner: 10RLazarus)
[23:59:43] <wikibugs>	 (03PS1) 10Dzahn: installserver: add parsoidtest1001 to partman [puppet] - 10https://gerrit.wikimedia.org/r/1051856 (https://phabricator.wikimedia.org/T363399)