[00:04:39] FIRING: [2x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:07:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:18:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:19:37] jouncebot: nowandnext [00:19:37] No deployments scheduled for the next 5 hour(s) and 40 minute(s) [00:19:37] In 5 hour(s) and 40 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T0600) [00:19:37] In 5 hour(s) and 40 minute(s): Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T0600) [00:21:31] (03PS1) 10Ladsgroup: Init magwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303594 (https://phabricator.wikimedia.org/T428266) [00:24:18] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303594 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [00:25:10] (03Merged) 10jenkins-bot: Init magwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303594 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [00:26:21] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1303594|Init magwiki (T428266)]] [00:26:26] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [00:28:21] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1303594|Init magwiki (T428266)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:29:16] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [00:33:36] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303594|Init magwiki (T428266)]] (duration: 07m 14s) [00:33:40] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [00:35:25] FIRING: SystemdUnitFailed: community_civicrm-cv-job-run.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:37:42] (03PS1) 10Ladsgroup: Activate magwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303596 (https://phabricator.wikimedia.org/T428266) [00:38:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:38:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:38:49] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303596 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [00:40:13] (03Merged) 10jenkins-bot: Activate magwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303596 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [00:40:25] RESOLVED: SystemdUnitFailed: community_civicrm-cv-job-run.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:40:39] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1303596|Activate magwiki (T428266)]] [00:40:43] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [00:42:40] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1303596|Activate magwiki (T428266)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:43:52] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [00:48:04] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303596|Activate magwiki (T428266)]] (duration: 07m 25s) [00:48:10] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [00:49:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:59:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:04:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:09:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:10:07] (03PS1) 10Ladsgroup: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303600 (https://phabricator.wikimedia.org/T428266) [01:11:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303600 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [01:12:04] (03Merged) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303600 (https://phabricator.wikimedia.org/T428266) (owner: 10Ladsgroup) [01:12:29] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1303600|Update interwiki map (T428266)]] [01:12:34] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [01:12:44] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1303601 [01:12:44] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1303601 (owner: 10TrainBranchBot) [01:14:29] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1303600|Update interwiki map (T428266)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [01:15:06] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [01:16:08] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [01:19:24] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303600|Update interwiki map (T428266)]] (duration: 06m 55s) [01:19:29] T428266: Create Wikipedia Magahi - https://phabricator.wikimedia.org/T428266 [01:20:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:21:05] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1303601 (owner: 10TrainBranchBot) [01:30:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:51:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:54:30] (03PS2) 10Krinkle: Disable ShortUrl on mrwiki, newiki, pawiki, tawiki, orwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303006 (https://phabricator.wikimedia.org/T107188) [02:00:46] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:01:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:07:31] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 45s) [02:08:21] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12032211 (10medelius) Hi, I'm back, here's my public key: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAbbmEfRM12l9Ii5cVi9v9cMeUcwvgVIov2d+b1mgQgQ cmedeli@wmf3715` Thanks for the patience + the help! [02:09:39] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:39] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:22:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:23:24] (03PS3) 10Krinkle: Disable ShortUrl on mrwiki, newiki, orwiki, pawiki, sawiki, tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303006 (https://phabricator.wikimedia.org/T107188) [02:32:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:33:25] FIRING: [2x] BFDdown: BFD session down between asw1-b4-magru and 195.200.68.37 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=asw1-b4-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [02:39:39] FIRING: [6x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-wdqs2001:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [02:53:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:03:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:24:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:34:30] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:37:25] (03CR) 10RLazarus: "This is the end result of the instructions at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/docker-pkg/deploy/+/" [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/1303560 (owner: 10RLazarus) [03:39:56] (03PS1) 10Andrew Bogott: Replace zuul project name w/ID for security group management [puppet] - 10https://gerrit.wikimedia.org/r/1303614 (https://phabricator.wikimedia.org/T422801) [03:42:34] (03CR) 10Andrew Bogott: [C:03+2] Replace zuul project name w/ID for security group management [puppet] - 10https://gerrit.wikimedia.org/r/1303614 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [03:45:30] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_zuul on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_zuul https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:53:26] (03CR) 10Scott French: [C:03+1] "Thanks, Reuven! I was briefly puzzled by the relative delta on the artifacts files, but I now see a couple of recent patches that bumped r" [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/1303560 (owner: 10RLazarus) [03:56:39] 10SRE-swift-storage, 10EasyTimeline: "Timeline error. Could not store output files" - https://phabricator.wikimedia.org/T428063#12032241 (10Pppery) a:05SomeRandomDeveloper→03None [03:59:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [04:04:39] FIRING: [2x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:04:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [04:22:17] (03PS3) 10Giuseppe Lavagetto: hiddenparma: switch to native CAS authentication [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) [04:38:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:46:08] RESOLVED: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [05:12:40] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [05:12:40] !log marostegui@cumin1003 dbmaint on es7@codfw T429463 [05:12:48] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [05:13:02] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2040: Upgrading es2040.codfw.wmnet [05:13:23] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2040: Upgrading es2040.codfw.wmnet [05:13:43] (03CR) 10Marostegui: "This has been tested with a real reimage and this is what it produces:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [05:14:36] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2040.codfw.wmnet with OS trixie [05:22:36] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1224 is unreachable - https://phabricator.wikimedia.org/T427535#12032281 (10Marostegui) [05:22:58] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1224 is unreachable - https://phabricator.wikimedia.org/T427535#12032283 (10Marostegui) 05Open→03Resolved This host will be decommissioned: T429561 [05:23:30] (03CR) 10Marostegui: "And now this correctly shows at: https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance" [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [05:24:48] (03PS1) 10Marostegui: instances.yaml: Remove db1224 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1303629 (https://phabricator.wikimedia.org/T429561) [05:26:03] (03CR) 10Marostegui: [C:03+2] instances.yaml: Remove db1224 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1303629 (https://phabricator.wikimedia.org/T429561) (owner: 10Marostegui) [05:27:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove db1224 from dbctl T429561', diff saved to https://phabricator.wikimedia.org/P94269 and previous config saved to /var/cache/conftool/dbconfig/20260618-052737-marostegui.json [05:27:42] T429561: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561 [05:28:59] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [05:29:26] (03PS1) 10Marostegui: mariadb: Decommission db1224 [puppet] - 10https://gerrit.wikimedia.org/r/1303652 (https://phabricator.wikimedia.org/T429561) [05:30:53] !log marostegui@cumin1003 START - Cookbook sre.mysql.decommission [05:31:04] !log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts db1224.eqiad.wmnet [05:31:17] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2040.codfw.wmnet with reason: host reimage [05:31:19] (03CR) 10Marostegui: [C:03+2] mariadb: Decommission db1224 [puppet] - 10https://gerrit.wikimedia.org/r/1303652 (https://phabricator.wikimedia.org/T429561) (owner: 10Marostegui) [05:31:59] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [05:35:34] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2040.codfw.wmnet with reason: host reimage [05:36:22] !log marostegui@cumin1003 START - Cookbook sre.dns.netbox [05:40:48] !log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1224.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" [05:41:03] !log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1224.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" [05:41:04] !log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [05:41:05] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1224.eqiad.wmnet [05:41:22] !log marostegui@cumin1003 Removing db1224 from zarcillo T429561 [05:41:24] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.decommission (exit_code=99) [05:41:26] T429561: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561 [05:42:04] (03CR) 10Marostegui: "I tested this cookbook today with a decommissioning and I got:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [05:42:53] (03CR) 10Marostegui: "Probably better formatted at: https://phabricator.wikimedia.org/P94270" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [05:43:49] PROBLEM - orchestrator resolve cache non-FQDNs on dborch1002 is CRITICAL: CRITICAL: 1 non-FQDN entries in orchestrator resolve cache: https://wikitech.wikimedia.org/wiki/Orchestrator [05:44:21] 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032320 (10Marostegui) This host is ready for dc-ops [05:44:27] 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032323 (10Marostegui) a:05Marostegui→03None [05:44:49] RECOVERY - orchestrator resolve cache non-FQDNs on dborch1002 is OK: OK: all orchestrator resolve cache entries are FQDNs https://wikitech.wikimedia.org/wiki/Orchestrator [05:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:52:20] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2040.codfw.wmnet with OS trixie [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T0600) [06:00:05] marostegui, Amir1, and federico3: Time to snap out of that daydream and deploy Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T0600). [06:04:36] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2040: Migration of es2040.codfw.wmnet completed [06:15:37] (03CR) 10Elukey: sre.hosts.provision: introduce the wmfroot user (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291994 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [06:31:00] (03PS1) 10DDesouza: Deploy English Wikipedia Mobile App Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303895 (https://phabricator.wikimedia.org/T428876) [06:32:42] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032355 (10Jclark-ctr) a:03Jclark-ctr [06:33:25] FIRING: [2x] BFDdown: BFD session down between asw1-b4-magru and 195.200.68.37 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=asw1-b4-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [06:34:00] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032361 (10Jclark-ctr) Location D6 u37 [06:34:55] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032362 (10Jclark-ctr) [06:36:37] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303895 (https://phabricator.wikimedia.org/T428876) (owner: 10DDesouza) [06:38:16] (03PS1) 10Muehlenhoff: Record LDAP access for rscout [puppet] - 10https://gerrit.wikimedia.org/r/1303896 [06:39:39] FIRING: [6x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-wdqs2001:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [06:39:46] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12032364 (10Jclark-ctr) [06:39:53] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/httpbb] - 10https://gerrit.wikimedia.org/r/1303557 (https://phabricator.wikimedia.org/T427899) (owner: 10RLazarus) [06:40:43] (03CR) 10Muehlenhoff: [C:03+2] Record LDAP access for rscout [puppet] - 10https://gerrit.wikimedia.org/r/1303896 (owner: 10Muehlenhoff) [06:50:06] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2040: Migration of es2040.codfw.wmnet completed [06:50:07] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [07:00:05] Amir1, urbanecm, and awight: Time to do the UTC morning backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T0700). [07:00:05] inflatador/dcausse: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:14] o/ [07:00:24] I can deploy [07:03:19] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302956 (https://phabricator.wikimedia.org/T425585) (owner: 10Bking) [07:04:15] (03Merged) 10jenkins-bot: deployment-prep: Update cirrussearch (OpenSearch) config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302956 (https://phabricator.wikimedia.org/T425585) (owner: 10Bking) [07:06:52] I'm done with scap [07:13:49] (03PS2) 10Slyngshede: IDP: Bump local version, 7.3.7.2+wmf13u2 [dns] - 10https://gerrit.wikimedia.org/r/1303380 (https://phabricator.wikimedia.org/T372892) [07:14:16] (03CR) 10Slyngshede: IDP: Bump local version, 7.3.7.2+wmf13u2 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1303380 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede) [07:30:36] (03CR) 10Jelto: [C:03+1] "lgtm" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1303475 (https://phabricator.wikimedia.org/T427401) (owner: 10JMeybohm) [07:35:30] (03PS1) 10Muehlenhoff: package_builder: Export environment variables for web proxy access [puppet] - 10https://gerrit.wikimedia.org/r/1303913 (https://phabricator.wikimedia.org/T416707) [07:37:57] (03PS1) 10Giuseppe Lavagetto: Add session secret key [labs/private] - 10https://gerrit.wikimedia.org/r/1303915 [07:38:30] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Add session secret key [labs/private] - 10https://gerrit.wikimedia.org/r/1303915 (owner: 10Giuseppe Lavagetto) [07:39:39] FIRING: [8x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:40:13] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] mirrors: Disable mirror_age_metrics metric [puppet] - 10https://gerrit.wikimedia.org/r/1303405 (https://phabricator.wikimedia.org/T416707) (owner: 10Muehlenhoff) [07:40:36] (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8763/co" [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) (owner: 10Giuseppe Lavagetto) [07:47:58] (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1303913 (https://phabricator.wikimedia.org/T416707) (owner: 10Muehlenhoff) [07:50:39] (03CR) 10Fabfur: cache::haproxy: changing req.provenance to sess.provenance and log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1303473 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [07:54:35] (03CR) 10Muehlenhoff: [C:03+2] package_builder: Export environment variables for web proxy access [puppet] - 10https://gerrit.wikimedia.org/r/1303913 (https://phabricator.wikimedia.org/T416707) (owner: 10Muehlenhoff) [07:55:07] (03PS2) 10Fabfur: cache::haproxy: changing req.provenance to txn.provenance and log [puppet] - 10https://gerrit.wikimedia.org/r/1303473 (https://phabricator.wikimedia.org/T427068) [07:59:51] (03CR) 10Jelto: [C:03+2] helm: remove helm311 package and make helm317 default [puppet] - 10https://gerrit.wikimedia.org/r/1303367 (https://phabricator.wikimedia.org/T341984) (owner: 10Jelto) [08:01:17] !log regenerate pbuilder environments on build2002 to use deb.debian.org T416707 [08:01:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:22] T416707: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707 [08:02:55] !log uploaded wmf-laptop 1.0.6 to component/wmf-laptop on apt.wikimedia.org [08:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:10] (03CR) 10Trueg: "How do we handle the `wdqs-next` namespace here? I assume there is a standard way of handling staging?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302923 (https://phabricator.wikimedia.org/T429380) (owner: 10Lerickson) [08:05:18] !log regenerate pbuilder environments on build2001 to use deb.debian.org T416707 [08:05:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:00] (03CR) 10CWilliams: major-upgrade.py: Add !log dbmaint on the start (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [08:12:23] 10SRE-swift-storage, 10EasyTimeline: "Timeline error. Could not store output files" - https://phabricator.wikimedia.org/T428063#12032539 (10lado85) >>! In T428063#12032091, @Fuyo21 wrote: > Still happens on Russian wiki: "Timeline error. Could not store output files" > https://ru.wikipedia.org/wiki/Шаблон:Хро... [08:15:58] (03PS1) 10Muehlenhoff: mirrors: Remove obsolete users, groups and monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303959 (https://phabricator.wikimedia.org/T416707) [08:16:04] jouncebot: nowandnext [08:16:04] No deployments scheduled for the next 1 hour(s) and 43 minute(s) [08:16:04] In 1 hour(s) and 43 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1000) [08:19:33] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#12032575 (10MoritzMuehlenhoff) [08:19:56] !log urbanecm@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [08:20:25] !log urbanecm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [08:20:54] (03PS1) 10Volans: Cinder backups: enable transport encryption part 2 [puppet] - 10https://gerrit.wikimedia.org/r/1303961 (https://phabricator.wikimedia.org/T294432) [08:21:03] !log jelto@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply [08:22:07] (03CR) 10Fabfur: [C:03+1] images/haproxy: set owner to Traffic [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1303420 (owner: 10Ssingh) [08:22:08] !log jelto@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply [08:22:46] (03CR) 10Giuseppe Lavagetto: [C:03+1] "LGTM at a very high level. Please check on turnilo that provenance-dependent requestctl rules still work once you've deployed" [puppet] - 10https://gerrit.wikimedia.org/r/1303473 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [08:23:52] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#12032610 (10ayounsi) Good news, I was able to find someone knowledgable on our hardware at the Nokia conference. He told me that we can reverse the airflow by replacing the fans... [08:26:03] (03PS1) 10Filippo Giunchedi: Put cloudvirt10[77-80] in service [puppet] - 10https://gerrit.wikimedia.org/r/1303962 (https://phabricator.wikimedia.org/T429563) [08:27:10] (03CR) 10Urbanecm: [C:04-1] "see inline comments" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285467 (https://phabricator.wikimedia.org/T425402) (owner: 10Acamicamacaraca) [08:27:54] (03CR) 10Jelto: [C:03+2] helm: install helm317 and helm319 in parallel [puppet] - 10https://gerrit.wikimedia.org/r/1303368 (https://phabricator.wikimedia.org/T341984) (owner: 10Jelto) [08:28:04] (03PS3) 10Jelto: helm: install helm317 and helm319 in parallel [puppet] - 10https://gerrit.wikimedia.org/r/1303368 (https://phabricator.wikimedia.org/T341984) [08:29:50] (03CR) 10Jelto: [C:03+2] helm: install helm317 and helm319 in parallel [puppet] - 10https://gerrit.wikimedia.org/r/1303368 (https://phabricator.wikimedia.org/T341984) (owner: 10Jelto) [08:33:17] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [08:33:49] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [08:34:17] !log brouberol@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. [08:35:10] !log brouberol@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. [08:35:39] (03CR) 10Filippo Giunchedi: "I don't know what else is needed after this patch !" [puppet] - 10https://gerrit.wikimedia.org/r/1303962 (https://phabricator.wikimedia.org/T429563) (owner: 10Filippo Giunchedi) [08:38:28] (03CR) 10Filippo Giunchedi: [C:03+1] Cinder backups: enable transport encryption part 2 [puppet] - 10https://gerrit.wikimedia.org/r/1303961 (https://phabricator.wikimedia.org/T294432) (owner: 10Volans) [08:38:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:51:00] !log jelto@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply [08:51:07] !log jelto@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply [08:51:17] !log jelto@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply [08:53:15] !log jelto@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply [08:53:42] !log jelto@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply [08:55:43] !log jelto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [08:58:52] 06SRE, 06Infrastructure-Foundations, 06Traffic: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12032690 (10MLechvien-WMF) That's correct, we could not prioritize Sophroid remaining work so far and our immediate/Q1 capacity is limited, but we're inte... [09:00:01] (03CR) 10Muehlenhoff: "Looks good, two questions inline" [puppet] - 10https://gerrit.wikimedia.org/r/1303304 (owner: 10Slyngshede) [09:00:43] 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12032722 (10MLechvien-WMF) [09:00:59] (03CR) 10Fabfur: cache::haproxy: changing req.provenance to txn.provenance and log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1303473 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [09:01:01] (03CR) 10Fabfur: [C:03+2] cache::haproxy: changing req.provenance to txn.provenance and log [puppet] - 10https://gerrit.wikimedia.org/r/1303473 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [09:02:25] 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12032735 (10MoritzMuehlenhoff) >>! In T429175#12032690, @MLechvien-WMF wrote: > That's correct, we could not prioritize Sophroid remai... [09:09:54] (03CR) 10Volans: "post-merge -1, there is at least one error and some refactoring to be done IMHO" [puppet] - 10https://gerrit.wikimedia.org/r/1302236 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [09:11:38] !log installing apache2 security updates [09:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:15] (03CR) 10Effie Mouzeli: [C:03+2] mcrouter_wancache: Bring mc-gp1004 back into use. [puppet] - 10https://gerrit.wikimedia.org/r/1303469 (https://phabricator.wikimedia.org/T426044) (owner: 10Blake) [09:32:56] !log blake@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply [09:33:06] !log blake@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply [09:33:12] !log blake@deploy1003 helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply [09:33:19] !log blake@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply [09:41:25] (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) [09:43:41] (03PS1) 10Zabe: LocalFileMoveBatch: Also update fr_archive_name when moving file [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303981 (https://phabricator.wikimedia.org/T428406) [09:44:43] (03PS1) 10Kosta Harlan: CaptchaScoreHooks: Log risk score for every non-exempt edit [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303982 (https://phabricator.wikimedia.org/T429481) [09:45:14] (03PS1) 10Kosta Harlan: CaptchaScoreHooks: Log risk score for every non-exempt edit [extensions/WikimediaEvents] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1303983 (https://phabricator.wikimedia.org/T429481) [09:45:31] jouncebot: nowandnext [09:45:31] No deployments scheduled for the next 0 hour(s) and 14 minute(s) [09:45:31] In 0 hour(s) and 14 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1000) [09:47:21] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1303983 (https://phabricator.wikimedia.org/T429481) (owner: 10Kosta Harlan) [09:47:22] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303982 (https://phabricator.wikimedia.org/T429481) (owner: 10Kosta Harlan) [09:47:26] (03CR) 10Muehlenhoff: [C:03+2] mirrors: Remove obsolete users, groups and monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303959 (https://phabricator.wikimedia.org/T416707) (owner: 10Muehlenhoff) [09:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:51:15] (03Merged) 10jenkins-bot: CaptchaScoreHooks: Log risk score for every non-exempt edit [extensions/WikimediaEvents] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1303983 (https://phabricator.wikimedia.org/T429481) (owner: 10Kosta Harlan) [09:51:17] (03Merged) 10jenkins-bot: CaptchaScoreHooks: Log risk score for every non-exempt edit [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303982 (https://phabricator.wikimedia.org/T429481) (owner: 10Kosta Harlan) [09:51:47] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1303983|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]], [[gerrit:1303982|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]] [09:51:51] T429481: hCaptcha: Fix logging of risk scores for edits - https://phabricator.wikimedia.org/T429481 [09:53:39] (03PS1) 10Btullis: Allow trueg to run mediawiki dumps from his airflow devenv [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303984 (https://phabricator.wikimedia.org/T422179) [09:54:00] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1303983|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]], [[gerrit:1303982|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:54:32] (03CR) 10Trueg: [C:03+1] "ty" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303984 (https://phabricator.wikimedia.org/T422179) (owner: 10Btullis) [09:55:33] !log kharlan@deploy1003 kharlan: Continuing with deployment [09:57:06] 06SRE, 06Infrastructure-Foundations, 10netops: Create cookbook to add BGP peering for host by triggering Homer run on correct device - https://phabricator.wikimedia.org/T429488#12032936 (10cmooney) >>! In T429488#12029612, @ayounsi wrote: > Not sure if it's worth adding something temporary to setup BGP on th... [09:57:31] (03PS1) 10Fabfur: Code changes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1303985 [09:57:56] (03CR) 10Fabfur: [V:03+2 C:03+2] Code changes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1303985 (owner: 10Fabfur) [09:59:57] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303983|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]], [[gerrit:1303982|CaptchaScoreHooks: Log risk score for every non-exempt edit (T429481)]] (duration: 08m 10s) [09:59:59] (03CR) 10Btullis: [C:03+2] Allow trueg to run mediawiki dumps from his airflow devenv [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303984 (https://phabricator.wikimedia.org/T422179) (owner: 10Btullis) [09:59:59] !log fabfur@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Change provenance var context - fabfur@cumin1003 - T427068" [10:00:01] !log fabfur@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Change provenance var context - fabfur@cumin1003 - T427068 [10:00:01] T429481: hCaptcha: Fix logging of risk scores for edits - https://phabricator.wikimedia.org/T429481 [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1000) [10:00:05] T427068: Add X-Provenance data to webrequest_sampled_live - https://phabricator.wikimedia.org/T427068 [10:00:59] !log fabfur@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Change provenance var context - fabfur@cumin1003 - T427068 [10:01:00] !log fabfur@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Change provenance var context - fabfur@cumin1003 - T427068" [10:01:44] (03CR) 10Marostegui: major-upgrade.py: Add !log dbmaint on the start (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [10:02:02] (03Merged) 10jenkins-bot: Allow trueg to run mediawiki dumps from his airflow devenv [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303984 (https://phabricator.wikimedia.org/T422179) (owner: 10Btullis) [10:02:41] (03PS1) 10Dreamy Jazz: hCaptcha: Recompute blocked-edit risk score block IDs server-side [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303986 (https://phabricator.wikimedia.org/T428394) [10:02:46] jouncebot: nowandnext [10:02:46] For the next 0 hour(s) and 57 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1000) [10:02:47] In 1 hour(s) and 57 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1200) [10:02:48] (03PS3) 10Marostegui: major-upgrade.py: Add !log dbmaint on the start [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 [10:03:01] (03CR) 10Marostegui: major-upgrade.py: Add !log dbmaint on the start (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [10:03:25] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303986 (https://phabricator.wikimedia.org/T428394) (owner: 10Dreamy Jazz) [10:03:40] 06SRE, 06Data-Engineering, 10observability, 06serviceops-radar, and 3 others: Upgrade Kafka to from 1.x to later version - https://phabricator.wikimedia.org/T300102#12032970 (10BTullis) 05Open→03Resolved [10:05:00] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply [10:05:06] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply [10:10:38] (03PS1) 10Fabfur: cache::haproxy: remove temporary workaround for req.provenance [puppet] - 10https://gerrit.wikimedia.org/r/1303988 (https://phabricator.wikimedia.org/T427068) [10:11:29] (03Merged) 10jenkins-bot: hCaptcha: Recompute blocked-edit risk score block IDs server-side [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303986 (https://phabricator.wikimedia.org/T428394) (owner: 10Dreamy Jazz) [10:11:59] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1303986|hCaptcha: Recompute blocked-edit risk score block IDs server-side (T428394)]] [10:13:34] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1303988 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [10:14:02] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1303986|hCaptcha: Recompute blocked-edit risk score block IDs server-side (T428394)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [10:18:18] Still testing... [10:19:56] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [10:24:13] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303986|hCaptcha: Recompute blocked-edit risk score block IDs server-side (T428394)]] (duration: 12m 13s) [10:25:23] (03CR) 10Clément Goubert: "I can leave in the old labels so they're backwards compatible, wdyt?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303389 (owner: 10Clément Goubert) [10:31:27] (03CR) 10Slyngshede: [C:03+1] cache::haproxy: remove temporary workaround for req.provenance [puppet] - 10https://gerrit.wikimedia.org/r/1303988 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [10:33:25] FIRING: [2x] BFDdown: BFD session down between asw1-b4-magru and 195.200.68.37 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=asw1-b4-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [10:33:57] (03CR) 10Fabfur: [C:03+2] cache::haproxy: remove temporary workaround for req.provenance [puppet] - 10https://gerrit.wikimedia.org/r/1303988 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [10:35:10] (03PS3) 10Slyngshede: C:apereo_cas: Script for resetting webauthn device registration [puppet] - 10https://gerrit.wikimedia.org/r/1303304 [10:37:04] (03CR) 10Slyngshede: C:apereo_cas: Script for resetting webauthn device registration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1303304 (owner: 10Slyngshede) [10:37:16] !log jmm@cumin2002 START - Cookbook sre.puppet.disable-merges [10:37:29] !log jmm@cumin2002 END (FAIL) - Cookbook sre.puppet.disable-merges (exit_code=99) [10:38:10] (03CR) 10Muehlenhoff: [C:03+2] ganeti: Remove validate-ganeti-firewall [puppet] - 10https://gerrit.wikimedia.org/r/1289936 (owner: 10Muehlenhoff) [10:39:39] FIRING: [6x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-wdqs2001:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [10:42:39] (03CR) 10Btullis: [V:03+1 C:03+2] Presto memory tuning, resource groups [puppet] - 10https://gerrit.wikimedia.org/r/1285926 (https://phabricator.wikimedia.org/T424112) (owner: 10Aleksandar Mastilovic) [10:46:36] (03PS1) 10Muehlenhoff: sre.puppet.disable-merges: Avoid using puppet-merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) [10:46:40] (03PS1) 10Fabfur: Revert "cache::haproxy: remove temporary workaround for req.provenance" [puppet] - 10https://gerrit.wikimedia.org/r/1303998 [10:46:44] (03CR) 10CI reject: [V:04-1] sre.puppet.disable-merges: Avoid using puppet-merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) (owner: 10Muehlenhoff) [10:47:25] (03PS1) 10Btullis: Revert "Presto memory tuning, resource groups" [puppet] - 10https://gerrit.wikimedia.org/r/1304000 [10:49:01] (03PS2) 10Clément Goubert: ratelimit: Unify statsd-exporter labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303389 [10:49:07] (03CR) 10Fabfur: [C:03+2] Revert "cache::haproxy: remove temporary workaround for req.provenance" [puppet] - 10https://gerrit.wikimedia.org/r/1303998 (owner: 10Fabfur) [10:49:34] (03PS1) 10Valn_ilyo: Fix autonym for Khasi (kha) in wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304001 (https://phabricator.wikimedia.org/T427917) [10:50:27] (03CR) 10Btullis: [C:03+2] Revert "Presto memory tuning, resource groups" [puppet] - 10https://gerrit.wikimedia.org/r/1304000 (owner: 10Btullis) [10:50:55] (03CR) 10Aleksandar Mastilovic: [V:03+1 C:03+1] "Copied votes on follow-up patch sets have been updated:" [puppet] - 10https://gerrit.wikimedia.org/r/1304000 (owner: 10Btullis) [10:54:51] (03PS2) 10Muehlenhoff: sre.puppet.disable-merges: Avoid using puppet-merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) [10:55:53] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1303304 (owner: 10Slyngshede) [11:06:37] (03PS1) 10JavierMonton: stream: webrequest.page_trending.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) [11:09:22] (03PS1) 10Btullis: Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) [11:09:55] (03CR) 10CI reject: [V:04-1] Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) (owner: 10Btullis) [11:10:01] (03CR) 10Slyngshede: [C:03+2] C:apereo_cas: Script for resetting webauthn device registration [puppet] - 10https://gerrit.wikimedia.org/r/1303304 (owner: 10Slyngshede) [11:10:08] !log atsuko updated charlie to 0.0.19 https://w.wiki/RPKN [11:10:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:53] (03PS2) 10Btullis: Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) [11:18:07] (03PS3) 10Btullis: Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) [11:25:01] (03CR) 10Aqu: [C:03+1] "We are using canary events in refine to differentiate between an hour without data from a problem upstream. Without them the strategy is t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [11:36:29] (03PS1) 10Btullis: Add the four new dse-k8s-workers in eqiad to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304009 (https://phabricator.wikimedia.org/T421465) [11:39:09] (03CR) 10Atsuko: [C:03+1] Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) (owner: 10Btullis) [11:39:39] FIRING: [8x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:41:29] (03CR) 10Atsuko: [C:03+1] Add the four new dse-k8s-workers in eqiad to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304009 (https://phabricator.wikimedia.org/T421465) (owner: 10Btullis) [11:49:39] FIRING: [8x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1200) [12:00:12] (03PS1) 10Dreamy Jazz: TranslatePage: Cast to string before using htmlspecialchars [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304016 (https://phabricator.wikimedia.org/T429459) [12:00:22] (03PS1) 10Dreamy Jazz: TranslatePage: Cast to string before using htmlspecialchars [extensions/SecurePoll] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304017 (https://phabricator.wikimedia.org/T429459) [12:00:33] jouncebot: nowandnext [12:00:33] For the next 0 hour(s) and 59 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1200) [12:00:33] In 0 hour(s) and 59 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1300) [12:02:09] (03CR) 10Btullis: [C:03+2] Add the new dse-k8s-wdqs nodes to the dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304005 (https://phabricator.wikimedia.org/T423312) (owner: 10Btullis) [12:02:17] (03CR) 10Marostegui: "See https://phabricator.wikimedia.org/T429581" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [12:08:34] (03PS2) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) [12:11:26] jouncebot: nowandnext [12:11:26] For the next 0 hour(s) and 48 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1200) [12:11:26] In 0 hour(s) and 48 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1300) [12:11:36] (03PS3) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) [12:11:54] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/SecurePoll] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304017 (https://phabricator.wikimedia.org/T429459) (owner: 10Dreamy Jazz) [12:11:54] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304016 (https://phabricator.wikimedia.org/T429459) (owner: 10Dreamy Jazz) [12:13:35] (03Merged) 10jenkins-bot: TranslatePage: Cast to string before using htmlspecialchars [extensions/SecurePoll] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304017 (https://phabricator.wikimedia.org/T429459) (owner: 10Dreamy Jazz) [12:13:37] (03Merged) 10jenkins-bot: TranslatePage: Cast to string before using htmlspecialchars [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304016 (https://phabricator.wikimedia.org/T429459) (owner: 10Dreamy Jazz) [12:14:07] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304017|TranslatePage: Cast to string before using htmlspecialchars (T429459)]], [[gerrit:1304016|TranslatePage: Cast to string before using htmlspecialchars (T429459)]] [12:14:12] T429459: SecurePoll translation: TypeError: htmlspecialchars(): Argument #1 ($string) must be of type string, false given - https://phabricator.wikimedia.org/T429459 [12:14:39] RESOLVED: [6x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-wdqs2001:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [12:16:10] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304017|TranslatePage: Cast to string before using htmlspecialchars (T429459)]], [[gerrit:1304016|TranslatePage: Cast to string before using htmlspecialchars (T429459)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:16:11] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303004 (https://phabricator.wikimedia.org/T422771) (owner: 10BPirkle) [12:16:59] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1304025 (owner: 10L10n-bot) [12:19:23] (03PS4) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) [12:23:08] (03PS1) 10Trueg: wikibase: fixed bash syntax error [dumps] - 10https://gerrit.wikimedia.org/r/1304029 (https://phabricator.wikimedia.org/T425036) [12:27:23] (03PS5) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) [12:27:36] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [12:29:54] !log cwilliams@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis magwiki in section s5 [12:30:13] (03CR) 10Clare Ming: [C:03+2] Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) (owner: 10Santiago Faci) [12:30:33] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12033397 (10Jclark-ctr) [12:30:46] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission cloudcontrol1008-dev.eqiad.wmnet - https://phabricator.wikimedia.org/T429527#12033398 (10Jclark-ctr) 05Open→03Resolved [12:30:54] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1224.eqiad.wmnet - https://phabricator.wikimedia.org/T429561#12033400 (10Jclark-ctr) 05Open→03Resolved [12:31:56] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304017|TranslatePage: Cast to string before using htmlspecialchars (T429459)]], [[gerrit:1304016|TranslatePage: Cast to string before using htmlspecialchars (T429459)]] (duration: 17m 49s) [12:32:01] T429459: SecurePoll translation: TypeError: htmlspecialchars(): Argument #1 ($string) must be of type string, false given - https://phabricator.wikimedia.org/T429459 [12:32:18] (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v1.4.4 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303980 (https://phabricator.wikimedia.org/T428985) (owner: 10Santiago Faci) [12:32:22] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis magwiki in section s5 [12:34:46] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply [12:34:56] !log cwilliams@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis magwiki in section s5 [12:35:53] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply [12:36:46] !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus5003.eqsin.wmnet to drbd [12:38:31] (03PS1) 10Gergő Tisza: Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304038 (https://phabricator.wikimedia.org/T429495) [12:38:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:38:55] (03PS1) 10Gergő Tisza: Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304039 (https://phabricator.wikimedia.org/T429495) [12:39:10] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/CentralAuth] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304038 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [12:39:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304039 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [12:39:32] !log upgrade haproxykafka on cp1111 to test for new x-provenance field (T427068) [12:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:36] T427068: Add X-Provenance data to webrequest_sampled_live - https://phabricator.wikimedia.org/T427068 [12:39:46] jmm@cumin2002 changedisk (PID 4187642) is awaiting input [12:42:15] cwilliams@cumin1003 sanitize-wiki (PID 647071) is awaiting input [12:42:42] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for EChukwukere-WMF - https://phabricator.wikimedia.org/T428827#12033442 (10EChukwukere-WMF) My Staging is set.. thank you all [12:48:58] (03PS1) 10Fabfur: Revert^2 "cache::haproxy: remove temporary workaround for req.provenance" [puppet] - 10https://gerrit.wikimedia.org/r/1304041 [12:51:13] (03CR) 10Fabfur: [C:03+2] Revert^2 "cache::haproxy: remove temporary workaround for req.provenance" [puppet] - 10https://gerrit.wikimedia.org/r/1304041 (owner: 10Fabfur) [12:55:50] (03PS2) 10Federico Ceratto: files/updates: Add wmfdb packages [puppet] - 10https://gerrit.wikimedia.org/r/1304018 (https://phabricator.wikimedia.org/T427900) [12:56:23] (03PS1) 10Anzx: magwiki: add wordmark, metanamespace, sitename and timezone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303613 (https://phabricator.wikimedia.org/T428279) [12:56:23] (03PS1) 10Sbisson: [labs] Enable Article Guidance wikidata connect on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304042 (https://phabricator.wikimedia.org/T421250) [12:56:27] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of prometheus5003.eqsin.wmnet to drbd [12:57:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304042 (https://phabricator.wikimedia.org/T421250) (owner: 10Sbisson) [12:57:42] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303613 (https://phabricator.wikimedia.org/T428279) (owner: 10Anzx) [13:00:04] Lucas_WMDE, urbanecm, and TheresNoTime: Time to do the UTC afternoon backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1300). [13:00:04] lerickson, tchin, bpirkle, tgr, stephanebisson, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:09] (03PS1) 10Fabfur: cache::haproxy: fix provenance field name [puppet] - 10https://gerrit.wikimedia.org/r/1304046 (https://phabricator.wikimedia.org/T427068) [13:00:11] I'm here [13:00:15] o/ [13:00:21] o/ [13:00:22]  o/ [13:00:32] !log btullis@puppetserver1001 conftool action : set/weight=10; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2001.codfw.wmnet [13:00:38] !log btullis@puppetserver1001 conftool action : set/weight=10; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2002.codfw.wmnet [13:00:44] !log btullis@puppetserver1001 conftool action : set/weight=10; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2003.codfw.wmnet [13:00:52] !log btullis@puppetserver1001 conftool action : set/weight=10; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2004.codfw.wmnet [13:01:08] let me know if any of you can self serve [13:01:29] or you need me to push it. I can do it [13:01:53] I have no spiderpig yet. :( However, my change is very trivial and could ride with someone else's. [13:01:58] I can do mine and it's labs only so it could go out at the same time as anything else if we want to save a little time [13:01:59] lerickson: I'm around to shadow [13:02:05] I have no spiderpig yet either. [13:02:16] need someone to deploy mine [13:02:28] I have spiderpig but no idea how to use it [13:02:39] (03CR) 10Ladsgroup: [C:03+2] [labs] Enable Article Guidance wikidata connect on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304042 (https://phabricator.wikimedia.org/T421250) (owner: 10Sbisson) [13:03:14] (03CR) 10Elukey: [C:03+1] cache::haproxy: fix provenance field name [puppet] - 10https://gerrit.wikimedia.org/r/1304046 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [13:03:25] RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:03:31] (03Merged) 10jenkins-bot: [labs] Enable Article Guidance wikidata connect on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304042 (https://phabricator.wikimedia.org/T421250) (owner: 10Sbisson) [13:03:34] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis magwiki in section s5 [13:03:36] !log btullis@puppetserver1001 conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2004.codfw.wmnet [13:03:37] (03CR) 10Fabfur: [C:03+2] cache::haproxy: fix provenance field name [puppet] - 10https://gerrit.wikimedia.org/r/1304046 (https://phabricator.wikimedia.org/T427068) (owner: 10Fabfur) [13:03:41] !log btullis@puppetserver1001 conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2003.codfw.wmnet [13:03:45] !log btullis@puppetserver1001 conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2002.codfw.wmnet [13:03:49] !log btullis@puppetserver1001 conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=codfw,name=dse-k8s-wdqs2001.codfw.wmnet [13:04:21] stephanebisson: the beta patch is merged and rebased. it should be in beta cluster in ten minutes automatically (I have no control over it) [13:04:32] (03CR) 10Tiziano Fogli: [C:03+2] nrpewrapper: Improve team/severity override handling [puppet] - 10https://gerrit.wikimedia.org/r/1302785 (https://phabricator.wikimedia.org/T395446) (owner: 10Tiziano Fogli) [13:05:08] lerickson gmodena: yours is going first [13:05:14] !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2005.codfw.wmnet to plain [13:06:03] Amir1 TY [13:06:10] Amir1: thanks [13:06:32] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - k8s-ingress-dse_30443: Servers dse-k8s-wdqs2001.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [13:06:57] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302923 (https://phabricator.wikimedia.org/T429380) (owner: 10Lerickson) [13:06:58] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - k8s-ingress-dse_30443: Servers dse-k8s-wdqs2001.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [13:07:08] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2005.codfw.wmnet to plain [13:07:52] (03Merged) 10jenkins-bot: EventStreamConfig: add stream for WDQS V2 external/internal queries. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302923 (https://phabricator.wikimedia.org/T429380) (owner: 10Lerickson) [13:07:53] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1304018 (https://phabricator.wikimedia.org/T427900) (owner: 10Federico Ceratto) [13:08:19] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1302923|EventStreamConfig: add stream for WDQS V2 external/internal queries. (T429380)]] [13:08:23] T429380: wdqs: integrate with eventgate - https://phabricator.wikimedia.org/T429380 [13:08:30] !log deploying new haproxykafka on A:cp to parse for x_provenance (T427068) [13:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:34] T427068: Add X-Provenance data to webrequest_sampled_live - https://phabricator.wikimedia.org/T427068 [13:08:46] !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2005.codfw.wmnet to drbd [13:10:18] !log ladsgroup@deploy1003 ladsgroup, lerickson: Backport for [[gerrit:1302923|EventStreamConfig: add stream for WDQS V2 external/internal queries. (T429380)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:10:31] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of testvm2005.codfw.wmnet to drbd [13:14:17] lerickson gmodena: it's on testservers [13:14:29] can you test it or we should push it forward? [13:14:51] Yes, we just tested it. It works! Thanks! [13:14:59] !log ladsgroup@deploy1003 ladsgroup, lerickson: Continuing with deployment [13:15:06] okie dokie. Moving forward [13:17:11] o/ sorry, was in meetings [13:17:18] shout if you still need another deployer :) [13:17:26] tchin: the patch your patch depends on still is not deployed to all wikis (https://en.wikipedia.org/wiki/Special:Version says wmf.6, your patch in core is in wmf.7) [13:17:46] Lucas_WMDE: actually, I'm about to go to another meeting at half the mark, can you take it from there? [13:17:55] siro [13:17:56] *sure [13:18:12] ha! psych! I forgot my yubikey at home 😔 [13:18:13] sorry [13:18:23] tchin: so the question is, do you still want it deployed? [13:18:27] oop [13:18:42] (and my spiderpig session is expired, so I couldn’t deploy even if I wanted to risk spiderpig’ing without shell access, which, no thank you) [13:18:44] Technically it should be a no-op so we could still deploy it [13:19:07] Unless there's some linter in MW for unused config or something [13:19:14] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1302923|EventStreamConfig: add stream for WDQS V2 external/internal queries. (T429380)]] (duration: 10m 55s) [13:19:19] T429380: wdqs: integrate with eventgate - https://phabricator.wikimedia.org/T429380 [13:19:40] lol no. We forget to remove things years after they are undeployed :P [13:19:48] :D [13:20:02] (03PS1) 10Blake: deployment_server: Add a new mw-pretrain service. [puppet] - 10https://gerrit.wikimedia.org/r/1304051 (https://phabricator.wikimedia.org/T427668) [13:20:52] lol then I think it should be fine to deploy, I can test it on a group 1 wiki [13:21:15] tchin: there’s a linter in the sense of, every few years someone will do a big grep + codesearch manually and post the results to slack [13:21:25] actually, apparently there’s an automated tool? according to https://wikitech.wikimedia.org/wiki/Technical_debt/Unused_config anyway [13:21:30] but yeah :) [13:21:45] !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2005.codfw.wmnet to drbd [13:21:49] ugh, spiderpig doesn't let me push your patch tchin :( [13:22:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303004 (https://phabricator.wikimedia.org/T422771) (owner: 10BPirkle) [13:22:35] * tchin oof [13:22:40] Oh well I guess [13:23:07] (03Merged) 10jenkins-bot: REST: Adjust key of Reading Lists OpenAPI spec in RestSandboxSpecs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303004 (https://phabricator.wikimedia.org/T422771) (owner: 10BPirkle) [13:23:29] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2005.codfw.wmnet to drbd [13:23:33] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1303004|REST: Adjust key of Reading Lists OpenAPI spec in RestSandboxSpecs (T422771)]] [13:23:37] T422771: REST: Audience Designations - publish modules to REST Sandbox by default - https://phabricator.wikimedia.org/T422771 [13:24:58] (03PS1) 10Muehlenhoff: Record LDAP access for laurabarluzzi [puppet] - 10https://gerrit.wikimedia.org/r/1304054 [13:25:15] (03CR) 10Slyngshede: [C:03+1] Record LDAP access for laurabarluzzi [puppet] - 10https://gerrit.wikimedia.org/r/1304054 (owner: 10Muehlenhoff) [13:25:35] !log ladsgroup@deploy1003 ladsgroup, bpirkle: Backport for [[gerrit:1303004|REST: Adjust key of Reading Lists OpenAPI spec in RestSandboxSpecs (T422771)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:25:55] Confirmed good to go [13:26:10] !log ladsgroup@deploy1003 ladsgroup, bpirkle: Continuing with deployment [13:26:18] It's blocked because the dependency isn't on all wikis? I'll just move it to Monday then [13:26:43] (03CR) 10Muehlenhoff: [C:03+2] Record LDAP access for laurabarluzzi [puppet] - 10https://gerrit.wikimedia.org/r/1304054 (owner: 10Muehlenhoff) [13:28:43] (03PS2) 10JavierMonton: stream: webrequest.page_trending.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) [13:28:58] So I need to go to meetings for a while. If anyone else can pick it up from here. I'd be grateful. If not, I can pick it up in an hour and deploy the rest (as long as it's not stepping on others' toes) [13:30:12] (03CR) 10Ottomata: [C:03+1] stream: webrequest.page_trending.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [13:30:29] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303004|REST: Adjust key of Reading Lists OpenAPI spec in RestSandboxSpecs (T422771)]] (duration: 06m 56s) [13:30:33] T422771: REST: Audience Designations - publish modules to REST Sandbox by default - https://phabricator.wikimedia.org/T422771 [13:31:08] Amir1: thanks for deploying [13:32:07] (03CR) 10Gmodena: [C:03+1] wikibase: fixed bash syntax error [dumps] - 10https://gerrit.wikimedia.org/r/1304029 (https://phabricator.wikimedia.org/T425036) (owner: 10Trueg) [13:32:21] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [13:33:03] I can continue if someone tells me where we are [13:33:28] !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus5003.eqsin.wmnet to drbd [13:33:37] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2005.codfw.wmnet with reason: Reboots T426633 [13:34:07] looks like the config changes from anzx and JavierMonton are left? [13:34:52] sorry, I just added my change a couple of minutes ago. I can deploy it from spiderpig, I can help deploying others if needed too [13:35:02] tgr_: I think so, yeah [13:35:05] (03CR) 10Tiziano Fogli: [C:03+1] hieradata: tlsproxy::envoy: Default to listening on IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/1237215 (https://phabricator.wikimedia.org/T255568) (owner: 10Majavah) [13:35:13] and your backports [13:35:14] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1271028 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [13:35:14] JavierMonton: can it be deployed together with other things? [13:35:39] sure, it's just a new stream config, it shouldn't affect anything [13:36:34] (03CR) 10Gergő Tisza: [C:03+2] Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304038 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [13:36:35] (03CR) 10Gergő Tisza: [C:03+2] Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304039 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [13:36:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303613 (https://phabricator.wikimedia.org/T428279) (owner: 10Anzx) [13:36:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [13:37:55] (03Merged) 10jenkins-bot: magwiki: add wordmark, metanamespace, sitename and timezone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303613 (https://phabricator.wikimedia.org/T428279) (owner: 10Anzx) [13:37:59] (03Merged) 10jenkins-bot: stream: webrequest.page_trending.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [13:38:00] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2232.codfw.wmnet with reason: Reboots T426633 [13:38:12] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#12033707 (10karapayneWMDE) Yes, approved on my side! [13:38:22] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2160.codfw.wmnet with reason: Reboots T426633 [13:38:24] (03Merged) 10jenkins-bot: Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1304038 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [13:38:25] !log tgr@deploy1003 Started scap sync-world: Backport for [[gerrit:1303613|magwiki: add wordmark, metanamespace, sitename and timezone (T428279)]], [[gerrit:1304004|stream: webrequest.page_trending.dev0 (T429588)]] [13:38:32] T428279: Post-creation work for magwiki - https://phabricator.wikimedia.org/T428279 [13:38:33] T429588: Relative Trending - Milestone 3 - Stream & Schema - https://phabricator.wikimedia.org/T429588 [13:38:37] (03Merged) 10jenkins-bot: Fix CentralAuthPostLoginRedirect type parameter on token loss [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304039 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [13:40:31] (03CR) 10Federico Ceratto: [C:03+2] files/updates: Add wmfdb packages [puppet] - 10https://gerrit.wikimedia.org/r/1304018 (https://phabricator.wikimedia.org/T427900) (owner: 10Federico Ceratto) [13:40:33] !log tgr@deploy1003 javiermonton, tgr, anzx: Backport for [[gerrit:1303613|magwiki: add wordmark, metanamespace, sitename and timezone (T428279)]], [[gerrit:1304004|stream: webrequest.page_trending.dev0 (T429588)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:40:48] looking [13:40:49] anzx: want to test it? [13:41:38] there's a pending change in puppet for modules/admin/data/data.yaml [13:41:46] can I merge it? [13:41:50] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus5003.eqsin.wmnet to drbd [13:42:13] tgr_: tested looks good, ok to sync [13:42:24] !log tgr@deploy1003 javiermonton, tgr, anzx: Continuing with deployment [13:42:28] tgr_ my config change is ok too, ok to sync [13:43:50] (03CR) 10Andrew Bogott: [C:03+2] "I'm happy to tidy up but interested in your thoughts about pulling hiera settings from the path that's meant for VMs." [puppet] - 10https://gerrit.wikimedia.org/r/1302236 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [13:44:29] anzx: needs some purges and running namespaceDupes, right? [13:44:52] (03PS5) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 [13:45:20] tgr_: no need, as it is new wiki no pages have imported yet [13:45:26] (03PS6) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 (https://phabricator.wikimedia.org/T429599) [13:46:40] !log tgr@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303613|magwiki: add wordmark, metanamespace, sitename and timezone (T428279)]], [[gerrit:1304004|stream: webrequest.page_trending.dev0 (T429588)]] (duration: 08m 15s) [13:46:46] T428279: Post-creation work for magwiki - https://phabricator.wikimedia.org/T428279 [13:46:46] T429588: Relative Trending - Milestone 3 - Stream & Schema - https://phabricator.wikimedia.org/T429588 [13:46:50] 10ops-codfw, 06cloud-services-team, 10Cloud-VPS, 06DC-Ops: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608 (10Andrew) 03NEW [13:46:58] 10ops-codfw, 06cloud-services-team, 10Cloud-VPS, 06DC-Ops: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12033802 (10Andrew) p:05Triage→03High [13:47:08] (03CR) 10Anzx: [C:03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304001 (https://phabricator.wikimedia.org/T427917) (owner: 10Valn_ilyo) [13:48:36] !log tgr@deploy1003 Started scap sync-world: Backport for [[gerrit:1304038|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]], [[gerrit:1304039|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]] [13:48:41] T429495: The type parameter of CentralAuthPostLoginRedirect is incorrect when recovering from lost tokenstore data - https://phabricator.wikimedia.org/T429495 [13:48:56] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Raise DRBD replication speed for Ganeti clusters - https://phabricator.wikimedia.org/T428878#12033804 (10MoritzMuehlenhoff) [13:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:50:32] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Raise DRBD replication speed for Ganeti clusters - https://phabricator.wikimedia.org/T428878#12033813 (10MoritzMuehlenhoff) Turns out //metavg// needs to match the configured volume group,which caused an allocation error in eqsin when moving from "plain" to DRBD... [13:50:37] !log tgr@deploy1003 tgr: Backport for [[gerrit:1304038|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]], [[gerrit:1304039|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:51:37] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2160.codfw.wmnet [13:51:37] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2160.codfw.wmnet [13:51:38] (03PS1) 10Elukey: Revert "role::docker_registry_had::registry: disable nginx cache" [puppet] - 10https://gerrit.wikimedia.org/r/1304057 [13:51:50] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2232.codfw.wmnet [13:51:50] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2232.codfw.wmnet [13:51:59] (03CR) 10CI reject: [V:04-1] Revert "role::docker_registry_had::registry: disable nginx cache" [puppet] - 10https://gerrit.wikimedia.org/r/1304057 (owner: 10Elukey) [13:52:04] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for dbproxy2005.codfw.wmnet [13:52:05] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbproxy2005.codfw.wmnet [13:54:06] 10SRE-Access-Requests: Requesting access for lerickson to deploy the RDF streaming updater on wikikube - https://phabricator.wikimedia.org/T429610 (10lerickson) 03NEW [13:54:33] (03Abandoned) 10Elukey: Revert "role::docker_registry_had::registry: disable nginx cache" [puppet] - 10https://gerrit.wikimedia.org/r/1304057 (owner: 10Elukey) [13:54:38] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2007.codfw.wmnet with reason: Reboots T426633 [13:54:51] (03CR) 10CDanis: [V:03+1 C:03+2] "PCC LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1271028 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [13:55:00] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2160.codfw.wmnet with reason: Reboots T426633 [13:55:09] (03PS1) 10Elukey: role::docker_registry: re-enable the blob cache [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) [13:55:23] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2234.codfw.wmnet with reason: Reboots T426633 [13:56:10] !log tgr@deploy1003 tgr: Continuing with deployment [13:57:23] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [13:59:44] (03CR) 10Muehlenhoff: [C:03+1] "idp1005 is working fine for me." [dns] - 10https://gerrit.wikimedia.org/r/1303380 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede) [14:00:11] (03PS2) 10Atsuko: deployment_server: adding dse monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) [14:00:27] (03PS8) 10Trueg: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) [14:00:27] !log tgr@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304038|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]], [[gerrit:1304039|Fix CentralAuthPostLoginRedirect type parameter on token loss (T429495)]] (duration: 11m 51s) [14:00:31] T429495: The type parameter of CentralAuthPostLoginRedirect is incorrect when recovering from lost tokenstore data - https://phabricator.wikimedia.org/T429495 [14:00:48] !log UTC afternoon deploys done [14:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:40] (03CR) 10Elukey: "Created the patch to discuss if we want to re-enable this cache or not :) Thoughts/Opinions?" [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [14:02:47] 06SRE, 10SRE-tools, 06Infrastructure-Foundations: Terminal configuration for cookbooks - https://phabricator.wikimedia.org/T429129#12033892 (10MoritzMuehlenhoff) >>! In T429129#12030649, @MoritzMuehlenhoff wrote: > Thanks! I'll retest the cookbook tomorrow. That didn't work, puppet-merge leaves some "Intern... [14:04:21] 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12033926 (10ssingh) >>! In T429175#12032690, @MLechvien-WMF wrote: > That's correct, we could not prioritize Sophroid remaining work s... [14:04:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:04:58] I'm going to do a deployment of private code in a moment [14:06:03] (03PS1) 10CWilliams: mariadb: Support argument for mysql-section.sh [puppet] - 10https://gerrit.wikimedia.org/r/1304065 (https://phabricator.wikimedia.org/T429613) [14:07:36] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2234.codfw.wmnet [14:07:36] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2234.codfw.wmnet [14:08:17] !log installing unbound security updates [14:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:32] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2008.codfw.wmnet with reason: Reboots T426633 [14:08:51] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2235.codfw.wmnet with reason: Reboots T426633 [14:09:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:10:41] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#12033967 (10RobH) I neglected to update this task (but updated in dc/netops meeting), but we have an ongoing email thread with Myriad and Nokia. They confirmed this (that we can... [14:10:41] (03PS1) 10Jforrester: SpecialSpecialPages: Guard against special pages with no content-language alias [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304067 (https://phabricator.wikimedia.org/T429584) [14:13:12] train: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/1304061 fixes one of the two open train blockers - what's the procedure there, backport to wmf.7 and proceed, or something else? [14:14:13] !log Finished deploying private code change [14:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:42] (03CR) 10Lerickson: [C:03+1] "Thank you for fixing this!!" [dumps] - 10https://gerrit.wikimedia.org/r/1304029 (https://phabricator.wikimedia.org/T425036) (owner: 10Trueg) [14:14:51] ihurbain: probably next train window (now+4h) or next backport window (now+6h) [14:15:01] although if a train deployer is around they sometimes do it earlier if it unblcoks the train [14:16:05] (03CR) 10Muehlenhoff: Makefike: don't try to install wheel*.whl (031 comment) [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 (owner: 10Ayounsi) [14:16:18] (03CR) 10Scott French: [C:03+1] "Thanks, Blake!" [puppet] - 10https://gerrit.wikimedia.org/r/1304051 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake) [14:17:03] 'k. i'll add a comment on the train and let whoever it may concern do their thing, i guess :D [14:18:02] (03CR) 10Blake: [C:03+2] deployment_server: Add a new mw-pretrain service. [puppet] - 10https://gerrit.wikimedia.org/r/1304051 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake) [14:19:25] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2235.codfw.wmnet [14:19:26] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2235.codfw.wmnet [14:20:44] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2160.codfw.wmnet [14:20:44] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2160.codfw.wmnet [14:20:53] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12034056 (10SuzanneWood-WMDE) Approved! [14:20:58] jouncebot: nowandnext [14:20:58] No deployments scheduled for the next 0 hour(s) and 9 minute(s) [14:20:58] In 0 hour(s) and 9 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1430) [14:21:04] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for dbproxy2008.codfw.wmnet [14:21:05] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbproxy2008.codfw.wmnet [14:21:09] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for dbproxy2007.codfw.wmnet [14:21:09] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbproxy2007.codfw.wmnet [14:21:19] ihurbain: IMHO you could also deploy it now if nobody objects [14:21:47] (03CR) 10Brouberol: [C:03+1] Added DNS entries for the new WDQS 2 deployments in DSE K8s. [dns] - 10https://gerrit.wikimedia.org/r/1301301 (https://phabricator.wikimedia.org/T428925) (owner: 10Trueg) [14:22:12] i'm in a meeting, so i'm going to do a stupid if i try to do that now [14:22:46] (03PS1) 10Jgreen: Remove frmx1001.wikimedia.org and add SPF for frmx1002.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1304074 (https://phabricator.wikimedia.org/T429529) [14:24:04] (03CR) 10Federico Ceratto: major-upgrade.py: Add !log dbmaint on the start (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [14:25:18] (03CR) 10Jgreen: [C:03+2] Remove frmx1001.wikimedia.org and add SPF for frmx1002.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1304074 (https://phabricator.wikimedia.org/T429529) (owner: 10Jgreen) [14:25:41] !log jgreen@dns1004 START - running authdns-update [14:27:19] (03CR) 10JHathaway: [C:03+1] sre.puppet.disable-merges: Avoid using puppet-merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) (owner: 10Muehlenhoff) [14:27:33] !log jgreen@dns1004 END - running authdns-update [14:29:48] (03PS3) 10Ayounsi: Makefike: don't try to install wheel*.whl [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 [14:30:04] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1430) [14:30:05] 06SRE, 10SRE-tools, 06Infrastructure-Foundations: Terminal configuration for cookbooks - https://phabricator.wikimedia.org/T429129#12034139 (10jhathaway) I think that makes sense, "Internal error!" is the helpful message that the args to the script were wrong. However, the whole dual python and shell script... [14:31:16] (03CR) 10Ayounsi: Makefike: don't try to install wheel*.whl (032 comments) [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 (owner: 10Ayounsi) [14:31:27] (03CR) 10Brouberol: dse-k8s-services: Enable ingress on WDQS namespaces (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg) [14:33:25] FIRING: [2x] BFDdown: BFD session down between asw1-b4-magru and 195.200.68.37 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=asw1-b4-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [14:33:35] (03CR) 10Trueg: dse-k8s-services: Enable ingress on WDQS namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg) [14:34:24] (03PS4) 10Ayounsi: Makefile: don't try to install wheel*.whl [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 [14:35:09] (03CR) 10Brouberol: [C:03+2] Added DNS entries for the new WDQS 2 deployments in DSE K8s. [dns] - 10https://gerrit.wikimedia.org/r/1301301 (https://phabricator.wikimedia.org/T428925) (owner: 10Trueg) [14:35:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.31% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [14:35:28] !log brouberol@dns1004 START - running authdns-update [14:35:29] (03PS5) 10Ayounsi: Makefile: don't try to install wheel*.whl [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 [14:36:59] (03CR) 10Ayounsi: Makefile: don't try to install wheel*.whl (031 comment) [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 (owner: 10Ayounsi) [14:37:09] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission frdb1003.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T401611#12034184 (10Jgreen) [14:37:17] !log brouberol@dns1004 END - running authdns-update [14:38:04] (03CR) 10Elukey: "My current worry is that the /var/cache/nginx dir could take max 10G and we have 12G avail for the root partition on registry2005." [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [14:38:36] (03CR) 10Elukey: [C:03+1] Makefile: don't try to install wheel*.whl [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 (owner: 10Ayounsi) [14:40:09] (03CR) 10Ottomata: [C:03+1] "Hm, you know, this will break things if the schema is not merged!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304004 (https://phabricator.wikimedia.org/T429588) (owner: 10JavierMonton) [14:40:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [14:40:24] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#12034223 (10MoritzMuehlenhoff) [14:42:06] (03CR) 10Ayounsi: [C:03+2] Makefile: don't try to install wheel*.whl [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1302127 (owner: 10Ayounsi) [14:42:12] !log installing zsh updates from Bookworm point release [14:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:17] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#12034279 (10MoritzMuehlenhoff) [14:45:57] (03PS1) 10Isabelle Hurbain-Palatin: Check that data-parsoid is an array before accessing it as such [extensions/DiscussionTools] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304082 (https://phabricator.wikimedia.org/T429582) [14:46:21] !log ayounsi@cumin1003 START - Cookbook sre.deploy.python-code homer to cumin2003.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:46:22] 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.5 point update - https://phabricator.wikimedia.org/T427072#12034286 (10MoritzMuehlenhoff) [14:47:43] (03PS1) 10Blake: main: Add a namespace for the mw-pretrain service. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) [14:47:46] (03PS1) 10Sbisson: [labs] Article Guidance WD connect use proper beta WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304086 (https://phabricator.wikimedia.org/T421250) [14:48:04] 10ops-codfw, 06SRE, 06cloud-services-team, 10Cloud-VPS, 06DC-Ops: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12034300 (10Jhancock.wm) pulled the cable and reseated psu1 to get the alert to clear. should be okay now. [14:48:51] (03PS1) 10Ayounsi: Makefile: fix ifeq indent [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1304087 [14:48:55] 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2247 - https://phabricator.wikimedia.org/T429348#12034303 (10Jhancock.wm) they shipped the drive late yesterday. hopefully it comes in today and I can get this swapped out before the long weekend. [14:49:10] jouncebot now [14:49:10] For the next 0 hour(s) and 10 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1430) [14:49:20] jouncebot next [14:49:20] In 0 hour(s) and 10 minute(s): Train log triage (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1500) [14:49:25] ayounsi@cumin1003 python-code (PID 704957) is awaiting input [14:49:43] claime, arnoldokoth: if no one is deploying in the test kitchen window, I have a patch I'd like to backport to fix a train blocker [14:49:56] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/1304082 [14:50:19] stephanebisson: ^ [14:50:23] cscott: sgtm [14:50:51] ok, shouldn't take long, i'm going to spiderpig it [14:51:03] thanks scott :) [14:51:20] (03CR) 10Ayounsi: [C:03+2] Makefile: fix ifeq indent [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1304087 (owner: 10Ayounsi) [14:51:29] !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) homer to cumin2003.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:51:32] 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2247 - https://phabricator.wikimedia.org/T429348#12034313 (10Marostegui) Thank you - feel free to replace it anytime [14:51:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [extensions/DiscussionTools] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304082 (https://phabricator.wikimedia.org/T429582) (owner: 10Isabelle Hurbain-Palatin) [14:51:40] !log ayounsi@cumin1003 START - Cookbook sre.deploy.python-code homer to cumin2003.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:51:51] ihurbain can you help test (aka check the logs when the time comes) [14:52:17] i can try yeah [14:52:28] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2003.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:53:02] (03Merged) 10jenkins-bot: Check that data-parsoid is an array before accessing it as such [extensions/DiscussionTools] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304082 (https://phabricator.wikimedia.org/T429582) (owner: 10Isabelle Hurbain-Palatin) [14:53:32] !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1304082|Check that data-parsoid is an array before accessing it as such (T429582)]] [14:53:36] T429582: DiscussionTools: Error: Cannot use object of type stdClass as array - https://phabricator.wikimedia.org/T429582 [14:53:44] (03PS3) 10CWilliams: Cookbook sre.mysql.upgrade should not accept multiple hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1302745 (https://phabricator.wikimedia.org/T429230) [14:54:08] (03CR) 10CWilliams: Cookbook sre.mysql.upgrade should not accept multiple hosts (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1302745 (https://phabricator.wikimedia.org/T429230) (owner: 10CWilliams) [14:55:02] (03CR) 10Marostegui: [C:03+1] mariadb: Support argument for mysql-section.sh [puppet] - 10https://gerrit.wikimedia.org/r/1304065 (https://phabricator.wikimedia.org/T429613) (owner: 10CWilliams) [14:55:32] !log cscott@deploy1003 ihurbain, cscott: Backport for [[gerrit:1304082|Check that data-parsoid is an array before accessing it as such (T429582)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:56:13] ihurbain: that was quick, looks like it is testable [14:56:51] (03CR) 10CI reject: [V:04-1] Cookbook sre.mysql.upgrade should not accept multiple hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1302745 (https://phabricator.wikimedia.org/T429230) (owner: 10CWilliams) [14:57:10] i haven't found yet how to check whether i was seeing "the thing i am looking for" vs "nothing" :| [14:57:28] !log ayounsi@cumin1003 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:58:16] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: trixie homer deploy - ayounsi@cumin1003 [14:58:32] (03CR) 10Scott French: "Thanks, Blake!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake) [14:59:58] well, i've confirmed it doesn't crash w/ the new patch at least :) [15:00:05] jeena and dduvall: Train log triage (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1500). Please do the needful. [15:00:05] i think we're good [15:00:15] !log cscott@deploy1003 ihurbain, cscott: Continuing with deployment [15:01:53] (i went to a page that had issues, got "verbose logs", didn't find anything looking like the issue. then it doesn't necessarily say much, because i didn't confirm that i COULD see the issue before that, so... we'll see.) [15:02:15] (03CR) 10CWilliams: major-upgrade.py: Add !log dbmaint on the start (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [15:02:26] (03PS1) 10Santiago Faci: test-kitchen-next: Set `ui_url` explicitly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304092 [15:02:51] (03PS2) 10DDesouza: Deploy English Wikipedia Mobile App Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303895 (https://phabricator.wikimedia.org/T428876) [15:03:42] I can see it generate logs for https://it.wikipedia.org/wiki/Wikipedia:Bar/2008_01_1 when I have x-wikimedia-debug off, and not generate logs when i turn it on [15:03:50] (03CR) 10CWilliams: major-upgrade.py: Add !log dbmaint on the start (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [15:04:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:04:50] !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304082|Check that data-parsoid is an array before accessing it as such (T429582)]] (duration: 11m 17s) [15:04:54] T429582: DiscussionTools: Error: Cannot use object of type stdClass as array - https://phabricator.wikimedia.org/T429582 [15:06:00] ihurbain: all done, i think that did it, i'm not seeing any more logs in logstash [15:06:12] claime: all done, thanks! [15:06:25] (03PS1) 10Elukey: accounting.py: fix spurious data lines [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1304094 (https://phabricator.wikimedia.org/T428936) [15:06:29] agreed [15:06:38] cscott: thanks! [15:06:54] (03CR) 10Elukey: [V:03+2 C:03+2] accounting.py: fix spurious data lines [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1304094 (https://phabricator.wikimedia.org/T428936) (owner: 10Elukey) [15:07:37] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10netbox, 13Patch-For-Review: netbox accounting report error - - https://phabricator.wikimedia.org/T428936#12034372 (10elukey) @RobH it should be fixed now, see https://netbox.wikimedia.org/extras/scripts/results/356792/ [15:08:29] !log elukey@cumin1003 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox [15:08:59] !log elukey@cumin1003 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox [15:09:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:51] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10netbox, 13Patch-For-Review: netbox accounting report error - - https://phabricator.wikimedia.org/T428936#12034379 (10elukey) 05Open→03Resolved a:03elukey [15:11:53] (03CR) 10Btullis: [C:03+2] Add the four new dse-k8s-workers in eqiad to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/1304009 (https://phabricator.wikimedia.org/T421465) (owner: 10Btullis) [15:12:42] Requesting permission to deploy a simple labs-only config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1304086) [15:13:26] claime, arnoldokoth ^ [15:13:44] jouncebot: nowandnext [15:13:44] For the next 0 hour(s) and 46 minute(s): Train log triage (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1500) [15:13:44] In 0 hour(s) and 46 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1600) [15:14:06] stephanebisson: sgtm [15:14:19] claime thank you [15:14:40] (03PS1) 10Elukey: role::kafka::main: add missing ACL for statsv [puppet] - 10https://gerrit.wikimedia.org/r/1304096 (https://phabricator.wikimedia.org/T425528) [15:14:47] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304086 (https://phabricator.wikimedia.org/T421250) (owner: 10Sbisson) [15:15:46] (03Merged) 10jenkins-bot: [labs] Article Guidance WD connect use proper beta WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304086 (https://phabricator.wikimedia.org/T421250) (owner: 10Sbisson) [15:17:06] claime: Too quick with it. :D [15:17:35] (03CR) 10Marostegui: major-upgrade.py: Add !log dbmaint on the start (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 (owner: 10Marostegui) [15:17:47] (03PS4) 10Marostegui: major-upgrade.py: Add !log dbmaint on the start [cookbooks] - 10https://gerrit.wikimedia.org/r/1303438 [15:19:15] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12034443 (10BCornwall) [15:20:35] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12034448 (10BCornwall) Thanks! For the rest of this, Amir will be on clinic duty this week and will finish up once the remaining requirements are satisfied. [15:20:52] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12034450 (10BCornwall) Amir will be on clinic duty this week and will finish up once the remaining requirements are satisfied. [15:20:58] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provissioning - https://phabricator.wikimedia.org/T429429#12034452 (10BCornwall) Amir will be on clinic duty this week and will finish up once the remaining requirements are satisfied. [15:21:11] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provissioning - https://phabricator.wikimedia.org/T429429#12034453 (10BCornwall) [15:21:15] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply [15:21:18] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users level 1 for chudson - https://phabricator.wikimedia.org/T429353#12034454 (10BCornwall) Amir will be on clinic duty this week and will finish up once the remaining requirements are satisfied. [15:21:24] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply [15:21:34] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply [15:21:42] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#12034479 (10BCornwall) [15:21:43] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply [15:24:04] (03PS1) 10BCornwall: admin: Add mahmoud-abdelsattar to a-p-d [puppet] - 10https://gerrit.wikimedia.org/r/1304099 (https://phabricator.wikimedia.org/T428416) [15:25:28] (03CR) 10Scott French: "Interesting find!" [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [15:30:05] (03PS1) 10CDanis: fundraising_data_import: hostname != FQDN [puppet] - 10https://gerrit.wikimedia.org/r/1304104 (https://phabricator.wikimedia.org/T416948) [15:30:21] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304104 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [15:33:44] (03CR) 10Brouberol: [C:03+2] data-platform: add alert on kafka-jumbo partition sizes [alerts] - 10https://gerrit.wikimedia.org/r/1302737 (https://phabricator.wikimedia.org/T429127) (owner: 10Brouberol) [15:34:03] (03CR) 10CDanis: [V:03+1] "https://puppet-compiler.wmflabs.org/output/1304104/7044/deploy1003.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1304104 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [15:34:08] 06SRE, 10SRE-swift-storage, 07Essential-Work: Migrate production swift clusters to trixie - https://phabricator.wikimedia.org/T429630 (10MatthewVernon) 03NEW [15:34:17] (03PS4) 10Brouberol: data-platform: add alert on kafka-jumbo partition sizes [alerts] - 10https://gerrit.wikimedia.org/r/1302737 (https://phabricator.wikimedia.org/T429127) [15:34:36] (03CR) 10Scott French: [C:03+1] "Whoops, good catch! Definitely present now." [puppet] - 10https://gerrit.wikimedia.org/r/1304104 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [15:34:37] (03CR) 10Brouberol: data-platform: add alert on kafka-jumbo partition sizes (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1302737 (https://phabricator.wikimedia.org/T429127) (owner: 10Brouberol) [15:34:56] (03CR) 10CDanis: [V:03+1 C:03+2] fundraising_data_import: hostname != FQDN [puppet] - 10https://gerrit.wikimedia.org/r/1304104 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis) [15:35:05] (03PS4) 10AOkoth: hiera: promote phab2003 to passive_server [puppet] - 10https://gerrit.wikimedia.org/r/1302894 (https://phabricator.wikimedia.org/T423727) [15:36:28] 06SRE, 10SRE-swift-storage, 07Essential-Work: Migrate production swift clusters to trixie - https://phabricator.wikimedia.org/T429630#12034569 (10MatthewVernon) [15:36:29] 06SRE, 06Infrastructure-Foundations, 07Epic: Tracking task for Bullseye migrations in production - https://phabricator.wikimedia.org/T291916#12034570 (10MatthewVernon) [15:38:04] (03PS7) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 (https://phabricator.wikimedia.org/T429599) [15:39:01] 10ops-codfw, 06DC-Ops: upgrade selecte servers from 1G to 10G - https://phabricator.wikimedia.org/T429631 (10Jhancock.wm) 03NEW [15:40:40] (03CR) 10Zabe: [C:03+2] LocalFileMoveBatch: Also update fr_archive_name when moving file [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303981 (https://phabricator.wikimedia.org/T428406) (owner: 10Zabe) [15:41:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by zabe@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303981 (https://phabricator.wikimedia.org/T428406) (owner: 10Zabe) [15:44:58] 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Q3:rack/setup/install conf200[7-9] - https://phabricator.wikimedia.org/T418914#12034597 (10Jhancock.wm) [15:45:17] (03CR) 10Elukey: "Left some comments for the Python code and some nits for the puppet one, lemme know!" [puppet] - 10https://gerrit.wikimedia.org/r/1298294 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [15:46:58] (03CR) 10Elukey: slothslos/report2drive: add profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [15:48:23] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [15:48:29] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12034603 (10Ottomata) Approved! FYI though approval is [[ https://github.com/wikimedia/operations-puppet/blob/production/modules/admin/data/data.yaml#L495-L502 | not need... [15:48:36] (03Merged) 10jenkins-bot: LocalFileMoveBatch: Also update fr_archive_name when moving file [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1303981 (https://phabricator.wikimedia.org/T428406) (owner: 10Zabe) [15:49:02] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1303981|LocalFileMoveBatch: Also update fr_archive_name when moving file (T428406)]] [15:49:06] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12034611 (10Ottomata) Approved! FYI though approval is [[ https://github.com/wikimedia/operations-puppet/blob/production/modules/admin/data/data.yaml#L495-L502 | not needed for... [15:49:07] T428406: old file revisions missing of File:A_Warm_Shade_of_Ivory_-_Henry_Mancini_album_cover.jpg - https://phabricator.wikimedia.org/T428406 [15:50:02] FIRING: [5x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:51:08] !log zabe@deploy1003 zabe: Backport for [[gerrit:1303981|LocalFileMoveBatch: Also update fr_archive_name when moving file (T428406)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:51:32] !log zabe@deploy1003 zabe: Continuing with deployment [15:52:39] 10ops-codfw, 06SRE, 06DC-Ops, 10observability: Q3:rack/setup/install kafka-logging200[6-8] - https://phabricator.wikimedia.org/T418931#12034630 (10Jhancock.wm) sent! [15:52:44] (03CR) 10Elukey: "I found these interesting things:" [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [15:54:20] 10ops-codfw, 06SRE, 06DC-Ops, 06Wikidata Platform Team, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Q4:rack/setup/install dse-k8s-wdqs200[1-4] (formerly wdqs20[28-31]) - https://phabricator.wikimedia.org/T423312#12034648 (10Jhancock.wm) [15:54:54] 10ops-codfw, 06SRE, 06cloud-services-team, 10Cloud-VPS, 06DC-Ops: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12034649 (10Andrew) 05Open→03Resolved a:03Andrew Indeed, the alert seems to have cleared. Thank you! [15:55:51] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303981|LocalFileMoveBatch: Also update fr_archive_name when moving file (T428406)]] (duration: 06m 49s) [15:55:56] T428406: old file revisions missing of File:A_Warm_Shade_of_Ivory_-_Henry_Mancini_album_cover.jpg - https://phabricator.wikimedia.org/T428406 [16:00:05] jhathaway and rzl: gettimeofday() says it's time for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1600) [16:00:05] dancy: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:37] o/ [16:01:54] hi! looking [16:03:35] dancy: my inclination is to just say "yep, that's beta all right" and merge without looking too closely at the details, does that work for you or would you like a closer review? :) [16:04:31] Just merge please. This stuff has been running in beta for a few weeks [16:04:42] (03CR) 10RLazarus: [C:03+2] beta: Add a wmf-beta-update-all timer and script [puppet] - 10https://gerrit.wikimedia.org/r/1276813 (https://phabricator.wikimedia.org/T256168) (owner: 10BryanDavis) [16:05:06] ty! [16:05:42] done, thanks! [16:07:00] (03CR) 10Scott French: "Aside from the extra quotes in CAS_SERVER_URL, this looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) (owner: 10Giuseppe Lavagetto) [16:07:10] (03CR) 10Scott French: [C:03+1] hiddenparma: switch to native CAS authentication [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) (owner: 10Giuseppe Lavagetto) [16:09:39] FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:10:24] (03PS8) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 (https://phabricator.wikimedia.org/T429599) [16:10:25] (03PS1) 10Effie Mouzeli: Extend robots.php to serve llms-*.txt files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) [16:11:46] (03PS1) 10Jforrester: [abstractwiki] Update favicon with new version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304110 (https://phabricator.wikimedia.org/T429620) [16:12:38] (03PS2) 10Effie Mouzeli: Extend robots.php to serve llms-*.txt files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) [16:13:16] (03CR) 10Volans: "replies inline" [puppet] - 10https://gerrit.wikimedia.org/r/1302236 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [16:14:39] RESOLVED: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:15:38] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users level 1 for chudson - https://phabricator.wikimedia.org/T429353#12034754 (10CHudson-WMF) Thanks, I'll bring this up with my manager again in our 1:1 today. [16:16:43] (03PS1) 10Zabe: Add script to fix fr_archive_name drifts [extensions/WikimediaMaintenance] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304112 (https://phabricator.wikimedia.org/T428406) [16:17:00] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [16:17:10] (03CR) 10RLazarus: [C:03+2] Rebuild for Trixie [software/httpbb] - 10https://gerrit.wikimedia.org/r/1303557 (https://phabricator.wikimedia.org/T427899) (owner: 10RLazarus) [16:17:59] (03CR) 10RLazarus: [V:03+2 C:03+2] "Thanks!" [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/1303560 (owner: 10RLazarus) [16:18:21] (03CR) 10Zabe: [C:03+2] Add script to fix fr_archive_name drifts [extensions/WikimediaMaintenance] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304112 (https://phabricator.wikimedia.org/T428406) (owner: 10Zabe) [16:19:12] (03Merged) 10jenkins-bot: Rebuild for Trixie [software/httpbb] - 10https://gerrit.wikimedia.org/r/1303557 (https://phabricator.wikimedia.org/T427899) (owner: 10RLazarus) [16:21:33] (03Merged) 10jenkins-bot: Add script to fix fr_archive_name drifts [extensions/WikimediaMaintenance] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304112 (https://phabricator.wikimedia.org/T428406) (owner: 10Zabe) [16:21:41] (03PS1) 10Effie Mouzeli: mediawiki-vhost.conf: Route llms*.txt requests to robots.php [puppet] - 10https://gerrit.wikimedia.org/r/1304114 (https://phabricator.wikimedia.org/T429599) [16:22:06] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1304112|Add script to fix fr_archive_name drifts (T428406)]] [16:22:10] T428406: old file revisions missing of File:A_Warm_Shade_of_Ivory_-_Henry_Mancini_album_cover.jpg - https://phabricator.wikimedia.org/T428406 [16:23:03] (03PS3) 10Effie Mouzeli: Extend robots.php to serve llms-*.txt files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) [16:23:24] (03PS9) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 (https://phabricator.wikimedia.org/T429599) [16:24:05] !log zabe@deploy1003 zabe: Backport for [[gerrit:1304112|Add script to fix fr_archive_name drifts (T428406)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [16:24:34] !log zabe@deploy1003 zabe: Continuing with deployment [16:25:24] (03PS4) 10Effie Mouzeli: Extend robots.php to serve llms-*.txt files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) [16:28:23] RESOLVED: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [16:28:23] (03CR) 10Federico Ceratto: "This code implements a different logic: it receives sections as an argument and iterates over the required sections." [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto) [16:28:52] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304112|Add script to fix fr_archive_name drifts (T428406)]] (duration: 06m 46s) [16:28:56] T428406: old file revisions missing of File:A_Warm_Shade_of_Ivory_-_Henry_Mancini_album_cover.jpg - https://phabricator.wikimedia.org/T428406 [16:31:46] (03CR) 10Volans: [C:03+2] Cinder backups: enable transport encryption part 2 [puppet] - 10https://gerrit.wikimedia.org/r/1303961 (https://phabricator.wikimedia.org/T294432) (owner: 10Volans) [16:34:59] (03CR) 10Jforrester: "Won't we want to land llms-full.txt or whatever with this? Or is waiting OK?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) (owner: 10Effie Mouzeli) [16:35:05] (03CR) 10Volans: "Question inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) (owner: 10Muehlenhoff) [16:40:18] 10SRE-swift-storage, 06Commons: Compressing TIFF files from the Library of Congress - https://phabricator.wikimedia.org/T429264#12034813 (10Ladsgroup) Yeah, it'll be lower. Depends on whether they have been compressed already and to what level. We can take a random sample and check. OTOH. I have already the bo... [16:40:34] (03PS2) 10RLazarus: [WIP] Periodic jobs: Add abstractwiki_update_generated_articles [puppet] - 10https://gerrit.wikimedia.org/r/1302213 (https://phabricator.wikimedia.org/T422628) (owner: 10Jforrester) [16:48:50] jouncebot: nowandnext [16:48:50] For the next 0 hour(s) and 11 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1600) [16:48:50] In 0 hour(s) and 11 minute(s): Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1700) [16:48:50] In 0 hour(s) and 11 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1700) [16:51:06] (03CR) 10Atsuko: dse-k8s-services: Enable ingress on WDQS namespaces (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg) [16:51:12] (03CR) 10RLazarus: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8765/co" [puppet] - 10https://gerrit.wikimedia.org/r/1302213 (https://phabricator.wikimedia.org/T422628) (owner: 10Jforrester) [16:52:32] (03PS5) 10AOkoth: hiera: promote phab2003 to passive_server [puppet] - 10https://gerrit.wikimedia.org/r/1302894 (https://phabricator.wikimedia.org/T423727) [16:56:42] (03PS3) 10Atsuko: deployment_server: adding dse monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) [16:58:28] (03CR) 10Atsuko: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) (owner: 10Atsuko) [16:58:37] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) (owner: 10Atsuko) [17:00:05] bd808: Your horoscope predicts another Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1700). [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1700) [17:01:47] 06SRE, 06Product Safety and Integrity: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T429156#12034865 (10Dreamy_Jazz) Happened again: ` bhwiktionary Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached Resolving timed out aft... [17:03:12] 06SRE, 06Product Safety and Integrity: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T429156#12034867 (10Dreamy_Jazz) And again: ` eswiki Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached Resolving timed out after 2000 mil... [17:08:05] (03CR) 10Dzahn: [C:03+1] hiera: promote phab2003 to passive_server [puppet] - 10https://gerrit.wikimedia.org/r/1302894 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [17:14:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:18:31] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) (owner: 10Atsuko) [17:18:56] 06SRE, 06Product Safety and Integrity: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T429156#12034912 (10kostajh) p:05Triage→03High [17:19:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:22:41] (03PS1) 10Mmartorana: config: Enable EmailConfirmationBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304122 (https://phabricator.wikimedia.org/T428292) [17:23:07] (03PS1) 10CDobbins: Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) [17:24:00] (03CR) 10CI reject: [V:04-1] Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [17:24:50] PROBLEM - orchestrator resolve cache non-FQDNs on dborch1002 is CRITICAL: CRITICAL: 1 non-FQDN entries in orchestrator resolve cache: https://wikitech.wikimedia.org/wiki/Orchestrator [17:26:16] (03PS2) 10CDobbins: Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) [17:28:58] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12034949 (10CDobbins) 05Open→03In progress a:03CDobbins [17:31:42] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304122 (https://phabricator.wikimedia.org/T428292) (owner: 10Mmartorana) [17:31:46] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12034960 (10Dzahn) Hey @medelius thanks for adding the public key. Now we just need to verify it somehow outside of this ticket. Could you send a direct email to one or all of us who have commen... [17:33:51] cdobbins@cumin2002 reimage (PID 53235) is awaiting input [17:34:09] (03PS1) 10Mmartorana: Add email confirmation banner Test Kitchen instrumentation (long-term) [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304125 (https://phabricator.wikimedia.org/T428293) [17:36:51] (03CR) 10BCornwall: Add learn.wiki cname for AWS (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [17:37:21] !log cdobbins@cumin2002 START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm [17:38:06] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12034975 (10medelius) Sure thing, sent. [17:38:16] (03CR) 10Ssingh: Add learn.wiki cname for AWS (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [17:39:39] FIRING: JobUnavailable: Reduced availability for job mysql-test in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:44:19] (03PS2) 10Andrea Denisse: admin: Remove deprecated SSH key for denisse [puppet] - 10https://gerrit.wikimedia.org/r/1304124 (https://phabricator.wikimedia.org/T429429) [17:44:39] FIRING: [3x] JobUnavailable: Reduced availability for job haproxy in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:45:08] PROBLEM - Host 2a02:ec80:700:2:195:200:68:37 is DOWN: PING CRITICAL - Packet loss = 100% [17:45:28] (03CR) 10Andrea Denisse: [C:03+2] admin: Remove deprecated SSH key for denisse [puppet] - 10https://gerrit.wikimedia.org/r/1304124 (https://phabricator.wikimedia.org/T429429) (owner: 10Andrea Denisse) [17:49:29] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304125 (https://phabricator.wikimedia.org/T428293) (owner: 10Mmartorana) [17:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:50:16] (03CR) 10Ryan Kemper: [C:03+1] deployment_server: adding dse monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) (owner: 10Atsuko) [17:52:06] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12035007 (10Dzahn) Thanks! Received and verified the key. [17:53:11] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users level 1 for chudson - https://phabricator.wikimedia.org/T429353#12035010 (10IAckerman-WMF) Approved, thanks! ~ ilse [17:54:03] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12035012 (10Dzahn) [18:00:05] jeena and dduvall: MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T1800). Please do the needful. [18:02:09] (03PS1) 10CDobbins: hieradata: remove pdns cfg override for dns7002 [puppet] - 10https://gerrit.wikimedia.org/r/1304138 (https://phabricator.wikimedia.org/T401832) [18:02:58] (03CR) 10Ssingh: [C:03+1] hieradata: remove pdns cfg override for dns7002 [puppet] - 10https://gerrit.wikimedia.org/r/1304138 (https://phabricator.wikimedia.org/T401832) (owner: 10CDobbins) [18:04:21] James_F: can I merge https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1304067 to unblock the train? [18:04:29] !log cdobbins@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage [18:04:37] jeena: Yes, go for it. [18:04:43] okay thanks! [18:05:08] (03PS1) 10Andrew Bogott: cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) [18:05:46] (03CR) 10CI reject: [V:04-1] cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [18:07:41] (03PS1) 10Dzahn: admin: add Caro to deployers and add SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1304140 (https://phabricator.wikimedia.org/T426995) [18:08:15] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304067 (https://phabricator.wikimedia.org/T429584) (owner: 10Jforrester) [18:09:15] (03CR) 10CDobbins: [C:03+2] hieradata: remove pdns cfg override for dns7002 [puppet] - 10https://gerrit.wikimedia.org/r/1304138 (https://phabricator.wikimedia.org/T401832) (owner: 10CDobbins) [18:09:48] !log cdobbins@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage [18:10:13] (03PS2) 10Andrew Bogott: cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) [18:13:01] (03PS5) 10Dzahn: contint: add second proxy for jenkins on an external host [puppet] - 10https://gerrit.wikimedia.org/r/1300916 (https://phabricator.wikimedia.org/T418521) [18:13:05] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [18:16:44] (03Merged) 10jenkins-bot: SpecialSpecialPages: Guard against special pages with no content-language alias [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304067 (https://phabricator.wikimedia.org/T429584) (owner: 10Jforrester) [18:17:13] !log jhuneidi@deploy1003 Started scap sync-world: Backport for [[gerrit:1304067|SpecialSpecialPages: Guard against special pages with no content-language alias (T429584)]] [18:17:17] T429584: PHP Warning: foreach() argument must be of type array|object, null given - https://phabricator.wikimedia.org/T429584 [18:19:12] !log jhuneidi@deploy1003 jhuneidi, jforrester: Backport for [[gerrit:1304067|SpecialSpecialPages: Guard against special pages with no content-language alias (T429584)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [18:19:39] FIRING: [3x] JobUnavailable: Reduced availability for job haproxy in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:21:24] PROBLEM - Recursive DNS on 195.200.68.37 is CRITICAL: DNS_QUERY CRITICAL - query timed out https://wikitech.wikimedia.org/wiki/DNS [18:21:38] !log jhuneidi@deploy1003 jhuneidi, jforrester: Continuing with deployment [18:22:32] RECOVERY - Dell PowerEdge or Supermicro Broadcom RAID Controller on db2247 is OK: communication: 0 OK : controller: 0 OK : physical_disk: 0 OK : virtual_disk: 0 OK : bbu: 0 OK : enclosure: 0 OK https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring [18:23:31] (03CR) 10Dzahn: [V:03+1 C:03+1] "here you can see: on the prod contint server it only drops a second apache config - does not change existing one:" [puppet] - 10https://gerrit.wikimedia.org/r/1300916 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn) [18:24:35] (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2026-06-17-184727 to 2026-06-18-181627 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304145 (https://phabricator.wikimedia.org/T428126) [18:25:01] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12035061 (10ssingh) @Asaf: It seems like the record already exists and is being served: ` dig _e8216d92d36158dd2198ac46e3739de7.learn.wiki +short _58bdabc6b3bcd7a4a822c4b55d531e26.tjxr... [18:25:11] (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2026-06-17-184727 to 2026-06-18-181627 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304145 (https://phabricator.wikimedia.org/T428126) (owner: 10Jforrester) [18:25:59] !log jhuneidi@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304067|SpecialSpecialPages: Guard against special pages with no content-language alias (T429584)]] (duration: 08m 46s) [18:26:04] T429584: PHP Warning: foreach() argument must be of type array|object, null given - https://phabricator.wikimedia.org/T429584 [18:27:32] !log (eqiad) kubectl delete pod coredns-54cdd9bdf-6n4ps -n kube-system - T429156 [18:27:34] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2026-06-17-184727 to 2026-06-18-181627 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304145 (https://phabricator.wikimedia.org/T428126) (owner: 10Jforrester) [18:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:37] T429156: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T429156 [18:27:46] !log (eqiad) kubectl delete pod coredns-54cdd9bdf-6hwb5 -n kube-system - T429156 [18:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:21] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [18:28:33] (03PS3) 10CDobbins: Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) [18:28:42] (03CR) 10CDobbins: Add learn.wiki cname for AWS (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [18:29:10] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [18:29:29] (03CR) 10CI reject: [V:04-1] Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [18:31:21] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [18:31:24] PROBLEM - Recursive DNS on 2a02:ec80:700:2:195:200:68:37 is CRITICAL: DNS_QUERY CRITICAL - query timed out https://wikitech.wikimedia.org/wiki/DNS [18:32:04] (03PS1) 10TrainBranchBot: group2 to 1.47.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304146 (https://phabricator.wikimedia.org/T423916) [18:32:07] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by jhuneidi@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304146 (https://phabricator.wikimedia.org/T423916) (owner: 10TrainBranchBot) [18:32:52] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304147 [18:33:17] (03Merged) 10jenkins-bot: group2 to 1.47.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304146 (https://phabricator.wikimedia.org/T423916) (owner: 10TrainBranchBot) [18:33:55] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [18:34:00] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [18:34:02] (03Abandoned) 10CDobbins: Add learn.wiki cname for AWS [dns] - 10https://gerrit.wikimedia.org/r/1304123 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [18:34:39] FIRING: [3x] JobUnavailable: Reduced availability for job haproxy in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:34:44] (03CR) 10Ssingh: "A few basic questions and needs a bit of research on the options presented. Can you let me know when you plan to deploy this so we can mak" [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [18:36:16] (03PS1) 10FNegri: WIP: add option to specify target hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1304149 (https://phabricator.wikimedia.org/T393387) [18:36:48] (03CR) 10Cathal Mooney: [C:03+1] change back interface to ge-0/0/0 reboot needed [homer/public] - 10https://gerrit.wikimedia.org/r/1303441 (https://phabricator.wikimedia.org/T421674) (owner: 10Papaul) [18:37:10] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [18:37:11] 06SRE, 06Product Safety and Integrity: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T429156#12035091 (10Scott_French) The rate of DNS-related fetch timeouts has definitely kicked up over the last week in eqiad, e.g., https://logstash.wikimedia.org/go... [18:37:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:39:01] (03CR) 10CI reject: [V:04-1] WIP: add option to specify target hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1304149 (https://phabricator.wikimedia.org/T393387) (owner: 10FNegri) [18:39:36] !log jhuneidi@deploy1003 rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.7 refs T423916 [18:39:41] T423916: 1.47.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T423916 [18:41:31] (03PS4) 10Cathal Mooney: Cookbook to enable BGP for a given host and configure network [cookbooks] - 10https://gerrit.wikimedia.org/r/1304137 (https://phabricator.wikimedia.org/T429488) [18:42:22] RECOVERY - Recursive DNS on 195.200.68.37 is OK: DNS_QUERY OK - Success https://wikitech.wikimedia.org/wiki/DNS [18:42:22] RECOVERY - Recursive DNS on 2a02:ec80:700:2:195:200:68:37 is OK: DNS_QUERY OK - Success https://wikitech.wikimedia.org/wiki/DNS [18:43:52] (03PS1) 10Jdlrobson: Ensure page tools icons are only shown on small viewports [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304151 (https://phabricator.wikimedia.org/T426131) [18:44:39] RESOLVED: JobUnavailable: Reduced availability for job pdnsrec in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:45:14] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [18:47:28] RECOVERY - BFD status on asw1-b4-magru.mgmt is OK: UP: 2 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:47:30] (03CR) 10Muehlenhoff: sre.puppet.disable-merges: Avoid using puppet-merge (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) (owner: 10Muehlenhoff) [18:49:58] (03CR) 10JHathaway: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1303529 (owner: 10JHathaway) [18:49:59] (03PS3) 10Andrew Bogott: cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) [18:50:37] (03CR) 10Ssingh: "@bcornwall@wikimedia.org FYI, no review expected but for your awareness since Cathal is doing good things for us :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1304137 (https://phabricator.wikimedia.org/T429488) (owner: 10Cathal Mooney) [18:50:39] (03CR) 10AOkoth: [C:03+2] hiera: promote phab2003 to passive_server [puppet] - 10https://gerrit.wikimedia.org/r/1302894 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [18:51:12] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [18:53:09] 10SRE-swift-storage, 06Commons, 06DBA, 10media-backups, and 6 others: old file revisions missing of File:A_Warm_Shade_of_Ivory_-_Henry_Mancini_album_cover.jpg - https://phabricator.wikimedia.org/T428406#12035126 (10TheDJ) we really should have some sort 'health check' when we drop the old stuff. A bug simi... [18:53:40] FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker2070:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker2070 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [18:53:59] (03CR) 10Catrope: [C:03+1] config: Enable EmailConfirmationBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304122 (https://phabricator.wikimedia.org/T428292) (owner: 10Mmartorana) [18:54:10] !log cdobbins@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7002.wikimedia.org with OS bookworm [18:58:18] 14SRE-Sprint-Week-Sustainability-March2023, 06Traffic, 07Sustainability (Incident Followup): Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106#12035135 (10BCornwall) Due to budgetary restraints, we're going to experiment with seeing how much worse off we'd be with single NV... [18:58:40] RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker2070:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker2070 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [18:58:59] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [19:01:12] !log cdobbins@cumin2002 START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org [19:01:14] !log cdobbins@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org [19:03:51] !log cdobbins@dns1004 START - running authdns-update [19:04:06] !log aokoth@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on phab2002.codfw.wmnet with reason: Host Replacement [19:04:57] (03CR) 10CDanis: "Dumb question: At what point in this patch stack could we write a non-trivial httpbb test case?" [puppet] - 10https://gerrit.wikimedia.org/r/1304114 (https://phabricator.wikimedia.org/T429599) (owner: 10Effie Mouzeli) [19:05:23] (03PS1) 10Catrope: Permissions: Create wmf-officeit group on collabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304156 [19:05:33] !log cdobbins@dns1004 END - running authdns-update [19:06:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304156 (owner: 10Catrope) [19:07:51] !log cdobbins@cumin2002 conftool action : set/pooled=yes; selector: name=dns7002.*,service=authdns-update [19:08:05] !log cdobbins@dns1004 START - running authdns-update [19:09:46] !log cdobbins@dns1004 END - running authdns-update [19:11:45] !log cdobbins@cumin2002 conftool action : set/pooled=yes; selector: name=dns7002.* [19:15:14] (03PS1) 10Andrew Bogott: Superset, Quarry: Open security groups from cumin to magnum workers [puppet] - 10https://gerrit.wikimedia.org/r/1304157 (https://phabricator.wikimedia.org/T422801) [19:28:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:31:44] PROBLEM - Ensure acme-chief-backend is running only in the active node on acmechief2002 is CRITICAL: PROCS CRITICAL: 0 processes with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [19:35:09] (03PS4) 10Andrew Bogott: cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) [19:36:45] PROBLEM - Check unit status of acme-chief #page on acmechief2002 is CRITICAL: CRITICAL: Status of the systemd unit acme-chief https://wikitech.wikimedia.org/wiki/Acme-chief%23Monitoring [19:37:42] 👀 [19:40:44] RECOVERY - Ensure acme-chief-backend is running only in the active node on acmechief2002 is OK: PROCS OK: 1 process with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [19:41:25] (03PS1) 10Clare Ming: Include Phabricator specific config for Test Kitchen [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304165 (https://phabricator.wikimedia.org/T428986) [19:42:32] (03PS2) 10Clare Ming: Include Phabricator specific config for Test Kitchen [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304165 (https://phabricator.wikimedia.org/T428986) [19:43:44] PROBLEM - Ensure acme-chief-backend is running only in the active node on acmechief2002 is CRITICAL: PROCS CRITICAL: 0 processes with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [19:44:38] (03CR) 10CI reject: [V:04-1] Include Phabricator specific config for Test Kitchen [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304165 (https://phabricator.wikimedia.org/T428986) (owner: 10Clare Ming) [19:50:44] RECOVERY - Ensure acme-chief-backend is running only in the active node on acmechief2002 is OK: PROCS OK: 1 process with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [19:54:39] FIRING: [5x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:55:21] (03PS1) 10JHathaway: WIP: CI test [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304167 [19:55:44] PROBLEM - Ensure acme-chief-backend is running only in the active node on acmechief2002 is CRITICAL: PROCS CRITICAL: 0 processes with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [20:00:04] RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T2000). [20:00:04] danisztls: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:22] o/ [20:00:25] I can self-deploy [20:01:10] (03CR) 10CI reject: [V:04-1] WIP: CI test [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304167 (owner: 10JHathaway) [20:02:30] (03PS1) 10Effie Mouzeli: Add /llms-rate-limits.txt and /llms-content-reuse.txt #2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304168 (https://phabricator.wikimedia.org/T429599) [20:03:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303895 (https://phabricator.wikimedia.org/T428876) (owner: 10DDesouza) [20:03:56] (03Merged) 10jenkins-bot: Deploy English Wikipedia Mobile App Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303895 (https://phabricator.wikimedia.org/T428876) (owner: 10DDesouza) [20:04:11] !log dani@deploy1003 Started scap sync-world: Backport for [[gerrit:1303895|Deploy English Wikipedia Mobile App Survey (T428876)]] [20:04:16] T428876: Quick survey on Wikipedia - Mobile App Survey (WP25) - https://phabricator.wikimedia.org/T428876 [20:06:02] !log dani@deploy1003 dani: Backport for [[gerrit:1303895|Deploy English Wikipedia Mobile App Survey (T428876)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:08:13] !log dani@deploy1003 dani: Continuing with deployment [20:12:32] !log dani@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303895|Deploy English Wikipedia Mobile App Survey (T428876)]] (duration: 08m 20s) [20:12:37] T428876: Quick survey on Wikipedia - Mobile App Survey (WP25) - https://phabricator.wikimedia.org/T428876 [20:15:14] all done [20:17:15] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [20:17:41] (03PS1) 10BPirkle: REST: remove obsolete and unnecessary config entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304173 (https://phabricator.wikimedia.org/T422770) [20:19:16] (03Abandoned) 10Clare Ming: Include Phabricator specific config for Test Kitchen [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304165 (https://phabricator.wikimedia.org/T428986) (owner: 10Clare Ming) [20:20:44] RECOVERY - Ensure acme-chief-backend is running only in the active node on acmechief2002 is OK: PROCS OK: 1 process with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief [20:25:44] (03PS3) 10RLazarus: Periodic jobs: Add abstractwiki_update_generated_articles [puppet] - 10https://gerrit.wikimedia.org/r/1302213 (https://phabricator.wikimedia.org/T422628) (owner: 10Jforrester) [20:26:46] RECOVERY - Check unit status of acme-chief #page on acmechief2002 is OK: OK: Status of the systemd unit acme-chief https://wikitech.wikimedia.org/wiki/Acme-chief%23Monitoring [20:28:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [20:29:14] (03PS1) 10BPirkle: REST: adjust analytics and wikifunctions REST Sandbox visibility [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304175 (https://phabricator.wikimedia.org/T422770) [20:30:32] (03CR) 10RLazarus: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8767/co" [puppet] - 10https://gerrit.wikimedia.org/r/1302213 (https://phabricator.wikimedia.org/T422628) (owner: 10Jforrester) [20:55:52] (03PS5) 10Effie Mouzeli: Extend robots.php to serve llms-*.txt files #3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) [20:56:20] (03CR) 10RLazarus: [V:03+1 C:03+2] Periodic jobs: Add abstractwiki_update_generated_articles (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1302213 (https://phabricator.wikimedia.org/T422628) (owner: 10Jforrester) [20:56:31] (03PS3) 10Jdlrobson: Prevent surveys being automatically added to non-Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303493 (https://phabricator.wikimedia.org/T393436) [21:00:04] Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260618T2100) [21:00:26] (03CR) 10Effie Mouzeli: "we can do so in a future separate patch 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304109 (https://phabricator.wikimedia.org/T429599) (owner: 10Effie Mouzeli) [21:02:14] Okay to proceed with a single deployment? [21:02:56] yes but let me know when you're done Jdlrobson I have some things I want to deploy as well [21:04:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303493 (https://phabricator.wikimedia.org/T393436) (owner: 10Jdlrobson) [21:04:49] Just one config change so should be quick! [21:05:29] lovely [21:06:32] (03Merged) 10jenkins-bot: Prevent surveys being automatically added to non-Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303493 (https://phabricator.wikimedia.org/T393436) (owner: 10Jdlrobson) [21:06:50] !log jdlrobson@deploy1003 Started scap sync-world: Backport for [[gerrit:1303493|Prevent surveys being automatically added to non-Wikipedias (T393436)]] [21:06:55] T393436: Improve QuickSurveys placement algorithm for non-Wikipedia support - https://phabricator.wikimedia.org/T393436 [21:08:38] !log jdlrobson@deploy1003 jdlrobson: Backport for [[gerrit:1303493|Prevent surveys being automatically added to non-Wikipedias (T393436)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:08:39] (03PS10) 10Effie Mouzeli: Add /llms.txt where honest robots can read our API Policy #1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303454 (https://phabricator.wikimedia.org/T429599) [21:09:47] !log jdlrobson@deploy1003 jdlrobson: Continuing with deployment [21:10:26] !log rzl@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-cron: apply [21:11:24] !log rzl@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply [21:14:44] (03PS1) 10JHathaway: WIP: fix tests? [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304180 [21:14:45] !log jdlrobson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1303493|Prevent surveys being automatically added to non-Wikipedias (T393436)]] (duration: 07m 54s) [21:14:50] T393436: Improve QuickSurveys placement algorithm for non-Wikipedia support - https://phabricator.wikimedia.org/T393436 [21:15:15] (03CR) 10Milazg: [C:03+1] REST: adjust analytics and wikifunctions REST Sandbox visibility [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304175 (https://phabricator.wikimedia.org/T422770) (owner: 10BPirkle) [21:16:59] 06SRE, 10DNS: Wikimedia DNS over Port 53 support - https://phabricator.wikimedia.org/T429650#12035535 (10Peachey88) [21:17:30] jdlrobson looks like you're finished [21:17:44] so I'll go ahead and get started on my one security patch that's going out [21:18:06] ok maryum all yours [21:18:12] lovely thanks [21:22:38] (03PS2) 10Effie Mouzeli: mediawiki-vhost.conf: Route llms*.txt requests to robots.php #4 [puppet] - 10https://gerrit.wikimedia.org/r/1304114 (https://phabricator.wikimedia.org/T429599) [21:22:46] 06SRE, 10DNS: Wikimedia DNS over Port 53 support - https://phabricator.wikimedia.org/T429650#12035548 (10Phantom2026) [21:24:13] running scap for the security deploy [21:25:28] (03CR) 10Effie Mouzeli: "it is sorted" [puppet] - 10https://gerrit.wikimedia.org/r/1304114 (https://phabricator.wikimedia.org/T429599) (owner: 10Effie Mouzeli) [21:26:56] 10ops-codfw, 06SRE, 06DC-Ops: upgrade selected servers from 1G to 10G - https://phabricator.wikimedia.org/T429631#12035598 (10Reedy) [21:27:06] 10ops-codfw, 06SRE, 06DC-Ops: upgrade selected servers from 1G to 10G - https://phabricator.wikimedia.org/T429631#12035600 (10Reedy) [21:29:05] !log Deployed security fix for T428833 [21:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:11] security deploy is done [21:30:23] (03PS1) 10JHathaway: WIP: CI? [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [21:33:12] (03PS3) 10Effie Mouzeli: mediawiki-vhost.conf: Route llms*.txt requests to robots.php #4 [puppet] - 10https://gerrit.wikimedia.org/r/1304114 (https://phabricator.wikimedia.org/T429599) [21:45:51] (03CR) 10JHathaway: sre.hosts.provision: introduce the wmfroot user (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291994 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [21:45:54] (03PS1) 10SBassett: Add info-level logging to wmgMonologChannels for timeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) [21:46:27] (03CR) 10SBassett: [C:04-1] "Hold for config deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) (owner: 10SBassett) [21:49:12] (03CR) 10SomeRandomDeveloper: [C:03+1] Add info-level logging to wmgMonologChannels for timeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) (owner: 10SBassett) [21:49:39] FIRING: PuppetFailure: Puppet has failed on cumin2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:55:04] (03Abandoned) 10Acamicamacaraca: Gender namespaces on Serbo-Croatian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285467 (https://phabricator.wikimedia.org/T425402) (owner: 10Acamicamacaraca) [21:57:59] (03CR) 10CI reject: [V:04-1] WIP: fix tests? [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304180 (owner: 10JHathaway) [22:14:29] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304175 (https://phabricator.wikimedia.org/T422770) (owner: 10BPirkle) [22:19:05] 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2247 - https://phabricator.wikimedia.org/T429348#12035716 (10Jhancock.wm) 05Open→03Resolved swapped! looks good on this side. feel free to reopen this ticket if anything comes up. [22:20:16] 10ops-codfw, 06SRE, 06DC-Ops, 06Wikidata Platform Team, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Q4:rack/setup/install dse-k8s-wdqs200[1-4] (formerly wdqs20[28-31]) - https://phabricator.wikimedia.org/T423312#12035718 (10Jhancock.wm) 05Open→03Resolved [22:21:20] 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Q3:rack/setup/install conf200[7-9] - https://phabricator.wikimedia.org/T418914#12035721 (10Jhancock.wm) 05Open→03Resolved [22:26:23] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [22:37:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:40:48] (03PS2) 10BCornwall: admin: Add mahmoud-abdelsattar to a-p-d [puppet] - 10https://gerrit.wikimedia.org/r/1304099 (https://phabricator.wikimedia.org/T428416) [22:41:42] jouncebot: nowandnext [22:41:43] No deployments scheduled for the next 7 hour(s) and 18 minute(s) [22:41:43] In 7 hour(s) and 18 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260619T0600) [22:41:43] (03PS1) 10Dreamy Jazz: hCaptcha: Re-enable for mcrundo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304195 (https://phabricator.wikimedia.org/T427612) [22:42:00] (03CR) 10Ladsgroup: [V:03+2 C:03+2] admin: Add mahmoud-abdelsattar to a-p-d [puppet] - 10https://gerrit.wikimedia.org/r/1304099 (https://phabricator.wikimedia.org/T428416) (owner: 10BCornwall) [22:42:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304195 (https://phabricator.wikimedia.org/T427612) (owner: 10Dreamy Jazz) [22:43:10] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#12035771 (10Ladsgroup) 05In progress→03Resolved [22:44:17] (03Merged) 10jenkins-bot: hCaptcha: Re-enable for mcrundo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304195 (https://phabricator.wikimedia.org/T427612) (owner: 10Dreamy Jazz) [22:44:33] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304195|hCaptcha: Re-enable for mcrundo (T427612)]] [22:44:37] T427612: hCaptcha: mcrundo cannot be used when hCaptcha is enabled for editing - https://phabricator.wikimedia.org/T427612 [22:46:21] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304195|hCaptcha: Re-enable for mcrundo (T427612)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:46:42] (03PS2) 10Dzahn: admin: add Caro to deployers and add SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1304140 (https://phabricator.wikimedia.org/T426995) [22:46:49] (03CR) 10Ladsgroup: [V:03+2 C:03+2] admin: add Caro to deployers and add SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1304140 (https://phabricator.wikimedia.org/T426995) (owner: 10Dzahn) [22:47:43] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [22:47:49] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#12035778 (10Ladsgroup) 05In progress→03Resolved [22:48:20] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users level 1 for chudson - https://phabricator.wikimedia.org/T429353#12035784 (10Ladsgroup) [22:49:59] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12035786 (10Ladsgroup) [22:51:18] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12035795 (10Ladsgroup) Not a big deal but would be possible not to use RSA? [22:51:58] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12035796 (10Ladsgroup) a:03XenoRyet [22:51:58] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304195|hCaptcha: Re-enable for mcrundo (T427612)]] (duration: 07m 25s) [22:52:03] T427612: hCaptcha: mcrundo cannot be used when hCaptcha is enabled for editing - https://phabricator.wikimedia.org/T427612 [22:54:30] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provissioning - https://phabricator.wikimedia.org/T429429#12035811 (10Ladsgroup) >>! In T429429#12034452, @BCornwall wrote: > Amir will be on clinic duty this week and will finish up once the remaining requirements are satisfied. My apo... [23:03:26] !log rzl@apt1002:~$ sudo -i reprepro -C main include trixie-wikimedia ${HOME}/httpbb/trixie/httpbb_${VERSION?}-1+deb13u1_amd64.changes # T427899 [23:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:03:30] T427899: Build httpbb for Trixie - https://phabricator.wikimedia.org/T427899 [23:03:51] er. that's less helpful with those variables still in there, fixing in the on-wiki SAL [23:05:15] 06SRE, 06ServiceOps new: Build httpbb for Trixie - https://phabricator.wikimedia.org/T427899#12035839 (10RLazarus) Above SAL line should have read: ` sudo -i reprepro -C main include trixie-wikimedia /home/rzl/httpbb/trixie/httpbb_0.0.5-1+deb13u1_amd64.changes ` [23:08:10] 06SRE, 10SRE-Access-Requests: Requesting access for lerickson to deploy the RDF streaming updater on wikikube - https://phabricator.wikimedia.org/T429610#12035844 (10Ladsgroup) It seems you're not in the deployment group. I don't think we have a dedicated rdf-streaming-updater deployment right. Let me ask what... [23:10:03] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12035846 (10Ladsgroup) [23:11:56] 06SRE, 06ServiceOps new: Build httpbb for Trixie - https://phabricator.wikimedia.org/T427899#12035847 (10RLazarus) 05Open→03Resolved Done! ` rzl@cumin2002:~$ sudo debdeploy deploy -u 2026-06-18-httpbb.yaml -Q C:httpbb Rolling out httpbb: Non-daemon update, no service restart needed httpbb was updated... [23:13:08] 06SRE, 10SRE-Access-Requests, 06Data-Engineering: Requesting access to analytics-privatedata-users level 1 for chudson - https://phabricator.wikimedia.org/T429353#12035851 (10Ladsgroup) According to the note on data.yaml > Approval requests for this group can be expedited by tagging Data-Engineering on phabr... [23:15:09] (03PS1) 10Ladsgroup: admin: Add chudson to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304201 (https://phabricator.wikimedia.org/T429353) [23:16:07] (03CR) 10CI reject: [V:04-1] admin: Add chudson to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304201 (https://phabricator.wikimedia.org/T429353) (owner: 10Ladsgroup) [23:16:29] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12035861 (10Ladsgroup) I will confirm the ssh key out of band. Just a request. No big deal. Is it possible to avoid RSA? [23:23:45] (03PS2) 10Ladsgroup: admin: Add chudson to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304201 (https://phabricator.wikimedia.org/T429353) [23:26:45] (03Abandoned) 10Jdlrobson: Ensure page tools icons are only shown on small viewports [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304151 (https://phabricator.wikimedia.org/T426131) (owner: 10Jdlrobson) [23:27:46] !log rzl@deploy1003 Started deploy [docker-pkg/deploy@f030aed]: (no justification provided) [23:28:11] !log rzl@deploy1003 Finished deploy [docker-pkg/deploy@f030aed]: (no justification provided) (duration: 00m 26s) [23:33:23] !log rzl@deploy1003 Started deploy [docker-pkg/deploy@f030aed]: (no justification provided) [23:34:07] !log rzl@deploy1003 Finished deploy [docker-pkg/deploy@f030aed]: (no justification provided) (duration: 00m 45s) [23:42:57] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304203 [23:42:58] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304203 (owner: 10TrainBranchBot) [23:46:01] !log ALTER TABLE reading_list_project AUTO_INCREMENT = 882; on wikishared on x1 master (T428002) [23:46:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:05] T428002: [Reading List, Bug] On some Wikis, saving an article to reading list results in error message - https://phabricator.wikimedia.org/T428002 [23:51:23] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304203 (owner: 10TrainBranchBot) [23:54:39] FIRING: [5x] SystemdUnitFailed: cowbuilder_update_bookworm-amd64.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed