[00:00:06] (03Merged) 10jenkins-bot: CommonSettings: Stop setting wgDBuser [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174071 (owner: 10Zabe) [00:01:06] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1174071|CommonSettings: Stop setting wgDBuser]] [00:05:30] !log zabe@deploy1003 zabe: Backport for [[gerrit:1174071|CommonSettings: Stop setting wgDBuser]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:07:10] !log zabe@deploy1003 zabe: Continuing with sync [00:08:28] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1174573 [00:08:28] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1174573 (owner: 10TrainBranchBot) [00:08:38] (03PS1) 10Zabe: group0: Stop writing to cl_to and cl_collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174574 (https://phabricator.wikimedia.org/T399579) [00:10:44] (03CR) 10Zabe: [C:03+2] group0: Stop writing to cl_to and cl_collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174574 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe) [00:11:08] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11048694 (10KFrancis) I just pinged legal counsel to counter sign. I'll confirm when it's complete. [00:11:32] (03Merged) 10jenkins-bot: group0: Stop writing to cl_to and cl_collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174574 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe) [00:14:39] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174071|CommonSettings: Stop setting wgDBuser]] (duration: 13m 32s) [00:15:04] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1174574|group0: Stop writing to cl_to and cl_collation (T399579)]] [00:15:09] T399579: Stop writing to cl_to and cl_collation - https://phabricator.wikimedia.org/T399579 [00:17:17] !log zabe@deploy1003 zabe: Backport for [[gerrit:1174574|group0: Stop writing to cl_to and cl_collation (T399579)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:18:22] !log zabe@deploy1003 zabe: Continuing with sync [00:19:48] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11048701 (10KFrancis) The NDA is complete. Thanks! [00:23:44] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174574|group0: Stop writing to cl_to and cl_collation (T399579)]] (duration: 08m 39s) [00:23:49] T399579: Stop writing to cl_to and cl_collation - https://phabricator.wikimedia.org/T399579 [00:27:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.281s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:29:54] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1174573 (owner: 10TrainBranchBot) [00:32:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.19s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:37:19] (03CR) 10Pppery: [C:03+1] ncredir: Add batch of pay-for-edit domains [puppet] - 10https://gerrit.wikimedia.org/r/1174539 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [00:40:33] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11048756 (10KFrancis) I've reached out to @qchris. I'll confirm when the NDA is complete [01:00:41] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [01:09:31] (03PS1) 10Krinkle: MobileUrlCallback: Disable for thankyou.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174578 (https://phabricator.wikimedia.org/T400855) [01:11:29] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 10m 48s) [01:20:58] (03PS1) 10RLazarus: deployment_server: Add --sal to mwscript_k8s [puppet] - 10https://gerrit.wikimedia.org/r/1174579 (https://phabricator.wikimedia.org/T376776) [01:25:01] !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:28:15] vriley@cumin1002 provision (PID 4066885) is awaiting input [01:32:06] (03CR) 10RLazarus: deployment_server: Add --sal to mwscript_k8s (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1174579 (https://phabricator.wikimedia.org/T376776) (owner: 10RLazarus) [01:35:08] !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:36:14] !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:37:45] !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:38:44] !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:45:39] !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:48:47] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:51:17] !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:54:34] !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [01:55:37] !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host clouddb1025.eqiad.wmnet with OS bookworm [01:55:47] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048799 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1025.eqiad.wmnet with OS bookworm [02:00:45] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [02:00:53] (03PS1) 10Theprotonade: Enable bulk OCR on beta wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174583 (https://phabricator.wikimedia.org/T400281) [02:03:07] !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host clouddb1024.eqiad.wmnet with OS bookworm [02:03:19] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048802 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1024.eqiad.wmnet with OS bookworm [02:07:30] !log vriley@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1025.eqiad.wmnet with reason: host reimage [02:13:31] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1025.eqiad.wmnet with reason: host reimage [02:14:51] (03CR) 10Samwilson: [C:03+1] Enable bulk OCR on beta wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174583 (https://phabricator.wikimedia.org/T400281) (owner: 10Theprotonade) [02:15:08] !log vriley@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1024.eqiad.wmnet with reason: host reimage [02:18:40] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1024.eqiad.wmnet with reason: host reimage [02:19:59] PROBLEM - OSPF status on cr3-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:20:25] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:20:59] RECOVERY - OSPF status on cr3-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:21:21] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:23:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:23:28] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048849 (10VRiley-WMF) [02:28:31] !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002" [02:28:57] !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002" [02:28:58] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1025.eqiad.wmnet with OS bookworm [02:29:07] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1025.eqiad.wmnet with OS bookworm completed: - c... [02:33:34] !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002" [02:33:54] !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002" [02:33:55] !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1024.eqiad.wmnet with OS bookworm [02:34:03] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1024.eqiad.wmnet with OS bookworm completed: - c... [02:34:41] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048867 (10VRiley-WMF) 05Open→03Resolved This has been completed [03:09:28] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:10:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:25:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:49:50] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [04:41:01] PROBLEM - Backup freshness on backup1014 is CRITICAL: All failures: 1 (gitlab1004), Fresh: 136 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:09:43] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:19:04] (03PS2) 10Tiziano Fogli: alertmanager/api/ro: permit ro api calls to domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/1174500 (https://phabricator.wikimedia.org/T400443) [05:28:16] 06SRE, 06Traffic: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11048975 (10Joe) >>! In T400119#11047688, @DavidBrooks wrote: > @Joe I wasn't addressing AWB used as a bot, but as an interactive Windows app. Still, the rest of your comment seems applicable. T... [05:40:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/0/1:1 (Transport: cr2-eqord:xe-0/1/0 (Arelion, IC-314534 29ms 10Gbps wave) {#10694_12249-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [05:41:39] FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-codfw and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [05:41:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T0600) [06:00:05] marostegui, Amir1, and federico3: #bothumor I � Unicode. All rise for Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T0600). [06:09:43] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:23:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:26:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:30:41] (03CR) 10Tiziano Fogli: [C:03+2] nrpe wrapper: enable nrpe2nodexp for check_disk_space (for testing) [puppet] - 10https://gerrit.wikimedia.org/r/1174411 (https://phabricator.wikimedia.org/T395446) (owner: 10Tiziano Fogli) [06:38:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:38:33] (03CR) 10Arnaudb: gitlab: binding nft throttling and its monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1174479 (https://phabricator.wikimedia.org/T400252) (owner: 10Arnaudb) [06:41:05] RECOVERY - Backup freshness on backup1014 is OK: Fresh: 137 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [06:42:22] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6464/console" [puppet] - 10https://gerrit.wikimedia.org/r/1174479 (https://phabricator.wikimedia.org/T400252) (owner: 10Arnaudb) [06:43:56] (03CR) 10Jelto: [V:03+1 C:03+1] "lgtm, thanks for the patch!" [puppet] - 10https://gerrit.wikimedia.org/r/1174479 (https://phabricator.wikimedia.org/T400252) (owner: 10Arnaudb) [06:44:17] (03CR) 10Arnaudb: [C:03+2] gitlab: binding nft throttling and its monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1174479 (https://phabricator.wikimedia.org/T400252) (owner: 10Arnaudb) [06:54:21] (03CR) 10Jelto: [C:03+2] "The coobook stops when this dns check fails and you can just type "retry". So I don't think we need a dedicated retry here" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174417 (https://phabricator.wikimedia.org/T400252) (owner: 10Jelto) [07:00:04] Amir1, Urbanecm, and awight: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T0700). [07:00:05] koi: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:11] o/ [07:01:59] (03Merged) 10jenkins-bot: sre.gitlab.failover: use hostname in wipe-cache [cookbooks] - 10https://gerrit.wikimedia.org/r/1174417 (https://phabricator.wikimedia.org/T400252) (owner: 10Jelto) [07:07:08] hello, anyone can deploy here? [07:09:28] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:30:31] PROBLEM - Disk space on an-worker1128 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/c 149587 MB (3% inode=99%): /var/lib/hadoop/data/d 149028 MB (3% inode=99%): /var/lib/hadoop/data/j 162298 MB (4% inode=99%): /var/lib/hadoop/data/f 152526 MB (4% inode=99%): /var/lib/hadoop/data/g 150553 MB (4% inode=99%): /var/lib/hadoop/data/i 150093 MB (3% inode=99%): /var/lib/hadoop/data/b 149573 MB (3% inode=99%): /var/lib/hadoop/data [07:30:31] 4 MB (4% inode=99%): /var/lib/hadoop/data/e 154133 MB (4% inode=99%): /var/lib/hadoop/data/h 154237 MB (4% inode=99%): /var/lib/hadoop/data/k 152415 MB (4% inode=99%): /var/lib/hadoop/data/m 150585 MB (4% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1128&var-datasource=eqiad+prometheus/ops [07:31:32] (03CR) 10Vgutierrez: "text tests are happy: `0 tests failed, 0 tests skipped, 39 tests passed`" [puppet] - 10https://gerrit.wikimedia.org/r/1174501 (https://phabricator.wikimedia.org/T400753) (owner: 10CDanis) [07:35:17] (03CR) 10Vgutierrez: [C:03+1] varnish: tests: env var for which docker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1174505 (owner: 10CDanis) [07:37:59] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11049050 (10jeremyb-phone) please send advanced notice of changes like this to wikitech-ambassadors and make sure it gets tagged #User-notice on phab. thank you! [07:43:11] (03CR) 10Vgutierrez: [C:03+1] traffic: copied some fixes from haproxykafka [alerts] - 10https://gerrit.wikimedia.org/r/1174416 (owner: 10Fabfur) [07:43:35] (03CR) 10Filippo Giunchedi: [C:03+1] alertmanager/api/ro: permit ro api calls to domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/1174500 (https://phabricator.wikimedia.org/T400443) (owner: 10Tiziano Fogli) [07:44:26] (03CR) 10Filippo Giunchedi: [C:03+1] centrallog: Add sampling rules for debug logging [puppet] - 10https://gerrit.wikimedia.org/r/1173442 (https://phabricator.wikimedia.org/T383309) (owner: 10Andrea Denisse) [07:48:14] (03CR) 10Tiziano Fogli: [C:03+2] alertmanager/api/ro: permit ro api calls to domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/1174500 (https://phabricator.wikimedia.org/T400443) (owner: 10Tiziano Fogli) [07:49:50] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [07:51:08] (03CR) 10Fabfur: "tnx" [alerts] - 10https://gerrit.wikimedia.org/r/1174416 (owner: 10Fabfur) [07:51:15] (03CR) 10Fabfur: [C:03+2] traffic: copied some fixes from haproxykafka [alerts] - 10https://gerrit.wikimedia.org/r/1174416 (owner: 10Fabfur) [08:18:49] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, July 31 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang) [08:21:09] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11049106 (10QChris) >>! In T400847#11048604, @Dzahn wrote: > The correct spelling of the domain is `@quelltextlich.at`. Correct! Thanks for spotting it and calling it out. (email fro... [08:43:31] I'm starting a series of partition reassignments on kafka-jumbo. An alert might fire, for which I'll then put the appropriate silence [08:52:03] FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster jumbo-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=jumbo-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions [08:52:31] ^ acked and silenced [08:53:10] (if there is a better way, as in putting a pre-emptive silence, I looked for that in alertmanager as well as grafana.wikimedia.org/alerts, to no avail) [09:02:25] FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:02:35] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye [09:04:59] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [09:07:29] (03CR) 10Vgutierrez: [C:03+1] "NS records looking good now :)" [dns] - 10https://gerrit.wikimedia.org/r/1174007 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [09:13:15] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye [09:14:09] (03PS3) 10Arnaudb: gerrit: Switchover gerrit1003 → gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) [09:15:47] (03CR) 10Arnaudb: "I've added 2 lookups to avoid breaking the rename plugin during the switchover." [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [09:24:31] (03CR) 10MVernon: "Hi," [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/1156924 (owner: 10Eevans) [09:25:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/0/1:1 (Transport: cr2-eqord:xe-0/1/0 (Arelion, IC-314534 29ms 10Gbps wave) {#10694_12249-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [09:26:39] RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr2-codfw and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [09:28:11] (03Abandoned) 10FNegri: wikireplicas: delete obsolete maintain* users [puppet] - 10https://gerrit.wikimedia.org/r/1151655 (https://phabricator.wikimedia.org/T395432) (owner: 10FNegri) [09:30:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/0/1:1 (Transport: cr2-eqord:xe-0/1/0 (Arelion, IC-314534 29ms 10Gbps wave) {#10694_12249-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [09:31:05] (03PS1) 10Arnaudb: gerrit: default owner: Gerrit Managers > Admnistrators [puppet] - 10https://gerrit.wikimedia.org/r/1174669 (https://phabricator.wikimedia.org/T399689) [09:31:07] (03CR) 10Arnaudb: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174669 (https://phabricator.wikimedia.org/T399689) (owner: 10Arnaudb) [09:43:26] (03PS1) 10Arnaudb: gerrit: site.pp cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1174671 (https://phabricator.wikimedia.org/T338470) [09:43:26] (03CR) 10Arnaudb: "little trimming" [puppet] - 10https://gerrit.wikimedia.org/r/1174671 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [09:44:03] (03CR) 10Hnowlan: [C:04-1] Add configurable --kill-after parameter for subprocess timeout (032 comments) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [09:45:03] (03CR) 10Hnowlan: "We should wrap the image version bump with this release once it's built in the dependent change" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [09:50:24] !log upgrading haproxykafka to v0.3.13 on A:cp (T400199) [09:50:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:29] T400199: Prevent HaproxyKafka from hanging - https://phabricator.wikimedia.org/T400199 [09:51:33] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:51:45] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6466/co" [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [09:52:53] (03Abandoned) 10FNegri: wikireplicas: split db config into separate file [puppet] - 10https://gerrit.wikimedia.org/r/1151656 (https://phabricator.wikimedia.org/T395266) (owner: 10FNegri) [09:56:33] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1000) [10:01:47] !log haproxykafka upgraded to v0.3.13 on A:cp (T400199) [10:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:53] T400199: Prevent HaproxyKafka from hanging - https://phabricator.wikimedia.org/T400199 [10:02:14] (03CR) 10Jelto: [V:03+1] gerrit: Switchover gerrit1003 → gerrit2003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [10:02:25] RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:02:51] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Reimage sretest2009 as a wikikube worker and assess performance - https://phabricator.wikimedia.org/T400871 (10Clement_Goubert) 03NEW [10:03:15] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Reimage sretest2009 as a wikikube worker and assess performance - https://phabricator.wikimedia.org/T400871#11049267 (10Clement_Goubert) p:05Triage→03Medium [10:04:59] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:13:35] (03PS1) 10Elukey: CHANGELOG: add changelogs for release v11.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1174682 [10:13:43] (03PS1) 10Clément Goubert: abstractwiki-rust-web: remove out of repo Depends [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174684 [10:14:24] (03CR) 10Elukey: [C:03+2] CHANGELOG: add changelogs for release v11.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1174682 (owner: 10Elukey) [10:17:04] (03CR) 10Elukey: [V:03+2 C:03+2] CHANGELOG: add changelogs for release v11.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1174682 (owner: 10Elukey) [10:19:22] !log cmooney@cumin1003 START - Cookbook sre.dns.netbox [10:19:53] (03PS5) 10Clément Goubert: python-build: README.md: Clarify some text [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174499 (owner: 10Ahmon Dancy) [10:19:53] (03PS3) 10Clément Goubert: python: Include python3-venv package in python base image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [10:20:22] (03PS1) 10Gergő Tisza: varnish: Fix X-Wikimedia-Debug cookie VCL rule [puppet] - 10https://gerrit.wikimedia.org/r/1174685 (https://phabricator.wikimedia.org/T350094) [10:21:10] (03CR) 10Clément Goubert: "This was failing because of an out-of repository `Depends` in the `abstractwiki-rust-web` image. I've pushed a patch to fix and rebased ov" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [10:21:28] (03PS1) 10Arnaudb: gerrit: add httpbb monitoring to gerrit-spare [puppet] - 10https://gerrit.wikimedia.org/r/1174674 (https://phabricator.wikimedia.org/T387833) [10:21:42] (03PS1) 10Arnaudb: gerrit: add certificate request for gerrit-spare.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1174672 (https://phabricator.wikimedia.org/T387833) [10:22:21] (03PS1) 10Majavah: acme_chief: designate: Include problematic zone in error message [puppet] - 10https://gerrit.wikimedia.org/r/1174686 (https://phabricator.wikimedia.org/T400873) [10:22:32] (03PS3) 10Ahmon Dancy: python-build/bookworm/Dockerfile.template: Modernize [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174507 [10:25:00] cmooney@cumin1003 netbox (PID 4168377) is awaiting input [10:25:21] (03CR) 10FNegri: [C:03+1] acme_chief: designate: Include problematic zone in error message [puppet] - 10https://gerrit.wikimedia.org/r/1174686 (https://phabricator.wikimedia.org/T400873) (owner: 10Majavah) [10:25:28] !log cmooney@cumin1003 START - Cookbook sre.dns.netbox [10:25:32] (03CR) 10Majavah: [C:03+2] acme_chief: designate: Include problematic zone in error message [puppet] - 10https://gerrit.wikimedia.org/r/1174686 (https://phabricator.wikimedia.org/T400873) (owner: 10Majavah) [10:26:30] (03PS4) 10Clément Goubert: python-build/bookworm/Dockerfile.template: Modernize [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174507 (owner: 10Ahmon Dancy) [10:26:35] (03PS1) 10Elukey: Upstream release v11.4.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1174687 [10:26:52] (03CR) 10Elukey: [V:03+2 C:03+2] Upstream release v11.4.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1174687 (owner: 10Elukey) [10:27:15] (03CR) 10Clément Goubert: [C:03+1] python-build: README.md: Clarify some text [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174499 (owner: 10Ahmon Dancy) [10:31:10] cmooney@cumin1003 netbox (PID 4170276) is awaiting input [10:31:56] !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003" [10:32:01] !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003" [10:32:01] !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:36:49] !log uploaded spicerack_11.4.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia [10:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:14] (03PS2) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) [10:46:30] (03CR) 10CI reject: [V:04-1] Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [10:51:32] (03CR) 10Vgutierrez: [C:03+1] "checked that the 98 unique domains have a symlink on the DNS repo with:" [puppet] - 10https://gerrit.wikimedia.org/r/1174010 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [10:53:31] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876 (10MatthewVernon) 03NEW [10:57:35] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877 (10MatthewVernon) 03NEW [11:03:28] 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new JBOD disk controllers into SM swift backends - https://phabricator.wikimedia.org/T400878 (10MatthewVernon) 03NEW [11:03:42] 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new JBOD disk controllers into SM swift backends - https://phabricator.wikimedia.org/T400878#11049458 (10MatthewVernon) [11:03:43] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11049460 (10MatthewVernon) [11:03:44] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11049459 (10MatthewVernon) [11:07:45] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device ssw1-d8-eqiad [11:08:04] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-eqiad [11:09:28] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:14:01] (03CR) 10Cathal Mooney: [C:03+1] "LGTM! Great work. One open question I wonder does it make sense to delete the 'self-signed' cert? I guess tricky maybe as the server-pr" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174407 (owner: 10Ayounsi) [11:21:10] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11049531 (10Joe) [11:25:26] 06SRE, 06Traffic, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881 (10Joe) 03NEW [11:26:13] 06SRE, 10MediaWiki-File-management, 06Traffic, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049563 (10Joe) a:05Joe→03None [11:32:38] (03PS3) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) [11:34:49] (03CR) 10CI reject: [V:04-1] Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [11:36:36] (03PS4) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) [11:42:08] (03CR) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout (032 comments) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [11:42:17] (03CR) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout (031 comment) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [11:42:41] (03PS5) 10Effie Mouzeli: Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) [11:44:22] 07Puppet, 06SRE, 10Cloud-Services, 13Patch-For-Review: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#11049640 (10taavi) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ an... [11:45:09] 07Puppet, 06SRE, 10Cloud-VPS, 13Patch-For-Review: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#11049643 (10taavi) [11:45:53] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [11:50:52] (03PS1) 10Majavah: openstack: designate: Test makedomain with Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/1174696 (https://phabricator.wikimedia.org/T218426) [11:51:49] (03CR) 10Majavah: [C:03+2] openstack: designate: Test makedomain with Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/1174696 (https://phabricator.wikimedia.org/T218426) (owner: 10Majavah) [11:53:01] 07Puppet, 10MediaWiki-Vagrant: Vagrant -> mwvagrant alias in role::labs::mediawiki_vagrant is brittle - https://phabricator.wikimedia.org/T195592#11049703 (10taavi) [12:00:04] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1200) [12:04:59] (03CR) 10Clément Goubert: [C:03+1] "LGTM, thanks for adding tests!" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [12:06:18] 06SRE, 10MediaWiki-File-management, 06Traffic, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049785 (10Peachey88) [12:07:59] (03CR) 10Effie Mouzeli: [C:03+2] Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [12:09:37] 06SRE, 10MediaWiki-File-management, 06Traffic: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049796 (10taavi) (#user-notice, not relevant to editors of Wikimedia wikis.) [12:11:33] (03Merged) 10jenkins-bot: Add configurable --kill-after parameter for subprocess timeout [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1174445 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [12:13:42] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114#11049843 (10taavi) 05Open→03Resolved I /think/ this is done for cloudvirts and ceph nodes are tracked separately... [12:16:00] (03CR) 10Dbrant: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174080 (owner: 10PipelineBot) [12:17:41] (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174080 (owner: 10PipelineBot) [12:18:13] !log dbrant@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply [12:19:01] !log dbrant@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply [12:19:20] (03PS2) 10Mszwarc: Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 [12:19:29] !log dbrant@deploy1003 helmfile [eqiad] START helmfile.d/services/mobileapps: apply [12:20:35] (03CR) 10Mszwarc: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [12:20:40] !log dbrant@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply [12:20:47] !log dbrant@deploy1003 helmfile [codfw] START helmfile.d/services/mobileapps: apply [12:21:31] !log dbrant@deploy1003 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply [12:22:04] (03CR) 10Dbrant: [C:03+2] wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174530 (owner: 10PipelineBot) [12:23:44] (03Merged) 10jenkins-bot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174530 (owner: 10PipelineBot) [12:24:19] !log dbrant@deploy1003 helmfile [staging] START helmfile.d/services/wikifeeds: apply [12:24:40] !log dbrant@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifeeds: apply [12:24:42] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1174701 (owner: 10L10n-bot) [12:25:09] !log dbrant@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply [12:25:39] !log dbrant@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply [12:25:50] !log dbrant@deploy1003 helmfile [codfw] START helmfile.d/services/wikifeeds: apply [12:26:16] !log dbrant@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply [12:33:37] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d8-eqiad [12:35:22] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, July 31 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [12:35:57] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device lsw1-d8-eqiad [12:36:03] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d1-eqiad [12:37:09] 10ops-eqiad, 06SRE, 06DC-Ops: ssw1-f1-eqiad: Fan Spinning Upgraded - https://phabricator.wikimedia.org/T400783#11049950 (10Jclark-ctr) a:03Jclark-ctr This might be due to the recently installed hot aisle containment in EQIAD. Adjusted the hotlocks on the cold aisle side of the rack and checked airflow from... [12:38:43] (03CR) 10Gergő Tisza: ""Enable multiple authenticators for 10% of users" would be a better description of what this patch does." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174049 (https://phabricator.wikimedia.org/T400579) (owner: 10Mstyles) [12:40:03] (03PS3) 10Fabfur: traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) [12:41:13] cmooney@cumin1003 tls (PID 4182340) is awaiting input [12:41:39] (03CR) 10CI reject: [V:04-1] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [12:46:28] (03PS4) 10Fabfur: traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) [12:47:41] (03PS1) 10Zabe: Stop setting wgGlobalUsageDatabase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174715 (https://phabricator.wikimedia.org/T400169) [12:48:18] (03CR) 10CI reject: [V:04-1] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [12:49:20] (03PS2) 10Effie Mouzeli: thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) [12:49:39] (03PS1) 10Jelto: sre.gitlab.failover: manage home_page_url in cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1174718 (https://phabricator.wikimedia.org/T400695) [12:50:40] (03CR) 10CI reject: [V:04-1] thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [12:52:21] (03PS5) 10Fabfur: traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) [12:52:53] (03PS3) 10Effie Mouzeli: thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) [12:53:32] (03CR) 10CI reject: [V:04-1] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [12:55:03] (03CR) 10Arnaudb: [C:03+1] "lgtm!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174718 (https://phabricator.wikimedia.org/T400695) (owner: 10Jelto) [12:55:08] 06SRE, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06Traffic: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11050004 (10A_smart_kitten) (assuming this will also apply to QuickInstantCommons, rem... [12:56:11] (03CR) 10Hashar: [C:03+1] "I don't have +2 😊" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174499 (owner: 10Ahmon Dancy) [12:58:45] 06SRE, 10SRE-Access-Requests, 06MW-Interfaces-Team: Requesting access to analytics-privatedata-users, SSH and Kerberos for HCoplin-WMF - https://phabricator.wikimedia.org/T400897 (10HCoplin-WMF) 03NEW [12:59:31] !log install spicerack 11.4.0 on cumin2002 [12:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1300) [13:00:05] koi and Msz2001: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:16] o/ [13:00:19] o/ [13:00:19] o/ [13:01:02] I can deploy! [13:02:06] koi: clarification question: where is it defined that stewards can add/remove the scrutineer group? [13:03:09] well, if this group is not defined in wgAddGroups/wgRemoveGroups, it can only be add/remove by stew [13:03:19] ah, can stewards add/remove all groups? [13:03:29] yes, the local 'steward' group has 'userrights' [13:03:32] yes, stew can modify all groups [13:03:46] I see, thanks [13:03:48] (03CR) 10Effie Mouzeli: thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [13:04:11] (03PS6) 10Fabfur: traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) [13:04:29] ok then I’ll just edit the patch to add two trailing commas and it should be good to go [13:04:32] (03PS1) 10Brouberol: deployment_server: define opensearch-test kubeconfigs in dse-k8s-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1174720 (https://phabricator.wikimedia.org/T397246) [13:04:37] and I should check if the tables were already created or not [13:05:07] (03CR) 10Clément Goubert: [V:03+2 C:03+2] abstractwiki-rust-web: remove out of repo Depends [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174684 (owner: 10Clément Goubert) [13:05:14] looks like securepoll_log doesn’t exist yet [13:05:16] (03CR) 10Clément Goubert: [C:03+2] python-build: README.md: Clarify some text [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174499 (owner: 10Ahmon Dancy) [13:05:18] (03CR) 10Clément Goubert: [V:03+2 C:03+2] python-build: README.md: Clarify some text [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174499 (owner: 10Ahmon Dancy) [13:05:23] (03CR) 10CI reject: [V:04-1] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [13:06:03] all other tables are present AFAICT [13:06:05] oh i just notice i forget some trailing commas :) [13:06:20] thanks for checking [13:06:22] (03PS1) 10Brouberol: dse-k8s-eqaid: define an opensearch-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174721 (https://phabricator.wikimedia.org/T397246) [13:06:35] np, I didn’t point it out earlier because I didn’t want to lose all the +1 votes [13:07:05] uhhhh [13:07:09] https://wikitech.wikimedia.org/wiki/Creating_new_tables suggests I might need DBA signoff [13:07:40] (03PS2) 10Brouberol: deployment_server: define opensearch-test kubeconfigs in dse-k8s-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1174720 (https://phabricator.wikimedia.org/T400898) [13:07:48] Amir1 or other DBAs around: can I get approval to create securepoll_log on zhwiki? context is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1100228 [13:07:50] (03PS2) 10Brouberol: dse-k8s-eqaid: define an opensearch-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174721 (https://phabricator.wikimedia.org/T400898) [13:09:19] (03PS7) 10Fabfur: traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) [13:10:23] (03PS1) 10Elukey: services: update proton's Docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174722 [13:10:26] Lucas_WMDE: go for it [13:10:30] thanks! [13:11:45] hmm, I just realized there’s no SQL file for just this table [13:11:56] so sql.php might not be the best maintenance script [13:12:00] !log install spicerack 11.4.0 on cumin100* [13:12:03] (03PS3) 10CDanis: varnish: fix nocookies / wmf-uniq interaction [puppet] - 10https://gerrit.wikimedia.org/r/1174501 (https://phabricator.wikimedia.org/T400753) [13:12:03] (03PS2) 10CDanis: varnish: tests: env var for which docker [puppet] - 10https://gerrit.wikimedia.org/r/1174505 [13:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:16] yeah definitely not, that file says `CREATE TABLE` not `CREATE TABLE IF NOT EXIST` /o\ [13:12:20] so idk what it would do to the existing tables [13:12:33] (03CR) 10CDanis: varnish: fix nocookies / wmf-uniq interaction (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1174501 (https://phabricator.wikimedia.org/T400753) (owner: 10CDanis) [13:12:40] sql zhwiki --write [13:12:43] ? [13:12:44] so… I probably want create ExtensionTables? [13:13:13] Amir1: based on https://wikitech.wikimedia.org/wiki/Creating_new_tables#Deployment I thought we should call sql.php with the path to an SQL file, but there’s no SQL file for just the one table we want to create [13:13:36] do you mean I should just the relevant part of it into an interactive SQL script? [13:13:41] yeah [13:13:44] ok [13:14:16] > ERROR: Unexpected option --write! [13:14:23] I think it’s the default if I don’t specify --replicadb ^^ [13:15:12] sql.php is different sql bin [13:15:21] !log created securepoll_log on zhwiki via sql.php (T380020) [13:15:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:27] T380020: Enable SecurePoll extension on zhwiki - https://phabricator.wikimedia.org/T380020 [13:15:27] oh [13:15:32] sql ≠ mwscript sql? [13:16:14] (03CR) 10Elukey: [C:03+2] services: update proton's Docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174722 (owner: 10Elukey) [13:16:25] meh, I tried to `DESCRIBE securepoll_log` in `mwscript sql zhwiki --replicadb=any` and it gave me an error that the server is read-only o_O [13:16:25] I think sql bin is mysql.php [13:16:30] since when is DESCRIBE a write query [13:16:36] !log andrew@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye [13:16:53] but SELECT COUNT(*) FROM securepoll_log works without an error, so I assume the table creation replicated properly [13:17:18] https://orchestrator.wikimedia.org/web/cluster/alias/s2 [13:17:26] replication is not broken [13:17:55] Amir1: you’re right, /usr/local/bin/sql is a pretty thin bash wrapper around mwscript mysql.php [13:18:14] ok then let’s spiderpig that config change [13:19:09] (03PS11) 10Lucas Werkmeister (WMDE): zhwiki: Allow local securepoll setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang) [13:19:35] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Done: https://sal.toolforge.org/log/d0qfYJgB8tZ8Ohr0e8WO" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang) [13:19:44] 06SRE, 10SRE-Access-Requests: Restore Taavi's analytics-privatedata-users membership - https://phabricator.wikimedia.org/T400900 (10taavi) 03NEW [13:19:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang) [13:20:22] (03CR) 10Clément Goubert: thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [13:20:30] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye [13:20:48] (03PS1) 10Cathal Mooney: Add RO user for jclark [homer/public] - 10https://gerrit.wikimedia.org/r/1174724 [13:20:48] (03Merged) 10jenkins-bot: zhwiki: Allow local securepoll setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang) [13:21:04] (03Abandoned) 10Elukey: DNM: reimage - test for new cp nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/1173418 (https://phabricator.wikimedia.org/T392851) (owner: 10Elukey) [13:21:15] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1100228|zhwiki: Allow local securepoll setup (T380020)]] [13:21:20] T380020: Enable SecurePoll extension on zhwiki - https://phabricator.wikimedia.org/T380020 [13:21:24] (03Abandoned) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463 (owner: 10Elukey) [13:21:41] (03Abandoned) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170462 (owner: 10Elukey) [13:21:42] (03CR) 10Ssingh: [C:03+1] Add RO user for jclark [homer/public] - 10https://gerrit.wikimedia.org/r/1174724 (owner: 10Cathal Mooney) [13:22:05] By the Lucas_WMDE, Amir1, sql should work with mwscript-k8s now, libmysqlclient is in the mediawiki-cli image [13:22:09] (03PS1) 10Ayounsi: dhcpd.conf: add Nokia specific class [puppet] - 10https://gerrit.wikimedia.org/r/1174725 [13:22:11] !log cmooney@cumin1003 END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d1-eqiad [13:22:17] By the way* [13:22:17] (03PS1) 10Vgutierrez: cache::haproxy: Deploy public JWT key used by MW [puppet] - 10https://gerrit.wikimedia.org/r/1174726 (https://phabricator.wikimedia.org/T400238) [13:23:17] (03PS4) 10Effie Mouzeli: thumbor: Add configurable SUBPROCESS_TIMEOUT_KILL_AFTER [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) [13:23:33] !log lucaswerkmeister-wmde@deploy1003 stang, lucaswerkmeister-wmde: Backport for [[gerrit:1100228|zhwiki: Allow local securepoll setup (T380020)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:23:55] ooh, that's nice [13:23:55] (03CR) 10Hashar: ":old-man-yields-at-debian: that is partly because Debian pip tries to prevent people from overriding python code provided by Debian. The" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [13:24:19] 06SRE, 13Patch-For-Review: FY25/26 WE4.3.1: edge uniques in requestctl - https://phabricator.wikimedia.org/T400753#11050143 (10CDanis) [13:24:22] claime: oooh nice [13:24:32] 06SRE, 13Patch-For-Review: FY25/26 WE4.3.1: edge uniques in requestctl - https://phabricator.wikimedia.org/T400753#11050144 (10CDanis) p:05Triage→03Medium [13:24:43] 06SRE, 10Hiddenparma, 13Patch-For-Review: FY25/26 WE4.3.1: edge uniques in requestctl - https://phabricator.wikimedia.org/T400753#11050145 (10CDanis) [13:24:43] claime: nice! [13:24:46] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174726 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:24:47] Feel free to test it and tell us if something's borked [13:24:47] (I was in the middle of cobbling together an sql.php call with --file= when I realized there was no suitable file :D) [13:24:53] (so I still haven’t tried out the --file thing) [13:25:11] Lucas_WMDE, I've tested this patch and LGTM [13:25:22] (03CR) 10Cathal Mooney: [C:03+2] Add RO user for jclark [homer/public] - 10https://gerrit.wikimedia.org/r/1174724 (owner: 10Cathal Mooney) [13:25:28] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye [13:25:30] !log lucaswerkmeister-wmde@deploy1003 stang, lucaswerkmeister-wmde: Continuing with sync [13:25:31] thanks! [13:25:52] (03Merged) 10jenkins-bot: Add RO user for jclark [homer/public] - 10https://gerrit.wikimedia.org/r/1174724 (owner: 10Cathal Mooney) [13:25:58] echo "whateverSQL" | maintenance/run.php sql ? [13:26:12] btw, would you mind have a look at this one? not sure if it's the right place, but https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/1171756 [13:28:26] (03CR) 10Ssingh: [C:03+2] beta: redirect misc *.beta.wmflabs.org to *.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1170188 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle) [13:28:45] (03CR) 10CDanis: [C:03+1] cache::haproxy: Deploy public JWT key used by MW [puppet] - 10https://gerrit.wikimedia.org/r/1174726 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:29:01] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d1-eqiad [13:29:15] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-eqiad [13:29:34] !log elukey@deploy1003 helmfile [staging] START helmfile.d/services/proton: sync [13:29:41] koi: looks reasonable to me, added some minor comments [13:30:14] !log elukey@deploy1003 helmfile [staging] DONE helmfile.d/services/proton: sync [13:30:39] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1100228|zhwiki: Allow local securepoll setup (T380020)]] (duration: 09m 24s) [13:30:47] T380020: Enable SecurePoll extension on zhwiki - https://phabricator.wikimedia.org/T380020 [13:31:09] (03PS1) 10Brouberol: kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 [13:31:18] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance [13:31:36] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [13:31:39] (03PS3) 10Bking: opensearch-operator: Add chart for review [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174038 (https://phabricator.wikimedia.org/T397246) [13:31:43] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80337 and previous config saved to /var/cache/conftool/dbconfig/20250731-133143-ladsgroup.json [13:31:52] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [13:32:28] (03PS6) 10Ssingh: hiera: service.yaml: use better aliasing for text/upload [puppet] - 10https://gerrit.wikimedia.org/r/1168192 [13:32:36] 06SRE, 10SRE-Access-Requests, 06MW-Interfaces-Team: Requesting access to analytics-privatedata-users, SSH and Kerberos for HCoplin-WMF - https://phabricator.wikimedia.org/T400897#11050165 (10Bmueller) Approved! @HCoplin-WMF [13:33:39] (03CR) 10Jelto: [C:03+2] sre.gitlab.failover: manage home_page_url in cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1174718 (https://phabricator.wikimedia.org/T400695) (owner: 10Jelto) [13:33:59] (03CR) 10Lucas Werkmeister (WMDE): Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [13:34:11] Msz2001: ^ question for you in there [13:34:19] (disregard the second comment, that’s just me having fun :P) [13:35:05] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80338 and previous config saved to /var/cache/conftool/dbconfig/20250731-133504-ladsgroup.json [13:35:11] it’s still possible to deploy the config change, but usually we would wait until the relevant commit is live on testwiki [13:35:17] 06SRE, 10SRE-Access-Requests: Restore Taavi's analytics-privatedata-users membership - https://phabricator.wikimedia.org/T400900#11050172 (10joanna_borun) Approved [13:35:44] (03CR) 10Mszwarc: Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [13:36:47] (03CR) 10Vgutierrez: [C:03+2] cache::haproxy: Deploy public JWT key used by MW [puppet] - 10https://gerrit.wikimedia.org/r/1174726 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:37:18] (03PS2) 10Ayounsi: [WIP] Nokia ZTP [puppet] - 10https://gerrit.wikimedia.org/r/1174725 [13:37:44] (03CR) 10CI reject: [V:04-1] [WIP] Nokia ZTP [puppet] - 10https://gerrit.wikimedia.org/r/1174725 (owner: 10Ayounsi) [13:38:30] (03CR) 10Ssingh: [C:03+2] hiera: service.yaml: use better aliasing for text/upload [puppet] - 10https://gerrit.wikimedia.org/r/1168192 (owner: 10Ssingh) [13:38:36] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device ssw1-d8-eqiad [13:38:45] (03CR) 10CI reject: [V:04-1] kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [13:38:49] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-eqiad [13:40:00] Lucas_WMDE: I've responded. Generally, PSI team plans to showcase the feature on Wikimania and while this config patch for now is not critical, without it may happen that some people will be unable to use this feature there (even if it's just testwiki). But there's no pressure. If you prefer, we can have it deployed next week [13:40:41] ok, trying to understand the linked change now [13:41:15] previously, it looked for the option, which on testwiki is set to be conditional, and enabled for all named users (IIUC) [13:41:50] (03Merged) 10jenkins-bot: sre.gitlab.failover: manage home_page_url in cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1174718 (https://phabricator.wikimedia.org/T400695) (owner: 10Jelto) [13:42:13] Previously, the UIC preference was visible only if the UIC was default on that wiki. This is something that doesn't scale, so we added a proper switch for making UIC preference visible, even if it's not the default [13:42:23] oh [13:42:28] this is changing the *type* of the preference [13:42:35] between “toggle on Special:Preferences” and “only set via api.php” [13:42:42] is that what “api” means in that context [13:42:45] FIRING: WidespreadPuppetFailure: Puppet has failed in esams - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [13:42:49] Yes [13:42:53] I see [13:43:01] ok yeah fair enough [13:43:03] let’s go ahead then [13:43:07] thanks for explaining :) [13:43:10] Okay, thanks! [13:43:13] (03CR) 10Lucas Werkmeister (WMDE): Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [13:43:14] yw :) [13:43:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [13:44:28] (03Merged) 10jenkins-bot: Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174700 (owner: 10Mszwarc) [13:44:52] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1174700|Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki]] [13:47:02] !log lucaswerkmeister-wmde@deploy1003 mszwarc, lucaswerkmeister-wmde: Backport for [[gerrit:1174700|Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:47:05] even though I’m still a bit confused :D wouldn’t getBoolOption() still return true (and make the option visible) on testwiki, due to the $wgConditionalUserOptions? [13:47:35] Except for those people who turned it off by themselves [13:47:44] ah, true [13:47:45] FIRING: [3x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [13:47:53] so now they can see the preference and, if they like, turn it on again [13:47:56] These may include some early-adopters, whose opinion can still be valuable [13:48:02] Exactly [13:48:10] ^ widespread failure is known, related to a recent cp change, we are looking [13:48:12] anyway, is there anything to test? ^^ [13:48:31] Not really, at leastnothing appears to be broken [13:48:33] sukhe: let me know if I should hold deploying (currently at mwdebug) [13:48:36] Msz2001: ok [13:48:43] Lucas_WMDE: not at all, please continue. thanks for checking. [13:48:46] ok :) [13:48:50] !log lucaswerkmeister-wmde@deploy1003 mszwarc, lucaswerkmeister-wmde: Continuing with sync [13:50:12] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80339 and previous config saved to /var/cache/conftool/dbconfig/20250731-135012-ladsgroup.json [13:52:27] (03PS1) 10Vgutierrez: cache::haproxy:: Create /etc/haproxy/jwt explicitly [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) [13:52:42] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:52:45] FIRING: [4x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [13:52:53] (03CR) 10CI reject: [V:04-1] cache::haproxy:: Create /etc/haproxy/jwt explicitly [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:53:46] (03PS2) 10Vgutierrez: cache::haproxy:: Create /etc/haproxy/jwt explicitly [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) [13:54:26] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174700|Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki]] (duration: 09m 33s) [13:54:42] Thanks! [13:54:45] np :) [13:54:48] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [13:54:49] !log UTC afternoon backport+config window done [13:54:51] just in time ^^ [13:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:45] FIRING: [5x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [14:04:06] (03PS3) 10Vgutierrez: cache::haproxy: Fix /etc/haproxy/jwt on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) [14:04:33] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:05:20] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80340 and previous config saved to /var/cache/conftool/dbconfig/20250731-140519-ladsgroup.json [14:06:59] (03PS4) 10Vgutierrez: cache::haproxy: Fix /etc/haproxy/jwt on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) [14:07:02] (03PS11) 10Arnaudb: gerrit: Switchover gerrit1003 → gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) [14:08:16] (03CR) 10Ssingh: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:08:36] (03CR) 10Arnaudb: [C:04-2] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1172625 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [14:08:40] (03PS5) 10Vgutierrez: cache::haproxy: Fix /etc/haproxy/jwt on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) [14:09:01] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:11:17] (03CR) 10Ssingh: [C:03+1] cache::haproxy: Fix /etc/haproxy/jwt on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:11:32] 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Noarave from WMF systems - https://phabricator.wikimedia.org/T399953#11050299 (10taavi) [14:11:43] (03CR) 10Vgutierrez: [C:03+2] cache::haproxy: Fix /etc/haproxy/jwt on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174731 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:18:04] (03PS1) 10Vgutierrez: cache::haproxy: Get rid of recurselimit warning on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174738 (https://phabricator.wikimedia.org/T400238) [14:18:21] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174738 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:19:42] (03CR) 10Bking: [C:03+1] deployment_server: define opensearch-test kubeconfigs in dse-k8s-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1174720 (https://phabricator.wikimedia.org/T400898) (owner: 10Brouberol) [14:20:12] (03CR) 10Bking: [C:03+1] dse-k8s-eqaid: define an opensearch-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174721 (https://phabricator.wikimedia.org/T400898) (owner: 10Brouberol) [14:20:28] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80341 and previous config saved to /var/cache/conftool/dbconfig/20250731-142027-ladsgroup.json [14:20:33] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [14:20:43] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance [14:20:50] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80342 and previous config saved to /var/cache/conftool/dbconfig/20250731-142050-ladsgroup.json [14:21:36] (03CR) 10Ssingh: [C:03+1] cache::haproxy: Get rid of recurselimit warning on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174738 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:23:12] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80343 and previous config saved to /var/cache/conftool/dbconfig/20250731-142311-ladsgroup.json [14:27:24] (03PS6) 10Ayounsi: sre.network.tls: add Nokia SR-Linux support [cookbooks] - 10https://gerrit.wikimedia.org/r/1174407 [14:27:45] FIRING: [5x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [14:30:04] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1430) [14:32:38] (03CR) 10Ayounsi: "Good idea! I now first clear all the possible certs." [cookbooks] - 10https://gerrit.wikimedia.org/r/1174407 (owner: 10Ayounsi) [14:35:50] 06SRE, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06Traffic: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11050371 (10Bawolff) > Provide a way to override the UA used by ForeignFileRepo to add... [14:38:20] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80344 and previous config saved to /var/cache/conftool/dbconfig/20250731-143819-ladsgroup.json [14:42:45] FIRING: [5x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [14:44:08] !log cwhite@cumin2002 START - Cookbook sre.hosts.reimage for host logstash1033.eqiad.wmnet with OS bookworm [14:47:45] RESOLVED: [4x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [14:49:48] (03CR) 10Vgutierrez: [C:03+2] cache::haproxy: Get rid of recurselimit warning on upload [puppet] - 10https://gerrit.wikimedia.org/r/1174738 (https://phabricator.wikimedia.org/T400238) (owner: 10Vgutierrez) [14:53:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80345 and previous config saved to /var/cache/conftool/dbconfig/20250731-145326-ladsgroup.json [14:54:29] (03PS1) 10JHathaway: postfix: allow sasl auth clients to use non-FQDN in helo [puppet] - 10https://gerrit.wikimedia.org/r/1174741 (https://phabricator.wikimedia.org/T400882) [14:54:45] (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1174741 (https://phabricator.wikimedia.org/T400882) (owner: 10JHathaway) [15:00:05] brennen and dduvall: That opportune time for a Train log triage deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1500). [15:03:05] !log cwhite@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage [15:03:36] (03CR) 10Ahmon Dancy: [C:03+1] scap: Limit foreachwikiindblist and expanddblist in beta to beta wikis [puppet] - 10https://gerrit.wikimedia.org/r/941479 (https://phabricator.wikimedia.org/T357877) (owner: 10Krinkle) [15:08:35] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80346 and previous config saved to /var/cache/conftool/dbconfig/20250731-150834-ladsgroup.json [15:08:40] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [15:08:49] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [15:08:51] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage [15:08:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80347 and previous config saved to /var/cache/conftool/dbconfig/20250731-150856-ladsgroup.json [15:09:28] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:09:43] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:12:07] (03CR) 10Hashar: [C:03+1] gerrit: default owner: Gerrit Managers > Admnistrators [puppet] - 10https://gerrit.wikimedia.org/r/1174669 (https://phabricator.wikimedia.org/T399689) (owner: 10Arnaudb) [15:13:10] (03CR) 10Hnowlan: [C:03+1] "lgtm! Please roll to staging and make some requests just to verify everything is okay." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174444 (https://phabricator.wikimedia.org/T374350) (owner: 10Effie Mouzeli) [15:13:13] !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/proton: sync [15:13:19] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80348 and previous config saved to /var/cache/conftool/dbconfig/20250731-151318-ladsgroup.json [15:14:22] !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/proton: sync [15:14:44] !log elukey@deploy1003 helmfile [eqiad] START helmfile.d/services/proton: sync [15:14:47] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#11050555 (10jhathaway) [15:15:58] !log elukey@deploy1003 helmfile [eqiad] DONE helmfile.d/services/proton: sync [15:19:43] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:26:27] (03CR) 10Jelto: [C:03+2] gerrit: default owner: Gerrit Managers > Admnistrators [puppet] - 10https://gerrit.wikimedia.org/r/1174669 (https://phabricator.wikimedia.org/T399689) (owner: 10Arnaudb) [15:27:41] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1033.eqiad.wmnet with OS bookworm [15:28:26] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80349 and previous config saved to /var/cache/conftool/dbconfig/20250731-152826-ladsgroup.json [15:30:59] (03PS3) 10CDanis: varnish: tests: env var for which docker impl [puppet] - 10https://gerrit.wikimedia.org/r/1174505 [15:31:27] (03CR) 10CDanis: varnish: tests: env var for which docker impl (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1174505 (owner: 10CDanis) [15:31:31] (03PS1) 10Ottomata: eventgate-analytics-external - bump to 1.19.0 to pick up recent ExP changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174745 (https://phabricator.wikimedia.org/T396359) [15:32:06] (03PS4) 10CDanis: varnish: fix nocookies / wmf-uniq interaction [puppet] - 10https://gerrit.wikimedia.org/r/1174501 (https://phabricator.wikimedia.org/T400753) [15:32:16] (03PS4) 10CDanis: varnish: tests: env var for which docker impl [puppet] - 10https://gerrit.wikimedia.org/r/1174505 [15:33:16] (03CR) 10CDanis: [C:03+2] varnish: fix nocookies / wmf-uniq interaction [puppet] - 10https://gerrit.wikimedia.org/r/1174501 (https://phabricator.wikimedia.org/T400753) (owner: 10CDanis) [15:35:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [15:36:56] (03CR) 10Cathal Mooney: [C:03+1] sre.network.tls: add Nokia SR-Linux support [cookbooks] - 10https://gerrit.wikimedia.org/r/1174407 (owner: 10Ayounsi) [15:38:40] !log Restarting Gerrit [15:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:46] to apply a config change [15:39:28] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11050767 (10DavidBrooks) >>! In T400119#11048975, @Joe wrote: > Oh sorry, I've never used AWB (or windows, in the last few decades). If it's a Windows application, as in used by... [15:40:43] PROBLEM - Host ps1-f4-codfw is DOWN: PING CRITICAL - Packet loss = 100% [15:41:01] RECOVERY - Host ps1-f4-codfw is UP: PING OK - Packet loss = 0%, RTA = 31.35 ms [15:41:36] FIRING: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:41:40] Any update on if the restart has finished? [15:41:49] it hasn't [15:42:05] gerrit is back [15:42:08] :D [15:42:35] yeah usually it is a couple of minutes [15:43:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80350 and previous config saved to /var/cache/conftool/dbconfig/20250731-154333-ladsgroup.json [15:44:11] generally you can tell that on whether you get an error message or not [15:44:25] FIRING: SystemdUnitFailed: helm-chartctl-package-all.service on chartmuseum1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:45:21] (03PS5) 10CDanis: varnish: tests: env var for which docker impl [puppet] - 10https://gerrit.wikimedia.org/r/1174505 [15:45:28] (03CR) 10CDanis: [C:03+2] varnish: tests: env var for which docker impl [puppet] - 10https://gerrit.wikimedia.org/r/1174505 (owner: 10CDanis) [15:45:59] !log dancy@deploy1003 Installing scap version "4.194.1" for 2 host(s) [15:46:11] 06SRE, 10Hiddenparma, 13Patch-For-Review: FY25/26 WE4.3.1: edge uniques in requestctl - https://phabricator.wikimedia.org/T400753#11050837 (10CDanis) [15:46:31] RESOLVED: [4x] ProbeDown: Service gerrit1003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:46:40] 06SRE, 10Hiddenparma, 13Patch-For-Review: FY25/26 WE4.3.1: edge uniques in requestctl - https://phabricator.wikimedia.org/T400753#11050838 (10CDanis) [15:47:45] !log dancy@deploy1003 Installation of scap version "4.194.1" completed for 2 hosts [15:47:51] (03CR) 10Vgutierrez: [C:03+1] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [15:49:25] RESOLVED: SystemdUnitFailed: helm-chartctl-package-all.service on chartmuseum1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:49:51] (03CR) 10JHathaway: [C:03+2] postfix: allow sasl auth clients to use non-FQDN in helo [puppet] - 10https://gerrit.wikimedia.org/r/1174741 (https://phabricator.wikimedia.org/T400882) (owner: 10JHathaway) [15:50:17] !log dancy@deploy1003 Started scap build-images: Publishing wmf/next image [15:57:13] (03PS1) 10Elukey: profile::thanos::recording_rules: add two rules for the EditCheck SLO [puppet] - 10https://gerrit.wikimedia.org/r/1174748 (https://phabricator.wikimedia.org/T395444) [15:57:15] (03PS1) 10Elukey: profile::pyrra::filesystem::slos: add edit-check ratio [puppet] - 10https://gerrit.wikimedia.org/r/1174749 (https://phabricator.wikimedia.org/T395444) [15:58:06] (03CR) 10Ottomata: [C:03+2] eventgate-analytics-external - bump to 1.19.0 to pick up recent ExP changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174745 (https://phabricator.wikimedia.org/T396359) (owner: 10Ottomata) [15:58:41] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80351 and previous config saved to /var/cache/conftool/dbconfig/20250731-155840-ladsgroup.json [15:58:46] (03PS1) 10Jly: miscweb(research-landing-page): bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174750 (https://phabricator.wikimedia.org/T399132) [15:58:47] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [15:58:56] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance [15:59:03] (03PS1) 10Bking: golang: add trixie-based golang-1.24 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174751 (https://phabricator.wikimedia.org/T400295) [15:59:04] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80352 and previous config saved to /var/cache/conftool/dbconfig/20250731-155903-ladsgroup.json [15:59:53] (03Merged) 10jenkins-bot: eventgate-analytics-external - bump to 1.19.0 to pick up recent ExP changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174745 (https://phabricator.wikimedia.org/T396359) (owner: 10Ottomata) [15:59:57] !log deploying eventgate-analytics external for T398922 and T396359 [16:00:04] jhathaway and moritzm: Your horoscope predicts another Puppet request window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1600). [16:00:05] Krinkle: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:06] T398922: EventGate: Add Prometheus metric for hoisting errors - https://phabricator.wikimedia.org/T398922 [16:00:06] T396359: EventGate: Log unparseable X-Experiment-Enrollments headers to an error stream - https://phabricator.wikimedia.org/T396359 [16:00:11] o/ [16:00:15] !log otto@deploy1003 helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply [16:00:25] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80353 and previous config saved to /var/cache/conftool/dbconfig/20250731-160024-ladsgroup.json [16:00:31] o/ [16:00:49] !log otto@deploy1003 helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply [16:00:53] !log dancy@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 10m 36s) [16:01:13] !log otto@deploy1003 helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply [16:01:28] (03PS2) 10Elukey: profile::thanos::recording_rules: add two rules for the EditCheck SLO [puppet] - 10https://gerrit.wikimedia.org/r/1174748 (https://phabricator.wikimedia.org/T395444) [16:01:28] (03PS2) 10Elukey: profile::pyrra::filesystem::slos: add edit-check ratio [puppet] - 10https://gerrit.wikimedia.org/r/1174749 (https://phabricator.wikimedia.org/T395444) [16:01:44] (03CR) 10JHathaway: [C:03+2] scap: Limit foreachwikiindblist and expanddblist in beta to beta wikis [puppet] - 10https://gerrit.wikimedia.org/r/941479 (https://phabricator.wikimedia.org/T357877) (owner: 10Krinkle) [16:02:05] Krinkle: merged [16:02:08] !log otto@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply [16:02:22] jhathaway: ok, standing by on deploy03 to verify that it is a no-op. [16:02:25] !log otto@deploy1003 helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply [16:02:37] deploy1003* [16:02:43] Krinkle: Woohoo!! [16:02:48] (03CR) 10SBassett: [C:03+2] miscweb(research-landing-page): bump image version (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174750 (https://phabricator.wikimedia.org/T399132) (owner: 10Jly) [16:03:13] !log otto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply [16:03:25] (03CR) 10Ahmon Dancy: "Thanks for looking this over Antoine. Based on everything you've said, this is what I propose:" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [16:05:27] (03PS1) 10Jasmine: mwmaint: decommission mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/1174753 (https://phabricator.wikimedia.org/T400442) [16:05:29] (03Merged) 10jenkins-bot: miscweb(research-landing-page): bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174750 (https://phabricator.wikimedia.org/T399132) (owner: 10Jly) [16:08:34] 06SRE, 10SRE-Access-Requests: Restore Taavi's analytics-privatedata-users membership - https://phabricator.wikimedia.org/T400900#11051032 (10CDobbins) [16:08:51] 06SRE, 10SRE-Access-Requests: Restore Taavi's analytics-privatedata-users membership - https://phabricator.wikimedia.org/T400900#11051033 (10CDobbins) 05Open→03In progress p:05Triage→03Medium a:03CDobbins [16:09:24] (03PS1) 10SBassett: Revert "miscweb(research-landing-page): bump image version" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174755 [16:10:09] (03PS1) 10Clare Ming: xLab: Deploy v0.8.0 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174756 (https://phabricator.wikimedia.org/T394527) [16:10:48] (03CR) 10Fabfur: [C:03+2] traffic: added alert for haproxykafka_socket_dropped_messages [alerts] - 10https://gerrit.wikimedia.org/r/1174565 (https://phabricator.wikimedia.org/T400684) (owner: 10Fabfur) [16:12:17] (03CR) 10SBassett: [C:03+2] "Self +2 to fix this issue quickly." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174755 (owner: 10SBassett) [16:12:51] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11051059 (10Alien333) Where does UAs like `MediaWiki-JS/1.45.0-wmf.12`, the defaults used by a plain `new mw.Api()` in an on-wiki script, stand with this? (If they're going to... [16:14:29] (03Merged) 10jenkins-bot: Revert "miscweb(research-landing-page): bump image version" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174755 (owner: 10SBassett) [16:15:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80354 and previous config saved to /var/cache/conftool/dbconfig/20250731-161532-ladsgroup.json [16:15:49] (03PS4) 10Ahmon Dancy: python: Include virtualenv packages in python base image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 [16:15:58] (03CR) 10Krinkle: [C:03+1] "Applied in Beta Cluster." [puppet] - 10https://gerrit.wikimedia.org/r/1174685 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [16:16:50] (03CR) 10Ahmon Dancy: "Proposal reflected in patchset 4." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [16:21:13] !log dancy@deploy1003 Started scap sync-world: Testing T398875 [16:21:19] T398875: Publish updated wmf/next container when deploying config backport or security patch - https://phabricator.wikimedia.org/T398875 [16:24:46] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d4-eqiad [16:24:56] !log dancy@deploy1003 Finished scap sync-world: Testing T398875 (duration: 03m 43s) [16:25:06] (03CR) 10SBassett: [C:03+2] "Ok, this LGTM now." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174763 (https://phabricator.wikimedia.org/T399132) (owner: 10Jly) [16:25:13] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-eqiad [16:26:41] (03CR) 10Hashar: [C:03+1] "I have missed `python3-dev`! 😊 Thanks for the update!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [16:27:25] (03Merged) 10jenkins-bot: values-security-landing-page.yaml: bump image version to 2025-07-31-153443 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174763 (https://phabricator.wikimedia.org/T399132) (owner: 10Jly) [16:29:45] (03CR) 10Ahmon Dancy: "Thanks Clément! docker-pkg is working now and I've revised this commit based on Antoine's discussion. Please re-review and merge is poss" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1174506 (owner: 10Ahmon Dancy) [16:30:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80355 and previous config saved to /var/cache/conftool/dbconfig/20250731-163040-ladsgroup.json [16:31:03] !log jly@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply [16:31:29] !log jly@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply [16:31:51] !log jly@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply [16:32:11] !log jly@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply [16:32:12] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11051157 (10dr0ptp4kt) Thanks @ssingh. For the exchange on the documentation starting from T399899#11017204 ... > I will bring this up internally and to Moritz/Simon late... [16:32:21] !log jly@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply [16:32:41] !log jly@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [16:33:18] !log jly@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply [16:33:40] (03CR) 10RLazarus: [C:03+1] "Nice, thanks! Filippo, you'd have more to say about the promql style than me, but I imagine it's a busy week. :)" [puppet] - 10https://gerrit.wikimedia.org/r/1174748 (https://phabricator.wikimedia.org/T395444) (owner: 10Elukey) [16:35:35] !log jly@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply [16:35:45] !log jly@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply [16:35:48] !log jly@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [16:36:09] (03CR) 10RLazarus: [C:03+1] profile::pyrra::filesystem::slos: add edit-check ratio [puppet] - 10https://gerrit.wikimedia.org/r/1174749 (https://phabricator.wikimedia.org/T395444) (owner: 10Elukey) [16:45:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80356 and previous config saved to /var/cache/conftool/dbconfig/20250731-164547-ladsgroup.json [16:45:54] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [16:46:04] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance [16:46:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80357 and previous config saved to /var/cache/conftool/dbconfig/20250731-164610-ladsgroup.json [16:51:12] !log cwhite@cumin2002 START - Cookbook sre.hosts.reimage for host logstash1034.eqiad.wmnet with OS bookworm [16:53:59] (03CR) 10BCornwall: [C:03+1] "Looks good, but let's see what @vgutierrez@wikimedia.org thinks" [puppet] - 10https://gerrit.wikimedia.org/r/1154085 (owner: 10CDobbins) [16:54:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80358 and previous config saved to /var/cache/conftool/dbconfig/20250731-165432-ladsgroup.json [16:54:38] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [16:57:12] (03PS1) 10CDanis: mesh: networkpolicy: copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174767 [16:57:12] (03PS1) 10CDanis: mesh: networkpolicy: also allow egress to 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174768 (https://phabricator.wikimedia.org/T344177) [17:00:05] bd808: #bothumor I � Unicode. All rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1700). [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1700) [17:02:03] 06SRE, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06Traffic: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11051324 (10Reedy) [17:03:23] 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Restore Taavi's analytics-privatedata-users membership - https://phabricator.wikimedia.org/T400900#11051327 (10BCornwall) [17:03:31] (03CR) 10BCornwall: [C:03+1] admin: add taavi to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1174761 (https://phabricator.wikimedia.org/T400900) (owner: 10CDobbins) [17:04:18] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11051328 (10jeremyb-phone) >>! In T400119#11051059, @Alien333 wrote: > Where does UAs like `MediaWiki-JS/1.45.0-wmf.12`, the defaults used by a plain `new mw.Api()` in an on-wik... [17:04:19] (03CR) 10Bking: [C:03+1] mesh: networkpolicy: copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174767 (owner: 10CDanis) [17:04:20] I don't have anything to do in my window this week. I did a developer-portal deploy on Monday that caught things up. [17:04:24] (03CR) 10BCornwall: [C:03+2] Add batch of pay-for-edit domains [dns] - 10https://gerrit.wikimedia.org/r/1174007 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [17:04:25] (03CR) 10Bking: [C:03+1] mesh: networkpolicy: also allow egress to 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174768 (https://phabricator.wikimedia.org/T344177) (owner: 10CDanis) [17:05:00] !log brett@dns1004 START - running authdns-update [17:06:04] !log brett@dns1004 END - running authdns-update [17:07:00] (03CR) 10Ssingh: [C:03+2] varnish: Fix X-Wikimedia-Debug cookie VCL rule [puppet] - 10https://gerrit.wikimedia.org/r/1174685 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [17:07:42] !log sudo cumin "A:cp" "disable-puppet 'merging CR 1174685'" [17:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:08:21] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d6-eqiad [17:08:35] 10ops-eqiad, 06SRE, 06DC-Ops: RMA Damaged Pdu E14 - https://phabricator.wikimedia.org/T395971#11051336 (10Jclark-ctr) @VRiley-WMF Please complete the physical swap before Equinix installs power above, to avoid generating additional tickets for unplugging PDUs from receptacles. [17:09:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80359 and previous config saved to /var/cache/conftool/dbconfig/20250731-170940-ladsgroup.json [17:09:46] !log cwhite@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage [17:11:09] (03PS1) 10CDanis: wikifunctions: allow egress to otelcol port 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174772 (https://phabricator.wikimedia.org/T344177) [17:13:32] cmooney@cumin1003 tls (PID 18279) is awaiting input [17:14:05] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device lsw1-d6-eqiad [17:14:32] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage [17:21:36] (03PS2) 10Bking: P:docker::builder: Build a Trixie image [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:21:52] (03CR) 10CI reject: [V:04-1] P:docker::builder: Build a Trixie image [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:22:31] !log sudo cumin -b21 "A:cp" "run-puppet-agent --enable 'merging CR 1174685'" [17:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80360 and previous config saved to /var/cache/conftool/dbconfig/20250731-172447-ladsgroup.json [17:25:02] (03PS3) 10Bking: P:docker::builder: Build a Trixie image [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:25:18] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:25:27] (03CR) 10CI reject: [V:04-1] P:docker::builder: Build a Trixie image [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:27:33] (03PS4) 10Bking: P:docker::builder: Build a Trixie image [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:28:56] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d7-eqiad [17:28:57] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:29:12] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-eqiad [17:31:17] 06SRE, 10DNS, 06FR-donorrelations, 06Traffic: Custom URL for survey pop-up - https://phabricator.wikimedia.org/T400278#11051448 (10CDobbins) [17:35:36] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1034.eqiad.wmnet with OS bookworm [17:35:47] 06SRE, 10LDAP-Access-Requests: Logstash Access for gergesshamon - https://phabricator.wikimedia.org/T399421#11051480 (10CDobbins) 05Open→03Stalled p:05Triage→03Medium [17:37:55] (03CR) 10Dzahn: [C:03+2] gerrit: add httpbb monitoring to gerrit-spare [puppet] - 10https://gerrit.wikimedia.org/r/1174674 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [17:38:14] (03CR) 10Dzahn: [C:03+2] gerrit: add certificate request for gerrit-spare.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1174672 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [17:38:27] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11051485 (10ssingh) >>! In T399899#11051157, @dr0ptp4kt wrote: > Thanks @ssingh. For the exchange on the documentation starting from T399899#11017204 ... > >> I will brin... [17:38:38] 06SRE, 10Phabricator, 06Traffic: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051487 (10CDobbins) [17:39:56] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80361 and previous config saved to /var/cache/conftool/dbconfig/20250731-173955-ladsgroup.json [17:39:58] (03CR) 10Bking: "I updated this CR as T400295 is possibly blocked on a Trixie image release." [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:40:01] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [17:40:12] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance [17:40:59] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance [17:41:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80362 and previous config saved to /var/cache/conftool/dbconfig/20250731-174106-ladsgroup.json [17:42:18] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-d8-eqiad [17:42:32] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-eqiad [17:46:41] (03CR) 10Jforrester: "Can we ship it? Or is it still too premature?" [puppet] - 10https://gerrit.wikimedia.org/r/1140695 (https://phabricator.wikimedia.org/T393173) (owner: 10Majavah) [17:47:26] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80363 and previous config saved to /var/cache/conftool/dbconfig/20250731-174725-ladsgroup.json [17:47:31] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [17:51:12] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c2-eqiad [17:51:32] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-eqiad [17:53:50] !log Import trafficserver 9.2.11-1wm2 into bullseye-wikimedia [17:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:38] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11051553 (10Dzahn) @KFrancis In the "NDA and MOU" spreadsheet I see the columns "Has server access" and "Has LDAP NDA access". These should be updated now. Chris does NOT have shell... [17:56:06] !log brett@cumin2002 START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7001.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [17:57:14] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c3-eqiad [17:57:25] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-eqiad [18:00:04] brennen and dduvall: Your horoscope predicts another MediaWiki train - Utc-7 Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1800). [18:00:34] (03CR) 10Clare Ming: [C:03+2] xLab: Deploy v0.8.0 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174756 (https://phabricator.wikimedia.org/T394527) (owner: 10Clare Ming) [18:00:40] (03CR) 10CDanis: [C:03+2] mesh: networkpolicy: copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174767 (owner: 10CDanis) [18:00:42] (03CR) 10CDanis: [C:03+2] mesh: networkpolicy: also allow egress to 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174768 (https://phabricator.wikimedia.org/T344177) (owner: 10CDanis) [18:01:53] !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7001.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [18:02:09] (03Merged) 10jenkins-bot: xLab: Deploy v0.8.0 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174756 (https://phabricator.wikimedia.org/T394527) (owner: 10Clare Ming) [18:02:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80364 and previous config saved to /var/cache/conftool/dbconfig/20250731-180233-ladsgroup.json [18:02:40] (03Merged) 10jenkins-bot: mesh: networkpolicy: copy patch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174767 (owner: 10CDanis) [18:02:41] (03Merged) 10jenkins-bot: mesh: networkpolicy: also allow egress to 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174768 (https://phabricator.wikimedia.org/T344177) (owner: 10CDanis) [18:05:52] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c4-eqiad [18:06:05] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-eqiad [18:07:18] (03PS5) 10Dzahn: redirects: update SVN rewrite rules, do not link to Phabricator anymore [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846) [18:08:18] (03CR) 10Dzahn: "moved 2 "realistic URL"-tests to svn.wikimedia.org section. updated target, but still not 100% sure if anything /doc/foo would really be " [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846) (owner: 10Dzahn) [18:11:24] (03CR) 10Dzahn: [C:03+2] "this seems all fine - except we can't run the tests just yet, waiting for the new cert to be issued" [puppet] - 10https://gerrit.wikimedia.org/r/1174674 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [18:12:41] (03CR) 10Dzahn: [C:03+2] "next step the apache config on gerrit2003 needs the matching virtual host" [puppet] - 10https://gerrit.wikimedia.org/r/1174672 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [18:13:41] (03CR) 10Dzahn: [C:03+1] gerrit: site.pp cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1174671 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [18:15:12] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c5-eqiad [18:15:15] (03CR) 10Dzahn: [C:03+2] gerrit: site.pp cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1174671 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb) [18:15:30] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-eqiad [18:17:41] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80365 and previous config saved to /var/cache/conftool/dbconfig/20250731-181740-ladsgroup.json [18:23:01] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c6-eqiad [18:23:47] (03PS2) 10Dzahn: zuul: add initial new-zuul config from template [puppet] - 10https://gerrit.wikimedia.org/r/1174570 (https://phabricator.wikimedia.org/T395938) [18:23:48] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-eqiad [18:24:13] (03CR) 10CI reject: [V:04-1] zuul: add initial new-zuul config from template [puppet] - 10https://gerrit.wikimedia.org/r/1174570 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [18:25:04] o/ [18:27:04] !log train 1.45.0-wmf.12 status (T396373): blocked on T400899, holding at group1 [18:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:11] T396373: 1.45.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T396373 [18:27:12] T400899: EventFormatter::formatAddress(): Argument #3 ($noAddressMsg) must be of type ?string, MediaWiki\Message\Message given in RegistrationNotificationPresentationModel - https://phabricator.wikimedia.org/T400899 [18:29:32] !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lsw1-c7-eqiad [18:29:46] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-eqiad [18:32:49] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80366 and previous config saved to /var/cache/conftool/dbconfig/20250731-183248-ladsgroup.json [18:32:54] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [18:33:04] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [18:33:46] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance [18:33:54] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80367 and previous config saved to /var/cache/conftool/dbconfig/20250731-183353-ladsgroup.json [18:36:22] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80368 and previous config saved to /var/cache/conftool/dbconfig/20250731-183622-ladsgroup.json [18:43:24] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply [18:44:07] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply [18:49:43] (03PS2) 10CDanis: wikifunctions: allow egress to otelcol port 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174772 (https://phabricator.wikimedia.org/T344177) [18:50:59] (03PS1) 10DLynch: Fix placement of toolbar insert group on mobile [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) [18:51:01] (03PS1) 10Brennen Bearnes: Notifications: fix type error and add regression test [extensions/CampaignEvents] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174796 (https://phabricator.wikimedia.org/T400899) [18:51:20] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, July 31 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [18:51:30] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P80369 and previous config saved to /var/cache/conftool/dbconfig/20250731-185129-ladsgroup.json [18:52:04] 06SRE, 10LDAP-Access-Requests: Logstash Access for gergesshamon - https://phabricator.wikimedia.org/T399421#11051705 (10CDobbins) @Gerges do you still need access to Logstash? If so, is there anything we can do to help with getting the NDA and other steps done? [18:53:06] 06SRE, 10SRE-Access-Requests, 06MW-Interfaces-Team: Requesting access to analytics-privatedata-users, SSH and Kerberos for HCoplin-WMF - https://phabricator.wikimedia.org/T400897#11051709 (10CDobbins) 05Open→03In progress p:05Triage→03Medium a:03CDobbins [18:53:10] (03CR) 10Jforrester: [C:03+2] wikifunctions: allow egress to otelcol port 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174772 (https://phabricator.wikimedia.org/T344177) (owner: 10CDanis) [18:54:51] (03CR) 10Krinkle: [C:03+1] redirects: update SVN rewrite rules, do not link to Phabricator anymore [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846) (owner: 10Dzahn) [18:54:58] (03Merged) 10jenkins-bot: wikifunctions: allow egress to otelcol port 4318 (http) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174772 (https://phabricator.wikimedia.org/T344177) (owner: 10CDanis) [18:56:21] (03CR) 10Krinkle: "Indeed. The override rule for this has not changed in this patch, and it seems broken today. https://svn.wikimedia.org/doc/foo today falls" [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846) (owner: 10Dzahn) [18:57:25] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [18:57:28] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [18:58:10] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [18:58:57] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [18:59:02] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [18:59:40] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [19:06:37] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P80370 and previous config saved to /var/cache/conftool/dbconfig/20250731-190637-ladsgroup.json [19:06:52] (03PS1) 10Jforrester: wikifunctions: Set OTEL_SERVICE_NAME values here [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174799 [19:07:30] jouncebot nowandnext [19:07:30] For the next 0 hour(s) and 52 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T1800) [19:07:31] In 0 hour(s) and 52 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T2000) [19:08:21] (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy1003 using scap backport" [extensions/CampaignEvents] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174796 (https://phabricator.wikimedia.org/T400899) (owner: 10Brennen Bearnes) [19:08:41] thx James_F [19:08:51] brennen: Good luck! [19:09:09] (03PS2) 10Jforrester: wikifunctions: Set OTEL_SERVICE_NAME values here [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174799 [19:09:29] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:09:34] (03CR) 10Jforrester: [C:03+2] wikifunctions: Set OTEL_SERVICE_NAME values here [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174799 (owner: 10Jforrester) [19:09:54] (03CR) 10CDanis: [C:03+1] wikifunctions: Set OTEL_SERVICE_NAME values here [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174799 (owner: 10Jforrester) [19:10:00] (03Merged) 10jenkins-bot: Notifications: fix type error and add regression test [extensions/CampaignEvents] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174796 (https://phabricator.wikimedia.org/T400899) (owner: 10Brennen Bearnes) [19:10:27] !log brennen@deploy1003 Started scap sync-world: Backport for [[gerrit:1174796|Notifications: fix type error and add regression test (T400899)]] [19:10:33] T400899: EventFormatter::formatAddress(): Argument #3 ($noAddressMsg) must be of type ?string, MediaWiki\Message\Message given in RegistrationNotificationPresentationModel - https://phabricator.wikimedia.org/T400899 [19:11:18] (03Merged) 10jenkins-bot: wikifunctions: Set OTEL_SERVICE_NAME values here [deployment-charts] - 10https://gerrit.wikimedia.org/r/1174799 (owner: 10Jforrester) [19:12:14] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [19:12:33] !log brennen@deploy1003 brennen: Backport for [[gerrit:1174796|Notifications: fix type error and add regression test (T400899)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:13:12] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [19:13:25] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [19:13:36] !log brennen@deploy1003 brennen: Continuing with sync [19:14:04] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [19:14:10] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [19:14:49] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [19:19:13] !log brennen@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174796|Notifications: fix type error and add regression test (T400899)]] (duration: 08m 46s) [19:19:20] T400899: EventFormatter::formatAddress(): Argument #3 ($noAddressMsg) must be of type ?string, MediaWiki\Message\Message given in RegistrationNotificationPresentationModel - https://phabricator.wikimedia.org/T400899 [19:21:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80371 and previous config saved to /var/cache/conftool/dbconfig/20250731-192144-ladsgroup.json [19:21:51] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [19:22:01] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance [19:22:08] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80372 and previous config saved to /var/cache/conftool/dbconfig/20250731-192208-ladsgroup.json [19:23:20] (03PS1) 10TrainBranchBot: group2 to 1.45.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174802 (https://phabricator.wikimedia.org/T396373) [19:23:22] (03CR) 10TrainBranchBot: [C:03+2] group2 to 1.45.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174802 (https://phabricator.wikimedia.org/T396373) (owner: 10TrainBranchBot) [19:23:52] 06SRE, 10Phabricator, 06Traffic: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051804 (10ssingh) Hi, thanks for reporting @Novem_Linguae. The issue should be resolved now; I did a quick test but let us know if there is an... [19:24:19] (03Merged) 10jenkins-bot: group2 to 1.45.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174802 (https://phabricator.wikimedia.org/T396373) (owner: 10TrainBranchBot) [19:24:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80373 and previous config saved to /var/cache/conftool/dbconfig/20250731-192440-ladsgroup.json [19:26:29] !log cwhite@cumin2002 START - Cookbook sre.hosts.reimage for host logstash1035.eqiad.wmnet with OS bookworm [19:28:31] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11051813 (10KFrancis) Hi @Dzahn, I have made the corrections on the spreadsheet. I'll also add an expiration date for those NDAs with a specific end date. [19:28:34] 06SRE, 10Phabricator, 06Traffic: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051814 (10Michael) Woohoo! Can confirm! Thank you so much @Novem_Linguae, @ssingh and Traffic Team! 🏆 [19:32:19] !log brennen@deploy1003 rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.12 refs T396373 [19:32:25] T396373: 1.45.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T396373 [19:37:27] 06SRE, 10SRE-Access-Requests, 06MW-Interfaces-Team: Requesting access to analytics-privatedata-users, SSH and Kerberos for HCoplin-WMF - https://phabricator.wikimedia.org/T400897#11051857 (10CDobbins) @HCoplin-WMF would you mind pasting your pubkey on your metawiki page? If that isn't an option, feel free to... [19:39:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P80374 and previous config saved to /var/cache/conftool/dbconfig/20250731-193948-ladsgroup.json [19:45:09] !log cwhite@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage [19:48:52] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage [19:54:48] FIRING: ThumborHighHaproxyErrorRate: Thumbor haproxy error rate for pod thumbor-main-5575cbcfcf-2flqn - eqiad - https://wikitech.wikimedia.org/wiki/Thumbor - https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor - https://alerts.wikimedia.org/?q=alertname%3DThumborHighHaproxyErrorRate [19:54:56] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P80375 and previous config saved to /var/cache/conftool/dbconfig/20250731-195455-ladsgroup.json [20:00:04] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: Your horoscope predicts another UTC late backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T2000). [20:00:05] kemayo: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:01:50] I am here, and can deploy my own patch. [20:02:06] Looks like I am the only one in the window, so I will get started with that. [20:02:33] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy1003 using scap backport" [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [20:07:44] !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1035.eqiad.wmnet with OS bookworm [20:09:30] (03CR) 10Dzahn: "should euwikipedia.in also redirect like euwikipedia.org to eu.wikipedia.org ? do you have an opinion Pppery?" [puppet] - 10https://gerrit.wikimedia.org/r/1174539 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [20:10:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80376 and previous config saved to /var/cache/conftool/dbconfig/20250731-201003-ladsgroup.json [20:10:09] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [20:10:19] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance [20:10:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80377 and previous config saved to /var/cache/conftool/dbconfig/20250731-201026-ladsgroup.json [20:10:43] (03PS3) 10Dzahn: zuul: add initial new-zuul config from template [puppet] - 10https://gerrit.wikimedia.org/r/1174570 (https://phabricator.wikimedia.org/T395938) [20:12:31] (03CR) 10Pppery: [C:03+1] "No, that domain is noise. `.in` is the TLD for India, which has no relation whatsoever to the Basque Country in France and Spain." [puppet] - 10https://gerrit.wikimedia.org/r/1174539 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [20:12:51] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11051898 (10KFrancis) Also, the NDA is out for signatures. I'll confirm when it's complete [20:12:59] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80378 and previous config saved to /var/cache/conftool/dbconfig/20250731-201259-ladsgroup.json [20:13:38] (03CR) 10BCornwall: [C:03+2] ncredir: Add batch of pay-for-edit domains [puppet] - 10https://gerrit.wikimedia.org/r/1174539 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [20:14:17] (03CR) 10Dzahn: [C:03+1] "ack, thanks! lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1174539 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [20:15:22] 06SRE, 10LDAP-Access-Requests: Grant Access to gerritadmin for qchris (NDA refresh) - https://phabricator.wikimedia.org/T400847#11051902 (10Dzahn) Thanks, Katie, perfect! [20:15:28] (03CR) 10CI reject: [V:04-1] Fix placement of toolbar insert group on mobile [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [20:16:41] Well, that test failed on `"https://repo.packagist.org/p2/spomky-labs/base64url.json" does not contain valid JSON` ...so I'm going right back to try again. [20:17:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy1003 using scap backport" [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [20:17:25] !log brett@cumin2002 START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7002.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [20:18:10] (03CR) 10DLynch: "recheck" [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [20:20:49] (03Merged) 10jenkins-bot: Fix placement of toolbar insert group on mobile [extensions/VisualEditor] (wmf/1.45.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1174795 (https://phabricator.wikimedia.org/T400933) (owner: 10DLynch) [20:21:22] !log kemayo@deploy1003 Started scap sync-world: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] [20:21:28] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [20:22:46] (03PS1) 10BCornwall: ncredir: "rewrite", not "redirect" [puppet] - 10https://gerrit.wikimedia.org/r/1174811 [20:23:17] !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7002.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [20:24:09] (03PS2) 10BCornwall: ncredir: "rewrite", not "redirect" [puppet] - 10https://gerrit.wikimedia.org/r/1174811 [20:26:07] (03CR) 10Dzahn: [C:03+1] "yea, given that the file is called redirects.dat and the entire server nc-redir, this is like a trap.." [puppet] - 10https://gerrit.wikimedia.org/r/1174811 (owner: 10BCornwall) [20:26:44] (03CR) 10Pppery: [C:03+1] "Indeed." [puppet] - 10https://gerrit.wikimedia.org/r/1174811 (owner: 10BCornwall) [20:27:09] (03PS3) 10BCornwall: ncredir: "rewrite", not "redirect" [puppet] - 10https://gerrit.wikimedia.org/r/1174811 [20:28:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P80379 and previous config saved to /var/cache/conftool/dbconfig/20250731-202806-ladsgroup.json [20:28:27] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6472/console" [puppet] - 10https://gerrit.wikimedia.org/r/1174811 (owner: 10BCornwall) [20:28:35] (03CR) 10Dzahn: [C:03+2] "https://puppet-compiler.wmflabs.org/output/1174570/6473/zuul1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1174570 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [20:29:06] (03CR) 10BCornwall: [V:03+1 C:03+2] ncredir: "rewrite", not "redirect" [puppet] - 10https://gerrit.wikimedia.org/r/1174811 (owner: 10BCornwall) [20:32:42] (03CR) 10BCornwall: [C:03+2] acme-chief: Add batch of pay-for-edit domains [puppet] - 10https://gerrit.wikimedia.org/r/1174010 (https://phabricator.wikimedia.org/T400731) (owner: 10BCornwall) [20:35:11] !log kemayo@deploy1003 kemayo: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:35:17] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [20:38:12] kemayo: I'm taking a look [20:39:01] dancy: thanks! [20:39:34] (Context since it's not obvious from the above logs: the testservers are all 503ing.) [20:40:41] !log dancy@deploy1003 Started scap sync-world: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] [20:40:47] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [20:43:15] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P80382 and previous config saved to /var/cache/conftool/dbconfig/20250731-204314-ladsgroup.json [20:43:26] kemayo: I think the issue was caused by a recent change to scap. I'm working on a retry right now. It'll be about 10 minutes before something interesting will happening. [20:44:27] dancy: Thanks -- I don't have to be anywhere, so I can hang around to test things if it gets to a state where that's doable. [20:44:39] Thx [20:44:50] 06SRE, 10Phabricator, 06Traffic: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051974 (10AntiCompositeNumber) Still no cards on Discord, including brand new tasks. [20:46:14] PROBLEM - Disk space on an-worker1129 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/b 149859 MB (3% inode=99%): /var/lib/hadoop/data/l 158522 MB (4% inode=99%): /var/lib/hadoop/data/k 152537 MB (4% inode=99%): /var/lib/hadoop/data/c 155188 MB (4% inode=99%): /var/lib/hadoop/data/d 158392 MB (4% inode=99%): /var/lib/hadoop/data/e 153591 MB (4% inode=99%): /var/lib/hadoop/data/g 154126 MB (4% inode=99%): /var/lib/hadoop/data [20:46:14] 9 MB (4% inode=99%): /var/lib/hadoop/data/i 153526 MB (4% inode=99%): /var/lib/hadoop/data/j 151269 MB (4% inode=99%): /var/lib/hadoop/data/m 151103 MB (4% inode=99%): /var/lib/hadoop/data/f 158847 MB (4% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1129&var-datasource=eqiad+prometheus/ops [20:46:38] !log brett@cumin2002 START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp70[03-16].magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [20:52:10] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11052002 (10wiki_willy) Hi @Jclark-ctr - can you provide info on where the controllers from T393941 are, for you and @VRiley-WMF to work with Matthew on t... [20:52:48] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11052005 (10CDobbins) 05Open→03In progress p:05Triage→03Medium a:03CDobbins [20:53:17] (03PS1) 10Ottomata: WIP - Tweak EventgateProduceRateAnomaly [alerts] - 10https://gerrit.wikimedia.org/r/1174819 [20:55:03] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11052011 (10CDobbins) [20:55:34] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11052014 (10CDobbins) [20:55:59] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [20:56:11] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [20:58:22] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80383 and previous config saved to /var/cache/conftool/dbconfig/20250731-205821-ladsgroup.json [20:58:28] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [20:58:31] ^ bunch of new domains was added earlier.. thats why [20:58:37] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance [20:58:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80384 and previous config saved to /var/cache/conftool/dbconfig/20250731-205844-ladsgroup.json [20:58:46] ACK. [20:59:17] brett: ncredir expiry warnings.. somehow Snakeoil cert [21:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T2100) [21:00:25] mutante: ack [21:00:29] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:00:29] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir6001 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:01:09] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80385 and previous config saved to /var/cache/conftool/dbconfig/20250731-210108-ladsgroup.json [21:01:10] !log dancy@deploy1003 kemayo, dancy: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:01:20] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [21:01:54] 06SRE, 10LDAP-Access-Requests: Grant Access to nda & logstash for Novem Linguae - https://phabricator.wikimedia.org/T400176#11052028 (10CDobbins) @Milimetric @Ahoelzl @Ottomata could any of you confirm that access should be granted access to the logstash group? Thanks! [21:01:55] dancy: Things look good on the testservers now. [21:02:44] !log dancy@deploy1003 kemayo, dancy: Continuing with sync [21:04:45] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir2001 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:08:45] RECOVERY - HTTPS non-canonical-redirect-18 on ncredir2001 is OK: SSL OK - Certificate wikiwritingservice.com valid until 2025-10-29 19:44:50 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:09:03] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:09:03] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:09:03] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir5002 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:09:03] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir7004 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:14:35] !log dancy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] (duration: 33m 54s) [21:14:41] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [21:15:18] !log dancy@deploy1003 Installing scap version "4.193.0" for 2 host(s) [21:16:17] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P80386 and previous config saved to /var/cache/conftool/dbconfig/20250731-211616-ladsgroup.json [21:16:58] !log dancy@deploy1003 Installation of scap version "4.193.0" completed for 2 hosts [21:17:37] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir3003 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:17:37] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:17:39] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir5001 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:18:11] brett: ^ this known? [21:18:14] *us [21:18:16] er *is [21:18:29] Yep, working on it now [21:19:07] <3 [21:21:01] kemayo: I'm going to use your change for more backport testing. [21:21:51] only domains are affected that were not working in the first place [21:21:53] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir1002 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:21:53] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir2001 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:21:54] !log dancy@deploy1003 Installing scap version "4.194.1" for 2 host(s) [21:21:55] PROBLEM - HTTPS non-canonical-redirect-15 on ncredir6002 is CRITICAL: SSL CRITICAL - failed to verify wiki-experts.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 20:35:17 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:22:02] dancy: Go for it. Ping me if it turns out I need to do anything, I guess. [21:22:03] !log Deleting /var/lib/acme-chief/certs/non-canonical-redirect-{13..18} from acme-chief to force regeneration of certs [21:22:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:22:44] Kemayo: You will get the notifications but you can ignore them. [21:22:59] 👍🏻 [21:23:34] !log dancy@deploy1003 Installation of scap version "4.194.1" completed for 2 hosts [21:24:16] !log dancy@deploy1003 Started scap sync-world: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] [21:24:22] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [21:27:37] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:25:20 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:27:45] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:25:20 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:27:55] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:25:20 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:28:15] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:25:20 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:28:19] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir3004 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:25:20 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:29:21] 06SRE, 10Phabricator, 06Traffic: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11052148 (10ssingh) >>! In T400540#11051974, @AntiCompositeNumber wrote: > Still no cards on Discord, including brand new tasks. Sorry about th... [21:29:44] !log dancy@deploy1003 dancy, kemayo: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:29:50] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [21:30:31] !log dancy@deploy1003 dancy, kemayo: Continuing with sync [21:31:24] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P80387 and previous config saved to /var/cache/conftool/dbconfig/20250731-213124-ladsgroup.json [21:32:55] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:33:11] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:33:21] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:33:23] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:33:29] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir7003 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:34:33] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:34:51] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:34:55] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:35:11] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:35:15] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir1001 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:35:51] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir1002 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:38:10] !log dancy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174795|Fix placement of toolbar insert group on mobile (T400933)]] (duration: 13m 54s) [21:38:16] T400933: Insert menu in wrong place on mobile, on wikis using $wgVisualEditorMobileInsertMenu - https://phabricator.wikimedia.org/T400933 [21:41:55] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir2001 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:42:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.376s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [21:42:52] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: known and WIP [21:43:19] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: known and WIP [21:43:43] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11052206 (10wiki_willy) Hi @Jclark-ctr - can you provide info on where the controllers from T393941 are, so that you and @VRiley-WMF can work with Matthew... [21:43:46] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: known and WIP [21:45:01] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir6002 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:45:21] I'm going to revert [21:45:59] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:46:00] ack [21:46:10] (03PS1) 10BCornwall: Revert "acme-chief: Add batch of pay-for-edit domains" [puppet] - 10https://gerrit.wikimedia.org/r/1174829 [21:46:25] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir5001 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:46:29] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11052211 (10Jclark-ctr) >>! In T400877#11052205, @wiki_willy wrote: > Hi @Jclark-ctr - can you provide info on where the controllers from T393941 are, so... [21:46:32] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80388 and previous config saved to /var/cache/conftool/dbconfig/20250731-214631-ladsgroup.json [21:46:38] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [21:46:47] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance [21:46:55] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80389 and previous config saved to /var/cache/conftool/dbconfig/20250731-214654-ladsgroup.json [21:47:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.376s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [21:47:59] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:48:05] (03PS1) 10BCornwall: ncredir: Revert addition of for-pay domains [puppet] - 10https://gerrit.wikimedia.org/r/1174830 [21:48:18] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80390 and previous config saved to /var/cache/conftool/dbconfig/20250731-214817-ladsgroup.json [21:49:25] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir5002 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:49:32] (03CR) 10Dzahn: [C:03+1] Revert "acme-chief: Add batch of pay-for-edit domains" [puppet] - 10https://gerrit.wikimedia.org/r/1174829 (owner: 10BCornwall) [21:49:56] (03CR) 10BCornwall: [C:03+2] Revert "acme-chief: Add batch of pay-for-edit domains" [puppet] - 10https://gerrit.wikimedia.org/r/1174829 (owner: 10BCornwall) [21:50:59] (03CR) 10Dzahn: [C:03+1] ncredir: Revert addition of for-pay domains [puppet] - 10https://gerrit.wikimedia.org/r/1174830 (owner: 10BCornwall) [21:51:21] (03CR) 10BCornwall: [C:03+2] ncredir: Revert addition of for-pay domains [puppet] - 10https://gerrit.wikimedia.org/r/1174830 (owner: 10BCornwall) [21:52:33] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify 2wikipedia.com against Snakeoil cert:Certificate Snakeoil cert valid until 2025-08-03 21:29:07 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:53:26] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11052260 (10Jhancock.wm) @MatthewVernon I can take care of that first one in the morning. Hopefully starting next week we can get one done a day to get th... [21:55:29] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11052263 (10wiki_willy) a:03Jhancock.wm [21:55:29] jouncebot: nowandnext [21:55:29] For the next 0 hour(s) and 4 minute(s): Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250731T2100) [21:55:30] In 8 hour(s) and 4 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250801T0600) [21:55:34] (03CR) 10Zabe: [C:03+2] Stop setting wgGlobalUsageDatabase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174715 (https://phabricator.wikimedia.org/T400169) (owner: 10Zabe) [21:56:15] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir3004 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:26] (03Merged) 10jenkins-bot: Stop setting wgGlobalUsageDatabase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1174715 (https://phabricator.wikimedia.org/T400169) (owner: 10Zabe) [21:56:33] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir2002 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:45] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1174715|Stop setting wgGlobalUsageDatabase (T400169)]] [21:56:51] T400169: Convert GlobalUsage to virtual domains - https://phabricator.wikimedia.org/T400169 [21:56:51] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir1002 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:55] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir1001 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:55] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir2001 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:59] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir4001 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:56:59] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir4002 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:57:10] Zabe: Lemme know if you have any deployment issues. [21:57:23] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir7003 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:57:23] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir5001 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:57:25] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir5002 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:57:45] dancy: Is there any issue with me deploying right now or is it just a new scap version or something like that? [21:57:59] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir6002 is OK: SSL OK - Certificate 2wikipedia.com valid until 2025-10-29 20:56:09 +0000 (expires in 89 days) https://wikitech.wikimedia.org/wiki/Ncredir [21:58:06] zabe: We're running a new version of scap which might have a bug. [21:58:11] Alright [21:58:19] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:58:19] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:37] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:58:37] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:43] !log zabe@deploy1003 zabe: Backport for [[gerrit:1174715|Stop setting wgGlobalUsageDatabase (T400169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:58:45] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:58:45] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:45] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:58:45] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:45] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:58:45] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:55] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir2002 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:58:55] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:57] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:58:57] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:57] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir3003 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:58:57] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:58:59] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir7004 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:58:59] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:03] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir6001 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:03] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:03] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:59:03] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:05] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:05] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:11] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:59:11] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:11] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir6002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:59:11] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:11] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir6001 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:59:12] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:12] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir3003 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:13] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:19] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir4001 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:59:19] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:19] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir3003 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:59:19] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:21] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir6001 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:59:21] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:21] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir7004 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:59:21] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:21] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir5001 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:59:22] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:22] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir5002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:59:23] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:29] PROBLEM - HTTPS non-canonical-redirect-16 on ncredir4002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaadmin.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewi [21:59:29] voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:29] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir3003 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:59:29] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:29] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir6002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:29] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:29] PROBLEM - HTTPS non-canonical-redirect-18 on ncredir5001 is CRITICAL: SSL CRITICAL - failed to verify wikiwritingservice.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voya [21:59:30] om, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:35] !log zabe@deploy1003 zabe: Continuing with sync [21:59:37] PROBLEM - HTTPS non-canonical-redirect-14 on ncredir6002 is CRITICAL: SSL CRITICAL - failed to verify indwikipedia.in against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.com, voyagewiki. [21:59:37] agewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:39] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir7004 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:39] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:39] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir5002 is CRITICAL: SSL CRITICAL - failed to verify wikipediaparticlecreation.com against wikipedia.com, *.en-wp.com, *.en-wp.org, *.mediawiki.com, *.voyagewiki.com, *.voyagewiki.org, *.wiikipedia.com, *.wikibook.com, *.wikibooks.com, *.wikiepdia.com, *.wikiepdia.org, *.wikiipedia.org, *.wikijunior.com, *.wikijunior.net, *.wikijunior.org, *.wikipedia.com, en-wp.com, en-wp.org, mediawiki.co [21:59:39] ewiki.com, voyagewiki.org, wiikipedia.com, wikibook.com, wikibooks.com, wikiepdia.com, wikiepdia.org, wikiipedia.org, wikijunior.com, wikijunior.net, wikijunior.org https://wikitech.wikimedia.org/wiki/Ncredir [21:59:44] are you shitting me [21:59:51] ignore this spam [21:59:59] ok [22:02:35] all green again [22:03:01] Sorry for the noise [22:03:25] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P80391 and previous config saved to /var/cache/conftool/dbconfig/20250731-220324-ladsgroup.json [22:03:47] !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for ncredir2001.codfw.wmnet [22:03:48] !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir2001.codfw.wmnet [22:03:58] !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for ncredir1002.eqiad.wmnet [22:03:59] !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir1002.eqiad.wmnet [22:04:03] !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for ncredir1001.eqiad.wmnet [22:04:05] !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir1001.eqiad.wmnet [22:05:01] :) cool, thanks brett [22:05:09] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1174715|Stop setting wgGlobalUsageDatabase (T400169)]] (duration: 08m 24s) [22:05:15] T400169: Convert GlobalUsage to virtual domains - https://phabricator.wikimedia.org/T400169 [22:08:08] !log dancy@deploy1003 Installing scap version "4.193.0" for 2 host(s) [22:09:48] !log dancy@deploy1003 Installation of scap version "4.193.0" completed for 2 hosts [22:11:22] !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp70[03-16].magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade () [22:14:48] RESOLVED: ThumborHighHaproxyErrorRate: Thumbor haproxy error rate for pod thumbor-main-5575cbcfcf-2flqn - eqiad - https://wikitech.wikimedia.org/wiki/Thumbor - https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor - https://alerts.wikimedia.org/?q=alertname%3DThumborHighHaproxyErrorRate [22:17:18] FIRING: [2x] ThumborHighHaproxyErrorRate: Thumbor haproxy error rate for pod thumbor-main-5575cbcfcf-2flqn - eqiad - https://wikitech.wikimedia.org/wiki/Thumbor - https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor - https://alerts.wikimedia.org/?q=alertname%3DThumborHighHaproxyErrorRate [22:18:08] 06SRE: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952 (10JAbrams) 03NEW [22:18:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P80392 and previous config saved to /var/cache/conftool/dbconfig/20250731-221832-ladsgroup.json [22:24:48] 06SRE, 06Traffic, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11052410 (10Alien333) >>! In T400119#11051328, @jeremyb-phone wrote: > if they're requests in response to user clicks then they should be fine. And if not? For instance, for scr... [22:27:40] !log dancy@deploy1003 Installing scap version "4.194.2" for 2 host(s) [22:28:10] 06SRE: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952#11052416 (10JAbrams) {F65698830} [22:29:27] !log dancy@deploy1003 Installation of scap version "4.194.2" completed for 2 hosts [22:30:20] !log dancy@deploy1003 Started scap build-images: Publishing wmf/next image [22:31:26] !log dancy@deploy1003 build-images aborted: Publishing wmf/next image (duration: 01m 05s) [22:33:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80393 and previous config saved to /var/cache/conftool/dbconfig/20250731-223339-ladsgroup.json [22:33:46] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [22:33:55] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance [22:34:23] (03PS2) 10Jforrester: Clean up wmgWikibaseSiteGroup list, alpha-sort and de-dupe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170565 [22:34:46] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2214.codfw.wmnet with reason: Maintenance [22:34:53] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80394 and previous config saved to /var/cache/conftool/dbconfig/20250731-223453-ladsgroup.json [22:37:19] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80395 and previous config saved to /var/cache/conftool/dbconfig/20250731-223719-ladsgroup.json [22:46:09] 06SRE: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952#11052476 (10Dzahn) Please talk to traffic, but afaict the core of the issue is that wikimedia.org represents the brand and users can/would trust that domain... [22:52:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P80396 and previous config saved to /var/cache/conftool/dbconfig/20250731-225226-ladsgroup.json [22:57:18] FIRING: [2x] ThumborHighHaproxyErrorRate: Thumbor haproxy error rate for pod thumbor-main-5575cbcfcf-bbtrp - eqiad - https://wikitech.wikimedia.org/wiki/Thumbor - https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor - https://alerts.wikimedia.org/?q=alertname%3DThumborHighHaproxyErrorRate [23:01:02] (03PS1) 10Dzahn: zuul: add mysql prod password in new zuul config [puppet] - 10https://gerrit.wikimedia.org/r/1174837 (https://phabricator.wikimedia.org/T395938) [23:01:19] (03PS2) 10Dzahn: zuul: add mysql prod password in new zuul config [puppet] - 10https://gerrit.wikimedia.org/r/1174837 (https://phabricator.wikimedia.org/T395938) [23:01:19] (03CR) 10CI reject: [V:04-1] zuul: add mysql prod password in new zuul config [puppet] - 10https://gerrit.wikimedia.org/r/1174837 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [23:05:36] (03PS3) 10Dzahn: zuul: add mysql prod password in new zuul config [puppet] - 10https://gerrit.wikimedia.org/r/1174837 (https://phabricator.wikimedia.org/T395938) [23:07:35] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P80397 and previous config saved to /var/cache/conftool/dbconfig/20250731-230734-ladsgroup.json [23:07:44] FIRING: KubernetesDeploymentUnavailableReplicas: ... [23:07:44] Deployment thumbor-main in thumbor at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=thumbor&var-deployment=thumbor-main - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplica [23:09:29] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:11:01] (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1174837/6474/zuul1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1174837 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [23:16:47] (03PS1) 10BryanDavis: proxy: Allow outbound HTTPS connections to port 6443 [puppet] - 10https://gerrit.wikimedia.org/r/1174842 (https://phabricator.wikimedia.org/T394838) [23:22:42] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80398 and previous config saved to /var/cache/conftool/dbconfig/20250731-232241-ladsgroup.json [23:22:48] T400854: Add rc_source_name_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T400854 [23:22:57] !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2217.codfw.wmnet with reason: Maintenance [23:23:05] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2217 (T400854)', diff saved to https://phabricator.wikimedia.org/P80399 and previous config saved to /var/cache/conftool/dbconfig/20250731-232304-ladsgroup.json [23:25:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T400854)', diff saved to https://phabricator.wikimedia.org/P80400 and previous config saved to /var/cache/conftool/dbconfig/20250731-232532-ladsgroup.json [23:25:44] (03CR) 10BryanDavis: "PCC output at https://puppet-compiler.wmflabs.org/output/1174842/6475/" [puppet] - 10https://gerrit.wikimedia.org/r/1174842 (https://phabricator.wikimedia.org/T394838) (owner: 10BryanDavis) [23:30:36] (03PS1) 10Dzahn: zuul: use mariadb connector and Hiera'ize mysql_host name [puppet] - 10https://gerrit.wikimedia.org/r/1174844 (https://phabricator.wikimedia.org/T395938) [23:36:16] (03CR) 10Dzahn: "Yes, I understand why we want this but I don't think I have the authority to just merge it without at least consulting infra foundations." [puppet] - 10https://gerrit.wikimedia.org/r/1174842 (https://phabricator.wikimedia.org/T394838) (owner: 10BryanDavis) [23:38:26] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1174845 [23:38:26] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1174845 (owner: 10TrainBranchBot) [23:38:32] (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1174844/6476/zuul1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1174844 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [23:39:42] (03CR) 10BryanDavis: "+1. I figured you would know who we needed to talk this out with. It would also be possible for me to change things to work over port 443 " [puppet] - 10https://gerrit.wikimedia.org/r/1174842 (https://phabricator.wikimedia.org/T394838) (owner: 10BryanDavis) [23:40:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P80401 and previous config saved to /var/cache/conftool/dbconfig/20250731-234040-ladsgroup.json [23:52:52] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1174845 (owner: 10TrainBranchBot) [23:55:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P80402 and previous config saved to /var/cache/conftool/dbconfig/20250731-235547-ladsgroup.json [23:58:33] (03PS1) 10Dzahn: zuul: puppetize password for zuul->gerrit http connection [puppet] - 10https://gerrit.wikimedia.org/r/1174850 (https://phabricator.wikimedia.org/T395938) [23:59:50] (03PS2) 10Dzahn: zuul: puppetize password for zuul->gerrit http connection [puppet] - 10https://gerrit.wikimedia.org/r/1174850 (https://phabricator.wikimedia.org/T395938)