[00:05:06] !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "import lswtest-d8-eqiad - cmooney@cumin1003" [00:05:26] !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "import lswtest-d8-eqiad - cmooney@cumin1003" [00:05:39] RESOLVED: CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance cirrussearch1112-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [00:07:48] RECOVERY - Check correctness of the icinga configuration on alert1002 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [00:09:03] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [00:12:10] !log cmooney@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on sretest1006.eqiad.wmnet with reason: doing network tests [00:12:49] !log cmooney@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lswtest-d8-eqiad with reason: doing network tests [00:19:33] (03PS1) 10DDesouza: Undeploy 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211855 (https://phabricator.wikimedia.org/T410696) [00:26:47] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [00:30:16] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211855 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [00:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [00:40:20] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211862 [00:40:20] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211862 (owner: 10TrainBranchBot) [00:52:20] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211862 (owner: 10TrainBranchBot) [01:00:43] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [01:10:44] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211871 [01:10:44] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211871 (owner: 10TrainBranchBot) [01:13:38] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 55s) [01:13:56] (03PS1) 10Cory Massaro: wikifunctions: Upgrade evaluators from 2025-11-12-122736 to 2025-11-17-175029 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211872 (https://phabricator.wikimedia.org/T305612) [01:16:40] (03PS2) 10Cory Massaro: wikifunctions: Upgrade evaluators from 2025-11-12-122736 to 2025-11-17-175029 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211872 (https://phabricator.wikimedia.org/T305612) [01:17:02] (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Upgrade evaluators from 2025-11-12-122736 to 2025-11-17-175029 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211872 (https://phabricator.wikimedia.org/T305612) (owner: 10Cory Massaro) [01:17:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [01:17:26] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [01:18:47] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2025-11-12-122736 to 2025-11-17-175029 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211872 (https://phabricator.wikimedia.org/T305612) (owner: 10Cory Massaro) [01:21:51] (03PS1) 10Cory Massaro: wikifunctions: Upgrade orchestrator from 2025-11-18-175356 to 2025-11-26-175208 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211875 (https://phabricator.wikimedia.org/T382921) [01:22:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [01:22:26] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [01:23:18] !log apine@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [01:23:59] !log apine@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [01:24:40] !log apine@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [01:25:32] !log apine@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [01:25:57] !log apine@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [01:26:45] !log apine@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [01:27:42] (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Upgrade orchestrator from 2025-11-18-175356 to 2025-11-26-175208 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211875 (https://phabricator.wikimedia.org/T382921) (owner: 10Cory Massaro) [01:29:31] (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2025-11-18-175356 to 2025-11-26-175208 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211875 (https://phabricator.wikimedia.org/T382921) (owner: 10Cory Massaro) [01:33:11] !log apine@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [01:33:36] !log apine@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [01:34:11] !log apine@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [01:34:30] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211871 (owner: 10TrainBranchBot) [01:34:44] !log apine@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [01:35:42] !log apine@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [01:36:25] !log apine@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [01:42:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T410589)', diff saved to https://phabricator.wikimedia.org/P85809 and previous config saved to /var/cache/conftool/dbconfig/20251127-014226-ladsgroup.json [01:42:32] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [01:57:34] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P85810 and previous config saved to /var/cache/conftool/dbconfig/20251127-015733-ladsgroup.json [02:12:42] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P85811 and previous config saved to /var/cache/conftool/dbconfig/20251127-021241-ladsgroup.json [02:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:27:49] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T410589)', diff saved to https://phabricator.wikimedia.org/P85812 and previous config saved to /var/cache/conftool/dbconfig/20251127-022749-ladsgroup.json [02:27:54] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [02:28:05] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance [02:39:47] (03PS2) 10Samuel (WMF): Set $wgRateLimits['hcaptchaedit'] for edit attempt log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211295 (https://phabricator.wikimedia.org/T406865) [02:41:04] (03PS3) 10Samuel (WMF): Set new $wgRateLimits config for edit attempt log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211295 (https://phabricator.wikimedia.org/T406865) [02:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:06:25] RESOLVED: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [04:27:02] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [04:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [05:08:59] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:33:59] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:11:54] !log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki in section s5 [06:12:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2218.codfw.wmnet with reason: Maintenance [06:17:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance [06:17:26] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [06:17:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1167 (T410531)', diff saved to https://phabricator.wikimedia.org/P85814 and previous config saved to /var/cache/conftool/dbconfig/20251127-061733-marostegui.json [06:17:40] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [06:21:17] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis tokwiki in section s5 [06:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [06:24:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T410531)', diff saved to https://phabricator.wikimedia.org/P85815 and previous config saved to /var/cache/conftool/dbconfig/20251127-062441-marostegui.json [06:24:52] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [06:28:11] (03PS1) 10Marostegui: clouddb1023: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1211968 [06:30:41] (03CR) 10Marostegui: [C:03+2] clouddb1023: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1211968 (owner: 10Marostegui) [06:39:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P85816 and previous config saved to /var/cache/conftool/dbconfig/20251127-063949-marostegui.json [06:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:54:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P85817 and previous config saved to /var/cache/conftool/dbconfig/20251127-065456-marostegui.json [07:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0700) [07:00:05] marostegui, Amir1, and federico3: Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0700). Please do the needful. [07:10:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T410531)', diff saved to https://phabricator.wikimedia.org/P85818 and previous config saved to /var/cache/conftool/dbconfig/20251127-071004-marostegui.json [07:10:10] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [07:10:11] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance [07:15:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance [07:15:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1172 (T410531)', diff saved to https://phabricator.wikimedia.org/P85819 and previous config saved to /var/cache/conftool/dbconfig/20251127-071538-marostegui.json [07:15:44] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [07:22:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T410531)', diff saved to https://phabricator.wikimedia.org/P85820 and previous config saved to /var/cache/conftool/dbconfig/20251127-072243-marostegui.json [07:22:58] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [07:23:10] (03PS1) 10Arnaudb: aptrepo: update gitlab-ce & gitlab-runner to 18.4 [puppet] - 10https://gerrit.wikimedia.org/r/1211985 (https://phabricator.wikimedia.org/T411160) [07:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:31:43] (03CR) 10Awight: "Not yet—the job will be running intermittently and its metrics endpoint goes down along with the job. For the current phase we'll use the" [puppet] - 10https://gerrit.wikimedia.org/r/1207174 (https://phabricator.wikimedia.org/T402613) (owner: 10Awight) [07:37:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P85821 and previous config saved to /var/cache/conftool/dbconfig/20251127-073751-marostegui.json [07:39:11] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211799 (https://phabricator.wikimedia.org/T408734) (owner: 10DCausse) [07:51:07] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1211684 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [07:52:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P85822 and previous config saved to /var/cache/conftool/dbconfig/20251127-075259-marostegui.json [07:55:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Fully repool db1172', diff saved to https://phabricator.wikimedia.org/P85823 and previous config saved to /var/cache/conftool/dbconfig/20251127-075541-marostegui.json [07:58:06] (03CR) 10LSobanski: "Approved." [puppet] - 10https://gerrit.wikimedia.org/r/1211653 (https://phabricator.wikimedia.org/T395939) (owner: 10Muehlenhoff) [07:59:58] (03CR) 10Muehlenhoff: [C:03+2] Allow smartctl for datacenter-ops [puppet] - 10https://gerrit.wikimedia.org/r/1211653 (https://phabricator.wikimedia.org/T395939) (owner: 10Muehlenhoff) [08:00:05] Amir1, Urbanecm, and awight: May I have your attention please! UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0800) [08:00:05] dcausse: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:16] o/ [08:00:19] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1209.eqiad.wmnet with reason: Maintenance [08:00:23] I can deploy [08:01:05] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Request additional access for Dcops group - https://phabricator.wikimedia.org/T395939#11411948 (10MoritzMuehlenhoff) @Jclark-ctr : You should now be able to run smartctl, let me know if you run into any issues [08:01:30] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211985 (https://phabricator.wikimedia.org/T411160) (owner: 10Arnaudb) [08:04:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211799 (https://phabricator.wikimedia.org/T408734) (owner: 10DCausse) [08:05:22] (03Merged) 10jenkins-bot: cirrus: enable DWIM wrong keyboard second try on all he & ru wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211799 (https://phabricator.wikimedia.org/T408734) (owner: 10DCausse) [08:06:19] !log dcausse@deploy2002 Started scap sync-world: Backport for [[gerrit:1211799|cirrus: enable DWIM wrong keyboard second try on all he & ru wikis (T408734)]] [08:06:24] T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete - https://phabricator.wikimedia.org/T408734 [08:08:16] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance [08:08:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2152 (T410531)', diff saved to https://phabricator.wikimedia.org/P85824 and previous config saved to /var/cache/conftool/dbconfig/20251127-080823-marostegui.json [08:08:29] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [08:08:42] !log dcausse@deploy2002 dcausse: Backport for [[gerrit:1211799|cirrus: enable DWIM wrong keyboard second try on all he & ru wikis (T408734)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:10:50] !log dcausse@deploy2002 dcausse: Continuing with sync [08:12:29] (03CR) 10Ayounsi: [C:03+2] Add nokia Console and PSx ports [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1206816 (https://phabricator.wikimedia.org/T410073) (owner: 10Ayounsi) [08:14:56] !log dcausse@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211799|cirrus: enable DWIM wrong keyboard second try on all he & ru wikis (T408734)]] (duration: 08m 37s) [08:14:56] (03Merged) 10jenkins-bot: Add nokia Console and PSx ports [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1206816 (https://phabricator.wikimedia.org/T410073) (owner: 10Ayounsi) [08:15:02] T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete - https://phabricator.wikimedia.org/T408734 [08:15:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T410531)', diff saved to https://phabricator.wikimedia.org/P85825 and previous config saved to /var/cache/conftool/dbconfig/20251127-081520-marostegui.json [08:15:27] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [08:16:03] !log closing the UTC morning backport window [08:16:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:14] 06SRE, 10MinT, 10Prod-Kubernetes, 06serviceops, and 3 others: machinetranslation eqiad pods in state ContainerStatusUnknown - https://phabricator.wikimedia.org/T411058#11411996 (10Nikerabbit) [08:17:11] !log ayounsi@cumin1003 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox [08:17:40] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox [08:18:43] !log installing perl security updates [08:18:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:05] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Netbox Cable report - incorrectly parsing Nokia power supplies - https://phabricator.wikimedia.org/T410073#11412010 (10ayounsi) 05Open→03Resolved All good ! https://netbox.wikimedia.org/extras/scripts/results/274160/ [08:19:59] (03CR) 10Bunnypranav: [C:03+1] trwikisource: Create rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210681 (https://phabricator.wikimedia.org/T410931) (owner: 10Neriah) [08:24:19] (03PS1) 10Kevin Bazira: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212056 (https://phabricator.wikimedia.org/T410906) [08:27:02] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [08:28:56] (03PS1) 10Muehlenhoff: Mark Tyler as group approver for deployment-jenkins [puppet] - 10https://gerrit.wikimedia.org/r/1212057 (https://phabricator.wikimedia.org/T276465) [08:30:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P85826 and previous config saved to /var/cache/conftool/dbconfig/20251127-083028-marostegui.json [08:30:33] 06SRE, 10SRE-swift-storage, 10Infrastructure Security, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), and 3 others: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573#11412023 (10RKemper) >>! In T410573#11410948, @MoritzMuehlenhoff wrote: > @RKemper There... [08:31:01] (03PS1) 10Ayounsi: Add new Yubikey ssh key [homer/public] - 10https://gerrit.wikimedia.org/r/1212058 [08:34:40] (03PS1) 10Muehlenhoff: Add Lukasz as approver for mailman3-roots [puppet] - 10https://gerrit.wikimedia.org/r/1212060 (https://phabricator.wikimedia.org/T276465) [08:35:59] !log T410573 Rebooting `apifeatureusage[1,2]001*`, one host at a time [08:36:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:04] T410573: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573 [08:36:43] (03PS1) 10Muehlenhoff: Add Guillaume as appprover for analytics-search-admins [puppet] - 10https://gerrit.wikimedia.org/r/1212061 (https://phabricator.wikimedia.org/T276465) [08:37:55] (03CR) 10Ayounsi: [C:04-1] "I see that it would be the first `sk-ssh-ed25519` key, so it's also possible that some devices don't support it. To be investigated." [homer/public] - 10https://gerrit.wikimedia.org/r/1212058 (owner: 10Ayounsi) [08:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [08:42:19] (03Abandoned) 10Ayounsi: Add new Yubikey ssh key [homer/public] - 10https://gerrit.wikimedia.org/r/1212058 (owner: 10Ayounsi) [08:42:53] 10ops-eqiad, 06SRE, 06DC-Ops, 10Wikidata, and 3 others: Racking request for wdqs10(2[8-9]|3[0-2]) - https://phabricator.wikimedia.org/T410406#11412039 (10RKemper) >>! In T410406#11409668, @Jclark-ctr wrote: > @bking Did you have any luck with reimage? or do you need any assistance? I believe we need hel... [08:43:21] (03PS1) 10Muehlenhoff: postgis: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1212062 (https://phabricator.wikimedia.org/T381565) [08:45:02] (03CR) 10Dpogorzelski: [C:03+1] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212056 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:45:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P85827 and previous config saved to /var/cache/conftool/dbconfig/20251127-084535-marostegui.json [08:48:21] (03CR) 10Kevin Bazira: [C:03+2] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212056 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:48:51] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212062 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [08:50:10] (03Merged) 10jenkins-bot: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212056 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:51:45] !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [08:57:12] (03CR) 10Arnaudb: [C:03+2] aptrepo: update gitlab-ce & gitlab-runner to 18.4 [puppet] - 10https://gerrit.wikimedia.org/r/1211985 (https://phabricator.wikimedia.org/T411160) (owner: 10Arnaudb) [08:57:54] (03CR) 10Daniel Kinzler: [C:04-1] api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: 10Pmiazga) [09:00:05] jnuche and brennen: gettimeofday() says it's time for MediaWiki train - Utc-0+Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0900) [09:00:19] hi there, train will rollout soon [09:00:44] 06SRE, 10SRE-swift-storage, 10Infrastructure Security, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), and 3 others: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573#11412071 (10MoritzMuehlenhoff) >>! In T410573#11412023, @RKemper wrote: > Ah yeah the la... [09:00:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T410531)', diff saved to https://phabricator.wikimedia.org/P85828 and previous config saved to /var/cache/conftool/dbconfig/20251127-090043-marostegui.json [09:00:49] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [09:01:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance [09:01:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2154 (T410531)', diff saved to https://phabricator.wikimedia.org/P85829 and previous config saved to /var/cache/conftool/dbconfig/20251127-090107-marostegui.json [09:04:17] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1211727 (owner: 10AOkoth) [09:04:50] (03PS1) 10TrainBranchBot: group2 to 1.46.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212065 (https://phabricator.wikimedia.org/T408274) [09:04:53] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by jnuche@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212065 (https://phabricator.wikimedia.org/T408274) (owner: 10TrainBranchBot) [09:05:47] (03Merged) 10jenkins-bot: group2 to 1.46.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212065 (https://phabricator.wikimedia.org/T408274) (owner: 10TrainBranchBot) [09:08:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T410531)', diff saved to https://phabricator.wikimedia.org/P85830 and previous config saved to /var/cache/conftool/dbconfig/20251127-090816-marostegui.json [09:08:22] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [09:11:19] !log installing gdk-pixbuf security updates [09:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:52] !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.4 refs T408274 [09:11:54] (03CR) 10LSobanski: [C:03+1] Add Lukasz as approver for mailman3-roots [puppet] - 10https://gerrit.wikimedia.org/r/1212060 (https://phabricator.wikimedia.org/T276465) (owner: 10Muehlenhoff) [09:11:57] T408274: 1.46.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T408274 [09:15:55] (03CR) 10Muehlenhoff: [C:03+2] Add Lukasz as approver for mailman3-roots [puppet] - 10https://gerrit.wikimedia.org/r/1212060 (https://phabricator.wikimedia.org/T276465) (owner: 10Muehlenhoff) [09:16:09] (03PS1) 10Jcrespo: bacula: Add helper function to check for database files size on bacula db [puppet] - 10https://gerrit.wikimedia.org/r/1212067 [09:16:43] 06SRE, 10Infrastructure Security, 06Infrastructure-Foundations, 13Patch-For-Review: puppet admin module: Assign approvers to unix groups - https://phabricator.wikimedia.org/T276465#11412130 (10MoritzMuehlenhoff) [09:16:45] (03CR) 10CI reject: [V:04-1] bacula: Add helper function to check for database files size on bacula db [puppet] - 10https://gerrit.wikimedia.org/r/1212067 (owner: 10Jcrespo) [09:18:34] jouncebot: nowandnext [09:18:34] For the next 1 hour(s) and 41 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0900) [09:18:34] In 1 hour(s) and 41 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1100) [09:18:47] ah, it's train time. okay. [09:19:11] (03PS2) 10Jcrespo: bacula: Add helper function to check for database files size on bacula db [puppet] - 10https://gerrit.wikimedia.org/r/1212067 [09:20:46] (03CR) 10Jcrespo: [C:03+2] bacula: Add helper function to check for database files size on bacula db [puppet] - 10https://gerrit.wikimedia.org/r/1212067 (owner: 10Jcrespo) [09:21:22] !log upgrade Envoy on cloudweb* T405808 [09:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:27] T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808 [09:23:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P85831 and previous config saved to /var/cache/conftool/dbconfig/20251127-092323-marostegui.json [09:24:04] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for dsmit - https://phabricator.wikimedia.org/T410426#11412139 (10DSmit-WMF) Thank you. I see data in the dashboard now. thanks! [09:25:31] urbanecm: train deploy is complete and things look healthy right now. You can go ahead if you need to backport something [09:26:58] (03CR) 10FNegri: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [09:27:13] (03CR) 10FNegri: [C:03+1] toolforge: add ingress for infra-tracing-loki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [09:28:11] (03PS2) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 [09:28:40] (03CR) 10CI reject: [V:04-1] Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (owner: 10Jcrespo) [09:33:51] (03PS3) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 [09:34:37] jnuche: thank you! [09:34:48] (03CR) 10Urbanecm: [C:03+2] beta: Enable UserEmailConfirmationUseHTML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211809 (https://phabricator.wikimedia.org/T396155) (owner: 10Urbanecm) [09:34:50] (03CR) 10Urbanecm: [C:03+2] testwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211812 (https://phabricator.wikimedia.org/T396155) (owner: 10Urbanecm) [09:35:59] (03Merged) 10jenkins-bot: beta: Enable UserEmailConfirmationUseHTML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211809 (https://phabricator.wikimedia.org/T396155) (owner: 10Urbanecm) [09:36:02] (03Merged) 10jenkins-bot: testwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211812 (https://phabricator.wikimedia.org/T396155) (owner: 10Urbanecm) [09:36:52] urbanecm: I have a config patch to sync when you're done [09:38:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P85832 and previous config saved to /var/cache/conftool/dbconfig/20251127-093831-marostegui.json [09:38:49] (03PS4) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) [09:38:54] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo) [09:39:54] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [09:40:47] (03CR) 10CI reject: [V:04-1] Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo) [09:40:50] PROBLEM - MegaRAID on an-worker1148 is CRITICAL: CRITICAL: 12 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [09:42:05] (03PS5) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) [09:43:08] (03PS1) 10DCausse: cirrus: stop writing to relforge [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212069 [09:43:15] (03PS3) 10Kosta Harlan: CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) [09:43:29] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1014 - ayounsi@cumin1003" [09:43:33] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1014 - ayounsi@cumin1003" [09:43:33] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [09:45:16] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache maps1014.eqiad.wmnet on all recursors [09:45:20] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) maps1014.eqiad.wmnet on all recursors [09:47:16] urbanecm: are you still syncing? [09:48:56] (03PS6) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 [09:49:01] !log installing krb5 security updates [09:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:25] (03PS7) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T41002) [09:49:40] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T41002) (owner: 10Jcrespo) [09:51:30] (03CR) 10CI reject: [V:04-1] Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T41002) (owner: 10Jcrespo) [09:52:06] (03CR) 10Btullis: [C:03+1] "Looks good to me." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212069 (owner: 10DCausse) [09:52:07] jnuche: I think urbanecm is finished, is it ok if I sync a patch now? [09:52:28] (03PS1) 10Fabfur: cache::text: enable unid and browser flags rate limits in magru [puppet] - 10https://gerrit.wikimedia.org/r/1212073 (https://phabricator.wikimedia.org/T406545) [09:52:30] (03PS1) 10Fabfur: cache::text: enable unid and browser flags rate limits in drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) [09:52:37] (03CR) 10Elukey: [C:03+1] postgis: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1212062 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [09:53:15] (03CR) 10DCausse: [C:03+2] "thanks Ben!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212069 (owner: 10DCausse) [09:53:32] (03PS8) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T41002) [09:53:38] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet [09:53:39] jouncebot: nowandnext [09:53:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T410531)', diff saved to https://phabricator.wikimedia.org/P85833 and previous config saved to /var/cache/conftool/dbconfig/20251127-095339-marostegui.json [09:53:39] For the next 1 hour(s) and 6 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T0900) [09:53:40] In 1 hour(s) and 6 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1100) [09:53:48] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [09:53:55] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance [09:54:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) (owner: 10Kosta Harlan) [09:54:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2163 (T410531)', diff saved to https://phabricator.wikimedia.org/P85834 and previous config saved to /var/cache/conftool/dbconfig/20251127-095402-marostegui.json [09:54:03] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T41002) (owner: 10Jcrespo) [09:54:19] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [09:54:21] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [09:54:44] kostajh: well, I see you already started [09:54:54] (03Merged) 10jenkins-bot: CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) (owner: 10Kosta Harlan) [09:55:11] (03Merged) 10jenkins-bot: cirrus: stop writing to relforge [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212069 (owner: 10DCausse) [09:55:17] jnuche: it looks like urbanecm patch didn't actually sync [09:55:19] (03PS2) 10Alexandros Kosiaris: base: Switch away from legacy fact, lint ignore $::realm [puppet] - 10https://gerrit.wikimedia.org/r/1208013 [09:55:21] it's just been merged :/ [09:55:28] (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208013 (owner: 10Alexandros Kosiaris) [09:55:44] however, it's just beta and testwiki, so I am fine syncing this alongside mine [09:55:58] yeah, I think it's ok [09:56:25] ok [09:56:31] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1204243|CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (T409840)]] [09:56:36] T409840: Enable UIC by default for privileged user groups on enwiki - https://phabricator.wikimedia.org/T409840 [09:56:46] (03PS9) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) [09:57:08] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo) [09:58:29] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1204243|CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (T409840)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [10:00:08] !log kharlan@deploy2002 kharlan: Continuing with sync [10:00:33] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet [10:00:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T410531)', diff saved to https://phabricator.wikimedia.org/P85835 and previous config saved to /var/cache/conftool/dbconfig/20251127-100054-marostegui.json [10:00:59] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [10:02:57] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [10:03:01] (03CR) 10Jcrespo: [C:03+2] Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo) [10:03:08] !log arnaudb@cumin1003 START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit2003.wikimedia.org [10:04:21] (03PS1) 10Jcrespo: Revert^3 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1212077 [10:04:27] (03CR) 10Jcrespo: [V:03+2 C:03+2] Revert^3 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1212077 (owner: 10Jcrespo) [10:05:42] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1204243|CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (T409840)]] (duration: 09m 11s) [10:05:47] T409840: Enable UIC by default for privileged user groups on enwiki - https://phabricator.wikimedia.org/T409840 [10:06:39] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1013 - ayounsi@cumin1003" [10:06:43] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1013 - ayounsi@cumin1003" [10:06:43] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:06:53] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache maps1013.eqiad.wmnet on all recursors [10:06:56] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) maps1013.eqiad.wmnet on all recursors [10:07:06] (03PS1) 10Jcrespo: Revert^4 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1212080 [10:07:41] (03CR) 10Jcrespo: "this works now for all hosts, but it fails for backup2014 with:" [puppet] - 10https://gerrit.wikimedia.org/r/1212080 (owner: 10Jcrespo) [10:07:42] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:08:41] !log arnaudb@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab [10:08:44] (03CR) 10Majavah: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [10:08:48] (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [10:09:05] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet [10:09:32] (03PS1) 10Gkyziridis: ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) [10:09:35] 06SRE, 07SRE-Unowned, 10Maps: Setup a maps staging DB - https://phabricator.wikimedia.org/T409528#11412245 (10elukey) Next steps: * Wait for imposm to catch up from OSM upstream. * Warm up the Tegola cache on Swift. After the above two steps the stack will be ready to be used :) [10:10:17] !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [10:10:19] (03CR) 10CI reject: [V:04-1] ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) (owner: 10Gkyziridis) [10:10:39] !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:11:07] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212073 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:11:41] (03PS2) 10Gkyziridis: ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) [10:12:11] (03CR) 10Muehlenhoff: [C:03+2] postgis: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1212062 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [10:12:35] (03CR) 10CI reject: [V:04-1] ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) (owner: 10Gkyziridis) [10:13:04] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit2003.wikimedia.org [10:13:18] !log arnaudb@cumin1003 START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit2003.wikimedia.org [10:13:46] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit2003.wikimedia.org [10:13:56] (03CR) 10Giuseppe Lavagetto: [C:03+1] cache::text: enable unid and browser flags rate limits in magru [puppet] - 10https://gerrit.wikimedia.org/r/1212073 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:14:03] (03CR) 10Majavah: [C:03+1] toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [10:14:05] (03CR) 10Giuseppe Lavagetto: [C:03+1] cache::text: enable unid and browser flags rate limits in drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:14:22] (03PS3) 10Gkyziridis: ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) [10:15:10] (03CR) 10CI reject: [V:04-1] ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) (owner: 10Gkyziridis) [10:16:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P85836 and previous config saved to /var/cache/conftool/dbconfig/20251127-101601-marostegui.json [10:16:11] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet [10:17:08] (03PS1) 10DCausse: cirrus: set null sink in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212083 [10:17:11] (03CR) 10Gkyziridis: "Thnx for creating this patch for deploying rr filter for thwiki." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207932 (https://phabricator.wikimedia.org/T409438) (owner: 10Kgraessle) [10:17:51] (03CR) 10Gkyziridis: Enable revertrisk filters in thwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207932 (https://phabricator.wikimedia.org/T409438) (owner: 10Kgraessle) [10:18:49] 10SRE-Access-Requests: Yubikey-SSH-FIDO for Tiziano Fogli (tappof) - https://phabricator.wikimedia.org/T411167 (10tappof) 03NEW [10:18:54] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab [10:19:45] (03CR) 10Fabfur: [C:03+2] cache::text: enable unid and browser flags rate limits in magru [puppet] - 10https://gerrit.wikimedia.org/r/1212073 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:20:59] (03CR) 10DCausse: [C:03+2] cirrus: set null sink in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212083 (owner: 10DCausse) [10:21:40] (03PS1) 10Tiziano Fogli: ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212085 (https://phabricator.wikimedia.org/T411167) [10:22:21] (03CR) 10CI reject: [V:04-1] ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212085 (https://phabricator.wikimedia.org/T411167) (owner: 10Tiziano Fogli) [10:22:51] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [10:22:57] (03Merged) 10jenkins-bot: cirrus: set null sink in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212083 (owner: 10DCausse) [10:23:00] jnuche: I have another one (fix for T411166) that I'd like to sync, probably in about 20-30 minutes from now [10:23:00] T411166: TypeError: MediaWiki\Extension\ConfirmEdit\hCaptcha\HCaptcha::logCheckError(): Argument #3 ($token) must be of type string, null given, called in /srv/mediawiki/php-1.46.0-wmf.4/extensions/ConfirmEdit/includes/hCaptcha/HCaptcha - https://phabricator.wikimedia.org/T411166 [10:23:16] (03PS2) 10Tiziano Fogli: ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212085 (https://phabricator.wikimedia.org/T411167) [10:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:25:05] !log arnaudb@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade gitlab [10:25:08] !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [10:25:17] !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:25:20] (03CR) 10Filippo Giunchedi: [C:03+1] toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [10:26:36] (03PS1) 10Gkyziridis: ores-extension: Enable revertrisklanguageagnostic on multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212086 (https://phabricator.wikimedia.org/T408607) [10:27:02] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1012 - ayounsi@cumin1003" [10:27:07] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1012 - ayounsi@cumin1003" [10:27:07] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:27:27] (03Abandoned) 10Gkyziridis: ores-extension: Enable revertrisk filters for multiple wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212081 (https://phabricator.wikimedia.org/T408607) (owner: 10Gkyziridis) [10:27:45] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache maps1012.eqiad.wmnet on all recursors [10:27:49] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) maps1012.eqiad.wmnet on all recursors [10:28:10] !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [10:28:40] !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:29:03] (03CR) 10Gkyziridis: "I avoided to run `composer manage-dblist add {wiki_name} ores` so I just enabled the filters and the rr_languageagnostic model for the wik" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212086 (https://phabricator.wikimedia.org/T408607) (owner: 10Gkyziridis) [10:29:31] (03CR) 10Majavah: [V:03+1 C:03+2] P:wmcs::cloudgw: Use interface::route wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1196367 (owner: 10Majavah) [10:31:04] (03CR) 10Fabfur: [C:03+2] cache::text: enable unid and browser flags rate limits in drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1212074 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:31:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P85837 and previous config saved to /var/cache/conftool/dbconfig/20251127-103109-marostegui.json [10:32:31] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet [10:33:32] !log taavi@cumin1003 START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet [10:34:08] (03CR) 10Ayounsi: [C:03+2] Outbound saturation: add transport interfaces [alerts] - 10https://gerrit.wikimedia.org/r/1206849 (https://phabricator.wikimedia.org/T409330) (owner: 10Ayounsi) [10:34:25] (03CR) 10FNegri: [C:03+1] hieradata: cloudlb: Add x4 section to wiki replicas [puppet] - 10https://gerrit.wikimedia.org/r/1203042 (https://phabricator.wikimedia.org/T409560) (owner: 10Majavah) [10:34:59] (03CR) 10FNegri: [C:03+1] conftool-data: Add clouddb1024/5 as x4 [puppet] - 10https://gerrit.wikimedia.org/r/1211083 (https://phabricator.wikimedia.org/T409557) (owner: 10Majavah) [10:35:17] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade gitlab [10:35:48] (03Merged) 10jenkins-bot: Outbound saturation: add transport interfaces [alerts] - 10https://gerrit.wikimedia.org/r/1206849 (https://phabricator.wikimedia.org/T409330) (owner: 10Ayounsi) [10:36:12] (03CR) 10Majavah: [C:03+2] conftool-data: Add clouddb1024/5 as x4 [puppet] - 10https://gerrit.wikimedia.org/r/1211083 (https://phabricator.wikimedia.org/T409557) (owner: 10Majavah) [10:36:21] (03CR) 10Majavah: [V:03+1 C:03+2] hieradata: cloudlb: Add x4 section to wiki replicas [puppet] - 10https://gerrit.wikimedia.org/r/1203042 (https://phabricator.wikimedia.org/T409560) (owner: 10Majavah) [10:37:39] !log taavi@cumin1003 conftool action : set/pooled=yes:weight=100; selector: cluster=wikireplica-db-analytics,service=x4 [10:37:39] (03CR) 10Muehlenhoff: [C:03+1] "Looks good and verified out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1212085 (https://phabricator.wikimedia.org/T411167) (owner: 10Tiziano Fogli) [10:37:45] !log taavi@cumin1003 conftool action : set/pooled=yes:weight=100; selector: cluster=wikireplica-db-web,service=x4 [10:38:42] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [10:39:02] !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1187 (T299441)', diff saved to https://phabricator.wikimedia.org/P85838 and previous config saved to /var/cache/conftool/dbconfig/20251127-103901-fceratto.json [10:39:07] T299441: Avoid depooling hosts if the schema change has been applied before - https://phabricator.wikimedia.org/T299441 [10:39:33] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet [10:39:55] !log taavi@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet [10:39:58] (03PS5) 10Majavah: P:wmcs::cloudgw: Remove absented resources [puppet] - 10https://gerrit.wikimedia.org/r/1211012 [10:39:59] (03PS2) 10Majavah: hieradata: cloudgw: Move shared data to role file [puppet] - 10https://gerrit.wikimedia.org/r/1211666 [10:39:59] (03PS2) 10Majavah: hieradata: cloudgw: Configure individual v6 networks [puppet] - 10https://gerrit.wikimedia.org/r/1211667 (https://phabricator.wikimedia.org/T411081) [10:41:13] (03PS1) 10DCausse: cirrus: bump job image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212092 (https://phabricator.wikimedia.org/T410602) [10:41:47] 06SRE, 06Infrastructure-Foundations: Updates of passwords of users created with postgresql::user / PostgreSQL change to scram-sha256 - https://phabricator.wikimedia.org/T326325#11412364 (10fgiunchedi) Leaving a suggestion here for a workaround for the record: while having a native pg facility to detect passwor... [10:42:20] (03PS2) 10Ayounsi: Add alerting for core link saturation [alerts] - 10https://gerrit.wikimedia.org/r/1206855 (https://phabricator.wikimedia.org/T409330) [10:43:38] (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM, didn't run PCC though I'm assuming you tried this in Pontoon and it worked as expected" [puppet] - 10https://gerrit.wikimedia.org/r/1211666 (owner: 10Majavah) [10:45:33] (03CR) 10Ayounsi: [C:03+2] Add alerting for core link saturation [alerts] - 10https://gerrit.wikimedia.org/r/1206855 (https://phabricator.wikimedia.org/T409330) (owner: 10Ayounsi) [10:46:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T410531)', diff saved to https://phabricator.wikimedia.org/P85839 and previous config saved to /var/cache/conftool/dbconfig/20251127-104617-marostegui.json [10:46:23] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [10:46:34] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance [10:46:35] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [10:46:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2164 (T410531)', diff saved to https://phabricator.wikimedia.org/P85840 and previous config saved to /var/cache/conftool/dbconfig/20251127-104641-marostegui.json [10:46:42] (03Merged) 10jenkins-bot: Add alerting for core link saturation [alerts] - 10https://gerrit.wikimedia.org/r/1206855 (https://phabricator.wikimedia.org/T409330) (owner: 10Ayounsi) [10:47:33] (03PS1) 10Fabfur: hiera: remove custom ratelimit for cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1212093 (https://phabricator.wikimedia.org/T406545) [10:48:27] (03PS1) 10Fabfur: Revert "cache::text: enable unid and browser flags rate limits in magru" [puppet] - 10https://gerrit.wikimedia.org/r/1212095 [10:48:42] (03PS1) 10Fabfur: Revert "cache::text: enable unid and browser flags rate limits in drmrs" [puppet] - 10https://gerrit.wikimedia.org/r/1212096 [10:49:13] (03PS2) 10Majavah: wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650 [10:49:13] (03PS2) 10Majavah: firewall: Use exported resources to fix ordering issues [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) [10:49:13] (03PS2) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) [10:49:14] (03PS1) 10Majavah: nftables::service: Improve src/dst filter handling [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) [10:50:03] (03CR) 10CI reject: [V:04-1] wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650 (owner: 10Majavah) [10:50:20] (03CR) 10Tiziano Fogli: [C:03+2] ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212085 (https://phabricator.wikimedia.org/T411167) (owner: 10Tiziano Fogli) [10:50:41] (03CR) 10CI reject: [V:04-1] nftables::service: Improve src/dst filter handling [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) (owner: 10Majavah) [10:50:58] (03PS2) 10Majavah: nftables::service: Improve src/dst filter handling [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) [10:50:58] (03PS3) 10Majavah: wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650 [10:50:58] (03PS3) 10Majavah: firewall: Use exported resources to fix ordering issues [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) [10:50:59] (03PS3) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) [10:51:11] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212093 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [10:51:22] !log btullis@deploy2002 helmfile [staging] START helmfile.d/services/eventgate-main: sync [10:51:31] !log btullis@deploy2002 helmfile [staging] DONE helmfile.d/services/eventgate-main: sync [10:52:15] ayounsi@cumin1003 netbox (PID 2450158) is awaiting input [10:53:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T410531)', diff saved to https://phabricator.wikimedia.org/P85841 and previous config saved to /var/cache/conftool/dbconfig/20251127-105341-marostegui.json [10:53:51] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7768/co" [puppet] - 10https://gerrit.wikimedia.org/r/1211666 (owner: 10Majavah) [10:53:53] (03CR) 10CI reject: [V:04-1] P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah) [10:53:55] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [10:54:11] (03CR) 10Muehlenhoff: [C:03+2] Remove maintenance-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/1211181 (owner: 10Muehlenhoff) [10:54:31] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1011 - ayounsi@cumin1003" [10:54:35] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA for maps1011 - ayounsi@cumin1003" [10:54:35] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:54:57] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache maps1011.eqiad.wmnet on all recursors [10:55:01] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) maps1011.eqiad.wmnet on all recursors [10:55:01] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic, 13Patch-For-Review: Transport link saturation not alerting - https://phabricator.wikimedia.org/T409330#11412388 (10ayounsi) 05Open→03Resolved Paging alerting added. I won't disable the LibreNMS one for now, but only in the future to make s... [10:55:14] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet [10:56:27] (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) (owner: 10Majavah) [10:57:16] !log a-pizzata@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-main: sync [10:57:38] !log a-pizzata@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync [10:58:27] (03PS1) 10Muehlenhoff: sre.maps.roll-restart-reboot-master: Adapt to Bookworm changes [cookbooks] - 10https://gerrit.wikimedia.org/r/1212099 (https://phabricator.wikimedia.org/T381565) [10:58:49] !log a-pizzata@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-main: sync [10:59:10] !log a-pizzata@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1100) [11:00:52] (03PS1) 10Muehlenhoff: sre.maps.roll-restart-reboot: Adapt for Bookworm [cookbooks] - 10https://gerrit.wikimedia.org/r/1212100 (https://phabricator.wikimedia.org/T381565) [11:01:56] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet [11:05:51] 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11412416 (10elukey) >>! In T410075#11409819, @Eevans wrote: > > Well, like I mentioned above, (among other things) it avoids a significant number of network hops by not relying on randoml... [11:06:18] (03CR) 10Elukey: [C:03+1] sre.maps.roll-restart-reboot-master: Adapt to Bookworm changes [cookbooks] - 10https://gerrit.wikimedia.org/r/1212099 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [11:08:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P85842 and previous config saved to /var/cache/conftool/dbconfig/20251127-110850-marostegui.json [11:14:18] (03CR) 10Muehlenhoff: [C:03+2] sre.maps.roll-restart-reboot-master: Adapt to Bookworm changes [cookbooks] - 10https://gerrit.wikimedia.org/r/1212099 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [11:16:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:16:47] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:17:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2164', diff saved to https://phabricator.wikimedia.org/P85843 and previous config saved to /var/cache/conftool/dbconfig/20251127-111712-marostegui.json [11:17:21] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164.codfw.wmnet quickly with 2 steps - repool after schema change [11:17:24] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2164.codfw.wmnet quickly with 2 steps - repool after schema change [11:17:40] (03PS1) 10Michael Große: fix(ReviseTone): only initialize once [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212106 [11:18:06] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:18:10] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:18:19] (03PS1) 10Michael Große: fix(ReviseTone): render behind EditNotice on mobile [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212108 [11:18:28] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2161.codfw.wmnet with reason: Maintenance [11:18:43] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [11:19:28] apergos: 👋 I'm running the train this week but something came up and I won't be able to join the train log triage. Just wanted to give you a headsup [11:19:33] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:19:35] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2164.codfw.wmnet gradually with 4 steps - repool after schema change [11:24:43] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [11:24:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance [11:24:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2147 (T410531)', diff saved to https://phabricator.wikimedia.org/P85846 and previous config saved to /var/cache/conftool/dbconfig/20251127-112453-marostegui.json [11:24:58] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [11:25:05] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164 gradually with 4 steps - repool after schema change [11:28:05] (03PS1) 10Ayounsi: re-add pc* and restbase* to the clusters with no AAAA [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1212110 (https://phabricator.wikimedia.org/T253173) [11:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:30:17] (03CR) 10Majavah: [C:03+2] P:wmcs::cloudgw: Remove absented resources [puppet] - 10https://gerrit.wikimedia.org/r/1211012 (owner: 10Majavah) [11:30:25] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11412473 (10cmooney) 05Open→03Resolved [11:30:35] (03CR) 10Muehlenhoff: [C:03+1] "Looks good, but adding also Manuel and Eric for confirmation" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1212110 (https://phabricator.wikimedia.org/T253173) (owner: 10Ayounsi) [11:31:24] (03CR) 10Marostegui: [C:03+1] "+1 for the pc*. For restbase, we should wait for Eric." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1212110 (https://phabricator.wikimedia.org/T253173) (owner: 10Ayounsi) [11:31:31] (03PS1) 10Federico Ceratto: sre.mysql.pool: Pass hostname to dbctl's get() [cookbooks] - 10https://gerrit.wikimedia.org/r/1212114 (https://phabricator.wikimedia.org/T391581) [11:31:56] (03CR) 10Majavah: [V:03+1 C:03+2] hieradata: cloudgw: Move shared data to role file [puppet] - 10https://gerrit.wikimedia.org/r/1211666 (owner: 10Majavah) [11:31:59] (03CR) 10Marostegui: [C:03+1] sre.mysql.pool: Pass hostname to dbctl's get() [cookbooks] - 10https://gerrit.wikimedia.org/r/1212114 (https://phabricator.wikimedia.org/T391581) (owner: 10Federico Ceratto) [11:33:06] !log marostegui@cumin1003 END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2164 gradually with 4 steps - repool after schema change [11:33:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2164', diff saved to https://phabricator.wikimedia.org/P85849 and previous config saved to /var/cache/conftool/dbconfig/20251127-113315-marostegui.json [11:33:19] (03CR) 10Ayounsi: "For context I removed them recently in that change : I6bc586cf4661d7cfa17f66d6efcc340b0ab5994a" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1212110 (https://phabricator.wikimedia.org/T253173) (owner: 10Ayounsi) [11:34:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db2164 gradually with 4 steps - repool after schema change [11:35:35] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [11:37:18] (03CR) 10CI reject: [V:04-1] sre.mysql.pool: Pass hostname to dbctl's get() [cookbooks] - 10https://gerrit.wikimedia.org/r/1212114 (https://phabricator.wikimedia.org/T391581) (owner: 10Federico Ceratto) [11:41:13] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance [11:41:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1190 (T410531)', diff saved to https://phabricator.wikimedia.org/P85851 and previous config saved to /var/cache/conftool/dbconfig/20251127-114120-marostegui.json [11:41:26] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [11:43:05] 10SRE-tools, 06cloud-services-team, 06Infrastructure-Foundations, 07IPv6: Some WMCS clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271139#11412503 (10ayounsi) 05Open→03Resolved a:03ayounsi All solved. [11:43:35] (03CR) 10Majavah: "See the Phabricator task for additional context. PCC seems to show a diff for every single host due to a change in the `require` field, bu" [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) (owner: 10Majavah) [11:43:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T410531)', diff saved to https://phabricator.wikimedia.org/P85852 and previous config saved to /var/cache/conftool/dbconfig/20251127-114355-marostegui.json [11:44:48] (03PS3) 10Majavah: hieradata: cloudgw: Configure individual v6 networks [puppet] - 10https://gerrit.wikimedia.org/r/1211667 (https://phabricator.wikimedia.org/T411081) [11:46:51] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7769/co" [puppet] - 10https://gerrit.wikimedia.org/r/1211667 (https://phabricator.wikimedia.org/T411081) (owner: 10Majavah) [11:49:34] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [11:52:38] 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 07IPv6: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136#11412520 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff These are all done, the remaining Ganeti nodes w/o AAAA records were... [11:57:26] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212106 (owner: 10Michael Große) [11:57:46] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212108 (owner: 10Michael Große) [11:58:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T410531)', diff saved to https://phabricator.wikimedia.org/P85855 and previous config saved to /var/cache/conftool/dbconfig/20251127-115807-marostegui.json [11:58:13] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [11:59:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P85856 and previous config saved to /var/cache/conftool/dbconfig/20251127-115903-marostegui.json [12:01:42] (03PS1) 10Kosta Harlan: hCaptcha: Handle requests without a token [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212119 (https://phabricator.wikimedia.org/T411166) [12:02:01] jouncebot: nowandnext [12:02:01] No deployments scheduled for the next 0 hour(s) and 57 minute(s) [12:02:01] In 0 hour(s) and 57 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1300) [12:02:14] I will deploy the wmf.4 patch above, unless there are objections [12:03:15] no objection [12:05:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212119 (https://phabricator.wikimedia.org/T411166) (owner: 10Kosta Harlan) [12:06:09] (03PS5) 10Blake: alerting: Add an alert for when Kafka brokers need a rolling restart. [alerts] - 10https://gerrit.wikimedia.org/r/1212113 (https://phabricator.wikimedia.org/T410552) [12:13:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P85858 and previous config saved to /var/cache/conftool/dbconfig/20251127-121315-marostegui.json [12:13:16] jouncebot: nowandnext [12:13:16] No deployments scheduled for the next 0 hour(s) and 46 minute(s) [12:13:16] In 0 hour(s) and 46 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1300) [12:13:30] kostajh: Can you ping me when you are done? [12:13:42] Like to apply a private code change [12:13:58] Dreamy_Jazz: sure [12:14:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P85859 and previous config saved to /var/cache/conftool/dbconfig/20251127-121410-marostegui.json [12:16:44] (03CR) 10Volans: [C:03+2] toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [12:19:24] (03Merged) 10jenkins-bot: hCaptcha: Handle requests without a token [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212119 (https://phabricator.wikimedia.org/T411166) (owner: 10Kosta Harlan) [12:19:44] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1212119|hCaptcha: Handle requests without a token (T411166)]] [12:19:49] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2164 gradually with 4 steps - repool after schema change [12:19:50] T411166: TypeError: MediaWiki\Extension\ConfirmEdit\hCaptcha\HCaptcha::logCheckError(): Argument #3 ($token) must be of type string, null given, called in /srv/mediawiki/php-1.46.0-wmf.4/extensions/ConfirmEdit/includes/hCaptcha/HCaptcha - https://phabricator.wikimedia.org/T411166 [12:21:40] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1212119|hCaptcha: Handle requests without a token (T411166)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:22:06] (03PS1) 10Sergio Gimeno: instrumentation(ReviseTone): fix stream for edits and refine exposure [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212128 (https://phabricator.wikimedia.org/T405177) [12:22:18] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212128 (https://phabricator.wikimedia.org/T405177) (owner: 10Sergio Gimeno) [12:24:13] (03PS1) 10Btullis: Bind the spark-job-orchestration role to the default serviceaccount [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212130 (https://phabricator.wikimedia.org/T410017) [12:26:31] !log kharlan@deploy2002 kharlan: Continuing with sync [12:27:02] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [12:28:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P85861 and previous config saved to /var/cache/conftool/dbconfig/20251127-122822-marostegui.json [12:28:47] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412594 (10ayounsi) @jcrespo do you know if that bug got fixed and if we could have that daemon listen on IPv6 now ? [12:29:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T410531)', diff saved to https://phabricator.wikimedia.org/P85862 and previous config saved to /var/cache/conftool/dbconfig/20251127-122918-marostegui.json [12:29:23] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [12:29:34] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance [12:29:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2155 (T410531)', diff saved to https://phabricator.wikimedia.org/P85863 and previous config saved to /var/cache/conftool/dbconfig/20251127-122941-marostegui.json [12:30:34] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1212119|hCaptcha: Handle requests without a token (T411166)]] (duration: 10m 50s) [12:30:39] T411166: TypeError: MediaWiki\Extension\ConfirmEdit\hCaptcha\HCaptcha::logCheckError(): Argument #3 ($token) must be of type string, null given, called in /srv/mediawiki/php-1.46.0-wmf.4/extensions/ConfirmEdit/includes/hCaptcha/HCaptcha - https://phabricator.wikimedia.org/T411166 [12:30:42] Dreamy_Jazz: over to you [12:30:52] Thanks! [12:32:13] jouncebot: nowandnext [12:32:13] No deployments scheduled for the next 0 hour(s) and 27 minute(s) [12:32:13] In 0 hour(s) and 27 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1300) [12:34:20] !log installing sqlite3 security updates [12:34:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:22] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412635 (10jcrespo) No idea, it is the first time I've seen this task. Do you want me to test a bacula storage to listen on :: ? [12:38:49] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412660 (10ayounsi) Sure, thanks ! I also discovered it just now by reviewing the sub-tasks of {T253173}. [12:39:31] (03PS1) 10Marco Fossati: ReaderExperiments' StickyHeaders stream configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212134 (https://phabricator.wikimedia.org/T410533) [12:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [12:42:48] 06SRE, 10SRE-tools, 06Infrastructure-Foundations: sre.ganeti.makevm: Create machine types - https://phabricator.wikimedia.org/T344972#11412674 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03None [12:43:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T410531)', diff saved to https://phabricator.wikimedia.org/P85864 and previous config saved to /var/cache/conftool/dbconfig/20251127-124330-marostegui.json [12:43:36] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [12:43:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance [12:43:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1199 (T410531)', diff saved to https://phabricator.wikimedia.org/P85865 and previous config saved to /var/cache/conftool/dbconfig/20251127-124353-marostegui.json [12:45:06] I'm done with my deploy [12:48:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T410531)', diff saved to https://phabricator.wikimedia.org/P85866 and previous config saved to /var/cache/conftool/dbconfig/20251127-124824-marostegui.json [12:50:23] (03PS2) 10Federico Ceratto: sre.mysql.pool: Pass hostname to dbctl's get() [cookbooks] - 10https://gerrit.wikimedia.org/r/1212114 (https://phabricator.wikimedia.org/T391581) [12:52:02] (03CR) 10Muehlenhoff: [C:03+2] Switch the cluster::cloud_management role to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1210395 (owner: 10Muehlenhoff) [12:52:13] (03CR) 10Klausman: [C:03+1] ml-services: Separate eqiad and codfw deployments for Revise Tone. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [12:58:50] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet [12:59:48] (03CR) 10Urbanecm: CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) (owner: 10Kosta Harlan) [12:59:52] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412742 (10jcrespo) Doing: ` SDAddresses = { ipv4 = { addr = backup1012.eqiad.wmnet; port = 9103; } ipv6 = {... [13:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1300) [13:00:45] (03CR) 10Kosta Harlan: CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) (owner: 10Kosta Harlan) [13:02:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T410531)', diff saved to https://phabricator.wikimedia.org/P85867 and previous config saved to /var/cache/conftool/dbconfig/20251127-130208-marostegui.json [13:02:14] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [13:02:40] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet [13:03:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P85868 and previous config saved to /var/cache/conftool/dbconfig/20251127-130332-marostegui.json [13:04:43] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance [13:05:02] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [13:05:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1165 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85869 and previous config saved to /var/cache/conftool/dbconfig/20251127-130509-marostegui.json [13:05:17] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [13:05:18] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [13:07:10] 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.2 point update - https://phabricator.wikimedia.org/T410147#11412788 (10MoritzMuehlenhoff) [13:07:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85870 and previous config saved to /var/cache/conftool/dbconfig/20251127-130719-marostegui.json [13:07:25] 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.2 point update - https://phabricator.wikimedia.org/T410147#11412789 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done [13:14:42] (03CR) 10Filippo Giunchedi: [C:03+1] "Chatted with Taavi on IRC: modulo s/exported/virtual/ in the commit message this LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah) [13:14:57] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412871 (10jcrespo) Interestingly, if I do: ` SDAddresses = { ipv4 = { addr = 0.0.0.0; port = 9103; } ipv6 = {... [13:14:57] !log taavi@deploy2002 mwscript-k8s job started: initEditCount --wiki=tokwiki [13:16:14] (03PS3) 10Majavah: nftables::service: Improve src/dst filter handling [puppet] - 10https://gerrit.wikimedia.org/r/1212097 (https://phabricator.wikimedia.org/T411102) [13:16:14] (03PS4) 10Majavah: wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650 [13:16:14] (03PS4) 10Majavah: firewall: Use virtual resources to fix ordering issues [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) [13:16:14] (03PS4) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) [13:16:15] (03PS1) 10Majavah: P:cloudceph::osd: Convert drange to an array [puppet] - 10https://gerrit.wikimedia.org/r/1212138 [13:16:16] (03PS1) 10Majavah: P:postfix::mx: Convert port to an integer [puppet] - 10https://gerrit.wikimedia.org/r/1212139 [13:17:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P85871 and previous config saved to /var/cache/conftool/dbconfig/20251127-131715-marostegui.json [13:17:49] (03CR) 10CI reject: [V:04-1] wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650 (owner: 10Majavah) [13:18:04] (03CR) 10Majavah: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1211650 (owner: 10Majavah) [13:18:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P85872 and previous config saved to /var/cache/conftool/dbconfig/20251127-131839-marostegui.json [13:22:00] (03CR) 10Urbanecm: CheckUser/UserInfoCard: Enable by default for some privileged groups on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204243 (https://phabricator.wikimedia.org/T409840) (owner: 10Kosta Harlan) [13:22:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P85873 and previous config saved to /var/cache/conftool/dbconfig/20251127-132226-marostegui.json [13:26:59] jouncebot: nowandnext [13:27:00] For the next 0 hour(s) and 32 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1300) [13:27:00] In 0 hour(s) and 32 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1400) [13:27:19] (03CR) 10DCausse: [C:03+2] cirrus: bump job image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212092 (https://phabricator.wikimedia.org/T410602) (owner: 10DCausse) [13:27:26] (03PS1) 10Muehlenhoff: Remove unused cassandra-test-roots group [puppet] - 10https://gerrit.wikimedia.org/r/1212140 [13:29:11] (03Merged) 10jenkins-bot: cirrus: bump job image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212092 (https://phabricator.wikimedia.org/T410602) (owner: 10DCausse) [13:31:01] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11412976 (10Volans) If you want to bind any address, from the docs at [1] it seems that you can just omit the setting and not specify any of `SDAddresses` and `SDAddress`.... [13:32:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P85875 and previous config saved to /var/cache/conftool/dbconfig/20251127-133223-marostegui.json [13:32:54] (03CR) 10Alexandros Kosiaris: [C:03+2] "INFO: Nodes: 307 NOOP 5 FAIL 21 CORE_DIFF 1 CANCELLED" [puppet] - 10https://gerrit.wikimedia.org/r/1208013 (owner: 10Alexandros Kosiaris) [13:33:29] (03CR) 10Elukey: [C:03+1] sre.maps.roll-restart-reboot: Adapt for Bookworm [cookbooks] - 10https://gerrit.wikimedia.org/r/1212100 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [13:33:48] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T410531)', diff saved to https://phabricator.wikimedia.org/P85877 and previous config saved to /var/cache/conftool/dbconfig/20251127-133347-marostegui.json [13:33:53] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [13:34:01] !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [13:34:04] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance [13:34:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2172 (T410531)', diff saved to https://phabricator.wikimedia.org/P85878 and previous config saved to /var/cache/conftool/dbconfig/20251127-133411-marostegui.json [13:34:13] !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:34:18] !log taavi@cumin1003 START - Cookbook sre.wikireplicas.add-wiki for database tokwiki (T404570) [13:34:25] T404570: [wikireplicas] Create views for new wiki tokwiki - https://phabricator.wikimedia.org/T404570 [13:35:30] (03PS1) 10Anzx: tokwiki: add logos and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212143 (https://phabricator.wikimedia.org/T411119) [13:36:28] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212143 (https://phabricator.wikimedia.org/T411119) (owner: 10Anzx) [13:37:41] !log dcausse@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply [13:38:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repool db1165', diff saved to https://phabricator.wikimedia.org/P85879 and previous config saved to /var/cache/conftool/dbconfig/20251127-133813-marostegui.json [13:38:16] !log dcausse@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:38:23] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance [13:38:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [13:38:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1165 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85880 and previous config saved to /var/cache/conftool/dbconfig/20251127-133838-marostegui.json [13:38:46] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [13:38:47] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [13:39:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85881 and previous config saved to /var/cache/conftool/dbconfig/20251127-133947-marostegui.json [13:40:50] !log dcausse@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:41:03] !log dcausse@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:42:25] (03PS3) 10Alexandros Kosiaris: Fix CSS in Docker registry builder [puppet] - 10https://gerrit.wikimedia.org/r/1113430 (owner: 10Mvolz) [13:42:58] (03CR) 10Alexandros Kosiaris: [V:03+2 C:03+2] "CI isn't going to check something here, merging. Thanks for this and sorry for not tending to it earlier." [puppet] - 10https://gerrit.wikimedia.org/r/1113430 (owner: 10Mvolz) [13:44:07] (03CR) 10Federico Ceratto: [C:03+2] sre.mysql.pool: Pass hostname to dbctl's get() [cookbooks] - 10https://gerrit.wikimedia.org/r/1212114 (https://phabricator.wikimedia.org/T391581) (owner: 10Federico Ceratto) [13:44:15] 06SRE, 10Data-Persistence-Backup, 07IPv6: update bacula-sd config so that it listens on IPv6 - https://phabricator.wikimedia.org/T253986#11413026 (10jcrespo) >>! In T253986#11412976, @Volans wrote: > If you want to bind any address, from the docs at [1] it seems that you can just omit the setting and not spe... [13:47:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T410531)', diff saved to https://phabricator.wikimedia.org/P85883 and previous config saved to /var/cache/conftool/dbconfig/20251127-134730-marostegui.json [13:47:37] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [13:47:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance [13:48:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 6 hosts with reason: Maintenance [13:48:18] (03CR) 10Alexandros Kosiaris: [C:03+1] network data: increase size of public1-ulsfo IPv4 range [puppet] - 10https://gerrit.wikimedia.org/r/1205135 (https://phabricator.wikimedia.org/T410047) (owner: 10Cathal Mooney) [13:48:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85884 and previous config saved to /var/cache/conftool/dbconfig/20251127-134816-marostegui.json [13:52:23] (03Abandoned) 10Alexandros Kosiaris: PHPFPMTooBusy: Point to public available runbook [alerts] - 10https://gerrit.wikimedia.org/r/954947 (owner: 10Alexandros Kosiaris) [13:52:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T410531)', diff saved to https://phabricator.wikimedia.org/P85885 and previous config saved to /var/cache/conftool/dbconfig/20251127-135251-marostegui.json [13:52:57] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [13:54:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P85886 and previous config saved to /var/cache/conftool/dbconfig/20251127-135454-marostegui.json [13:55:53] (03CR) 10AikoChou: ml-services: Separate eqiad and codfw deployments for Revise Tone. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [13:58:50] (03PS1) 10Filippo Giunchedi: partman: fix db-efi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/1212144 (https://phabricator.wikimedia.org/T410400) [14:00:04] Urbanecm and TheresNoTime: gettimeofday() says it's time for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1400) [14:00:05] MichaelG_WMF, Sergi0, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:18] * MichaelG_WMF is here [14:00:26] o/ [14:00:30] o/ [14:03:32] I can deploy if there are no deployers around [14:04:25] that would be great, thank you [14:04:27] @MichaelG_WMF is it ok if we do all GE patches together? [14:04:35] yes, I think that makes sense [14:05:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85887 and previous config saved to /var/cache/conftool/dbconfig/20251127-140502-marostegui.json [14:05:08] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [14:05:33] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sgimeno@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212106 (owner: 10Michael Große) [14:05:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sgimeno@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212108 (owner: 10Michael Große) [14:05:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sgimeno@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212128 (https://phabricator.wikimedia.org/T405177) (owner: 10Sergio Gimeno) [14:07:20] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1212144 (https://phabricator.wikimedia.org/T410400) (owner: 10Filippo Giunchedi) [14:07:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P85888 and previous config saved to /var/cache/conftool/dbconfig/20251127-140758-marostegui.json [14:08:29] (03CR) 10Filippo Giunchedi: [C:03+2] partman: fix db-efi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/1212144 (https://phabricator.wikimedia.org/T410400) (owner: 10Filippo Giunchedi) [14:08:58] (03CR) 10Vgutierrez: [C:03+1] hiera: remove custom ratelimit for cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1212093 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [14:09:29] (03CR) 10Muehlenhoff: [C:03+2] sre.maps.roll-restart-reboot: Adapt for Bookworm [cookbooks] - 10https://gerrit.wikimedia.org/r/1212100 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [14:10:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P85889 and previous config saved to /var/cache/conftool/dbconfig/20251127-141002-marostegui.json [14:11:54] (03PS1) 10Tiziano Fogli: ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212147 (https://phabricator.wikimedia.org/T411167) [14:12:10] (03CR) 10Fabfur: [C:03+2] hiera: remove custom ratelimit for cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1212093 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [14:14:30] (03Merged) 10jenkins-bot: fix(ReviseTone): only initialize once [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212106 (owner: 10Michael Große) [14:16:23] (03Merged) 10jenkins-bot: fix(ReviseTone): render behind EditNotice on mobile [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212108 (owner: 10Michael Große) [14:17:35] (03Merged) 10jenkins-bot: instrumentation(ReviseTone): fix stream for edits and refine exposure [extensions/GrowthExperiments] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1212128 (https://phabricator.wikimedia.org/T405177) (owner: 10Sergio Gimeno) [14:17:58] !log sgimeno@deploy2002 Started scap sync-world: Backport for [[gerrit:1212106|fix(ReviseTone): only initialize once]], [[gerrit:1212108|fix(ReviseTone): render behind EditNotice on mobile]], [[gerrit:1212128|instrumentation(ReviseTone): fix stream for edits and refine exposure (T405177 T406252)]] [14:18:04] T405177: Revise Tone: Instrumentation - https://phabricator.wikimedia.org/T405177 [14:18:05] T406252: 🧑‍💻 Instrument the Revise Tone Onboarding Quiz - https://phabricator.wikimedia.org/T406252 [14:18:25] (03PS1) 10AikoChou: changeprop: enable remaining pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212148 (https://phabricator.wikimedia.org/T408538) [14:19:53] !log sgimeno@deploy2002 sgimeno, migr: Backport for [[gerrit:1212106|fix(ReviseTone): only initialize once]], [[gerrit:1212108|fix(ReviseTone): render behind EditNotice on mobile]], [[gerrit:1212128|instrumentation(ReviseTone): fix stream for edits and refine exposure (T405177 T406252)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:20:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P85891 and previous config saved to /var/cache/conftool/dbconfig/20251127-142009-marostegui.json [14:20:12] * MichaelG_WMF is testing [14:21:42] lgtm on my side, @MichaelG_WMF let me know once you're done [14:23:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P85892 and previous config saved to /var/cache/conftool/dbconfig/20251127-142306-marostegui.json [14:24:35] I think it does fix what it was supposed to fix. This is ready to roll forward [14:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:24:43] !log sgimeno@deploy2002 sgimeno, migr: Continuing with sync [14:25:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85893 and previous config saved to /var/cache/conftool/dbconfig/20251127-142509-marostegui.json [14:25:12] sergi0: would you also deploy change i scheduled , i need someone to deploy [14:25:18] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [14:25:18] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [14:25:26] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance [14:25:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1168 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85894 and previous config saved to /var/cache/conftool/dbconfig/20251127-142533-marostegui.json [14:25:38] @anzx sure [14:25:58] thanks [14:28:43] !log sgimeno@deploy2002 Finished scap sync-world: Backport for [[gerrit:1212106|fix(ReviseTone): only initialize once]], [[gerrit:1212108|fix(ReviseTone): render behind EditNotice on mobile]], [[gerrit:1212128|instrumentation(ReviseTone): fix stream for edits and refine exposure (T405177 T406252)]] (duration: 10m 46s) [14:28:50] T405177: Revise Tone: Instrumentation - https://phabricator.wikimedia.org/T405177 [14:28:51] T406252: 🧑‍💻 Instrument the Revise Tone Onboarding Quiz - https://phabricator.wikimedia.org/T406252 [14:29:12] I guess I need to run purge list for your change to take effect in the browser [14:29:50] (03PS1) 10DCausse: cirrus: enable georgian transliteration second try profile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212150 (https://phabricator.wikimedia.org/T408737) [14:30:44] sergi0: if you mean for my change no needed to run script , will ask if necessary [14:31:02] ack [14:31:08] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sgimeno@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212143 (https://phabricator.wikimedia.org/T411119) (owner: 10Anzx) [14:31:16] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance [14:31:16] !log taavi@cumin1003 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tokwiki (T404570) [14:31:23] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1229 (T410589)', diff saved to https://phabricator.wikimedia.org/P85895 and previous config saved to /var/cache/conftool/dbconfig/20251127-143123-ladsgroup.json [14:31:25] T404570: [wikireplicas] Create views for new wiki tokwiki - https://phabricator.wikimedia.org/T404570 [14:31:31] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [14:32:09] (03Merged) 10jenkins-bot: tokwiki: add logos and sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212143 (https://phabricator.wikimedia.org/T411119) (owner: 10Anzx) [14:32:25] !log sgimeno@deploy2002 Started scap sync-world: Backport for [[gerrit:1212143|tokwiki: add logos and sitename (T411119)]] [14:32:30] T411119: Change logo for tok.wikipedia.org - https://phabricator.wikimedia.org/T411119 [14:34:29] !log sgimeno@deploy2002 anzx, sgimeno: Backport for [[gerrit:1212143|tokwiki: add logos and sitename (T411119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:35:09] sergi0: looks good, ok to continue [14:35:15] !log sgimeno@deploy2002 anzx, sgimeno: Continuing with sync [14:35:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P85896 and previous config saved to /var/cache/conftool/dbconfig/20251127-143517-marostegui.json [14:35:47] nice wordmark! [14:36:26] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [14:36:50] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [14:37:37] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [14:37:42] (03CR) 10Fabfur: [C:03+1] "lgtm, [nit] I'd remove the trailing dots if right after the URLs" [puppet] - 10https://gerrit.wikimedia.org/r/1211749 (owner: 10BryanDavis) [14:38:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T410531)', diff saved to https://phabricator.wikimedia.org/P85897 and previous config saved to /var/cache/conftool/dbconfig/20251127-143813-marostegui.json [14:38:20] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [14:38:30] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance [14:39:09] PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:39:12] (03CR) 10Ssingh: [C:03+1] "Thanks for the patch!" [puppet] - 10https://gerrit.wikimedia.org/r/1211750 (owner: 10BryanDavis) [14:39:13] (03CR) 10Ssingh: [C:03+2] varnish: Use full URL in UA block message [puppet] - 10https://gerrit.wikimedia.org/r/1211750 (owner: 10BryanDavis) [14:39:19] !log sgimeno@deploy2002 Finished scap sync-world: Backport for [[gerrit:1212143|tokwiki: add logos and sitename (T411119)]] (duration: 06m 53s) [14:39:24] T411119: Change logo for tok.wikipedia.org - https://phabricator.wikimedia.org/T411119 [14:39:56] (03CR) 10Ssingh: "Yeah that's fair. But I also realize we have a dot after the wiki one above so it _should_ be fine? Let's try and merge and then revisit." [puppet] - 10https://gerrit.wikimedia.org/r/1211749 (owner: 10BryanDavis) [14:39:57] (03CR) 10Ssingh: [C:03+2] haproxy: Use full URL in UA block message [puppet] - 10https://gerrit.wikimedia.org/r/1211749 (owner: 10BryanDavis) [14:40:08] sergi0: thanks for deploying [14:40:09] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [14:40:10] 10ops-eqiad, 06DC-Ops: eqiad: cleanup Interface enabled but not connected alert - https://phabricator.wikimedia.org/T411194 (10ayounsi) 03NEW [14:40:32] 10ops-eqiad, 06DC-Ops: eqiad: cleanup Interface enabled but not connected alert - https://phabricator.wikimedia.org/T411194#11413256 (10ayounsi) [14:40:34] (03CR) 10Majavah: "[bikeshedding, feel free to ignore] what about putting the links in ?" [puppet] - 10https://gerrit.wikimedia.org/r/1211749 (owner: 10BryanDavis) [14:41:18] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1212147 (https://phabricator.wikimedia.org/T411167) (owner: 10Tiziano Fogli) [14:41:20] yw! [14:41:33] (03CR) 10Ssingh: [C:03+2] "Yeah we can do that as well. I am assuming most people will see this message in their CLI and not the browser interface, so there's that?" [puppet] - 10https://gerrit.wikimedia.org/r/1211749 (owner: 10BryanDavis) [14:41:55] 10ops-eqiad, 06DC-Ops: eqiad: cleanup Interface enabled but not connected alert - https://phabricator.wikimedia.org/T411194#11413259 (10ayounsi) [14:41:57] (03PS1) 10Esanders: Enable DiscussionTools visual enhancements on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212157 (https://phabricator.wikimedia.org/T409297) [14:42:13] 10ops-codfw, 06DC-Ops: codfw: cleanup Interface enabled but not connected alert - https://phabricator.wikimedia.org/T411195 (10ayounsi) 03NEW [14:42:25] FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:42:26] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11413273 (10MoritzMuehlenhoff) [14:44:31] (03PS1) 10Hnowlan: thumbor: refuse to generate SVGs larger than 4096px [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212159 (https://phabricator.wikimedia.org/T411076) [14:48:03] (03CR) 10Ladsgroup: [C:03+1] thumbor: refuse to generate SVGs larger than 4096px [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212159 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan) [14:48:14] jouncebot: nowandnext [14:48:14] For the next 0 hour(s) and 11 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1400) [14:48:14] In 0 hour(s) and 41 minute(s): xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1530) [14:49:16] !log installing expat security updates [14:49:19] (03PS1) 10Esanders: DiscussionTools: cleanup unused config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212161 [14:49:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85898 and previous config saved to /var/cache/conftool/dbconfig/20251127-145024-marostegui.json [14:50:32] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [14:50:41] (03CR) 10Hnowlan: [C:03+2] thumbor: refuse to generate SVGs larger than 4096px [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212159 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan) [14:50:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance [14:50:48] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance [14:50:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1238 (T410531)', diff saved to https://phabricator.wikimedia.org/P85899 and previous config saved to /var/cache/conftool/dbconfig/20251127-145048-marostegui.json [14:52:04] (03CR) 10Tiziano Fogli: [C:03+2] ssh: FIDO key for Tiziano Fogli [puppet] - 10https://gerrit.wikimedia.org/r/1212147 (https://phabricator.wikimedia.org/T411167) (owner: 10Tiziano Fogli) [14:52:35] (03Merged) 10jenkins-bot: thumbor: refuse to generate SVGs larger than 4096px [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212159 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan) [14:53:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance [14:53:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2206 (T410531)', diff saved to https://phabricator.wikimedia.org/P85900 and previous config saved to /var/cache/conftool/dbconfig/20251127-145307-marostegui.json [14:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:57:32] !log hnowlan@deploy2002 helmfile [staging] START helmfile.d/services/thumbor: apply [14:57:44] !log hnowlan@deploy2002 helmfile [staging] DONE helmfile.d/services/thumbor: apply [14:58:05] (03PS1) 10Filippo Giunchedi: installserver: fix partmain typo [puppet] - 10https://gerrit.wikimedia.org/r/1212164 (https://phabricator.wikimedia.org/T406795) [15:01:17] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1212164 (https://phabricator.wikimedia.org/T406795) (owner: 10Filippo Giunchedi) [15:07:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T410531)', diff saved to https://phabricator.wikimedia.org/P85901 and previous config saved to /var/cache/conftool/dbconfig/20251127-150701-marostegui.json [15:07:07] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [15:10:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T410531)', diff saved to https://phabricator.wikimedia.org/P85902 and previous config saved to /var/cache/conftool/dbconfig/20251127-151012-marostegui.json [15:20:33] (03PS1) 10Muehlenhoff: Add Guillaume as approver for two more analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/1212168 (https://phabricator.wikimedia.org/T276465) [15:21:01] (03CR) 10Dpogorzelski: [C:03+1] changeprop: enable remaining pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212148 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou) [15:21:04] (03PS2) 10Muehlenhoff: Add Guillaume as approver for two more analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/1212168 (https://phabricator.wikimedia.org/T276465) [15:21:30] (03CR) 10AikoChou: [C:03+2] changeprop: enable remaining pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212148 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou) [15:21:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85903 and previous config saved to /var/cache/conftool/dbconfig/20251127-152136-marostegui.json [15:21:41] (03CR) 10Ssingh: [C:03+2] sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins) [15:21:44] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [15:21:44] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [15:22:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P85904 and previous config saved to /var/cache/conftool/dbconfig/20251127-152208-marostegui.json [15:23:20] (03Merged) 10jenkins-bot: changeprop: enable remaining pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212148 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou) [15:24:02] !log dpogorzelski@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop: sync [15:24:31] !log dpogorzelski@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync [15:24:34] (03CR) 10Filippo Giunchedi: [C:03+2] installserver: fix partmain typo [puppet] - 10https://gerrit.wikimedia.org/r/1212164 (https://phabricator.wikimedia.org/T406795) (owner: 10Filippo Giunchedi) [15:24:45] !log dpogorzelski@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop: sync [15:25:08] !log dpogorzelski@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop: sync [15:25:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P85905 and previous config saved to /var/cache/conftool/dbconfig/20251127-152519-marostegui.json [15:27:34] (03CR) 10Elukey: [C:03+1] sre.ganeti.reboot-vm: Use skip_acked=True [cookbooks] - 10https://gerrit.wikimedia.org/r/1203483 (https://phabricator.wikimedia.org/T330136) (owner: 10Muehlenhoff) [15:28:19] (03Merged) 10jenkins-bot: sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins) [15:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:29:42] (03CR) 10Muehlenhoff: [C:03+2] sre.ganeti.reboot-vm: Use skip_acked=True [cookbooks] - 10https://gerrit.wikimedia.org/r/1203483 (https://phabricator.wikimedia.org/T330136) (owner: 10Muehlenhoff) [15:30:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1530) [15:30:19] (03PS1) 10Gehel: query_service: alert on high number of JVM thread [alerts] - 10https://gerrit.wikimedia.org/r/1212170 (https://phabricator.wikimedia.org/T389859) [15:30:58] 10SRE-tools, 06Infrastructure-Foundations: Decide which cookbooks using icinga_hosts.wait_for_optimal() should use skip_acked=True - https://phabricator.wikimedia.org/T330136#11413437 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff I think all the relevant cookbooks have this enabled no... [15:31:06] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bookworm [15:32:47] !log installing libarchive security updates [15:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:21] !log sukhe@cumin1003 START - Cookbook sre.loadbalancer.admin rebooting P{lvs4010.ulsfo.wmnet} and A:liberica [15:34:47] (03Abandoned) 10Ssingh: Revert "conftool-data: proxoid: remove urldownloader machines" [puppet] - 10https://gerrit.wikimedia.org/r/1194984 (owner: 10Ssingh) [15:36:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P85906 and previous config saved to /var/cache/conftool/dbconfig/20251127-153644-marostegui.json [15:37:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P85907 and previous config saved to /var/cache/conftool/dbconfig/20251127-153715-marostegui.json [15:39:10] RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [15:39:24] 06SRE, 06Infrastructure-Foundations, 06serviceops, 13Patch-For-Review: etcd in codfw burned all latency SLO error budget - https://phabricator.wikimedia.org/T345738#11413482 (10akosiaris) 05Open→03Resolved a:03akosiaris Resolving per last comment. 2 year old task anyway. [15:39:35] (03CR) 10Clément Goubert: "LGTM other than narrowing down the team label." [alerts] - 10https://gerrit.wikimedia.org/r/1212113 (https://phabricator.wikimedia.org/T410552) (owner: 10Blake) [15:39:38] !log sukhe@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs4010.ulsfo.wmnet} and A:liberica [15:40:01] (03Abandoned) 10Alexandros Kosiaris: configcluster: Disable cadvisor in codfw [puppet] - 10https://gerrit.wikimedia.org/r/955583 (https://phabricator.wikimedia.org/T345738) (owner: 10Alexandros Kosiaris) [15:40:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P85908 and previous config saved to /var/cache/conftool/dbconfig/20251127-154027-marostegui.json [15:41:53] (03CR) 10Ssingh: [C:03+2] "This didn't work and it makes sense why. Didn't we put the class variable in the wrong place? It should go under SREBatchRunnerBase" [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins) [15:42:25] RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:42:52] (03PS6) 10Blake: alerting: Add an alert for when Kafka brokers need a rolling restart. [alerts] - 10https://gerrit.wikimedia.org/r/1212113 [15:43:21] (03CR) 10Blake: alerting: Add an alert for when Kafka brokers need a rolling restart. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1212113 (owner: 10Blake) [15:43:54] (03CR) 10Clément Goubert: [C:03+1] "LGTM" [alerts] - 10https://gerrit.wikimedia.org/r/1212113 (owner: 10Blake) [15:47:30] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage [15:51:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P85910 and previous config saved to /var/cache/conftool/dbconfig/20251127-155151-marostegui.json [15:51:57] (03PS2) 10Marco Fossati: ReaderExperiments' StickyHeaders stream configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212134 (https://phabricator.wikimedia.org/T410533) [15:52:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T410531)', diff saved to https://phabricator.wikimedia.org/P85911 and previous config saved to /var/cache/conftool/dbconfig/20251127-155223-marostegui.json [15:52:29] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [15:52:39] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance [15:52:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1241 (T410531)', diff saved to https://phabricator.wikimedia.org/P85912 and previous config saved to /var/cache/conftool/dbconfig/20251127-155246-marostegui.json [15:53:54] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage [15:55:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T410531)', diff saved to https://phabricator.wikimedia.org/P85914 and previous config saved to /var/cache/conftool/dbconfig/20251127-155535-marostegui.json [15:55:51] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance [15:55:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2210 (T410531)', diff saved to https://phabricator.wikimedia.org/P85915 and previous config saved to /var/cache/conftool/dbconfig/20251127-155559-marostegui.json [15:57:29] (03PS1) 10Kamila Součková: service::catalog: update hcaptcha-proxy entry [puppet] - 10https://gerrit.wikimedia.org/r/1212179 (https://phabricator.wikimedia.org/T411097) [16:00:05] jnuche and brennen: Time to snap out of that daydream and deploy Train log triage. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1600). [16:00:20] (03CR) 10Ssingh: service::catalog: update hcaptcha-proxy entry (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1212179 (https://phabricator.wikimedia.org/T411097) (owner: 10Kamila Součková) [16:06:13] (03CR) 10Fabfur: [C:03+2] Revert "cache::text: enable unid and browser flags rate limits in magru" [puppet] - 10https://gerrit.wikimedia.org/r/1212095 (owner: 10Fabfur) [16:06:18] (03CR) 10Fabfur: [C:03+2] Revert "cache::text: enable unid and browser flags rate limits in drmrs" [puppet] - 10https://gerrit.wikimedia.org/r/1212096 (owner: 10Fabfur) [16:06:28] jnuche: it's ok, the Us has us on holiday today 9Thur) and tomorrow so I declined it earleir. Good luck with the train! [16:07:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85918 and previous config saved to /var/cache/conftool/dbconfig/20251127-160659-marostegui.json [16:07:07] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [16:07:07] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [16:07:16] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance [16:07:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1173 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85919 and previous config saved to /var/cache/conftool/dbconfig/20251127-160723-marostegui.json [16:08:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T410531)', diff saved to https://phabricator.wikimedia.org/P85920 and previous config saved to /var/cache/conftool/dbconfig/20251127-160852-marostegui.json [16:08:54] !log installing unbound security updates [16:08:57] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [16:09:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T410531)', diff saved to https://phabricator.wikimedia.org/P85921 and previous config saved to /var/cache/conftool/dbconfig/20251127-161240-marostegui.json [16:15:32] 06SRE, 10SRE-Access-Requests: Yubikey-SSH-FIDO for Tiziano Fogli (tappof) - https://phabricator.wikimedia.org/T411167#11413661 (10tappof) 05Open→03Resolved [16:16:03] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bookworm [16:17:26] !log upgrade Envoy on chartmuseum* T405808 [16:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:31] T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808 [16:22:35] (03PS1) 10Majavah: P:toolforge::prometheus: Collect metrics for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1212186 (https://phabricator.wikimedia.org/T399313) [16:23:08] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215) - https://phabricator.wikimedia.org/T411203 (10cmooney) 03NEW p:05Triage→03Low [16:24:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P85922 and previous config saved to /var/cache/conftool/dbconfig/20251127-162359-marostegui.json [16:26:36] (03PS1) 10Fabfur: cache::text: revert rate_limiting_flags in drmrs and magru [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) [16:27:02] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [16:27:47] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215) - https://phabricator.wikimedia.org/T411203#11413739 (10cmooney) [16:27:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P85923 and previous config saved to /var/cache/conftool/dbconfig/20251127-162748-marostegui.json [16:30:59] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:33:54] (03PS2) 10Fabfur: cache::text: revert rate_limiting_flags in drmrs and magru [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) [16:33:59] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:34:00] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance [16:36:27] 07sre-alert-triage, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), 07Essential-Work: Alert in need of triage: Dell PowerEdge or Supermicro Broadcom RAID Controller (instance an-worker1187) - https://phabricator.wikimedia.org/T405217#11413762 (10Gehel) And now 6 hosts in error {F70689260} [16:39:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P85924 and previous config saved to /var/cache/conftool/dbconfig/20251127-163907-marostegui.json [16:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [16:41:40] (03CR) 10Giuseppe Lavagetto: [C:03+1] cache::text: revert rate_limiting_flags in drmrs and magru [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:42:41] (03CR) 10Fabfur: [C:03+2] cache::text: revert rate_limiting_flags in drmrs and magru [puppet] - 10https://gerrit.wikimedia.org/r/1212187 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:42:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P85925 and previous config saved to /var/cache/conftool/dbconfig/20251127-164255-marostegui.json [16:47:12] (03PS1) 10Hnowlan: thumbor: limit SVGs based on original file format, not output [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1212191 (https://phabricator.wikimedia.org/T411076) [16:54:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T410531)', diff saved to https://phabricator.wikimedia.org/P85926 and previous config saved to /var/cache/conftool/dbconfig/20251127-165414-marostegui.json [16:54:21] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [16:54:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance [16:54:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1242 (T410531)', diff saved to https://phabricator.wikimedia.org/P85927 and previous config saved to /var/cache/conftool/dbconfig/20251127-165438-marostegui.json [16:58:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T410531)', diff saved to https://phabricator.wikimedia.org/P85928 and previous config saved to /var/cache/conftool/dbconfig/20251127-165803-marostegui.json [16:58:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance [16:58:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2219 (T410531)', diff saved to https://phabricator.wikimedia.org/P85929 and previous config saved to /var/cache/conftool/dbconfig/20251127-165827-marostegui.json [16:59:32] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212134 (https://phabricator.wikimedia.org/T410533) (owner: 10Marco Fossati) [16:59:53] (03PS1) 10Hnowlan: Revert "thumbor: refuse to generate SVGs larger than 4096px" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212192 [17:00:30] (03PS2) 10Hnowlan: Revert "thumbor: refuse to generate SVGs larger than 4096px" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212192 [17:08:37] (03PS11) 10Matthieulec: [WIP] Adding a --rack flag to pool-depool-node cookbook for more intuitive operations, and more validations to avoid mistakes [cookbooks] - 10https://gerrit.wikimedia.org/r/1212089 [17:08:41] (03CR) 10Volans: [C:03+1] "Thanks, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1212186 (https://phabricator.wikimedia.org/T399313) (owner: 10Majavah) [17:09:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85930 and previous config saved to /var/cache/conftool/dbconfig/20251127-170927-marostegui.json [17:09:34] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [17:09:35] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [17:10:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T410531)', diff saved to https://phabricator.wikimedia.org/P85931 and previous config saved to /var/cache/conftool/dbconfig/20251127-171052-marostegui.json [17:10:59] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [17:13:33] (03PS3) 10Daniel Kinzler: api-gateway: add lua tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212107 [17:14:31] (03PS4) 10Daniel Kinzler: api-gateway: add lua tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212107 [17:14:48] (03PS5) 10Daniel Kinzler: api-gateway: add lua tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212107 [17:15:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T410531)', diff saved to https://phabricator.wikimedia.org/P85932 and previous config saved to /var/cache/conftool/dbconfig/20251127-171530-marostegui.json [17:16:23] (03CR) 10CI reject: [V:04-1] api-gateway: add lua tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212107 (owner: 10Daniel Kinzler) [17:24:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P85933 and previous config saved to /var/cache/conftool/dbconfig/20251127-172434-marostegui.json [17:25:59] (03CR) 10Hnowlan: "Looks pretty good! Some minor nits/style comments but logic makes sense to me." [cookbooks] - 10https://gerrit.wikimedia.org/r/1212089 (owner: 10Matthieulec) [17:26:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P85934 and previous config saved to /var/cache/conftool/dbconfig/20251127-172559-marostegui.json [17:30:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P85935 and previous config saved to /var/cache/conftool/dbconfig/20251127-173038-marostegui.json [17:39:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P85936 and previous config saved to /var/cache/conftool/dbconfig/20251127-173942-marostegui.json [17:41:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P85937 and previous config saved to /var/cache/conftool/dbconfig/20251127-174107-marostegui.json [17:44:34] (03CR) 10Effie Mouzeli: [WIP] Adding a --rack flag to pool-depool-node cookbook for more intuitive operations, and more validations to avoid mistakes (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1212089 (owner: 10Matthieulec) [17:45:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P85938 and previous config saved to /var/cache/conftool/dbconfig/20251127-174545-marostegui.json [17:47:35] (03CR) 10Hnowlan: [C:03+2] Revert "thumbor: refuse to generate SVGs larger than 4096px" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212192 (owner: 10Hnowlan) [17:48:38] PROBLEM - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1191 is CRITICAL: communication: 0 OK : controller: 1 Needs Attention : physical_disk: 2 Failed : virtual_disk: 1 OfLn : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021 https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring [17:48:40] ACKNOWLEDGEMENT - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1191 is CRITICAL: communication: 0 OK : controller: 1 Needs Attention : physical_disk: 2 Failed : virtual_disk: 1 OfLn : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T411209 https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring [17:48:46] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1191 - https://phabricator.wikimedia.org/T411209 (10ops-monitoring-bot) 03NEW [17:49:27] (03Merged) 10jenkins-bot: Revert "thumbor: refuse to generate SVGs larger than 4096px" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212192 (owner: 10Hnowlan) [17:54:50] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85939 and previous config saved to /var/cache/conftool/dbconfig/20251127-175449-marostegui.json [17:54:57] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [17:54:58] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [17:55:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance [17:55:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1180 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85940 and previous config saved to /var/cache/conftool/dbconfig/20251127-175513-marostegui.json [17:56:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T410531)', diff saved to https://phabricator.wikimedia.org/P85941 and previous config saved to /var/cache/conftool/dbconfig/20251127-175615-marostegui.json [17:56:21] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [17:56:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance [17:56:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1243 (T410531)', diff saved to https://phabricator.wikimedia.org/P85942 and previous config saved to /var/cache/conftool/dbconfig/20251127-175638-marostegui.json [17:57:41] (03PS1) 10DDesouza: Deploy 2025 Global Readers Survey (non-enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) [17:58:00] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210729 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [17:58:32] (03PS2) 10DDesouza: Undeploy 2025 Global Readers Survey on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211855 (https://phabricator.wikimedia.org/T410696) [17:58:32] (03CR) 10CI reject: [V:04-1] Deploy 2025 Global Readers Survey (non-enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [18:00:05] bd808: May I have your attention please! Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1800) [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1800) [18:00:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T410531)', diff saved to https://phabricator.wikimedia.org/P85943 and previous config saved to /var/cache/conftool/dbconfig/20251127-180053-marostegui.json [18:01:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2236.codfw.wmnet with reason: Maintenance [18:01:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2236 (T410531)', diff saved to https://phabricator.wikimedia.org/P85944 and previous config saved to /var/cache/conftool/dbconfig/20251127-180116-marostegui.json [18:01:22] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [18:02:04] (03PS2) 10DDesouza: Deploy 2025 Global Readers Survey (non-enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) [18:04:21] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [18:12:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T410531)', diff saved to https://phabricator.wikimedia.org/P85945 and previous config saved to /var/cache/conftool/dbconfig/20251127-181205-marostegui.json [18:12:12] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [18:17:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T410531)', diff saved to https://phabricator.wikimedia.org/P85946 and previous config saved to /var/cache/conftool/dbconfig/20251127-181751-marostegui.json [18:17:58] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [18:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [18:27:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P85947 and previous config saved to /var/cache/conftool/dbconfig/20251127-182712-marostegui.json [18:33:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P85948 and previous config saved to /var/cache/conftool/dbconfig/20251127-183259-marostegui.json [18:42:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P85949 and previous config saved to /var/cache/conftool/dbconfig/20251127-184220-marostegui.json [18:48:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P85950 and previous config saved to /var/cache/conftool/dbconfig/20251127-184806-marostegui.json [18:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:56:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85951 and previous config saved to /var/cache/conftool/dbconfig/20251127-185615-marostegui.json [18:56:22] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [18:56:23] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [18:57:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T410531)', diff saved to https://phabricator.wikimedia.org/P85952 and previous config saved to /var/cache/conftool/dbconfig/20251127-185727-marostegui.json [18:57:34] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [18:57:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1244.eqiad.wmnet with reason: Maintenance [18:57:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1244 (T410531)', diff saved to https://phabricator.wikimedia.org/P85953 and previous config saved to /var/cache/conftool/dbconfig/20251127-185751-marostegui.json [19:00:04] jnuche and brennen: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T1900). [19:03:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T410531)', diff saved to https://phabricator.wikimedia.org/P85954 and previous config saved to /var/cache/conftool/dbconfig/20251127-190314-marostegui.json [19:03:20] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [19:03:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance [19:03:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2237 (T410531)', diff saved to https://phabricator.wikimedia.org/P85955 and previous config saved to /var/cache/conftool/dbconfig/20251127-190338-marostegui.json [19:11:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P85956 and previous config saved to /var/cache/conftool/dbconfig/20251127-191122-marostegui.json [19:14:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T410531)', diff saved to https://phabricator.wikimedia.org/P85957 and previous config saved to /var/cache/conftool/dbconfig/20251127-191402-marostegui.json [19:14:08] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [19:19:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T410531)', diff saved to https://phabricator.wikimedia.org/P85958 and previous config saved to /var/cache/conftool/dbconfig/20251127-191939-marostegui.json [19:19:45] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [19:26:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P85959 and previous config saved to /var/cache/conftool/dbconfig/20251127-192630-marostegui.json [19:29:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P85960 and previous config saved to /var/cache/conftool/dbconfig/20251127-192909-marostegui.json [19:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:34:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P85961 and previous config saved to /var/cache/conftool/dbconfig/20251127-193446-marostegui.json [19:41:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P85962 and previous config saved to /var/cache/conftool/dbconfig/20251127-194137-marostegui.json [19:41:43] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance [19:41:45] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [19:41:46] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [19:41:48] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance [19:44:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P85963 and previous config saved to /var/cache/conftool/dbconfig/20251127-194417-marostegui.json [19:44:53] 06SRE, 10LDAP-Access-Requests: Access to logstash for OKryva-WMF - https://phabricator.wikimedia.org/T410115#11414122 (10MoritzMuehlenhoff) 05Stalled→03Resolved Access has been granted via Wikimedia IDM on Nov. 26, 2025, 4:38 p.m. Marking this task as resolved [19:49:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P85964 and previous config saved to /var/cache/conftool/dbconfig/20251127-194954-marostegui.json [19:59:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T410531)', diff saved to https://phabricator.wikimedia.org/P85965 and previous config saved to /var/cache/conftool/dbconfig/20251127-195925-marostegui.json [19:59:31] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [19:59:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance [20:05:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T410531)', diff saved to https://phabricator.wikimedia.org/P85966 and previous config saved to /var/cache/conftool/dbconfig/20251127-200502-marostegui.json [20:05:09] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [20:05:19] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance [20:14:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance [20:14:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1247 (T410531)', diff saved to https://phabricator.wikimedia.org/P85967 and previous config saved to /var/cache/conftool/dbconfig/20251127-201415-marostegui.json [20:14:22] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [20:14:37] (03PS16) 10Daniel Kinzler: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: 10Pmiazga) [20:19:52] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance [20:19:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2240 (T410531)', diff saved to https://phabricator.wikimedia.org/P85968 and previous config saved to /var/cache/conftool/dbconfig/20251127-201958-marostegui.json [20:20:05] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [20:22:06] (03PS6) 10Daniel Kinzler: api-gateway: add lua tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212107 [20:22:24] (03PS17) 10Daniel Kinzler: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: 10Pmiazga) [20:27:02] FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [20:30:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T410531)', diff saved to https://phabricator.wikimedia.org/P85969 and previous config saved to /var/cache/conftool/dbconfig/20251127-203024-marostegui.json [20:30:31] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [20:31:01] (03PS1) 10Daniel Kinzler: rest-gateway: add prefix to all user IDs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1212239 [20:33:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [20:35:20] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [20:38:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2240 (T410531)', diff saved to https://phabricator.wikimedia.org/P85970 and previous config saved to /var/cache/conftool/dbconfig/20251127-203759-marostegui.json [20:38:06] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [20:39:37] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [20:43:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [20:45:20] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [20:45:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P85971 and previous config saved to /var/cache/conftool/dbconfig/20251127-204532-marostegui.json [20:48:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [20:51:20] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [20:53:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P85972 and previous config saved to /var/cache/conftool/dbconfig/20251127-205307-marostegui.json [20:58:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: It is that lovely time of the day again! You are hereby commanded to deploy UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T2100). [21:00:05] danisztls: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:19] o/ [21:00:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P85973 and previous config saved to /var/cache/conftool/dbconfig/20251127-210039-marostegui.json [21:01:20] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [21:04:09] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211855 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:04:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210729 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:04:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:04:59] (03Merged) 10jenkins-bot: Undeploy 2025 Global Readers Survey on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211855 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:05:05] (03Merged) 10jenkins-bot: Deploy experiment for 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210729 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:05:07] (03Merged) 10jenkins-bot: Deploy 2025 Global Readers Survey (non-enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212204 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [21:05:26] !log dani@deploy2002 Started scap sync-world: Backport for [[gerrit:1211855|Undeploy 2025 Global Readers Survey on enwiki (T410696)]], [[gerrit:1210729|Deploy experiment for 2025 Global Readers Survey (T410696)]], [[gerrit:1212204|Deploy 2025 Global Readers Survey (non-enwiki) (T410696)]] [21:05:32] T410696: Deploy enwiki edition of 2025 GRS - https://phabricator.wikimedia.org/T410696 [21:07:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:07:29] !log dani@deploy2002 dani: Backport for [[gerrit:1211855|Undeploy 2025 Global Readers Survey on enwiki (T410696)]], [[gerrit:1210729|Deploy experiment for 2025 Global Readers Survey (T410696)]], [[gerrit:1212204|Deploy 2025 Global Readers Survey (non-enwiki) (T410696)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:08:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P85974 and previous config saved to /var/cache/conftool/dbconfig/20251127-210814-marostegui.json [21:10:14] (03PS1) 10E75ti: capirca: python 3.12 deprecates datetime.utcnow() [software/homer] - 10https://gerrit.wikimedia.org/r/1212243 [21:11:28] (03CR) 10Ladsgroup: "I think I confirmed it but maybe done by someone else now?" [puppet] - 10https://gerrit.wikimedia.org/r/1196894 (https://phabricator.wikimedia.org/T406590) (owner: 10Ladsgroup) [21:15:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T410531)', diff saved to https://phabricator.wikimedia.org/P85975 and previous config saved to /var/cache/conftool/dbconfig/20251127-211547-marostegui.json [21:15:53] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [21:16:03] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance [21:16:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1248 (T410531)', diff saved to https://phabricator.wikimedia.org/P85976 and previous config saved to /var/cache/conftool/dbconfig/20251127-211610-marostegui.json [21:16:57] !log dani@deploy2002 dani: Continuing with sync [21:17:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:21:00] !log dani@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211855|Undeploy 2025 Global Readers Survey on enwiki (T410696)]], [[gerrit:1210729|Deploy experiment for 2025 Global Readers Survey (T410696)]], [[gerrit:1212204|Deploy 2025 Global Readers Survey (non-enwiki) (T410696)]] (duration: 15m 34s) [21:21:06] T410696: Deploy enwiki edition of 2025 GRS - https://phabricator.wikimedia.org/T410696 [21:23:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2240 (T410531)', diff saved to https://phabricator.wikimedia.org/P85977 and previous config saved to /var/cache/conftool/dbconfig/20251127-212322-marostegui.json [21:23:28] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [21:23:33] all done [21:23:38] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2245.codfw.wmnet with reason: Maintenance [21:23:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2245 (T410531)', diff saved to https://phabricator.wikimedia.org/P85978 and previous config saved to /var/cache/conftool/dbconfig/20251127-212345-marostegui.json [21:25:45] (03CR) 10CI reject: [V:04-1] capirca: python 3.12 deprecates datetime.utcnow() [software/homer] - 10https://gerrit.wikimedia.org/r/1212243 (owner: 10E75ti) [21:27:04] 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice-archive: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11414318 (10Guycn2) Thank you both. I'm not sure what caused the issue in [[https://web.archive.org/web/20251121062412/en.wikipedia.org/wiki/M... [21:27:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:32:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T410531)', diff saved to https://phabricator.wikimedia.org/P85979 and previous config saved to /var/cache/conftool/dbconfig/20251127-213218-marostegui.json [21:32:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:32:24] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [21:41:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245 (T410531)', diff saved to https://phabricator.wikimedia.org/P85980 and previous config saved to /var/cache/conftool/dbconfig/20251127-214112-marostegui.json [21:41:18] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [21:46:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:46:26] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [21:47:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P85981 and previous config saved to /var/cache/conftool/dbconfig/20251127-214726-marostegui.json [21:51:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [21:51:31] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [21:56:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P85982 and previous config saved to /var/cache/conftool/dbconfig/20251127-215619-marostegui.json [22:00:04] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251127T2200) [22:02:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P85983 and previous config saved to /var/cache/conftool/dbconfig/20251127-220235-marostegui.json [22:07:05] (03PS2) 10E75ti: capirca: python 3.12 deprecates datetime.utcnow() [software/homer] - 10https://gerrit.wikimedia.org/r/1212243 [22:11:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P85984 and previous config saved to /var/cache/conftool/dbconfig/20251127-221127-marostegui.json [22:13:16] (03CR) 10E75ti: "recheck" [software/homer] - 10https://gerrit.wikimedia.org/r/1212243 (owner: 10E75ti) [22:17:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T410531)', diff saved to https://phabricator.wikimedia.org/P85985 and previous config saved to /var/cache/conftool/dbconfig/20251127-221742-marostegui.json [22:17:48] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [22:17:59] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance [22:18:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1249 (T410531)', diff saved to https://phabricator.wikimedia.org/P85986 and previous config saved to /var/cache/conftool/dbconfig/20251127-221806-marostegui.json [22:21:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [22:22:20] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:24:37] FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:26:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245 (T410531)', diff saved to https://phabricator.wikimedia.org/P85987 and previous config saved to /var/cache/conftool/dbconfig/20251127-222635-marostegui.json [22:26:41] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [22:26:51] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2246.codfw.wmnet with reason: Maintenance [22:26:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2246 (T410531)', diff saved to https://phabricator.wikimedia.org/P85988 and previous config saved to /var/cache/conftool/dbconfig/20251127-222658-marostegui.json [22:27:20] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:30:20] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:34:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T410531)', diff saved to https://phabricator.wikimedia.org/P85989 and previous config saved to /var/cache/conftool/dbconfig/20251127-223418-marostegui.json [22:34:25] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [22:35:20] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:36:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [22:42:09] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T410589)', diff saved to https://phabricator.wikimedia.org/P85990 and previous config saved to /var/cache/conftool/dbconfig/20251127-224208-ladsgroup.json [22:42:15] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [22:44:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246 (T410531)', diff saved to https://phabricator.wikimedia.org/P85991 and previous config saved to /var/cache/conftool/dbconfig/20251127-224418-marostegui.json [22:44:25] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [22:45:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [22:45:31] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:49:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P85992 and previous config saved to /var/cache/conftool/dbconfig/20251127-224926-marostegui.json [22:50:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [22:50:31] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [22:54:37] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:55:10] FIRING: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [22:57:17] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P85993 and previous config saved to /var/cache/conftool/dbconfig/20251127-225716-ladsgroup.json [22:59:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P85994 and previous config saved to /var/cache/conftool/dbconfig/20251127-225926-marostegui.json [23:00:10] RESOLVED: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [23:01:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:01:26] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [23:04:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P85995 and previous config saved to /var/cache/conftool/dbconfig/20251127-230433-marostegui.json [23:04:57] (03CR) 10Zabe: Deploy experiment for 2025 Global Readers Survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210729 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [23:06:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:06:31] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [23:12:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:12:24] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P85996 and previous config saved to /var/cache/conftool/dbconfig/20251127-231224-ladsgroup.json [23:12:53] (03PS1) 10Zabe: BETA: Fix 'Use of instanceTokenParameterName' deprecation warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212265 (https://phabricator.wikimedia.org/T410696) [23:13:39] (03CR) 10CI reject: [V:04-1] BETA: Fix 'Use of instanceTokenParameterName' deprecation warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212265 (https://phabricator.wikimedia.org/T410696) (owner: 10Zabe) [23:14:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P85997 and previous config saved to /var/cache/conftool/dbconfig/20251127-231433-marostegui.json [23:17:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:19:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T410531)', diff saved to https://phabricator.wikimedia.org/P85998 and previous config saved to /var/cache/conftool/dbconfig/20251127-231941-marostegui.json [23:19:48] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [23:19:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1252.eqiad.wmnet with reason: Maintenance [23:20:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1252 (T410531)', diff saved to https://phabricator.wikimedia.org/P85999 and previous config saved to /var/cache/conftool/dbconfig/20251127-232005-marostegui.json [23:20:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:21:20] FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [23:27:32] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T410589)', diff saved to https://phabricator.wikimedia.org/P86000 and previous config saved to /var/cache/conftool/dbconfig/20251127-232731-ladsgroup.json [23:27:37] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [23:27:48] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance [23:27:56] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1233 (T410589)', diff saved to https://phabricator.wikimedia.org/P86001 and previous config saved to /var/cache/conftool/dbconfig/20251127-232755-ladsgroup.json [23:28:42] (03CR) 10Zabe: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212265 (https://phabricator.wikimedia.org/T410696) (owner: 10Zabe) [23:29:37] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:29:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246 (T410531)', diff saved to https://phabricator.wikimedia.org/P86002 and previous config saved to /var/cache/conftool/dbconfig/20251127-232941-marostegui.json [23:29:47] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [23:29:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2247.codfw.wmnet with reason: Maintenance [23:30:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2247 (T410531)', diff saved to https://phabricator.wikimedia.org/P86003 and previous config saved to /var/cache/conftool/dbconfig/20251127-233005-marostegui.json [23:30:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:31:20] RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh [23:32:01] (03CR) 10Zabe: [C:03+2] BETA: Fix 'Use of instanceTokenParameterName' deprecation warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212265 (https://phabricator.wikimedia.org/T410696) (owner: 10Zabe) [23:32:48] (03Merged) 10jenkins-bot: BETA: Fix 'Use of instanceTokenParameterName' deprecation warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1212265 (https://phabricator.wikimedia.org/T410696) (owner: 10Zabe) [23:34:06] (03CR) 10Zabe: Deploy experiment for 2025 Global Readers Survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210729 (https://phabricator.wikimedia.org/T410696) (owner: 10DDesouza) [23:35:50] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252 (T410531)', diff saved to https://phabricator.wikimedia.org/P86004 and previous config saved to /var/cache/conftool/dbconfig/20251127-233549-marostegui.json [23:35:55] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [23:37:34] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [23:46:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247 (T410531)', diff saved to https://phabricator.wikimedia.org/P86005 and previous config saved to /var/cache/conftool/dbconfig/20251127-234643-marostegui.json [23:46:49] T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531 [23:47:20] FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh [23:50:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P86006 and previous config saved to /var/cache/conftool/dbconfig/20251127-235056-marostegui.json [23:52:20] RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh