[00:04:26] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:07] (03Abandoned) 10Krinkle: Use Request-Timeout header to set jobrunner PHP timeouts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577642 (https://phabricator.wikimedia.org/T247114) (owner: 10Ppchelko) [00:33:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88056 and previous config saved to /var/cache/conftool/dbconfig/20260129-003310-marostegui.json [00:33:16] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [00:40:22] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 [00:40:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [00:48:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88057 and previous config saved to /var/cache/conftool/dbconfig/20260129-004818-marostegui.json [00:53:47] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [01:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88058 and previous config saved to /var/cache/conftool/dbconfig/20260129-010327-marostegui.json [01:10:31] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 [01:10:31] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:18:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88059 and previous config saved to /var/cache/conftool/dbconfig/20260129-011836-marostegui.json [01:18:42] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [01:18:52] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance [01:19:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88060 and previous config saved to /var/cache/conftool/dbconfig/20260129-011900-marostegui.json [01:29:54] PROBLEM - MariaDB Replica Lag: m2 on db1217 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2231.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:31:54] RECOVERY - MariaDB Replica Lag: m2 on db1217 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:36:11] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:41:00] PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 647.38 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:44:02] RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 30.65 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:01:00] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:13:44] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 44s) [02:14:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88061 and previous config saved to /var/cache/conftool/dbconfig/20260129-021418-marostegui.json [02:14:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [02:29:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88062 and previous config saved to /var/cache/conftool/dbconfig/20260129-022926-marostegui.json [02:44:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88063 and previous config saved to /var/cache/conftool/dbconfig/20260129-024435-marostegui.json [02:59:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88064 and previous config saved to /var/cache/conftool/dbconfig/20260129-025943-marostegui.json [02:59:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:00:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2229.codfw.wmnet with reason: Maintenance [03:00:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88065 and previous config saved to /var/cache/conftool/dbconfig/20260129-030008-marostegui.json [03:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:39:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:49:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88066 and previous config saved to /var/cache/conftool/dbconfig/20260129-034917-marostegui.json [03:49:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:02:56] (03PS6) 10Ryan Kemper: opensearch-semantic-search: provision namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:02:56] (03PS2) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:04:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88067 and previous config saved to /var/cache/conftool/dbconfig/20260129-040426-marostegui.json [04:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:34] (03PS7) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:08:34] (03PS3) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:08:34] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:08:36] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS8) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:18:37] (03PS4) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:18:38] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:19:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88068 and previous config saved to /var/cache/conftool/dbconfig/20260129-041934-marostegui.json [04:24:08] (03CR) 10Ryan Kemper: "Should be ready for final review now" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [04:34:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88069 and previous config saved to /var/cache/conftool/dbconfig/20260129-043443-marostegui.json [04:34:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:48:06] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl2002.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [04:49:06] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [05:09:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:14:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:18:46] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:21:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:22:36] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:33:14] PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.096e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [05:34:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:42:42] (03CR) 10Ryan Kemper: Replace elasticsearch lib w/ spicerack APIClient (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper) [05:49:05] (03PS8) 10Ryan Kemper: hadoop.reboot-workers: make host override smarter [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) [06:03:30] (03Abandoned) 10Ryan Kemper: wdqs: Add new endpoints to allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1201296 (https://phabricator.wikimedia.org/T407407) (owner: 10Bking) [06:06:37] (03Abandoned) 10Ryan Kemper: flink-kubernetes-operator: change flink download URL [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1008534 (https://phabricator.wikimedia.org/T358879) (owner: 10Bking) [06:16:32] (03PS3) 10Bking: wdqs-categories: enable scrapes for jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) [06:16:50] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:17:14] RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [06:17:18] (03CR) 10Ryan Kemper: "addressed by ps2" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:22:57] (03Abandoned) 10Ryan Kemper: elasticsearch: move to opensearch client [software/spicerack] - 10https://gerrit.wikimedia.org/r/966492 (https://phabricator.wikimedia.org/T345337) (owner: 10David Caro) [06:35:37] (03PS1) 10Marostegui: Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 [06:35:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db2212: After schema change [06:36:41] (03CR) 10Marostegui: [C:03+2] Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 (owner: 10Marostegui) [06:38:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance [06:38:13] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1159.eqiad.wmnet with reason: Maintenance [06:38:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88071 and previous config saved to /var/cache/conftool/dbconfig/20260129-063813-marostegui.json [06:38:21] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [06:38:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88072 and previous config saved to /var/cache/conftool/dbconfig/20260129-063820-marostegui.json [06:52:08] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) [06:52:38] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2229 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234788 (https://phabricator.wikimedia.org/T415862) [06:52:45] (03PS1) 10Gerrit maintenance bot: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1234789 (https://phabricator.wikimedia.org/T415862) [06:55:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1173 with weight 0 T415861', diff saved to https://phabricator.wikimedia.org/P88074 and previous config saved to /var/cache/conftool/dbconfig/20260129-065528-marostegui.json [06:55:38] T415861: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T415861 [06:55:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s6 T415861 [06:56:02] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) (owner: 10Gerrit maintenance bot) [06:57:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1173 to s6 primary T415861', diff saved to https://phabricator.wikimedia.org/P88075 and previous config saved to /var/cache/conftool/dbconfig/20260129-065753-marostegui.json [06:58:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1201 T415861', diff saved to https://phabricator.wikimedia.org/P88076 and previous config saved to /var/cache/conftool/dbconfig/20260129-065838-marostegui.json [06:58:48] !log Starting s6 eqiad failover from db1201 to db1173 - T415861 [06:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1201.eqiad.wmnet with reason: Schema change on db1201 [07:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700) [07:00:04] marostegui, Amir1, and federico3: #bothumor My software never has bugs. It just develops random features. Rise for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700). [07:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:17:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88078 and previous config saved to /var/cache/conftool/dbconfig/20260129-071724-marostegui.json [07:17:31] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:21:13] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2212: After schema change [07:32:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88080 and previous config saved to /var/cache/conftool/dbconfig/20260129-073232-marostegui.json [07:41:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88081 and previous config saved to /var/cache/conftool/dbconfig/20260129-074130-marostegui.json [07:41:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:47:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88082 and previous config saved to /var/cache/conftool/dbconfig/20260129-074742-marostegui.json [07:53:30] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565014 (10KartikMistry) I'm still debugging, and probably best way to check with reverting original memory allocation. Patch is co... [07:54:04] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565016 (10KartikMistry) 05Open→03In progress [07:54:38] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565017 (10KartikMistry) a:03KartikMistry [07:56:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88083 and previous config saved to /var/cache/conftool/dbconfig/20260129-075639-marostegui.json [07:57:31] Hey folks! My name is Charlie and I found my way here from the "Get involved" page on WikiTech. I just moved on from 13 years with Puppet Labs as a tech lead on their support team. It looks like you folks are using Puppet for this and that and I would love to put my experience to work volunteering if there's anything I could help with. [07:58:19] Also, if anyone happens to be in Belgium this weekend for Fosdem or CfgMgmtCamp next week, I would love to say hi! [07:59:08] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [07:59:45] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [07:59:57] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [08:00:05] Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0800). [08:00:05] No Gerrit patches in the queue for this window AFAICS. [08:00:12] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [08:01:37] csharpsteen: you'll probably have more of a chance of not getting lost in noise in #wikimedia-sre [08:02:23] Awesome. Thanks for the pointer! [08:02:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88084 and previous config saved to /var/cache/conftool/dbconfig/20260129-080251-marostegui.json [08:02:58] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:03:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance [08:03:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [08:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88085 and previous config saved to /var/cache/conftool/dbconfig/20260129-080327-marostegui.json [08:04:26] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:11:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88086 and previous config saved to /var/cache/conftool/dbconfig/20260129-081148-marostegui.json [08:26:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88087 and previous config saved to /var/cache/conftool/dbconfig/20260129-082656-marostegui.json [08:27:00] I have a patch to backport [08:27:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:27:14] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance [08:27:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88088 and previous config saved to /var/cache/conftool/dbconfig/20260129-082722-marostegui.json [08:30:37] (03PS1) 10Kosta Harlan: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) [08:30:50] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:31:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:32:54] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:35:16] (03Merged) 10jenkins-bot: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:36:33] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] [08:36:38] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:38:55] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:40:04] !log kharlan@deploy2002 kharlan: Continuing with sync [08:42:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88089 and previous config saved to /var/cache/conftool/dbconfig/20260129-084216-marostegui.json [08:42:22] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:44:18] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] (duration: 07m 45s) [08:44:23] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:57:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88091 and previous config saved to /var/cache/conftool/dbconfig/20260129-085724-marostegui.json [09:00:05] brennen and andre: Time to do the MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0900). [09:00:25] nah [09:05:53] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for trueg - https://phabricator.wikimedia.org/T415632#11565160 (10DSantamaria) Approved! [09:06:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88092 and previous config saved to /var/cache/conftool/dbconfig/20260129-090628-marostegui.json [09:06:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:12:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88093 and previous config saved to /var/cache/conftool/dbconfig/20260129-091232-marostegui.json [09:21:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88094 and previous config saved to /var/cache/conftool/dbconfig/20260129-092135-marostegui.json [09:27:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88095 and previous config saved to /var/cache/conftool/dbconfig/20260129-092741-marostegui.json [09:27:50] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:27:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance [09:28:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88096 and previous config saved to /var/cache/conftool/dbconfig/20260129-092806-marostegui.json [09:30:05] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 [09:32:11] (03PS2) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [09:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:36:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88097 and previous config saved to /var/cache/conftool/dbconfig/20260129-093644-marostegui.json [09:41:16] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1193 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234935 (https://phabricator.wikimedia.org/T415879) [09:42:46] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234940 [09:51:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88098 and previous config saved to /var/cache/conftool/dbconfig/20260129-095151-marostegui.json [09:51:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:52:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance [09:52:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88099 and previous config saved to /var/cache/conftool/dbconfig/20260129-095216-marostegui.json [10:01:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88100 and previous config saved to /var/cache/conftool/dbconfig/20260129-100158-marostegui.json [10:02:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:02:30] hi everyone! can someone help me with this problem https://phabricator.wikimedia.org/T415876 ? on it.wiki the recent deploy broken module/templates that handles datetime triggered by a fault localization on translatewiki. the changes were reverted but we don't want to wait another week to fix the problem. thanks! [10:17:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88101 and previous config saved to /var/cache/conftool/dbconfig/20260129-101706-marostegui.json [10:17:22] (03PS3) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:18:06] (03PS4) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:28:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88103 and previous config saved to /var/cache/conftool/dbconfig/20260129-102834-marostegui.json [10:28:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:32:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88104 and previous config saved to /var/cache/conftool/dbconfig/20260129-103215-marostegui.json [10:43:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88105 and previous config saved to /var/cache/conftool/dbconfig/20260129-104343-marostegui.json [10:47:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88106 and previous config saved to /var/cache/conftool/dbconfig/20260129-104723-marostegui.json [10:47:33] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:47:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance [10:47:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88107 and previous config saved to /var/cache/conftool/dbconfig/20260129-104748-marostegui.json [10:58:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88108 and previous config saved to /var/cache/conftool/dbconfig/20260129-105851-marostegui.json [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1100) [11:01:17] FIRING: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:04:41] !log root@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None [11:14:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88109 and previous config saved to /var/cache/conftool/dbconfig/20260129-111359-marostegui.json [11:14:06] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:14:17] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance [11:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:21:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88110 and previous config saved to /var/cache/conftool/dbconfig/20260129-112137-marostegui.json [11:21:45] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:24:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88111 and previous config saved to /var/cache/conftool/dbconfig/20260129-112437-marostegui.json [11:24:47] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [11:24:47] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [11:34:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88112 and previous config saved to /var/cache/conftool/dbconfig/20260129-113446-marostegui.json [11:36:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P88113 and previous config saved to /var/cache/conftool/dbconfig/20260129-113645-marostegui.json