Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 1067 items:

2026-01-15 00:00:47 <wikibugs> ('Merged) ''jenkins-bot: Start reading from il_target_id on testwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226965 (https://phabricator.wikimedia.org/T413669) (owner: ''Zabe)'
2026-01-15 00:01:23 <logmsgbot> !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1226965|Start reading from il_target_id on testwiki (T413669)]]
2026-01-15 00:01:29 <stashbot> T413669: Set imagelinks migration to read new - https://phabricator.wikimedia.org/T413669
2026-01-15 00:03:30 <logmsgbot> !log zabe@deploy2002 zabe: Backport for [[gerrit:1226965|Start reading from il_target_id on testwiki (T413669)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 00:05:30 <logmsgbot> !log zabe@deploy2002 zabe: Continuing with sync
2026-01-15 00:09:36 <logmsgbot> !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1226965|Start reading from il_target_id on testwiki (T413669)]] (duration: 08m 13s)
2026-01-15 00:09:41 <stashbot> T413669: Set imagelinks migration to read new - https://phabricator.wikimedia.org/T413669
2026-01-15 00:14:45 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11523618 (''Papaul) Phase 1 of ULSFO migration which was changing the loopback addresses of cr1,cr4 ,mr1 and the IP address of the link between cr3 and cr4 was...'
2026-01-15 00:23:57 <icinga-wm> PROBLEM - Host an-worker1159 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 00:23:57 <icinga-wm> PROBLEM - Host an-worker1160 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 00:38:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 00:41:16 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1226973'
2026-01-15 00:41:16 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1226973 (owner: ''TrainBranchBot)'
2026-01-15 00:50:20 <wikibugs> ('PS1) ''Sbisson: CX3 Build 1.0.0+20260114 [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226976 (https://phabricator.wikimedia.org/T413646)'
2026-01-15 00:50:43 <wikibugs> ('PS1) ''Sbisson: Fallback to source title if target title is not provided by cxserver [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226977 (https://phabricator.wikimedia.org/T414558)'
2026-01-15 00:51:41 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226976 (https://phabricator.wikimedia.org/T413646) (owner: ''Sbisson)'
2026-01-15 00:52:08 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226977 (https://phabricator.wikimedia.org/T414558) (owner: ''Sbisson)'
2026-01-15 00:54:26 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1226973 (owner: ''TrainBranchBot)'
2026-01-15 00:56:59 <logmsgbot> ryankemper@cumin2002 reboot-workers (PID 2845277) is awaiting input
2026-01-15 00:57:44 <logmsgbot> !log ryankemper@cumin2002 END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
2026-01-15 01:00:49 <logmsgbot> !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
2026-01-15 01:10:46 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1226980'
2026-01-15 01:10:47 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1226980 (owner: ''TrainBranchBot)'
2026-01-15 01:13:47 <logmsgbot> !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 57s)
2026-01-15 01:18:39 <wikibugs> ('PS1) ''Jdrewniak: Update portals submodule for WP25 birthday preview. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226981 (https://phabricator.wikimedia.org/T128546)'
2026-01-15 01:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 01:27:28 <wikibugs> ('Abandoned) ''Jdrewniak: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226477 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 01:33:20 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1226980 (owner: ''TrainBranchBot)'
2026-01-15 01:41:45 <jinxer-wm> FIRING: [4x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-sshlb6_29418 has 2 unhealthy realservers pooled on lvs7001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
2026-01-15 02:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 03:40:12 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11523794 (''Papaul)'
2026-01-15 04:00:37 <wikibugs> ('PS1) ''Clare Ming: Enable Test Kitchen on all prod wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227004 (https://phabricator.wikimedia.org/T407806)'
2026-01-15 04:02:17 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T413525)', diff saved to https://phabricator.wikimedia.org/P87525 and previous config saved to /var/cache/conftool/dbconfig/20260115-040216-marostegui.json
2026-01-15 04:02:22 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 04:06:59 <icinga-wm> PROBLEM - Backup freshness on backup1014 is CRITICAL: Stale: 1 (dbprov1004), Fresh: 139 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
2026-01-15 04:12:26 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P87526 and previous config saved to /var/cache/conftool/dbconfig/20260115-041225-marostegui.json
2026-01-15 04:22:35 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P87527 and previous config saved to /var/cache/conftool/dbconfig/20260115-042233-marostegui.json
2026-01-15 04:28:45 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11523834 (''Papaul)'
2026-01-15 04:32:43 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T413525)', diff saved to https://phabricator.wikimedia.org/P87528 and previous config saved to /var/cache/conftool/dbconfig/20260115-043242-marostegui.json
2026-01-15 04:32:48 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 04:33:00 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
2026-01-15 04:38:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 05:04:49 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1261 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87529 and previous config saved to /var/cache/conftool/dbconfig/20260115-050448-marostegui.json
2026-01-15 05:04:55 <stashbot> T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163
2026-01-15 05:04:55 <stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164
2026-01-15 05:09:11 <jinxer-wm> FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 05:14:56 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P87530 and previous config saved to /var/cache/conftool/dbconfig/20260115-051455-marostegui.json
2026-01-15 05:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 05:25:04 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P87532 and previous config saved to /var/cache/conftool/dbconfig/20260115-052504-marostegui.json
2026-01-15 05:34:11 <jinxer-wm> RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 05:35:09 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11523872 (''Papaul)'
2026-01-15 05:35:13 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1261 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87533 and previous config saved to /var/cache/conftool/dbconfig/20260115-053512-marostegui.json
2026-01-15 05:35:19 <stashbot> T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163
2026-01-15 05:35:19 <stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164
2026-01-15 05:35:30 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1262.eqiad.wmnet with reason: Maintenance
2026-01-15 05:35:38 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1262 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87534 and previous config saved to /var/cache/conftool/dbconfig/20260115-053537-marostegui.json
2026-01-15 06:28:55 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
2026-01-15 06:29:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1169 (T413525)', diff saved to https://phabricator.wikimedia.org/P87535 and previous config saved to /var/cache/conftool/dbconfig/20260115-062902-marostegui.json
2026-01-15 06:29:07 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 06:30:12 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T413525)', diff saved to https://phabricator.wikimedia.org/P87536 and previous config saved to /var/cache/conftool/dbconfig/20260115-063011-marostegui.json
2026-01-15 06:32:55 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
2026-01-15 06:33:21 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - After schema change
2026-01-15 06:35:25 <wikibugs> ('CR) ''Marostegui: [C:''+1] sre.mysql.newpool: [de]pool various section kinds [cookbooks] - ''https://gerrit.wikimedia.org/r/1215575 (https://phabricator.wikimedia.org/T411573) (owner: ''Federico Ceratto)'
2026-01-15 06:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 06:43:12 <wikibugs> ('PS1) ''Giuseppe Lavagetto: cache::upload: rate-limit rather than blocking bingbot [puppet] - ''https://gerrit.wikimedia.org/r/1227202'
2026-01-15 06:45:13 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11523917 (''Dzahn) a:''ATitkov''Dzahn - site updated to version: 2026-01-14-150341 https://gerrit.wikimedia.org/r/c/operations/deploymen...'
2026-01-15 06:46:01 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11523919 (''Dzahn) ''Open''In progress'
2026-01-15 07:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T0700)
2026-01-15 07:00:05 <jouncebot> marostegui, Amir1, and federico3: gettimeofday() says it's time for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T0700)
2026-01-15 07:01:17 <XioNoX> !log restart snmp and MIB processes on asw1-b12-drmrs - T413181
2026-01-15 07:01:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 07:01:21 <stashbot> T413181: asw1-b12-drmrs stopped reporting metrics - https://phabricator.wikimedia.org/T413181
2026-01-15 07:02:46 <wikibugs> ('CR) ''Dzahn: [C:''+2] Revert "trafficserver: disable wikipedia25" [puppet] - ''https://gerrit.wikimedia.org/r/1224959 (https://phabricator.wikimedia.org/T408592) (owner: ''Dzahn)'
2026-01-15 07:03:43 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1169 gradually with 4 steps - After schema change
2026-01-15 07:06:43 <wikibugs> ('PS1) ''Marostegui: dbproxy2005: Add Debian Trixie note [puppet] - ''https://gerrit.wikimedia.org/r/1227204 (https://phabricator.wikimedia.org/T409398)'
2026-01-15 07:08:55 <wikibugs> ('CR) ''Marostegui: [C:''+2] dbproxy2005: Add Debian Trixie note [puppet] - ''https://gerrit.wikimedia.org/r/1227204 (https://phabricator.wikimedia.org/T409398) (owner: ''Marostegui)'
2026-01-15 07:16:14 <wikibugs> ('CR) ''JMeybohm: [C:''+1] "sgtm" [puppet] - ''https://gerrit.wikimedia.org/r/1226914 (https://phabricator.wikimedia.org/T394476) (owner: ''Elukey)'
2026-01-15 07:18:13 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11523949 (''Dzahn) The site is active: https://www.wikipedia25.org'
2026-01-15 07:25:26 <wikibugs> ('PS1) ''Superpes15: [slwiki] Fix temporary logo for Wikipedia 25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227210 (https://phabricator.wikimedia.org/T414265)'
2026-01-15 07:28:34 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
2026-01-15 07:33:14 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11523959 (''A_smart_kitten) Just a note (apologies if there's a better place to raise this): When I click on any of the 'Transcript' buttons...'
2026-01-15 07:51:59 <wikibugs> ('PS1) ''Muehlenhoff: Record LDAP access for tadeleye [puppet] - ''https://gerrit.wikimedia.org/r/1227214'
2026-01-15 07:53:48 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Record LDAP access for tadeleye [puppet] - ''https://gerrit.wikimedia.org/r/1227214 (owner: ''Muehlenhoff)'
2026-01-15 07:54:36 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
2026-01-15 07:54:44 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2206 (T413525)', diff saved to https://phabricator.wikimedia.org/P87540 and previous config saved to /var/cache/conftool/dbconfig/20260115-075444-marostegui.json
2026-01-15 07:54:49 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 07:54:56 <wikibugs> ('PS2) ''Gergő Tisza: debug: Add X-Provenance header to Logstash [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226903 (https://phabricator.wikimedia.org/T412396)'
2026-01-15 07:55:05 <wikibugs> ('CR) ''CI reject: [V:''-1] debug: Add X-Provenance header to Logstash [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226903 (https://phabricator.wikimedia.org/T412396) (owner: ''Gergő Tisza)'
2026-01-15 07:55:42 <wikibugs> ('PS3) ''Gergő Tisza: debug: Add X-Provenance header to Logstash [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226903 (https://phabricator.wikimedia.org/T412396)'
2026-01-15 07:55:51 <wikibugs> ('CR) ''CI reject: [V:''-1] debug: Add X-Provenance header to Logstash [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226903 (https://phabricator.wikimedia.org/T412396) (owner: ''Gergő Tisza)'
2026-01-15 07:55:54 <wikibugs> ('PS4) ''Gergő Tisza: debug: Add X-Provenance header to Logstash [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226903 (https://phabricator.wikimedia.org/T412396)'
2026-01-15 08:00:52 <hashar> good morning
2026-01-15 08:01:37 <hashar> Superpes: hello, I'll deploy your change
2026-01-15 08:01:47 <Superpes> Hi thanks hashar :)
2026-01-15 08:01:57 <hashar> artemkloko: good morning, I am going to deploy the WP25 change for portals
2026-01-15 08:02:22 <hashar> reads the changes
2026-01-15 08:03:32 <wikibugs> 'SRE, ''Data-Platform-SRE (2026.01.05 - 2026.01.23), ''Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11523973 (''RKemper) Got about 40 `an-worker*` hosts done, but there's still another ~80 left to be done'
2026-01-15 08:04:43 <hashar> I'll start
2026-01-15 08:05:46 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227210 (https://phabricator.wikimedia.org/T414265) (owner: ''Superpes15)'
2026-01-15 08:05:46 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226981 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 08:06:32 <hashar> changes are in the pipe https://integration.wikimedia.org/zuul/#q=mediawiki-config
2026-01-15 08:06:38 <wikibugs> ('Merged) ''jenkins-bot: [slwiki] Fix temporary logo for Wikipedia 25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227210 (https://phabricator.wikimedia.org/T414265) (owner: ''Superpes15)'
2026-01-15 08:06:42 <wikibugs> ('Merged) ''jenkins-bot: Update portals submodule for WP25 birthday preview. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226981 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 08:07:52 <logmsgbot> !log hashar@deploy2002 Started scap sync-world: Backport for [[gerrit:1227210|[slwiki] Fix temporary logo for Wikipedia 25 (T414265)]], [[gerrit:1226981|Update portals submodule for WP25 birthday preview. (T128546)]]
2026-01-15 08:07:57 <stashbot> T414265: Requesting temporary logo change for sl.wikipedia.org (WP25) - https://phabricator.wikimedia.org/T414265
2026-01-15 08:07:57 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 08:10:18 <logmsgbot> !log hashar@deploy2002 hashar, jdrewniak, superpes: Backport for [[gerrit:1227210|[slwiki] Fix temporary logo for Wikipedia 25 (T414265)]], [[gerrit:1226981|Update portals submodule for WP25 birthday preview. (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 08:10:24 <Superpes> Testing!
2026-01-15 08:12:55 <Superpes> Uhm... looks weird! hashar Are you able to quickly test via browser?
2026-01-15 08:13:12 <Superpes> Oh now it looks fine lmao
2026-01-15 08:13:18 <Superpes> Maybe a cache issue?
2026-01-15 08:13:18 <hashar> caches!! :b
2026-01-15 08:13:38 <Superpes> Yep lol It's fine thanks :)
2026-01-15 08:13:47 <hashar> of course I have a wrong link
2026-01-15 08:13:48 <hashar> :b
2026-01-15 08:14:09 <hashar> artemkloko: I have pushed the change for the portal and the orange button points to a link that does not exist :/
2026-01-15 08:14:40 <hashar> I guess cause the wikimediafoundation.org page has not been published
2026-01-15 08:14:43 <wikibugs> ('PS4) ''Dreamy Jazz: Write new for CheckUser user agent table migration on group1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1223674 (https://phabricator.wikimedia.org/T361196)'
2026-01-15 08:14:44 <wikibugs> ('PS4) ''Dreamy Jazz: Write new for CheckUser user agent table migration everywhere [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1223675 (https://phabricator.wikimedia.org/T361196)'
2026-01-15 08:15:21 <hashar> Superpes: great thanks
2026-01-15 08:15:37 <hashar> I'll most probably cancel, revert the portals update change and deploy again
2026-01-15 08:16:22 <logmsgbot> !log hashar@deploy2002 Sync cancelled.
2026-01-15 08:17:58 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11523999 (''Dzahn) @A_smart_kitten Thanks for reporting. The issue is known and currently a fix is being worked on.'
2026-01-15 08:18:44 <wikibugs> ('PS1) ''Hashar: Revert "Update portals submodule for WP25 birthday preview." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227256 (https://phabricator.wikimedia.org/T128546)'
2026-01-15 08:19:24 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1223674 (https://phabricator.wikimedia.org/T361196) (owner: ''Dreamy Jazz)'
2026-01-15 08:20:04 <Dreamy_Jazz> jouncebot: nowandnext
2026-01-15 08:20:05 <jouncebot> For the next 0 hour(s) and 39 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T0800)
2026-01-15 08:20:05 <jouncebot> In 2 hour(s) and 39 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1100)
2026-01-15 08:20:30 <Dreamy_Jazz> I've stopped the +2, waiting for others to finish their changes
2026-01-15 08:21:00 <Dreamy_Jazz> hashar: Could you ping me when you are done?
2026-01-15 08:21:35 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227256 (https://phabricator.wikimedia.org/T128546) (owner: ''Hashar)'
2026-01-15 08:21:37 <hashar> Dreamy_Jazz: sure!
2026-01-15 08:22:23 <wikibugs> ('Merged) ''jenkins-bot: Revert "Update portals submodule for WP25 birthday preview." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227256 (https://phabricator.wikimedia.org/T128546) (owner: ''Hashar)'
2026-01-15 08:22:54 <logmsgbot> !log hashar@deploy2002 Started scap sync-world: Backport for [[gerrit:1227256|Revert "Update portals submodule for WP25 birthday preview." (T128546 T414533)]]
2026-01-15 08:23:00 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 08:23:00 <stashbot> T414533: Update the url of the CTA button for Wikipedia25 portal customisation - https://phabricator.wikimedia.org/T414533
2026-01-15 08:23:43 <wikibugs> ('PS1) ''Hashar: Update portals submodule for WP25 birthday preview [2] [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227258 (https://phabricator.wikimedia.org/T128546)'
2026-01-15 08:25:15 <logmsgbot> !log hashar@deploy2002 hashar: Backport for [[gerrit:1227256|Revert "Update portals submodule for WP25 birthday preview." (T128546 T414533)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 08:25:47 <wikibugs> 'SRE-Sprint-Week-Sustainability-March2023, ''DBA, ''Sustainability (Incident Followup): Improve slow read query handling - https://phabricator.wikimedia.org/T293530#11524041 (''Marostegui) ''Open''Resolved a:''Ladsgroup I think we can consider this done. @Ladsgroup has done lots of work to 1) re...'
2026-01-15 08:25:52 <logmsgbot> !log hashar@deploy2002 hashar: Continuing with sync
2026-01-15 08:28:28 <hashar> Dreamy_Jazz: my changes are syncing
2026-01-15 08:29:21 <Dreamy_Jazz> Thanks
2026-01-15 08:29:52 <wikibugs> 'SRE, ''collaboration-services, ''Wikimedia-Mailing-lists, ''Patch-For-Review: Put lists.wikimedia.org web interface behind LVS - https://phabricator.wikimedia.org/T286066#11524049 (''ABran-WMF)'
2026-01-15 08:29:58 <logmsgbot> !log hashar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227256|Revert "Update portals submodule for WP25 birthday preview." (T128546 T414533)]] (duration: 07m 04s)
2026-01-15 08:30:03 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 08:30:04 <stashbot> T414533: Update the url of the CTA button for Wikipedia25 portal customisation - https://phabricator.wikimedia.org/T414533
2026-01-15 08:30:56 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1223674 (https://phabricator.wikimedia.org/T361196) (owner: ''Dreamy Jazz)'
2026-01-15 08:31:46 <wikibugs> ('CR) ''Vgutierrez: [C:''+1] "VTCs are happy and condition properly matches the intended traffic" [puppet] - ''https://gerrit.wikimedia.org/r/1227202 (owner: ''Giuseppe Lavagetto)'
2026-01-15 08:31:50 <wikibugs> ('Merged) ''jenkins-bot: Write new for CheckUser user agent table migration on group1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1223674 (https://phabricator.wikimedia.org/T361196) (owner: ''Dreamy Jazz)'
2026-01-15 08:32:21 <logmsgbot> !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1223674|Write new for CheckUser user agent table migration on group1 (T361196)]]
2026-01-15 08:32:25 <stashbot> T361196: Write to the cu_useragent table and agent_id columns on WMF wikis - https://phabricator.wikimedia.org/T361196
2026-01-15 08:32:53 <hashar> still running
2026-01-15 08:32:56 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524061 (''Dzahn) Unrelated to the issue reported above, but for the record. We had an initial problem with the bare domain without www be...'
2026-01-15 08:34:27 <phuedx> jouncebot: next
2026-01-15 08:34:27 <jouncebot> In 2 hour(s) and 25 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1100)
2026-01-15 08:34:32 <logmsgbot> !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1223674|Write new for CheckUser user agent table migration on group1 (T361196)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 08:36:17 <logmsgbot> !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
2026-01-15 08:36:40 <Dreamy_Jazz> I hadn't finished testing?
2026-01-15 08:36:48 <hashar> I did
2026-01-15 08:36:56 <Dreamy_Jazz> Okay
2026-01-15 08:36:57 <hashar> I pushed a rollback :b
2026-01-15 08:37:56 <Dreamy_Jazz> Ah, okay
2026-01-15 08:38:07 <hashar> pff
2026-01-15 08:38:12 <hashar> of course the page has been published now
2026-01-15 08:38:17 <wikibugs> ('PS1) ''Dzahn: miscweb: update wikipedia25 image to latest version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227260 (https://phabricator.wikimedia.org/T408592)'
2026-01-15 08:38:19 <phuedx> hashar, Dreamy_Jazz: I'd like to enable the TestKitchen extension everywhere. It looks like we've got a lot of time after the window. If not, I can do it in the afternoon window
2026-01-15 08:38:20 <hashar> so I gotta deploy again
2026-01-15 08:38:30 <Dreamy_Jazz> :D
2026-01-15 08:38:32 <phuedx> Or maybe not :D :D :D
2026-01-15 08:38:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 08:39:09 <hashar> phuedx: has that TestKitchen extension been fixed? It overlapped/clashed with MetricsPlatform :b
2026-01-15 08:39:25 <wikibugs> ('CR) ''Dzahn: [C:''+2] miscweb: update wikipedia25 image to latest version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227260 (https://phabricator.wikimedia.org/T408592) (owner: ''Dzahn)'
2026-01-15 08:39:48 <hashar> (I suspect the code got copy pasted between repos loosing the history but I digress)
2026-01-15 08:39:51 <hashar> anyway yea
2026-01-15 08:39:58 <hashar> but I have to push again that portals update change
2026-01-15 08:40:15 <mutante> same here with updating the birthday page.. in progress
2026-01-15 08:40:15 <logmsgbot> !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1223674|Write new for CheckUser user agent table migration on group1 (T361196)]] (duration: 07m 54s)
2026-01-15 08:40:19 <stashbot> T361196: Write to the cu_useragent table and agent_id columns on WMF wikis - https://phabricator.wikimedia.org/T361196
2026-01-15 08:40:39 <phuedx> hashar: Yes. It's currently enabled on testwiki. I believe the CI issues have been fixed
2026-01-15 08:41:25 <wikibugs> ('Merged) ''jenkins-bot: miscweb: update wikipedia25 image to latest version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227260 (https://phabricator.wikimedia.org/T408592) (owner: ''Dzahn)'
2026-01-15 08:41:46 <hashar> phuedx: great :]
2026-01-15 08:41:51 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227258 (https://phabricator.wikimedia.org/T128546) (owner: ''Hashar)'
2026-01-15 08:42:41 <logmsgbot> !log dzahn@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
2026-01-15 08:42:44 <wikibugs> ('Merged) ''jenkins-bot: Update portals submodule for WP25 birthday preview [2] [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227258 (https://phabricator.wikimedia.org/T128546) (owner: ''Hashar)'
2026-01-15 08:43:02 <logmsgbot> !log dzahn@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
2026-01-15 08:43:15 <logmsgbot> !log hashar@deploy2002 Started scap sync-world: Backport for [[gerrit:1227258|Update portals submodule for WP25 birthday preview [2] (T128546 T414533)]]
2026-01-15 08:43:21 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 08:43:21 <logmsgbot> !log dzahn@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
2026-01-15 08:43:21 <stashbot> T414533: Update the url of the CTA button for Wikipedia25 portal customisation - https://phabricator.wikimedia.org/T414533
2026-01-15 08:43:40 <logmsgbot> !log dzahn@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
2026-01-15 08:44:06 <logmsgbot> !log dzahn@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
2026-01-15 08:44:30 <logmsgbot> !log dzahn@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
2026-01-15 08:44:45 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from observability roles [puppet] - ''https://gerrit.wikimedia.org/r/1226178 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 08:45:30 <logmsgbot> !log hashar@deploy2002 hashar: Backport for [[gerrit:1227258|Update portals submodule for WP25 birthday preview [2] (T128546 T414533)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 08:45:52 <logmsgbot> !log hashar@deploy2002 hashar: Continuing with sync
2026-01-15 08:46:05 <hashar> ah this time the link worked
2026-01-15 08:46:29 <hashar> so that is poor synchronization with me deploying the www.wikipedia.org update before the target page got published by comm
2026-01-15 08:46:31 <hashar> fun times
2026-01-15 08:46:48 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524096 (''Dzahn) deployed latest version 2026-01-15-080024 - @A_smart_kitten is it gone for you too?'
2026-01-15 08:48:11 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524099 (''A_smart_kitten) @dzahn checking just now on the device I used before, the 'Not Found' page was initially cached, but once I refre...'
2026-01-15 08:49:57 <logmsgbot> !log hashar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227258|Update portals submodule for WP25 birthday preview [2] (T128546 T414533)]] (duration: 06m 42s)
2026-01-15 08:50:03 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 08:50:03 <stashbot> T414533: Update the url of the CTA button for Wikipedia25 portal customisation - https://phabricator.wikimedia.org/T414533
2026-01-15 08:50:52 <hashar> lets burst the cache
2026-01-15 08:52:28 <hashar> !log purged portals URLs using: `cat /srv/mediawiki-staging/portals/urls-to-purge.txt | MEDIAWIKI_STAGING_DIR=/srv/mediawiki-staging mwscript purgeList.php` # T414533
2026-01-15 08:52:31 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 08:52:34 <hashar> !log https://www.wikipedia.org/ and click that orange button! # T414533
2026-01-15 08:52:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 08:52:43 <hashar> artemkloko: change is live!
2026-01-15 08:52:52 <hashar> Dreamy_Jazz: phuedx: it is all your
2026-01-15 08:53:01 <hashar> https://www.wikipedia.org/ has been updated
2026-01-15 08:53:44 <Superpes> hashar What about my patch? :)
2026-01-15 08:53:59 <hashar> Superpes: yes it should be live now
2026-01-15 08:54:55 <Superpes> Wonderful! I asked because I didn't check SAL
2026-01-15 08:55:00 <Superpes> Thanks for your assistance :3
2026-01-15 08:55:49 <hashar> Superpes: thank you for the logo fix!
2026-01-15 08:56:45 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from serviceops roles [puppet] - ''https://gerrit.wikimedia.org/r/1227261 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 08:57:44 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524113 (''Dzahn) @A_smart_kitten Yea, that is also what we saw over here. Thanks!:)'
2026-01-15 08:57:46 <phuedx> Hrrm. I think I can see a bug with the TestKitchen config. I'm going to hold off on the deployment until others in my team are online
2026-01-15 08:57:54 <phuedx> hashar: I think you can close the window now
2026-01-15 08:58:54 <Dreamy_Jazz> Thanks for the ping hashar, mine should have been done by that one scap I did
2026-01-15 08:59:05 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524114 (''Dzahn) ''In progress''Resolved We are live - QA happening now.'
2026-01-15 09:04:01 <icinga-wm> PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2026-01-15 09:04:46 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to L3 data access for kimpham (developer name Kim.pham) - https://phabricator.wikimedia.org/T414660 (''kimpham) ''NEW'
2026-01-15 09:05:12 <kostajh> hashar: I have a patch to wmf.11 backport, but I could do it later as well
2026-01-15 09:06:13 <hashar> kostajh: looks like phuedx and Dreamy_Jazz have finished so feel free to deploy
2026-01-15 09:06:23 <hashar> I am off, I have an appointment
2026-01-15 09:06:29 <artemkloko> Hello everyone, is there someone knowledgable of how to deploy the portals?
2026-01-15 09:06:45 <artemkloko> We just deployed a version, but it seems to need a fix
2026-01-15 09:06:51 <wikibugs> ('CR) ''Elukey: [C:''+2] profile::docker_registry: tune the s3 config for /restricted [puppet] - ''https://gerrit.wikimedia.org/r/1226914 (https://phabricator.wikimedia.org/T394476) (owner: ''Elukey)'
2026-01-15 09:07:17 <mutante> artemkloko: hashar just had to go
2026-01-15 09:07:32 <kostajh> thanks
2026-01-15 09:07:45 <kostajh> will start deployment soon
2026-01-15 09:08:04 <mutante> kostajh: would you be able to deploy portal changes like hashar just did?
2026-01-15 09:08:10 <mutante> to help out artemkloko
2026-01-15 09:08:23 <icinga-wm> RECOVERY - Host an-conf1006 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
2026-01-15 09:08:49 <artemkloko> i have a doc that could help kostajh
2026-01-15 09:09:03 <kostajh> artemkloko: sure, I can look at it
2026-01-15 09:09:13 <kostajh> can you share the document with me please?
2026-01-15 09:09:13 <hashar> I think there is an issue in the build step that generate the assets for wikimedia/portals/deploy
2026-01-15 09:09:33 <hashar> there is a Gulp project in wikimedia/portals which is built/invoked by a CI job which build the assets
2026-01-15 09:09:48 <hashar> and some .webm files are not added to the assets dir
2026-01-15 09:10:12 <hashar> they are thus not added when doing a `git commit -A`
2026-01-15 09:10:44 <hashar> it looks like an issue with the `npm run build-all-portals` script from wikimedia/portals
2026-01-15 09:10:58 <hashar> thus I imagine that potentially needs Jan to look into
2026-01-15 09:11:56 <hashar> and the job building the assets is https://integration.wikimedia.org/ci/job/wikimedia-portals-build/ (which result in pubshing a change for the deploy repo at https://gerrit.wikimedia.org/r/q/project:wikimedia/portals/deploy )
2026-01-15 09:12:01 <hashar> so it is not trivial :\
2026-01-15 09:12:02 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from cloud roles [puppet] - ''https://gerrit.wikimedia.org/r/1227264 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 09:12:08 <hashar> I am off for that appointment, I'll be back at 13:30
2026-01-15 09:13:15 <wikibugs> ('PS1) ''Kosta Harlan: WebRequest::getSecurityLogContext: Log if user is a bot [core] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227265 (https://phabricator.wikimedia.org/T395204)'
2026-01-15 09:13:39 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227265 (https://phabricator.wikimedia.org/T395204) (owner: ''Kosta Harlan)'
2026-01-15 09:13:47 <kostajh> artemkloko: which patch are you trying to deploy?
2026-01-15 09:14:01 <icinga-wm> RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2026-01-15 09:16:55 <artemkloko> I am still looking into the bug, have to look into what hashar mentioned
2026-01-15 09:18:47 <wikibugs> ('PS2) ''Dzahn: microsites: monitor wikipedia25.org (WIP) [puppet] - ''https://gerrit.wikimedia.org/r/1224575'
2026-01-15 09:19:13 <wikibugs> ('PS3) ''Dzahn: microsites: monitor wikipedia25.org [puppet] - ''https://gerrit.wikimedia.org/r/1224575'
2026-01-15 09:19:25 <wikibugs> ('CR) ''Dzahn: microsites: monitor wikipedia25.org [puppet] - ''https://gerrit.wikimedia.org/r/1224575 (owner: ''Dzahn)'
2026-01-15 09:22:20 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Yubikey-SSH-FIDO access for dduvall - https://phabricator.wikimedia.org/T414619#11524174 (''JMeybohm) a:''MoritzMuehlenhoff @MoritzMuehlenhoff assigning to you so the next clinic duty person knows you're working on this with Dan, thanks'
2026-01-15 09:22:33 <wikibugs> ('PS4) ''Dzahn: microsites: monitor wikipedia25.org [puppet] - ''https://gerrit.wikimedia.org/r/1224575'
2026-01-15 09:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 09:24:34 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227004 (https://phabricator.wikimedia.org/T407806) (owner: ''Clare Ming)'
2026-01-15 09:24:57 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from search roles [puppet] - ''https://gerrit.wikimedia.org/r/1227270 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 09:25:00 <wikibugs> ('PS13) ''Daniel Kinzler: rest gateway: add tests for chart rendering [deployment-charts] - ''https://gerrit.wikimedia.org/r/1225085'
2026-01-15 09:25:06 <wikibugs> ('CR) ''Daniel Kinzler: rest gateway: add tests for chart rendering (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1225085 (owner: ''Daniel Kinzler)'
2026-01-15 09:26:48 <wikibugs> ('PS2) ''Arnaudb: gerrit: Switchover gerrit1003 → gerrit2003 [puppet] - ''https://gerrit.wikimedia.org/r/1217133 (https://phabricator.wikimedia.org/T338470)'
2026-01-15 09:26:49 <wikibugs> ('PS5) ''Dzahn: microsites: monitor wikipedia25.org [puppet] - ''https://gerrit.wikimedia.org/r/1224575'
2026-01-15 09:27:13 <wikibugs> ('CR) ''Arnaudb: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1217133 (https://phabricator.wikimedia.org/T338470) (owner: ''Arnaudb)'
2026-01-15 09:27:17 <wikibugs> ('Merged) ''jenkins-bot: WebRequest::getSecurityLogContext: Log if user is a bot [core] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227265 (https://phabricator.wikimedia.org/T395204) (owner: ''Kosta Harlan)'
2026-01-15 09:27:47 <logmsgbot> !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1227265|WebRequest::getSecurityLogContext: Log if user is a bot (T395204)]]
2026-01-15 09:27:52 <stashbot> T395204: MediaWiki should log request information (IP, user agent, referrer, HTTP method, etc) in a more uniform and predictable way - https://phabricator.wikimedia.org/T395204
2026-01-15 09:28:52 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11524188 (''fgiunchedi) ''Open''Resolved a:''fgiunchedi All hosts that are not pending decom have been migrated to single uplink, resolving.'
2026-01-15 09:29:53 <logmsgbot> !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1227265|WebRequest::getSecurityLogContext: Log if user is a bot (T395204)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 09:30:12 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Data-Platform-SRE: Grant Access to analytics-privatedata-users for hmonroy - https://phabricator.wikimedia.org/T414375#11524193 (''JMeybohm) >>! In T414375#11523067, @HMonroy wrote: > @JMeybohm Hi! I'm trying a query wmf.mediawiki_history in superset. I'm...'
2026-01-15 09:32:47 <logmsgbot> !log kharlan@deploy2002 kharlan: Continuing with sync
2026-01-15 09:33:13 <wikibugs> ('PS4) ''Daniel Kinzler: rest gateway: implement per-policy shadow mode [deployment-charts] - ''https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183)'
2026-01-15 09:36:51 <logmsgbot> !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227265|WebRequest::getSecurityLogContext: Log if user is a bot (T395204)]] (duration: 09m 04s)
2026-01-15 09:36:55 <stashbot> T395204: MediaWiki should log request information (IP, user agent, referrer, HTTP method, etc) in a more uniform and predictable way - https://phabricator.wikimedia.org/T395204
2026-01-15 09:36:59 <wikibugs> ('CR) ''Dzahn: [C:''+2] microsites: monitor wikipedia25.org [puppet] - ''https://gerrit.wikimedia.org/r/1224575 (owner: ''Dzahn)'
2026-01-15 09:37:56 <wikibugs> ('PS5) ''Daniel Kinzler: rest-gateway: generate retry-after header for rate-limited requests [deployment-charts] - ''https://gerrit.wikimedia.org/r/1224937 (https://phabricator.wikimedia.org/T405636)'
2026-01-15 09:38:21 <wikibugs> ('CR) ''JMeybohm: [C:''+1] "🎉" [puppet] - ''https://gerrit.wikimedia.org/r/1227261 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 09:38:43 <wikibugs> ('CR) ''Daniel Kinzler: rest-gateway: generate retry-after header for rate-limited requests (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1224937 (https://phabricator.wikimedia.org/T405636) (owner: ''Daniel Kinzler)'
2026-01-15 09:39:32 <wikibugs> ('PS2) ''Daniel Kinzler: rest gateway: include a meaningful body with 429 responses [deployment-charts] - ''https://gerrit.wikimedia.org/r/1226827 (https://phabricator.wikimedia.org/T405636)'
2026-01-15 09:39:41 <wikibugs> ('CR) ''Majavah: [C:''+1] Remove profile::puppet::agent::force_puppet7 from cloud roles [puppet] - ''https://gerrit.wikimedia.org/r/1227264 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 09:42:58 <wikibugs> ('PS14) ''Daniel Kinzler: charts: add redioscope chart and service [deployment-charts] - ''https://gerrit.wikimedia.org/r/1207256 (https://phabricator.wikimedia.org/T407999)'
2026-01-15 09:44:05 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Looks good" [puppet] - ''https://gerrit.wikimedia.org/r/1226774 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 09:45:55 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+1] Remove profile::puppet::agent::force_puppet7 from cloud roles [puppet] - ''https://gerrit.wikimedia.org/r/1227264 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 09:46:38 <wikibugs> ('CR) ''Muehlenhoff: "Looks good" [puppet] - ''https://gerrit.wikimedia.org/r/1226775 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 09:47:43 <wikibugs> ('CR) ''Elukey: [C:''+2] admin: add the analytics-sre uid and gid [puppet] - ''https://gerrit.wikimedia.org/r/1226774 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 09:56:54 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-worker1200.eqiad.wmnet
2026-01-15 09:57:16 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026.01.05 - 2026.01.23): Degraded RAID on an-worker1200 - https://phabricator.wikimedia.org/T413360#11524257 (''ops-monitoring-bot) Host an-worker1200.eqiad.wmnet rebooted by btullis@cumin1003 with reason: Rebooting to allow unmounting failed disk'
2026-01-15 09:58:32 <logmsgbot> !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on tcp-proxy1001.eqiad.wmnet with reason: remove nftables
2026-01-15 10:04:01 <wikibugs> ('PS1) ''D3r1ck01: Control: Handle accepted consumers with "auth-only" grants [extensions/OAuth] (wmf/1.46.0-wmf.10) - ''https://gerrit.wikimedia.org/r/1227280 (https://phabricator.wikimedia.org/T413947)'
2026-01-15 10:04:36 <wikibugs> ('PS1) ''D3r1ck01: Control: When saving grants, ensure array has no gaps [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227281'
2026-01-15 10:05:01 <wikibugs> ('PS1) ''D3r1ck01: Control: Keep irrevocable grants when accepting new OAuth 2 consumers [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227282 (https://phabricator.wikimedia.org/T413947)'
2026-01-15 10:05:28 <logmsgbot> !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-cluster
2026-01-15 10:05:29 <logmsgbot> !log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
2026-01-15 10:06:01 <logmsgbot> !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-cluster
2026-01-15 10:06:02 <logmsgbot> !log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
2026-01-15 10:07:19 <logmsgbot> !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy1001.eqiad.wmnet
2026-01-15 10:07:20 <wikibugs> ('Abandoned) ''D3r1ck01: Control: Handle accepted consumers with "auth-only" grants [extensions/OAuth] (wmf/1.46.0-wmf.10) - ''https://gerrit.wikimedia.org/r/1227280 (https://phabricator.wikimedia.org/T413947) (owner: ''D3r1ck01)'
2026-01-15 10:08:43 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227281 (owner: ''D3r1ck01)'
2026-01-15 10:08:57 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227282 (https://phabricator.wikimedia.org/T413947) (owner: ''D3r1ck01)'
2026-01-15 10:09:21 <wikibugs> ('CR) ''Vgutierrez: [C:''+2] cache::upload: rate-limit rather than blocking bingbot [puppet] - ''https://gerrit.wikimedia.org/r/1227202 (owner: ''Giuseppe Lavagetto)'
2026-01-15 10:10:39 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11524278 (''cmooney) >>! In T408892#11523618, @Papaul wrote: > Phase 1 of ULSFO migration which was changing the loopback addresses of cr1,cr4 ,mr1 and the IP...'
2026-01-15 10:11:07 <logmsgbot> !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy1001.eqiad.wmnet
2026-01-15 10:12:05 <wikibugs> ('PS2) ''Elukey: role::puppetserver: deploy kerberos keytab for analytics-sre [puppet] - ''https://gerrit.wikimedia.org/r/1226775 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:13:24 <wikibugs> ('CR) ''Elukey: [C:''+2] role::puppetserver: deploy kerberos keytab for analytics-sre [puppet] - ''https://gerrit.wikimedia.org/r/1226775 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:14:54 <wikibugs> ('PS2) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:16:57 <icinga-wm> PROBLEM - Host an-worker1200 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 10:19:41 <wikibugs> 'SRE, ''Kubernetes, ''ServiceOps new: Failing docker registry tests - https://phabricator.wikimedia.org/T414576#11524310 (''JMeybohm) p:''Triage''Medium The 403 vs. 401 or 404 are the result of the tests being run against a read-only (`profile::docker_registry::read_only_mode`) instance of the registry...'
2026-01-15 10:19:53 <wikibugs> 'SRE, ''Kubernetes, ''ServiceOps new: Failing docker registry httpbb tests - https://phabricator.wikimedia.org/T414576#11524313 (''JMeybohm)'
2026-01-15 10:20:24 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from cloud roles [puppet] - ''https://gerrit.wikimedia.org/r/1227264 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 10:22:32 <wikibugs> ('CR) ''Dzahn: [C:''+2] "had to follow-up and remove the nftables package via cumin and reboot the hosts - normally we don't have this case where we move from nfta" [puppet] - ''https://gerrit.wikimedia.org/r/1215284 (https://phabricator.wikimedia.org/T408532) (owner: ''Dzahn)'
2026-01-15 10:23:03 <wikibugs> ('PS3) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:23:03 <wikibugs> ('PS1) ''Elukey: role::puppetserver: add the profile to fetch the krb keytabs [puppet] - ''https://gerrit.wikimedia.org/r/1227285 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:23:49 <wikibugs> ('CR) ''Elukey: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:26:05 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from serviceops roles [puppet] - ''https://gerrit.wikimedia.org/r/1227261 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 10:27:36 <wikibugs> ('PS4) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:27:47 <wikibugs> ('CR) ''Elukey: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:28:24 <wikibugs> ('CR) ''Elukey: [C:''+2] role::puppetserver: add the profile to fetch the krb keytabs [puppet] - ''https://gerrit.wikimedia.org/r/1227285 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:30:45 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
2026-01-15 10:30:53 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1190 (T413525)', diff saved to https://phabricator.wikimedia.org/P87541 and previous config saved to /var/cache/conftool/dbconfig/20260115-103053-marostegui.json
2026-01-15 10:30:57 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 10:34:40 <wikibugs> ('PS1) ''Elukey: Add fake kerberos keytabs for the Puppetserver hosts [labs/private] - ''https://gerrit.wikimedia.org/r/1227290 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:35:01 <wikibugs> ('CR) ''Elukey: [V:''+2 C:''+2] Add fake kerberos keytabs for the Puppetserver hosts [labs/private] - ''https://gerrit.wikimedia.org/r/1227290 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:35:47 <wikibugs> ('CR) ''Elukey: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:35:59 <wikibugs> ('PS5) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 10:38:25 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 10:39:11 <wikibugs> ('PS6) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:39:55 <wikibugs> ('CR) ''Elukey: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:41:49 <wikibugs> ('PS7) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 10:42:14 <wikibugs> ('CR) ''Elukey: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 10:42:21 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026.01.05 - 2026.01.23): hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11524338 (''BTullis) The RAID controller firmware is already the latest version. {F71530261} {F71530265} I'm continuing to...'
2026-01-15 10:51:00 <logmsgbot> !log vgutierrez@cumin1003 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - haproxy 2.8.18 upgrade (T414318)
2026-01-15 10:51:04 <stashbot> T414318: upgrade to HAProxy 2.8.18 - https://phabricator.wikimedia.org/T414318
2026-01-15 10:51:16 <logmsgbot> !log vgutierrez@cumin1003 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - haproxy 2.8.18 upgrade (T414318)
2026-01-15 11:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1100)
2026-01-15 11:03:25 <jinxer-wm> FIRING: [15x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 11:06:45 <wikibugs> ('Abandoned) ''Giuseppe Lavagetto: Revert "Move status, commit status/history to database" [software/hiddenparma/deploy] - ''https://gerrit.wikimedia.org/r/1226867 (owner: ''Giuseppe Lavagetto)'
2026-01-15 11:10:00 <jynus> !log force dbprov1004 restart
2026-01-15 11:10:01 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 11:11:49 <wikibugs> ('PS15) ''Daniel Kinzler: charts: add redioscope chart and service [deployment-charts] - ''https://gerrit.wikimedia.org/r/1207256 (https://phabricator.wikimedia.org/T407999)'
2026-01-15 11:11:58 <wikibugs> ('CR) ''CI reject: [V:''-1] charts: add redioscope chart and service [deployment-charts] - ''https://gerrit.wikimedia.org/r/1207256 (https://phabricator.wikimedia.org/T407999) (owner: ''Daniel Kinzler)'
2026-01-15 11:12:02 <wikibugs> ('CR) ''Daniel Kinzler: charts: add redioscope chart and service (''8 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1207256 (https://phabricator.wikimedia.org/T407999) (owner: ''Daniel Kinzler)'
2026-01-15 11:13:24 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from IF roles [puppet] - ''https://gerrit.wikimedia.org/r/1227292 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 11:15:27 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524436 (''ATitkov) QA was successful. Some people report needed a refresh for the first visit on https://wikipedia25.org/ or https://w...'
2026-01-15 11:16:56 <logmsgbot> !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1200.eqiad.wmnet
2026-01-15 11:21:13 <moritzm> !log installing nginx security updates
2026-01-15 11:21:15 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 11:25:19 <wikibugs> ('CR) ''Elukey: [C:''+1] Remove profile::puppet::agent::force_puppet7 from IF roles [puppet] - ''https://gerrit.wikimedia.org/r/1227292 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 11:26:18 <wikibugs> ('PS1) ''Vgutierrez: tcpproxy: Accept connections from the internet [puppet] - ''https://gerrit.wikimedia.org/r/1227294'
2026-01-15 11:26:41 <wikibugs> ('CR) ''Vgutierrez: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227294 (owner: ''Vgutierrez)'
2026-01-15 11:26:48 <wikibugs> ('CR) ''CI reject: [V:''-1] tcpproxy: Accept connections from the internet [puppet] - ''https://gerrit.wikimedia.org/r/1227294 (owner: ''Vgutierrez)'
2026-01-15 11:26:54 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from IF roles [puppet] - ''https://gerrit.wikimedia.org/r/1227292 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 11:29:26 <wikibugs> ('PS2) ''Vgutierrez: tcpproxy: Accept connections from the internet [puppet] - ''https://gerrit.wikimedia.org/r/1227294'
2026-01-15 11:29:39 <logmsgbot> !log vgutierrez@cumin1003 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - haproxy 2.8.18 upgrade (T414318)
2026-01-15 11:29:42 <stashbot> T414318: upgrade to HAProxy 2.8.18 - https://phabricator.wikimedia.org/T414318
2026-01-15 11:29:51 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11524474 (''ATitkov) I know it might look too soon, but I want to request either scheduled re-deployments or the ability to deploy myself...'
2026-01-15 11:30:19 <wikibugs> 'ops-eqiad, ''DC-Ops: dbprov1004 lost connectivity, leading to a pause in eqiad database backups - https://phabricator.wikimedia.org/T414668 (''jcrespo) ''NEW'
2026-01-15 11:31:45 <wikibugs> ('CR) ''Vgutierrez: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227294 (owner: ''Vgutierrez)'
2026-01-15 11:33:26 <wikibugs> ('PS1) ''Gkyziridis: ml-services: Deploy rr-multilingual model using bookworm base image. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227296 (https://phabricator.wikimedia.org/T411786)'
2026-01-15 11:33:51 <logmsgbot> !log vgutierrez@cumin1003 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - haproxy 2.8.18 upgrade (T414318)
2026-01-15 11:35:52 <wikibugs> ('PS2) ''Gkyziridis: ml-services: Deploy rr-multilingual model using bookworm base image. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227296 (https://phabricator.wikimedia.org/T411786)'
2026-01-15 11:37:00 <wikibugs> 'SRE, ''Observability-Metrics: Change units for "network utilization" on "host overview" dashboard to bits/sec - https://phabricator.wikimedia.org/T414670 (''cmooney) ''NEW p:''Triage''Low'
2026-01-15 11:37:21 <wikibugs> 'SRE, ''Observability-Metrics: Change units for "network utilization" on "host overview" dashboard to bits/sec - https://phabricator.wikimedia.org/T414670#11524521 (''cmooney)'
2026-01-15 11:37:52 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to L3 data access for kimpham (developer name Kim.pham) - https://phabricator.wikimedia.org/T414660#11524522 (''WMDE-leszek) I approve this request on WMDE's end. Thank you'
2026-01-15 11:39:18 <wikibugs> ('CR) ''Kevin Bazira: [C:''+1] ml-services: Deploy rr-multilingual model using bookworm base image. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227296 (https://phabricator.wikimedia.org/T411786) (owner: ''Gkyziridis)'
2026-01-15 11:40:16 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T413525)', diff saved to https://phabricator.wikimedia.org/P87542 and previous config saved to /var/cache/conftool/dbconfig/20260115-114015-marostegui.json
2026-01-15 11:40:19 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 11:46:17 <wikibugs> ('CR) ''Gkyziridis: [C:''+2] ml-services: Deploy rr-multilingual model using bookworm base image. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227296 (https://phabricator.wikimedia.org/T411786) (owner: ''Gkyziridis)'
2026-01-15 11:48:06 <wikibugs> ('Merged) ''jenkins-bot: ml-services: Deploy rr-multilingual model using bookworm base image. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227296 (https://phabricator.wikimedia.org/T411786) (owner: ''Gkyziridis)'
2026-01-15 11:50:24 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P87543 and previous config saved to /var/cache/conftool/dbconfig/20260115-115023-marostegui.json
2026-01-15 11:51:03 <wikibugs> ('PS2) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from traffic hosts [puppet] - ''https://gerrit.wikimedia.org/r/1225524 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 11:51:40 <wikibugs> 'ops-eqiad, ''DC-Ops: dbprov1004 lost connectivity, leading to a pause in eqiad database backups - https://phabricator.wikimedia.org/T414668#11524548 (''jcrespo) For context, rebooting the host didn't fix the issue.'
2026-01-15 11:52:11 <logmsgbot> !log gkyziridis@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-01-15 11:52:28 <logmsgbot> !log gkyziridis@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-01-15 11:52:34 <wikibugs> ('CR) ''Muehlenhoff: "Thanks, these were already removed (hcaptcha via https://gerrit.wikimedia.org/r/c/operations/puppet/+/1227261 and the insetup role via htt" [puppet] - ''https://gerrit.wikimedia.org/r/1225524 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 12:00:32 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P87544 and previous config saved to /var/cache/conftool/dbconfig/20260115-120032-marostegui.json
2026-01-15 12:00:51 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to SRE/production access for Kim.pham (kimpham in phab) - https://phabricator.wikimedia.org/T414671 (''kimpham) ''NEW'
2026-01-15 12:02:46 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops, ''Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11524578 (''cmooney) //dse-k8s-worker1013// seems fairly happy in terms of the original problem since we made the change y...'
2026-01-15 12:10:41 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T413525)', diff saved to https://phabricator.wikimedia.org/P87545 and previous config saved to /var/cache/conftool/dbconfig/20260115-121040-marostegui.json
2026-01-15 12:10:44 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 12:10:57 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
2026-01-15 12:11:05 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2210 (T413525)', diff saved to https://phabricator.wikimedia.org/P87546 and previous config saved to /var/cache/conftool/dbconfig/20260115-121105-marostegui.json
2026-01-15 12:16:36 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops, ''Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11524635 (''BTullis) >>! In T414460#11521367, @CDanis wrote: >>>! In T414460#11521085, @cmooney wrote: >> The k8s host sen...'
2026-01-15 12:22:08 <wikibugs> ('PS1) ''Muehlenhoff: conf/etcd: Remove now obsolete cert [puppet] - ''https://gerrit.wikimedia.org/r/1227307 (https://phabricator.wikimedia.org/T352245)'
2026-01-15 12:23:17 <wikibugs> ('PS1) ''Muehlenhoff: conf/etcd: Remove now obsolete cert [puppet] - ''https://gerrit.wikimedia.org/r/1227309 (https://phabricator.wikimedia.org/T352245)'
2026-01-15 12:23:43 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] wikidough: Enable Bird 2.18 for all servers [puppet] - ''https://gerrit.wikimedia.org/r/1224708 (https://phabricator.wikimedia.org/T413740) (owner: ''Muehlenhoff)'
2026-01-15 12:24:31 <wikibugs> 'SRE, ''Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11524649 (''MoritzMuehlenhoff)'
2026-01-15 12:26:14 <wikibugs> ('PS1) ''PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227310'
2026-01-15 12:27:50 <wikibugs> 'SRE, ''serviceops, ''Kubernetes: Fix nginx config and caching for docker registry - https://phabricator.wikimedia.org/T256762#11524650 (''JMeybohm) ''Open''Resolved a:''JMeybohm Since there is clearly no need for optimization here, I'll resolve this now.'
2026-01-15 12:28:34 <wikibugs> ('PS1) ''JMeybohm: httpbb: Remove assertions for X-Cache-Status [puppet] - ''https://gerrit.wikimedia.org/r/1227311 (https://phabricator.wikimedia.org/T414576)'
2026-01-15 12:28:43 <ihurbain> jouncebot: nowandnext
2026-01-15 12:28:43 <jouncebot> No deployments scheduled for the next 0 hour(s) and 31 minute(s)
2026-01-15 12:28:43 <jouncebot> In 0 hour(s) and 31 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1300)
2026-01-15 12:29:48 <ihurbain> can I deploy a config patch? (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1226232)
2026-01-15 12:31:52 <claime> ihurbain: no objection from me. That sampling rate definition is confusing af
2026-01-15 12:33:57 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from Data Platform roles [puppet] - ''https://gerrit.wikimedia.org/r/1227313 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 12:34:48 <ihurbain> claime: the fact that i got confused by it is probably a good sign (but it's also how we apparently sample, and i get that, integers are good, etc)
2026-01-15 12:34:56 <ihurbain> anyway spiderpigging.
2026-01-15 12:35:42 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by ihurbain@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226232 (https://phabricator.wikimedia.org/T412803) (owner: ''Isabelle Hurbain-Palatin)'
2026-01-15 12:36:30 <wikibugs> ('Merged) ''jenkins-bot: Turn on debugging for unsafe postproc cache entries logging [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226232 (https://phabricator.wikimedia.org/T412803) (owner: ''Isabelle Hurbain-Palatin)'
2026-01-15 12:37:05 <logmsgbot> !log ihurbain@deploy2002 Started scap sync-world: Backport for [[gerrit:1226232|Turn on debugging for unsafe postproc cache entries logging (T412803)]]
2026-01-15 12:37:09 <stashbot> T412803: Tweak unsafe post-processing cache keys - https://phabricator.wikimedia.org/T412803
2026-01-15 12:39:14 <logmsgbot> !log ihurbain@deploy2002 ihurbain: Backport for [[gerrit:1226232|Turn on debugging for unsafe postproc cache entries logging (T412803)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 12:39:35 <icinga-wm> RECOVERY - Host an-worker1200 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms
2026-01-15 12:41:09 <wikibugs> ('PS2) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 from Data Platform roles [puppet] - ''https://gerrit.wikimedia.org/r/1227313 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 12:41:24 <logmsgbot> !log ihurbain@deploy2002 ihurbain: Continuing with sync
2026-01-15 12:45:27 <icinga-wm> RECOVERY - Host an-worker1148 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
2026-01-15 12:45:29 <logmsgbot> !log ihurbain@deploy2002 Finished scap sync-world: Backport for [[gerrit:1226232|Turn on debugging for unsafe postproc cache entries logging (T412803)]] (duration: 08m 24s)
2026-01-15 12:45:33 <stashbot> T412803: Tweak unsafe post-processing cache keys - https://phabricator.wikimedia.org/T412803
2026-01-15 12:45:38 <ihurbain> woot.
2026-01-15 12:46:27 <ihurbain> and yay, i'm seeing my new logs!
2026-01-15 12:48:01 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227307 (https://phabricator.wikimedia.org/T352245) (owner: ''Muehlenhoff)'
2026-01-15 12:49:07 <icinga-wm> RECOVERY - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1200 is OK: communication: 0 OK : controller: 0 OK : physical_disk: 0 OK : virtual_disk: 0 OK : bbu: 0 OK : enclosure: 0 OK https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
2026-01-15 12:50:37 <wikibugs> 'SRE-SLO, ''Citoid, ''VisualEditor, ''Editing-team (Tracking): Seperate SLO for requests made from Citoid Extension, possible wmf deployed extension only, vs bots etc. - https://phabricator.wikimedia.org/T345627#11524721 (''Mvolz) So we're running at around 10% error for mediawikijs requests, we're allowe...'
2026-01-15 12:51:23 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227309 (https://phabricator.wikimedia.org/T352245) (owner: ''Muehlenhoff)'
2026-01-15 12:53:42 <topranks> !log drainin Arelion transit circuit on cr1-codfw in advance of adding second 10G port to bundle
2026-01-15 12:53:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 12:55:00 <wikibugs> ('PS4) ''Muehlenhoff: etcd: Remove the use_pki_certs flag [puppet] - ''https://gerrit.wikimedia.org/r/978615'
2026-01-15 12:55:28 <wikibugs> 'SRE-SLO, ''Citoid, ''VisualEditor, ''Editing-team (Tracking): Seperate SLO for requests made from Citoid Extension, possible wmf deployed extension only, vs bots etc. - https://phabricator.wikimedia.org/T345627#11524727 (''Mvolz) If you look for https://thanos.wikimedia.org/graph?g0.expr=sum(rate(citoid_...'
2026-01-15 12:57:11 <wikibugs> ('CR) ''Elukey: [C:''+1] httpbb: Remove assertions for X-Cache-Status [puppet] - ''https://gerrit.wikimedia.org/r/1227311 (https://phabricator.wikimedia.org/T414576) (owner: ''JMeybohm)'
2026-01-15 12:59:08 <phuedx> jouncebot: next
2026-01-15 12:59:09 <jouncebot> In 0 hour(s) and 0 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1300)
2026-01-15 12:59:17 <phuedx> jouncebot: nowandnext
2026-01-15 12:59:17 <jouncebot> No deployments scheduled for the next 0 hour(s) and 0 minute(s)
2026-01-15 12:59:17 <jouncebot> In 0 hour(s) and 0 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1300)
2026-01-15 13:00:05 <jouncebot> Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1300)
2026-01-15 13:00:10 <phuedx> You win this time jouncebot
2026-01-15 13:01:36 <wikibugs> ('PS1) ''Majavah: P:toolforge: k8s: haproxy: Handle plain toolforge.org domain [puppet] - ''https://gerrit.wikimedia.org/r/1227321 (https://phabricator.wikimedia.org/T414674)'
2026-01-15 13:01:44 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/978615 (owner: ''Muehlenhoff)'
2026-01-15 13:03:25 <jinxer-wm> FIRING: [15x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 13:05:01 <icinga-wm> PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2026-01-15 13:08:10 <wikibugs> ('CR) ''JMeybohm: [C:''+2] httpbb: Remove assertions for X-Cache-Status [puppet] - ''https://gerrit.wikimedia.org/r/1227311 (https://phabricator.wikimedia.org/T414576) (owner: ''JMeybohm)'
2026-01-15 13:09:37 <wikibugs> ('PS1) ''Muehlenhoff: Remove profile::puppet::agent::force_puppet7 for Cloud VPS [puppet] - ''https://gerrit.wikimedia.org/r/1227322 (https://phabricator.wikimedia.org/T365798)'
2026-01-15 13:15:01 <icinga-wm> RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2026-01-15 13:22:57 <wikibugs> ('CR) ''Elukey: "Left a nit but we are close!" [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1146891 (https://phabricator.wikimedia.org/T385173) (owner: ''Kevin Bazira)'
2026-01-15 13:23:58 <wikibugs> 'SRE, ''Kubernetes, ''Patch-For-Review, ''ServiceOps new: Failing docker registry httpbb tests - https://phabricator.wikimedia.org/T414576#11524771 (''JMeybohm) a:''DPogorzelski-WMF The X-Cache-Status failures are gone now: ` jayme@cumin1003:~$ sudo httpbb /srv/deployment/httpbb-tests/docker-registry/te...'
2026-01-15 13:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 13:25:51 <logmsgbot> !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog1003.eqiad.wmnet with OS bookworm
2026-01-15 13:26:01 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''SRE Observability (FY2025/2026-Q3): Q2:rack/setup/install mwlog1003 - https://phabricator.wikimedia.org/T412230#11524779 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host mwlog1003.eqiad.wmnet with OS bookworm executed with erro...'
2026-01-15 13:26:17 <wikibugs> ('PS1) ''Jdrewniak: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227330 (https://phabricator.wikimedia.org/T128546)'
2026-01-15 13:26:24 <wikibugs> ('PS1) ''Muehlenhoff: Record LDAP access for aramilferaxa [puppet] - ''https://gerrit.wikimedia.org/r/1227331'
2026-01-15 13:27:05 <logmsgbot> !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host mwlog1003.eqiad.wmnet with OS bookworm
2026-01-15 13:27:08 <wikibugs> ('CR) ''CI reject: [V:''-1] Record LDAP access for aramilferaxa [puppet] - ''https://gerrit.wikimedia.org/r/1227331 (owner: ''Muehlenhoff)'
2026-01-15 13:27:16 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''SRE Observability (FY2025/2026-Q3): Q2:rack/setup/install mwlog1003 - https://phabricator.wikimedia.org/T412230#11524785 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host mwlog1003.eqiad.wmnet with OS bookworm'
2026-01-15 13:27:33 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+1] P:toolforge: k8s: haproxy: Handle plain toolforge.org domain [puppet] - ''https://gerrit.wikimedia.org/r/1227321 (https://phabricator.wikimedia.org/T414674) (owner: ''Majavah)'
2026-01-15 13:27:48 <moritzm> !log installing squid security updates
2026-01-15 13:27:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 13:27:57 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+1] Remove profile::puppet::agent::force_puppet7 for Cloud VPS [puppet] - ''https://gerrit.wikimedia.org/r/1227322 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 13:28:34 <wikibugs> ('CR) ''Majavah: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7899/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1227321 (https://phabricator.wikimedia.org/T414674) (owner: ''Majavah)'
2026-01-15 13:29:06 <wikibugs> ('CR) ''Majavah: [V:''+1 C:''+2] P:toolforge: k8s: haproxy: Handle plain toolforge.org domain [puppet] - ''https://gerrit.wikimedia.org/r/1227321 (https://phabricator.wikimedia.org/T414674) (owner: ''Majavah)'
2026-01-15 13:29:31 <hashar> jan_drewniak: we can do https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1227330 I think
2026-01-15 13:29:35 <hashar> jouncebot: nowandnext
2026-01-15 13:29:35 <jouncebot> For the next 0 hour(s) and 30 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1300)
2026-01-15 13:29:35 <jouncebot> In 0 hour(s) and 30 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1400)
2026-01-15 13:30:01 <jan_drewniak> hey folks, I'm going to be deploying a portals updates now just ahead of the backport window
2026-01-15 13:30:02 <wikibugs> ('CR) ''Hashar: [C:''+1] Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227330 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 13:31:32 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227330 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 13:32:40 <wikibugs> ('Merged) ''jenkins-bot: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227330 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2026-01-15 13:33:12 <logmsgbot> !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1227330|Bumping portals to master (T128546)]]
2026-01-15 13:33:17 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 13:34:24 <wikibugs> ('Abandoned) ''Muehlenhoff: Record LDAP access for aramilferaxa [puppet] - ''https://gerrit.wikimedia.org/r/1227331 (owner: ''Muehlenhoff)'
2026-01-15 13:35:18 <wikibugs> ('PS1) ''Filippo Giunchedi: wmcs: remove value from CephSlowOps summary [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669)'
2026-01-15 13:35:26 <logmsgbot> !log jdrewniak@deploy2002 jdrewniak: Backport for [[gerrit:1227330|Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 13:36:27 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to SRE/production access for Kim.pham (kimpham in phab) - https://phabricator.wikimedia.org/T414671#11524827 (''Novem_Linguae) Are you requesting `deployment` access? > backlog deployment windows Do you mean [[ https://wikitech.wikimedia.org/wiki/Backport_windo...'
2026-01-15 13:37:07 <logmsgbot> !log jdrewniak@deploy2002 jdrewniak: Continuing with sync
2026-01-15 13:37:51 <wikibugs> ('CR) ''Majavah: [C:''-1] "The number is useful to see in some form, so can it be added to the description if it can't be in the summary?" [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 13:40:23 <wikibugs> ('PS2) ''Filippo Giunchedi: wmcs: remove value from CephSlowOps summary [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669)'
2026-01-15 13:40:35 <wikibugs> ('CR) ''Filippo Giunchedi: "Fair point, {{done}}" [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 13:41:10 <logmsgbot> !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227330|Bumping portals to master (T128546)]] (duration: 07m 58s)
2026-01-15 13:41:14 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2026-01-15 13:42:02 <moritzm> !log upgrade wikidough to Bird 2.18 T413740
2026-01-15 13:42:05 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 13:42:06 <stashbot> T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740
2026-01-15 13:42:46 <wikibugs> ('CR) ''Majavah: [C:''+1] wmcs: remove value from CephSlowOps summary [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 13:43:11 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 for Cloud VPS [puppet] - ''https://gerrit.wikimedia.org/r/1227322 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 13:43:59 <jan_drewniak> hashar: I just ran the sync through spiderpig. Now I logged into deploy2002 and run `MEDIAWIKI_STAGING_DIR=/srv/mediawiki-staging | mwscript purgeList.php`
2026-01-15 13:44:43 <wikibugs> ('PS1) ''Filippo Giunchedi: sre: remove value from MaxConntrack summary [alerts] - ''https://gerrit.wikimedia.org/r/1227335 (https://phabricator.wikimedia.org/T414669)'
2026-01-15 13:45:38 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+2] wmcs: remove value from CephSlowOps summary [alerts] - ''https://gerrit.wikimedia.org/r/1227334 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 13:47:23 <jan_drewniak> hashar: ok, deployed and purged successfully!
2026-01-15 13:47:33 <hashar> well done!
2026-01-15 13:48:04 <hashar> I have sent some changes to the docs on https://gerrit.wikimedia.org/r/q/project:wikimedia/portals+is:open+owner:hashar
2026-01-15 13:48:11 <hashar> then I don't know whether they are accurate
2026-01-15 13:57:30 <wikibugs> 'SRE, ''Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11524895 (''MoritzMuehlenhoff)'
2026-01-15 13:58:19 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: dbprov1004 lost connectivity, leading to a pause in eqiad database backups - https://phabricator.wikimedia.org/T414668#11524898 (''Jclark-ctr) a:''Jclark-ctr'
2026-01-15 13:58:45 <wikibugs> ('PS1) ''Elukey: role::puppetserver: remove kerberos config [puppet] - ''https://gerrit.wikimedia.org/r/1227338 (https://phabricator.wikimedia.org/T402512)'
2026-01-15 14:00:04 <jouncebot> Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1400)
2026-01-15 14:00:05 <jouncebot> Seawolf35, JSherman, stephanebisson, and phuedx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-01-15 14:00:08 <JSherman> o/
2026-01-15 14:00:12 <phuedx> o/
2026-01-15 14:00:14 <Lucas_WMDE> o/
2026-01-15 14:00:18 <Seawolf35> o/
2026-01-15 14:00:19 <stephanebisson> o/
2026-01-15 14:00:26 <Lucas_WMDE> I can deploy!
2026-01-15 14:00:43 <Lucas_WMDE> let’s start with Seawolf35 ^^
2026-01-15 14:00:52 <Seawolf35> Ok
2026-01-15 14:01:04 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1225596 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 14:01:18 <wikibugs> ('Abandoned) ''Elukey: WIP: profile::puppetserver::volatile: add hdfs rsync job [puppet] - ''https://gerrit.wikimedia.org/r/1226776 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 14:01:51 <wikibugs> ('PS1) ''Cathal Mooney: Remove offload of Comcast traffic from Arelion [homer/public] - ''https://gerrit.wikimedia.org/r/1227341 (https://phabricator.wikimedia.org/T261867)'
2026-01-15 14:02:17 <wikibugs> ('Merged) ''jenkins-bot: ukwiki: Various changes to user rights. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1225596 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 14:02:49 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1225596|ukwiki: Various changes to user rights. (T414277)]]
2026-01-15 14:02:53 <stashbot> T414277: Some changes in user group rights in ukwiki - https://phabricator.wikimedia.org/T414277
2026-01-15 14:05:00 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, seawolf35gerrit: Backport for [[gerrit:1225596|ukwiki: Various changes to user rights. (T414277)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 14:05:28 <Lucas_WMDE> Seawolf35: please test!
2026-01-15 14:05:49 <Seawolf35> I’m using the debug cookie on my phone fyi
2026-01-15 14:06:18 <Lucas_WMDE> hmm, I still see the movestable right in the autoconfirmed group I think
2026-01-15 14:06:19 <wikibugs> ('PS2) ''Cathal Mooney: Remove offload of Comcast traffic from Arelion [homer/public] - ''https://gerrit.wikimedia.org/r/1227341 (https://phabricator.wikimedia.org/T261867)'
2026-01-15 14:06:25 <icinga-wm> RECOVERY - Host dbprov1004 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
2026-01-15 14:06:51 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: dbprov1004 lost connectivity, leading to a pause in eqiad database backups - https://phabricator.wikimedia.org/T414668#11524921 (''Jclark-ctr) @jcrespo Replaced Dac cable link came up.'
2026-01-15 14:07:04 <Lucas_WMDE> same for the confirmed group
2026-01-15 14:07:26 <Seawolf35> Everything else seemed to work.
2026-01-15 14:08:01 <wikibugs> ('CR) ''Ayounsi: [C:''+1] "lgtm" [homer/public] - ''https://gerrit.wikimedia.org/r/1227341 (https://phabricator.wikimedia.org/T261867) (owner: ''Cathal Mooney)'
2026-01-15 14:08:37 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: dbprov1004 lost connectivity, leading to a pause in eqiad database backups - https://phabricator.wikimedia.org/T414668#11524927 (''Jclark-ctr) ''Open''Resolved updated netbox cableid'
2026-01-15 14:09:01 <Lucas_WMDE> looks like the same is also true for ruwikinews, despite its 'autoconfirmed' => [ 'movestable' => false, ]
2026-01-15 14:09:04 <wikibugs> ('CR) ''Elukey: [C:''+2] role::puppetserver: remove kerberos config [puppet] - ''https://gerrit.wikimedia.org/r/1227338 (https://phabricator.wikimedia.org/T402512) (owner: ''Elukey)'
2026-01-15 14:09:22 <Lucas_WMDE> searches phabricator
2026-01-15 14:09:24 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] proton: Bump image [deployment-charts] - ''https://gerrit.wikimedia.org/r/1226218 (owner: ''Muehlenhoff)'
2026-01-15 14:09:51 <wikibugs> ('PS1) ''Elukey: Revert "Add fake kerberos keytabs for the Puppetserver hosts" [labs/private] - ''https://gerrit.wikimedia.org/r/1227342'
2026-01-15 14:09:56 <wikibugs> ('CR) ''Elukey: [V:''+2 C:''+2] Revert "Add fake kerberos keytabs for the Puppetserver hosts" [labs/private] - ''https://gerrit.wikimedia.org/r/1227342 (owner: ''Elukey)'
2026-01-15 14:10:26 <wikibugs> ('CR) ''Cathal Mooney: [C:''+2] Remove offload of Comcast traffic from Arelion [homer/public] - ''https://gerrit.wikimedia.org/r/1227341 (https://phabricator.wikimedia.org/T261867) (owner: ''Cathal Mooney)'
2026-01-15 14:11:05 <Lucas_WMDE> Seawolf35: I think let’s deploy the config change anyway, but the task should then stay open for further investigation what’s going on with this right
2026-01-15 14:11:07 <Lucas_WMDE> does that sound okay?
2026-01-15 14:11:35 <Seawolf35> Sounds good.
2026-01-15 14:11:47 <wikibugs> ('Merged) ''jenkins-bot: Remove offload of Comcast traffic from Arelion [homer/public] - ''https://gerrit.wikimedia.org/r/1227341 (https://phabricator.wikimedia.org/T261867) (owner: ''Cathal Mooney)'
2026-01-15 14:11:52 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, seawolf35gerrit: Continuing with sync
2026-01-15 14:11:56 <Seawolf35> Everything else like change tags looks good on my end
2026-01-15 14:11:57 <logmsgbot> !log jmm@deploy2002 helmfile [staging] START helmfile.d/services/proton: apply
2026-01-15 14:12:05 <Lucas_WMDE> alright, thanks
2026-01-15 14:13:18 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for johannesrichterwmde - https://phabricator.wikimedia.org/T414678 (''Johannes_Richter_WMDE) ''NEW'
2026-01-15 14:13:57 <logmsgbot> !log jmm@deploy2002 helmfile [staging] DONE helmfile.d/services/proton: apply
2026-01-15 14:15:28 <Lucas_WMDE> JSherman: want to self-service once the current deployment is done?
2026-01-15 14:16:02 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1225596|ukwiki: Various changes to user rights. (T414277)]] (duration: 13m 13s)
2026-01-15 14:16:06 <stashbot> T414277: Some changes in user group rights in ukwiki - https://phabricator.wikimedia.org/T414277
2026-01-15 14:16:06 <JSherman> Lucas_WMDE: on it
2026-01-15 14:16:10 <logmsgbot> jclark@cumin1003 reimage (PID 1651082) is awaiting input
2026-01-15 14:16:10 <JSherman> sounds good
2026-01-15 14:16:15 <Lucas_WMDE> ok!
2026-01-15 14:16:19 <logmsgbot> !log jmm@deploy2002 helmfile [codfw] START helmfile.d/services/proton: apply
2026-01-15 14:17:20 <Lucas_WMDE> (my spiderpig finished, you’re good to go)
2026-01-15 14:17:24 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for johannesrichterwmde - https://phabricator.wikimedia.org/T414678#11524963 (''Johannes_Richter_WMDE) By the way I noticed {T358578} – is that still common practice @Dzahn? (I'm not in the #wmf-nda group despite signing the NDA in...'
2026-01-15 14:17:31 <logmsgbot> !log jmm@deploy2002 helmfile [codfw] DONE helmfile.d/services/proton: apply
2026-01-15 14:18:00 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by jsn@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226862 (https://phabricator.wikimedia.org/T403982) (owner: ''Jsn.sherman)'
2026-01-15 14:18:27 <logmsgbot> !log jmm@deploy2002 helmfile [eqiad] START helmfile.d/services/proton: apply
2026-01-15 14:18:37 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for johannesrichterwmde - https://phabricator.wikimedia.org/T414678#11524970 (''Tobi_WMDE_SW) @Johannes_Richter_WMDE is part of the WMDE TechWish team, and I endorse this request.'
2026-01-15 14:18:43 <wikibugs> ('Merged) ''jenkins-bot: Deploy PersonalDashboard to testwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226862 (https://phabricator.wikimedia.org/T403982) (owner: ''Jsn.sherman)'
2026-01-15 14:19:04 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
2026-01-15 14:19:08 <Lucas_WMDE> stephanebisson: do you want to do your deploy afterwards? you could probably start the gate-and-submit builds already
2026-01-15 14:19:14 <logmsgbot> !log jsn@deploy2002 Started scap sync-world: Backport for [[gerrit:1226862|Deploy PersonalDashboard to testwiki (T403982)]]
2026-01-15 14:19:18 <stashbot> T403982: Create and deploy Extension:PersonalDashboard - https://phabricator.wikimedia.org/T403982
2026-01-15 14:19:44 <logmsgbot> !log jmm@deploy2002 helmfile [eqiad] DONE helmfile.d/services/proton: apply
2026-01-15 14:20:08 <stephanebisson> Lucas_WMDE: yes I'll do them, getting started soon
2026-01-15 14:20:37 <Lucas_WMDE> ok!
2026-01-15 14:21:24 <logmsgbot> !log jsn@deploy2002 jsn: Backport for [[gerrit:1226862|Deploy PersonalDashboard to testwiki (T403982)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 14:22:06 <wikibugs> ('CR) ''CDanis: [C:''+2] tcpproxy: Accept connections from the internet [puppet] - ''https://gerrit.wikimedia.org/r/1227294 (owner: ''Vgutierrez)'
2026-01-15 14:22:22 <vgutierrez> that was a highly motivated review lol
2026-01-15 14:22:38 <logmsgbot> !log jsn@deploy2002 jsn: Continuing with sync
2026-01-15 14:22:43 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns entries for arelion link ips - cmooney@cumin1003"
2026-01-15 14:23:27 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns entries for arelion link ips - cmooney@cumin1003"
2026-01-15 14:23:27 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-01-15 14:25:01 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Looks good" [alerts] - ''https://gerrit.wikimedia.org/r/1227335 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 14:25:20 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1159.eqiad.wmnet
2026-01-15 14:25:31 <logmsgbot> !log btullis@cumin1003 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-worker1159.eqiad.wmnet
2026-01-15 14:26:26 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops, ''Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11525006 (''CDanis) >>! In T414460#11524635, @BTullis wrote: > My assumption is that this is more likely related to the ce...'
2026-01-15 14:27:46 <stephanebisson> Lucas_WMDE can I just +2 the patches manually and start the real deployment later?
2026-01-15 14:28:12 <Lucas_WMDE> stephanebisson: yes
2026-01-15 14:28:19 <wikibugs> ('CR) ''Sbisson: [C:''+2] CX3 Build 1.0.0+20260114 [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226976 (https://phabricator.wikimedia.org/T413646) (owner: ''Sbisson)'
2026-01-15 14:28:23 <Lucas_WMDE> as long as nobody else is planning to deploy, because then they would pull in your changes ww
2026-01-15 14:28:24 <Lucas_WMDE> * ^^
2026-01-15 14:28:33 <wikibugs> ('PS8) ''CDanis: gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:28:42 <stephanebisson> I think I'm next in line
2026-01-15 14:28:45 <Lucas_WMDE> yeah
2026-01-15 14:28:50 <wikibugs> ('CR) ''Sbisson: [C:''+2] Fallback to source title if target title is not provided by cxserver [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226977 (https://phabricator.wikimedia.org/T414558) (owner: ''Sbisson)'
2026-01-15 14:28:50 <JSherman> we're about 3/4 through syncing prod k8s on mine, so I think you're good to +2
2026-01-15 14:28:56 <logmsgbot> !log jsn@deploy2002 Finished scap sync-world: Backport for [[gerrit:1226862|Deploy PersonalDashboard to testwiki (T403982)]] (duration: 09m 41s)
2026-01-15 14:29:00 <stashbot> T403982: Create and deploy Extension:PersonalDashboard - https://phabricator.wikimedia.org/T403982
2026-01-15 14:29:01 <JSherman> stephanebisson: over to you
2026-01-15 14:29:03 <JSherman> finished!
2026-01-15 14:29:07 <stephanebisson> Thanks!
2026-01-15 14:29:34 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by sbisson@deploy2002 using scap backport" [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226976 (https://phabricator.wikimedia.org/T413646) (owner: ''Sbisson)'
2026-01-15 14:29:34 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by sbisson@deploy2002 using scap backport" [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226977 (https://phabricator.wikimedia.org/T414558) (owner: ''Sbisson)'
2026-01-15 14:30:18 <Lucas_WMDE> depending on how long that gate-and-submit will take we could’ve tried to squeeze in phuedx in between
2026-01-15 14:30:22 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:30:27 <Lucas_WMDE> but I don’t think it’s necessary, there should be enough time afterwards
2026-01-15 14:32:06 <A_smart_kitten> Lucas_WMDE: fwiw my gut instinct is that the movestable permissions thing might be something to do with FlaggedRevs
2026-01-15 14:33:11 <Lucas_WMDE> ah, our favorite codebase?
2026-01-15 14:33:20 <A_smart_kitten> just the one :D
2026-01-15 14:33:34 <Lucas_WMDE> when in doubt, blame FlaggedRevs
2026-01-15 14:33:49 <Seawolf35> Beyond my pay grade
2026-01-15 14:33:52 <wikibugs> ('PS9) ''CDanis: gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:33:54 <A_smart_kitten> maybe some subtasks of T225144 are similar
2026-01-15 14:33:54 <stashbot> T225144: Flagged Revs configuration may be broken - https://phabricator.wikimedia.org/T225144
2026-01-15 14:34:01 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:34:02 <Lucas_WMDE> (I found some other Phabricator tasks that sounded related, though not quite the same)
2026-01-15 14:34:21 <Lucas_WMDE> T275370
2026-01-15 14:34:22 <stashbot> T275370: Unable to move pages despite being autoconfirmed on wikis with FlaggedRevs - https://phabricator.wikimedia.org/T275370
2026-01-15 14:35:15 <A_smart_kitten> my gut instinct (untested) would be to move the FlaggedRevs user group-related config that isn't currently working out of core-Permissions.php & add it to the MediaWikiServices hook in flaggedrevs.php
2026-01-15 14:37:41 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to SRE/production access for Kim.pham (kimpham in phab) - https://phabricator.wikimedia.org/T414671#11525053 (''WMDE-leszek) I approve this request on WMDE's end, and take the responsibility for the backlog instead of backport brainfart. @kimpham should not have...'
2026-01-15 14:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 14:38:14 <Lucas_WMDE> A_smart_kitten: geeeez
2026-01-15 14:38:18 <wikibugs> ('Merged) ''jenkins-bot: CX3 Build 1.0.0+20260114 [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226976 (https://phabricator.wikimedia.org/T413646) (owner: ''Sbisson)'
2026-01-15 14:38:27 <wikibugs> ('PS1) ''Jsn.sherman: Deploy PersonalDashboard to testwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227346 (https://phabricator.wikimedia.org/T403982)'
2026-01-15 14:38:30 <Lucas_WMDE> I hadn’t seen that hook before. that’s… something
2026-01-15 14:38:50 <wikibugs> ('Merged) ''jenkins-bot: Fallback to source title if target title is not provided by cxserver [extensions/ContentTranslation] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1226977 (https://phabricator.wikimedia.org/T414558) (owner: ''Sbisson)'
2026-01-15 14:39:00 <Lucas_WMDE> yeah there’s some stuff like $wgGroupPermissions['editor']['autoreview'] = false; there
2026-01-15 14:39:03 <wikibugs> ('CR) ''Dzahn: [C:''+1] tcpproxy: Accept connections from the internet [puppet] - ''https://gerrit.wikimedia.org/r/1227294 (owner: ''Vgutierrez)'
2026-01-15 14:39:11 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to SRE/production access for Kim.pham (kimpham in phab) - https://phabricator.wikimedia.org/T414671#11525059 (''WMDE-leszek)'
2026-01-15 14:39:24 <logmsgbot> !log sbisson@deploy2002 Started scap sync-world: Backport for [[gerrit:1226976|CX3 Build 1.0.0+20260114 (T413646)]], [[gerrit:1226977|Fallback to source title if target title is not provided by cxserver (T414558)]]
2026-01-15 14:39:28 <Lucas_WMDE> I’ll go make a task
2026-01-15 14:39:30 <stashbot> T413646: Content Translation: cannot select an existing target article; section translation is published to a redirect instead of the main article (target language: Russian). - https://phabricator.wikimedia.org/T413646
2026-01-15 14:39:31 <stashbot> T414558: Wikipedia Content Translation Tool displays blank page and never loads - https://phabricator.wikimedia.org/T414558
2026-01-15 14:39:39 <wikibugs> ('PS10) ''CDanis: gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:39:42 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:40:15 <wikibugs> ('CR) ''Vgutierrez: [C:''-1] "hiera files target eqsin, not drmrs" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:41:32 <logmsgbot> !log sbisson@deploy2002 sbisson: Backport for [[gerrit:1226976|CX3 Build 1.0.0+20260114 (T413646)]], [[gerrit:1226977|Fallback to source title if target title is not provided by cxserver (T414558)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 14:42:27 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] hcaptcha proxy: Enable Bird 2.18 for all servers [puppet] - ''https://gerrit.wikimedia.org/r/1224709 (https://phabricator.wikimedia.org/T413740) (owner: ''Muehlenhoff)'
2026-01-15 14:43:19 <logmsgbot> !log sbisson@deploy2002 sbisson: Continuing with sync
2026-01-15 14:43:35 <JSherman> Lucas_WMDE: just noting that I forgot to add the extension load to common settings to enable personaldashboard on testwiki, making my patch a noop. I just kept it moving and created a new patch to complete the enablement. Will followup in another window.
2026-01-15 14:43:48 <wikibugs> 'ops-eqiad, ''DC-Ops: Power Supply - PS1 Status - issue on clouddb1024:9290 - https://phabricator.wikimedia.org/T414681 (''phaultfinder) ''NEW'
2026-01-15 14:44:08 <moritzm> !log installing net-snmp security updates
2026-01-15 14:44:10 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 14:44:31 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+2] sre: remove value from MaxConntrack summary [alerts] - ''https://gerrit.wikimedia.org/r/1227335 (https://phabricator.wikimedia.org/T414669) (owner: ''Filippo Giunchedi)'
2026-01-15 14:45:06 <wikibugs> ('PS11) ''CDanis: gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:45:06 <wikibugs> ('PS1) ''CDanis: gerrit/Liberica: eqsin [puppet] - ''https://gerrit.wikimedia.org/r/1227348 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:45:22 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227348 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:45:24 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:46:22 <Lucas_WMDE> JSherman: ack
2026-01-15 14:47:00 <wikibugs> 'SRE, ''collaboration-services, ''Patch-For-Review, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11525113 (''Dzahn) I can help with another deployment tomorrow, Friday 16, but not after that until next month. Whether deployment right...'
2026-01-15 14:47:31 <logmsgbot> !log sbisson@deploy2002 Finished scap sync-world: Backport for [[gerrit:1226976|CX3 Build 1.0.0+20260114 (T413646)]], [[gerrit:1226977|Fallback to source title if target title is not provided by cxserver (T414558)]] (duration: 08m 07s)
2026-01-15 14:47:36 <stashbot> T413646: Content Translation: cannot select an existing target article; section translation is published to a redirect instead of the main article (target language: Russian). - https://phabricator.wikimedia.org/T413646
2026-01-15 14:47:37 <stashbot> T414558: Wikipedia Content Translation Tool displays blank page and never loads - https://phabricator.wikimedia.org/T414558
2026-01-15 14:48:30 <Lucas_WMDE> phuedx: over to you, do you also want to self-service?
2026-01-15 14:49:09 <phuedx> I can self service
2026-01-15 14:49:16 <wikibugs> ('CR) ''Vgutierrez: [C:''+1] gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:49:16 <Lucas_WMDE> ok, go ahead :)
2026-01-15 14:50:06 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by phuedx@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227004 (https://phabricator.wikimedia.org/T407806) (owner: ''Clare Ming)'
2026-01-15 14:50:56 <wikibugs> ('PS12) ''CDanis: gerrit/Liberica: expand to drmrs [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:50:56 <wikibugs> ('PS2) ''CDanis: gerrit/Liberica: eqsin [puppet] - ''https://gerrit.wikimedia.org/r/1227348 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 14:50:59 <wikibugs> ('Merged) ''jenkins-bot: Enable Test Kitchen on all prod wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227004 (https://phabricator.wikimedia.org/T407806) (owner: ''Clare Ming)'
2026-01-15 14:51:00 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:51:06 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227348 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 14:51:30 <logmsgbot> !log phuedx@deploy2002 Started scap sync-world: Backport for [[gerrit:1227004|Enable Test Kitchen on all prod wikis (T407806)]]
2026-01-15 14:51:34 <stashbot> T407806: Rename Metrics Platform Extension to Test Kitchen - https://phabricator.wikimedia.org/T407806
2026-01-15 14:51:47 <A_smart_kitten> Lucas_WMDE: aaahhhh the autoconfirmed movestable permission is *overridden* in the flaggedrevs.php MediaWikiServices hook
2026-01-15 14:51:48 <A_smart_kitten> https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/19adfae2241be7a72c651d64dd318dd57f560c59/wmf-config/flaggedrevs.php#207
2026-01-15 14:51:58 <logmsgbot> !log cdanis@cumin1003 conftool action : set/pooled=yes; selector: cluster=tcp-proxy,service=gerrit
2026-01-15 14:53:52 <logmsgbot> !log phuedx@deploy2002 cjming, phuedx: Backport for [[gerrit:1227004|Enable Test Kitchen on all prod wikis (T407806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 14:53:58 <Lucas_WMDE> A_smart_kitten: created T414684
2026-01-15 14:53:58 <stashbot> T414684: FlaggedRevs-specific group rights from core-Permissions.php get overridden by flaggedrevs.php - https://phabricator.wikimedia.org/T414684
2026-01-15 14:54:30 <phuedx> Looking at the test servers now
2026-01-15 14:54:52 <A_smart_kitten> ty Lucas_WMDE!
2026-01-15 14:56:07 <wikibugs> ('CR) ''Scott French: [C:''+1] conf/etcd: Remove now obsolete cert [puppet] - ''https://gerrit.wikimedia.org/r/1227307 (https://phabricator.wikimedia.org/T352245) (owner: ''Muehlenhoff)'
2026-01-15 14:56:09 <wikibugs> ('CR) ''Scott French: [C:''+1] conf/etcd: Remove now obsolete cert [puppet] - ''https://gerrit.wikimedia.org/r/1227309 (https://phabricator.wikimedia.org/T352245) (owner: ''Muehlenhoff)'
2026-01-15 14:56:42 <phuedx> Configuration is coming through OK. There aren't any instruments or experiments using TestKitchen codepaths so I'm not expecting to see anything in the console
2026-01-15 14:57:28 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T413525)', diff saved to https://phabricator.wikimedia.org/P87549 and previous config saved to /var/cache/conftool/dbconfig/20260115-145727-marostegui.json
2026-01-15 14:57:31 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 14:57:50 <wikibugs> ('CR) ''Scott French: [C:''+1] "Thanks, Moritz!" [puppet] - ''https://gerrit.wikimedia.org/r/978615 (owner: ''Muehlenhoff)'
2026-01-15 14:59:08 <phuedx> The SDKs are available as expected
2026-01-15 14:59:57 <Lucas_WMDE> I’m going afk, I hope everything goes fine with the rest of the window
2026-01-15 15:00:25 <icinga-wm> RECOVERY - Host an-worker1159 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
2026-01-15 15:00:47 <icinga-wm> PROBLEM - SSH on an-worker1159 is CRITICAL: connect to address 10.64.153.4 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring
2026-01-15 15:01:06 <phuedx> Continuing with sync
2026-01-15 15:01:15 <logmsgbot> !log phuedx@deploy2002 cjming, phuedx: Continuing with sync
2026-01-15 15:02:15 <wikibugs> ('CR) ''CDanis: [V:''+1 C:''+2] "https://puppet-compiler.wmflabs.org/output/1215693/5634/"; [puppet] - ''https://gerrit.wikimedia.org/r/1215693 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:05:16 <logmsgbot> !log phuedx@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227004|Enable Test Kitchen on all prod wikis (T407806)]] (duration: 13m 46s)
2026-01-15 15:05:18 <logmsgbot> !log cdanis@cumin1003 START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6003.drmrs.wmnet} and A:liberica
2026-01-15 15:05:20 <stashbot> T407806: Rename Metrics Platform Extension to Test Kitchen - https://phabricator.wikimedia.org/T407806
2026-01-15 15:05:37 <logmsgbot> !log cdanis@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6003.drmrs.wmnet} and A:liberica
2026-01-15 15:06:03 <logmsgbot> !log cdanis@cumin1003 START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6001.drmrs.wmnet} and A:liberica
2026-01-15 15:06:23 <logmsgbot> !log cdanis@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6001.drmrs.wmnet} and A:liberica
2026-01-15 15:06:30 <jinxer-wm> FIRING: LibericaStaleConfig: Liberica instance lvs6003 is running a stale configuration - https://wikitech.wikimedia.org/wiki/Liberica#LibericaStaleConfig - https://grafana.wikimedia.org/d/fa4de97a-7114-48c7-a91a-f56089ef554f/liberica?orgId=1&viewPanel=10&var-site=drmrs&var-instance=lvs6003 - https://alerts.wikimedia.org/?q=alertname%3DLibericaStaleConfig
2026-01-15 15:06:40 <cdanis> lol
2026-01-15 15:07:15 <taavi> hey, at least the alerting works!
2026-01-15 15:07:36 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P87551 and previous config saved to /var/cache/conftool/dbconfig/20260115-150735-marostegui.json
2026-01-15 15:09:06 <wikibugs> ('CR) ''Kevin Bazira: Add vLLM image in ML namespace (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1146891 (https://phabricator.wikimedia.org/T385173) (owner: ''Kevin Bazira)'
2026-01-15 15:09:11 <jinxer-wm> FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 15:11:30 <jinxer-wm> RESOLVED: LibericaStaleConfig: Liberica instance lvs6003 is running a stale configuration - https://wikitech.wikimedia.org/wiki/Liberica#LibericaStaleConfig - https://grafana.wikimedia.org/d/fa4de97a-7114-48c7-a91a-f56089ef554f/liberica?orgId=1&viewPanel=10&var-site=drmrs&var-instance=lvs6003 - https://alerts.wikimedia.org/?q=alertname%3DLibericaStaleConfig
2026-01-15 15:11:33 <wikibugs> ('PS1) ''DCausse: search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351'
2026-01-15 15:12:03 <jinxer-wm> FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
2026-01-15 15:13:14 <wikibugs> ('CR) ''CI reject: [V:''-1] search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351 (owner: ''DCausse)'
2026-01-15 15:13:24 <wikibugs> ('CR) ''Cwhite: [C:''+1] Remove profile::puppet::agent::force_puppet7 from search roles [puppet] - ''https://gerrit.wikimedia.org/r/1227270 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 15:13:59 <wikibugs> ('CR) ''Cwhite: [C:''+1] Remove profile::puppet::agent::force_puppet7 from Data Platform roles [puppet] - ''https://gerrit.wikimedia.org/r/1227313 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 15:14:56 <wikibugs> ('CR) ''Elukey: [C:''+1] "LGTM for a test" [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1146891 (https://phabricator.wikimedia.org/T385173) (owner: ''Kevin Bazira)'
2026-01-15 15:15:01 <wikibugs> ('PS1) ''Ayounsi: Routed ganeti: move v6_prefixes to Hiera [puppet] - ''https://gerrit.wikimedia.org/r/1227352 (https://phabricator.wikimedia.org/T410314)'
2026-01-15 15:16:53 <icinga-wm> PROBLEM - Host an-worker1159 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 15:17:06 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from search roles [puppet] - ''https://gerrit.wikimedia.org/r/1227270 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 15:17:40 <wikibugs> ('PS2) ''Ayounsi: Routed ganeti: move v6_prefixes to Hiera [puppet] - ''https://gerrit.wikimedia.org/r/1227352 (https://phabricator.wikimedia.org/T410314)'
2026-01-15 15:17:44 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P87552 and previous config saved to /var/cache/conftool/dbconfig/20260115-151744-marostegui.json
2026-01-15 15:17:50 <wikibugs> ('CR) ''Ayounsi: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227352 (https://phabricator.wikimedia.org/T410314) (owner: ''Ayounsi)'
2026-01-15 15:21:50 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove profile::puppet::agent::force_puppet7 from Data Platform roles [puppet] - ''https://gerrit.wikimedia.org/r/1227313 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 15:21:55 <icinga-wm> RECOVERY - Host an-worker1159 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
2026-01-15 15:22:27 <wikibugs> ('PS2) ''DCausse: search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351'
2026-01-15 15:23:01 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1262 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87553 and previous config saved to /var/cache/conftool/dbconfig/20260115-152301-marostegui.json
2026-01-15 15:23:07 <stashbot> T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163
2026-01-15 15:23:07 <stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164
2026-01-15 15:24:10 <wikibugs> ('CR) ''CI reject: [V:''-1] search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351 (owner: ''DCausse)'
2026-01-15 15:25:59 <wikibugs> ('PS3) ''DCausse: search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351'
2026-01-15 15:27:53 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T413525)', diff saved to https://phabricator.wikimedia.org/P87554 and previous config saved to /var/cache/conftool/dbconfig/20260115-152752-marostegui.json
2026-01-15 15:27:57 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 15:28:09 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
2026-01-15 15:28:16 <wikibugs> ('PS1) ''CDanis: Liberica/gerrit: 🌍‼️ 🎊 [puppet] - ''https://gerrit.wikimedia.org/r/1227356 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 15:28:17 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1199 (T413525)', diff saved to https://phabricator.wikimedia.org/P87555 and previous config saved to /var/cache/conftool/dbconfig/20260115-152817-marostegui.json
2026-01-15 15:28:19 <icinga-wm> PROBLEM - Host an-worker1159 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 15:28:53 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227356 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:29:49 <icinga-wm> RECOVERY - SSH on an-worker1159 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
2026-01-15 15:29:51 <icinga-wm> RECOVERY - Host an-worker1159 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
2026-01-15 15:30:05 <jouncebot> Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1530)
2026-01-15 15:32:51 <wikibugs> ('CR) ''Vgutierrez: [C:''+1] Liberica/gerrit: 🌍‼️ 🎊 [puppet] - ''https://gerrit.wikimedia.org/r/1227356 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:33:10 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P87556 and previous config saved to /var/cache/conftool/dbconfig/20260115-153309-marostegui.json
2026-01-15 15:33:49 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
2026-01-15 15:34:11 <jinxer-wm> RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 15:35:31 <wikibugs> ('PS4) ''Ayounsi: Routed ganeti: move v6_prefixes to Hiera [puppet] - ''https://gerrit.wikimedia.org/r/1227352 (https://phabricator.wikimedia.org/T410314)'
2026-01-15 15:36:05 <wikibugs> ('CR) ''CDanis: [C:''+2] Liberica/gerrit: 🌍‼️ 🎊 [puppet] - ''https://gerrit.wikimedia.org/r/1227356 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:36:24 <wikibugs> ('CR) ''CDanis: [C:''+2] gerrit/Liberica: eqsin [puppet] - ''https://gerrit.wikimedia.org/r/1227348 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:39:42 <logmsgbot> cmooney@cumin1003 netbox (PID 1669792) is awaiting input
2026-01-15 15:41:50 <logmsgbot> !log cdanis@cumin1003 START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4*} and A:liberica
2026-01-15 15:42:30 <jinxer-wm> FIRING: [6x] LibericaStaleConfig: Liberica instance lvs3008 is running a stale configuration - https://wikitech.wikimedia.org/wiki/Liberica#LibericaStaleConfig - https://alerts.wikimedia.org/?q=alertname%3DLibericaStaleConfig
2026-01-15 15:43:12 <wikibugs> ('PS1) ''Sbisson: Fallback to source title if target title is not provided by cxserver [extensions/ContentTranslation] (wmf/1.46.0-wmf.10) - ''https://gerrit.wikimedia.org/r/1227361 (https://phabricator.wikimedia.org/T414558)'
2026-01-15 15:43:14 <dancy> jouncebot nowandnext
2026-01-15 15:43:14 <jouncebot> For the next 0 hour(s) and 16 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1530)
2026-01-15 15:43:14 <jouncebot> In 1 hour(s) and 16 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1700)
2026-01-15 15:43:18 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P87557 and previous config saved to /var/cache/conftool/dbconfig/20260115-154317-marostegui.json
2026-01-15 15:43:34 <logmsgbot> !log cdanis@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4*} and A:liberica
2026-01-15 15:43:36 <logmsgbot> !log dancy@deploy2002 Installing scap version "4.232.0" for 2 host(s)
2026-01-15 15:43:51 <wikibugs> ('Abandoned) ''Sbisson: Fallback to source title if target title is not provided by cxserver [extensions/ContentTranslation] (wmf/1.46.0-wmf.10) - ''https://gerrit.wikimedia.org/r/1227361 (https://phabricator.wikimedia.org/T414558) (owner: ''Sbisson)'
2026-01-15 15:43:51 <logmsgbot> !log cdanis@cumin1003 START - Cookbook sre.loadbalancer.admin config_reloading P{lvs5*} and A:liberica
2026-01-15 15:44:04 <vgutierrez> cdanis: poor high-traffic2 lvs reloading config for a NOOP ;P
2026-01-15 15:44:34 <cdanis> I love all my liberica children
2026-01-15 15:45:27 <logmsgbot> !log dancy@deploy2002 Installation of scap version "4.232.0" completed for 2 hosts
2026-01-15 15:45:46 <logmsgbot> !log cdanis@cumin1003 START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3*} and A:liberica
2026-01-15 15:45:54 <logmsgbot> !log cdanis@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs5*} and A:liberica
2026-01-15 15:47:30 <jinxer-wm> FIRING: [6x] LibericaStaleConfig: Liberica instance lvs3008 is running a stale configuration - https://wikitech.wikimedia.org/wiki/Liberica#LibericaStaleConfig - https://alerts.wikimedia.org/?q=alertname%3DLibericaStaleConfig
2026-01-15 15:47:41 <logmsgbot> !log cdanis@cumin1003 END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3*} and A:liberica
2026-01-15 15:47:53 <vgutierrez> some timing issue :)
2026-01-15 15:48:16 <wikibugs> ('CR) ''Bking: [C:''+2] search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351 (owner: ''DCausse)'
2026-01-15 15:50:04 <wikibugs> ('Merged) ''jenkins-bot: search: pull wme secrets out of the connections array [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227351 (owner: ''DCausse)'
2026-01-15 15:51:10 <wikibugs> ('PS1) ''CDanis: LVS/gerrit: eqiad [puppet] - ''https://gerrit.wikimedia.org/r/1227363 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 15:51:32 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227363 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 15:52:00 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T413525)', diff saved to https://phabricator.wikimedia.org/P87558 and previous config saved to /var/cache/conftool/dbconfig/20260115-155159-marostegui.json
2026-01-15 15:52:04 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 15:52:20 <wikibugs> ('PS1) ''Trueg: blazegraph: alert on ratio of failed queries increase [alerts] - ''https://gerrit.wikimedia.org/r/1227364 (https://phabricator.wikimedia.org/T414306)'
2026-01-15 15:52:30 <jinxer-wm> RESOLVED: [6x] LibericaStaleConfig: Liberica instance lvs3008 is running a stale configuration - https://wikitech.wikimedia.org/wiki/Liberica#LibericaStaleConfig - https://alerts.wikimedia.org/?q=alertname%3DLibericaStaleConfig
2026-01-15 15:53:27 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1262 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87560 and previous config saved to /var/cache/conftool/dbconfig/20260115-155326-marostegui.json
2026-01-15 15:53:33 <stashbot> T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163
2026-01-15 15:53:34 <stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164
2026-01-15 15:53:43 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1263.eqiad.wmnet with reason: Maintenance
2026-01-15 15:53:52 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1263 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87561 and previous config saved to /var/cache/conftool/dbconfig/20260115-155351-marostegui.json
2026-01-15 15:57:23 <wikibugs> ('CR) ''CDanis: [V:''+1] "https://puppet-compiler.wmflabs.org/output/1227363/5639/"; [puppet] - ''https://gerrit.wikimedia.org/r/1227363 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 16:01:50 <wikibugs> 'SRE, ''Release-Engineering-Team, ''Scap, ''serviceops, ''Datacenter-Switchover: Add scap lock/unlock steps to sre.switchdc.mediawiki cookbook - https://phabricator.wikimedia.org/T330996#11525496 (''dancy)'
2026-01-15 16:02:08 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P87562 and previous config saved to /var/cache/conftool/dbconfig/20260115-160208-marostegui.json
2026-01-15 16:03:20 <wikibugs> 'SRE, ''Release-Engineering-Team, ''Scap, ''serviceops, ''Datacenter-Switchover: Add scap lock/unlock steps to sre.switchdc.mediawiki cookbook - https://phabricator.wikimedia.org/T330996#11525507 (''dancy) @Blake I've installed a new release of scap on the deploy servers. You can now use `scap lock --a...'
2026-01-15 16:03:49 <wikibugs> ('CR) ''Vgutierrez: [C:''+1] LVS/gerrit: eqiad [puppet] - ''https://gerrit.wikimedia.org/r/1227363 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 16:04:07 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227346 (https://phabricator.wikimedia.org/T403982) (owner: ''Jsn.sherman)'
2026-01-15 16:06:15 <logmsgbot> !log dcausse@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
2026-01-15 16:07:37 <icinga-wm> PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 16:07:42 <logmsgbot> !log dcausse@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
2026-01-15 16:09:11 <jinxer-wm> FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-01-15 16:11:31 <icinga-wm> RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 10.77 ms
2026-01-15 16:12:03 <jinxer-wm> RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
2026-01-15 16:12:17 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P87563 and previous config saved to /var/cache/conftool/dbconfig/20260115-161216-marostegui.json
2026-01-15 16:14:11 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-01-15 16:14:11 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 16:22:25 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T413525)', diff saved to https://phabricator.wikimedia.org/P87564 and previous config saved to /var/cache/conftool/dbconfig/20260115-162224-marostegui.json
2026-01-15 16:22:29 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 16:22:41 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
2026-01-15 16:22:48 <wikibugs> ('CR) ''Trueg: "To start the discussion: I think 1.0 is way too high as a threshold." [alerts] - ''https://gerrit.wikimedia.org/r/1227364 (https://phabricator.wikimedia.org/T414306) (owner: ''Trueg)'
2026-01-15 16:22:49 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2219 (T413525)', diff saved to https://phabricator.wikimedia.org/P87565 and previous config saved to /var/cache/conftool/dbconfig/20260115-162249-marostegui.json
2026-01-15 16:24:19 <wikibugs> ('CR) ''Majavah: [C:''-1] "-1 for the prometheus_nodes issue specifically, but in general I'm not a huge fan of this as it relies on the realm global and in general " [puppet] - ''https://gerrit.wikimedia.org/r/1226944 (https://phabricator.wikimedia.org/T411089) (owner: ''JHathaway)'
2026-01-15 16:24:36 <wikibugs> ('CR) ''Ssingh: [C:''+1] "Sorry, my bad." [puppet] - ''https://gerrit.wikimedia.org/r/1225524 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-01-15 16:24:41 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Platform-SRE: Requesting deployment access for AKhatun - https://phabricator.wikimedia.org/T414347#11525622 (''AKhatun_WMF) I also don't have access to `ssh an-launcher1003.eqiad.wmnet`. I get a permission denied. Is this related? Are we waiting for another approval (fro...'
2026-01-15 16:27:24 <wikibugs> ('CR) ''Gmodena: "Nice!" [alerts] - ''https://gerrit.wikimedia.org/r/1227364 (https://phabricator.wikimedia.org/T414306) (owner: ''Trueg)'
2026-01-15 16:33:48 <wikibugs> ('PS1) ''Vgutierrez: varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373'
2026-01-15 16:34:24 <wikibugs> ('CR) ''CI reject: [V:''-1] varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373 (owner: ''Vgutierrez)'
2026-01-15 16:35:15 <wikibugs> ('PS2) ''Vgutierrez: varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373'
2026-01-15 16:42:20 <wikibugs> ('PS3) ''Vgutierrez: varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373'
2026-01-15 16:43:06 <wikibugs> ('CR) ''Vgutierrez: [V:''+1] "VTC is happy: # top TEST /wikimedia/varnish/text/55-vary-xee.vtc passed (3.024)" [puppet] - ''https://gerrit.wikimedia.org/r/1227373 (owner: ''Vgutierrez)'
2026-01-15 16:45:28 <hnowlan> jouncebot: nowandnext
2026-01-15 16:45:28 <jouncebot> No deployments scheduled for the next 0 hour(s) and 14 minute(s)
2026-01-15 16:45:28 <jouncebot> In 0 hour(s) and 14 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1700)
2026-01-15 16:48:21 <wikibugs> ('CR) ''Hnowlan: [C:''+2] thumbor: reimplement SVG max size feature [deployment-charts] - ''https://gerrit.wikimedia.org/r/1226286 (https://phabricator.wikimedia.org/T411076) (owner: ''Hnowlan)'
2026-01-15 16:48:50 <wikibugs> ('CR) ''Hnowlan: thumbor: reimplement SVG max size feature [deployment-charts] - ''https://gerrit.wikimedia.org/r/1226286 (https://phabricator.wikimedia.org/T411076) (owner: ''Hnowlan)'
2026-01-15 16:51:11 <wikibugs> ('PS2) ''Hnowlan: thumbor: reimplement SVG max size feature [deployment-charts] - ''https://gerrit.wikimedia.org/r/1226286 (https://phabricator.wikimedia.org/T411076)'
2026-01-15 16:52:00 <wikibugs> ('CR) ''CDanis: [C:''+1] varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373 (owner: ''Vgutierrez)'
2026-01-15 16:52:12 <wikibugs> ('CR) ''Trueg: "thresholds are indeed way too high." [alerts] - ''https://gerrit.wikimedia.org/r/1227364 (https://phabricator.wikimedia.org/T414306) (owner: ''Trueg)'
2026-01-15 16:54:05 <wikibugs> ('CR) ''Scott French: [C:''+1] varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373 (owner: ''Vgutierrez)'
2026-01-15 16:54:14 <wikibugs> ('CR) ''Vgutierrez: [V:''+1 C:''+2] varnish: Drop leading commas when X-E-E is present on Vary [puppet] - ''https://gerrit.wikimedia.org/r/1227373 (owner: ''Vgutierrez)'
2026-01-15 16:54:20 <wikibugs> ('PS1) ''Bking: java: create openjdk-21 image [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695)'
2026-01-15 16:55:42 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns entries for arelion link ips - cmooney@cumin1003"
2026-01-15 16:55:49 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns entries for arelion link ips - cmooney@cumin1003"
2026-01-15 16:55:49 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-01-15 17:00:05 <jouncebot> jhathaway and rzl: It is that lovely time of the day again! You are hereby commanded to deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1700).
2026-01-15 17:00:05 <jouncebot> No Gerrit patches in the queue for this window AFAICS.
2026-01-15 17:02:10 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕛☕ sudo cumin 'A:lvs-eqiad' 'disable-puppet T411895'
2026-01-15 17:02:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:02:14 <stashbot> T411895: gerrit behind CDN - https://phabricator.wikimedia.org/T411895
2026-01-15 17:02:52 <wikibugs> ('CR) ''CDanis: [V:''+1 C:''+2] LVS/gerrit: eqiad [puppet] - ''https://gerrit.wikimedia.org/r/1227363 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 17:03:40 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 17:04:10 <cdanis> lol
2026-01-15 17:05:03 <mutante> grmbl
2026-01-15 17:06:51 <mutante> wanted to silence/ACK it but already gone?
2026-01-15 17:08:38 <mutante> removing unit file and resetting state in a moment
2026-01-15 17:09:09 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕛☕ sudo cumin A:lvs-secondary-eqiad 'systemctl restart pybal.service'
2026-01-15 17:09:10 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:10:35 <mutante> !log [cumin2002:~] $ sudo cumin -b 15 'tcp-proxy*' 'rm /lib/systemd/system/prometheus-node-textfile-check-nft*'
2026-01-15 17:10:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:11:07 <mutante> !log [cumin2002:~] $ sudo cumin -b 15 'tcp-proxy*' 'systemctl reset-failed'
2026-01-15 17:11:09 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:13:25 <jinxer-wm> RESOLVED: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 17:17:35 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕛☕ sudo cumin A:lvs-high-traffic1-eqiad 'systemctl restart pybal.service'
2026-01-15 17:17:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:22:39 <wikibugs> ('PS1) ''CDanis: LVS/gerrit: codfw [puppet] - ''https://gerrit.wikimedia.org/r/1227391 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 17:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 17:33:20 <wikibugs> 'ops-codfw, ''DC-Ops: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708 (''Jhancock.wm) ''NEW'
2026-01-15 17:34:19 <wikibugs> ('CR) ''CDanis: [C:''+2] LVS/gerrit: codfw [puppet] - ''https://gerrit.wikimedia.org/r/1227391 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 17:34:25 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy4001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 17:34:38 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin 'A:lvs-codfw' 'disable-puppet T411895'
2026-01-15 17:34:42 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:34:43 <stashbot> T411895: gerrit behind CDN - https://phabricator.wikimedia.org/T411895
2026-01-15 17:36:46 <wikibugs> 'ops-codfw, ''DC-Ops: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11525824 (''Jhancock.wm)'
2026-01-15 17:37:16 <wikibugs> 'ops-codfw, ''DC-Ops: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11525825 (''Jhancock.wm)'
2026-01-15 17:37:43 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin A:lvs-secondary-codfw 'systemctl restart pybal.service'
2026-01-15 17:37:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:39:25 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 17:41:53 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin A:lvs-high-traffic1-codfw 'systemctl restart pybal.service'
2026-01-15 17:41:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:44:59 <cdanis> !log 💙cdanis@cumin1003.eqiad.wmnet ~ 🕧☕ sudo cumin 'A:lvs-codfw or A:lvs-eqiad' 'enable-puppet T411895'
2026-01-15 17:45:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 17:45:04 <stashbot> T411895: gerrit behind CDN - https://phabricator.wikimedia.org/T411895
2026-01-15 17:56:13 <wikibugs> ('PS1) ''Milimetric: eventgate-analytics: increase instances to 30 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1227392 (https://phabricator.wikimedia.org/T411454)'
2026-01-15 18:00:05 <jouncebot> bd808: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1800).
2026-01-15 18:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1800)
2026-01-15 18:00:23 <wikibugs> ('PS1) ''CDanis: tunnelencabulator: Gerrit/CDN 🚀 [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 18:05:56 <wikibugs> ('PS2) ''Seawolf35gerrit: ukwiki: Add "changetags" to sysop user group. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227394'
2026-01-15 18:06:00 <wikibugs> ('CR) ''Ssingh: [C:''+1] "Strictly basing it on the additions to the existing code and modification for gerrit-cdn. I have not tested it so leave it to you :)" [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 18:06:25 <jinxer-wm> FIRING: SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 18:06:25 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227394 (owner: ''Seawolf35gerrit)'
2026-01-15 18:07:13 <bd808> Nothing to deploy in my window today
2026-01-15 18:07:46 <wikibugs> ('PS3) ''Seawolf35gerrit: ukwiki: Add "changetags" to sysop user group. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227394 (https://phabricator.wikimedia.org/T414277)'
2026-01-15 18:14:01 <wikibugs> ('CR) ''Btullis: java: create openjdk-21 image (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695) (owner: ''Bking)'
2026-01-15 18:23:11 <wikibugs> ('CR) ''Bking: java: create openjdk-21 image (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695) (owner: ''Bking)'
2026-01-15 18:35:38 <wikibugs> ('PS1) ''Bking: opensearch-ipoid: Add codfw to list of sites [puppet] - ''https://gerrit.wikimedia.org/r/1227406 (https://phabricator.wikimedia.org/T412447)'
2026-01-15 18:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 18:41:48 <wikibugs> ('CR) ''Bking: [C:''+2] opensearch-ipoid: Add codfw to list of sites [puppet] - ''https://gerrit.wikimedia.org/r/1227406 (https://phabricator.wikimedia.org/T412447) (owner: ''Bking)'
2026-01-15 18:44:41 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
2026-01-15 18:45:35 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Looks good and verified out of band" [puppet] - ''https://gerrit.wikimedia.org/r/1226922 (https://phabricator.wikimedia.org/T414619) (owner: ''Dduvall)'
2026-01-15 18:45:50 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] admin: Add new yubikey-ssh-fido keys for dduvall [puppet] - ''https://gerrit.wikimedia.org/r/1226922 (https://phabricator.wikimedia.org/T414619) (owner: ''Dduvall)'
2026-01-15 18:46:56 <logmsgbot> !log pt1979@cumin2002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
2026-01-15 18:47:33 <wikibugs> ('PS2) ''CDanis: tunnelencabulator: Gerrit/CDN 🚀 [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 18:49:16 <wikibugs> ('CR) ''Ssingh: [C:''+1] "Yes, fair enough :) [PS2-PS1 diff]" [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 18:49:48 <wikibugs> ('PS3) ''CDanis: tunnelencabulator: Gerrit/CDN 🚀 [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 18:50:03 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
2026-01-15 18:53:44 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new wikikube-worker nodes - pt1979@cumin2002"
2026-01-15 18:53:44 <wikibugs> ('CR) ''Ssingh: [C:''+1] tunnelencabulator: Gerrit/CDN 🚀 [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 18:53:49 <logmsgbot> !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new wikikube-worker nodes - pt1979@cumin2002"
2026-01-15 18:53:49 <logmsgbot> !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-01-15 18:55:48 <wikibugs> ('CR) ''CDanis: [V:''+2 C:''+2] tunnelencabulator: Gerrit/CDN 🚀 [debs/wmf-laptop] - ''https://gerrit.wikimedia.org/r/1227395 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 18:58:52 <wikibugs> ('CR) ''Ssingh: "@vgutierrez@wikimedia.org: We discussed this during the meeting and decided it was fine to merge. Can you stamp this please?" [puppet] - ''https://gerrit.wikimedia.org/r/1218817 (https://phabricator.wikimedia.org/T412863) (owner: ''Milimetric)'
2026-01-15 18:59:58 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Patch-For-Review: Requesting access to analytics-privatedata-users for kareid - https://phabricator.wikimedia.org/T413364#11526053 (''thcipriani) >>! In T413364#11521115, @JMeybohm wrote: > @thcipriani this needs sign-off from you as the approver for the...'
2026-01-15 19:00:05 <jouncebot> jeena and dduvall: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7 Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T1900).
2026-01-15 19:01:51 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Platform-SRE: Requesting deployment access for AKhatun - https://phabricator.wikimedia.org/T414347#11526060 (''thcipriani) >>! In T414347#11512705, @BTullis wrote: > We will need approval from @Ahoelzl as your manager and from @thcipriani as the approver for the `deployme...'
2026-01-15 19:04:22 <wikibugs> ('PS1) ''TrainBranchBot: group2 to 1.46.0-wmf.11 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227410 (https://phabricator.wikimedia.org/T413802)'
2026-01-15 19:04:25 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Initiated by jhuneidi@deploy2002" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227410 (https://phabricator.wikimedia.org/T413802) (owner: ''TrainBranchBot)'
2026-01-15 19:05:17 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup200[34] - https://phabricator.wikimedia.org/T414717 (''RobH) ''NEW'
2026-01-15 19:05:21 <wikibugs> ('Merged) ''jenkins-bot: group2 to 1.46.0-wmf.11 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227410 (https://phabricator.wikimedia.org/T413802) (owner: ''TrainBranchBot)'
2026-01-15 19:05:22 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup100[34] - https://phabricator.wikimedia.org/T414718 (''RobH) ''NEW'
2026-01-15 19:05:42 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup200[34] - https://phabricator.wikimedia.org/T414717#11526094 (''RobH)'
2026-01-15 19:08:39 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup100[34] - https://phabricator.wikimedia.org/T414718#11526107 (''RobH)'
2026-01-15 19:09:02 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup200[34] - https://phabricator.wikimedia.org/T414717#11526108 (''RobH) a:''jcrespo Jaime, I made assumptions on the racking details based on the existing ms-backup hosts. Please double-check the racking details in this task...'
2026-01-15 19:09:05 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup100[34] - https://phabricator.wikimedia.org/T414718#11526112 (''RobH) a:''jcrespo Jaime, I made assumptions on the racking details based on the existing ms-backup hosts. Please double-check the racking details in this task...'
2026-01-15 19:09:36 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup200[34] - https://phabricator.wikimedia.org/T414717#11526116 (''RobH)'
2026-01-15 19:09:37 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup200[34] - https://phabricator.wikimedia.org/T414717#11526117 (''jcrespo) Will do.'
2026-01-15 19:09:44 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install ms-backup100[34] - https://phabricator.wikimedia.org/T414718#11526118 (''RobH)'
2026-01-15 19:11:27 <logmsgbot> !log jhuneidi@deploy2002 rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.11 refs T413802
2026-01-15 19:11:32 <stashbot> T413802: 1.46.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T413802
2026-01-15 19:27:59 <wikibugs> ('PS3) ''A smart kitten: ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277)'
2026-01-15 19:28:54 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277) (owner: ''A smart kitten)'
2026-01-15 19:29:12 <wikibugs> ('CR) ''A smart kitten: "Did some testing locally, this approach seems like it should (hopefully) work :)" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277) (owner: ''A smart kitten)'
2026-01-15 19:30:08 <wikibugs> ('CR) ''A smart kitten: ukwiki: Various changes to user rights. (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1225596 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 19:30:32 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
2026-01-15 19:30:41 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2145 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87566 and previous config saved to /var/cache/conftool/dbconfig/20260115-193040-marostegui.json
2026-01-15 19:30:47 <stashbot> T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163
2026-01-15 19:30:47 <stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164
2026-01-15 19:31:04 <wikibugs> 'SRE, ''DNS, ''serviceops, ''Traffic, and 2 others: Set up DNS for abstract.wikipedia.org to be recognised - https://phabricator.wikimedia.org/T411724#11526184 (''ssingh) This is typically done as part of a new wiki creation process, but Traffic is happy to help as required.'
2026-01-15 19:43:36 <wikibugs> ('CR) ''Seawolf35gerrit: ukwiki: Various changes to user rights. (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1225596 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 19:48:52 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence-Backup, ''DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724 (''RobH) ''NEW'
2026-01-15 19:49:20 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence-Backup, ''DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11526283 (''RobH)'
2026-01-15 19:49:31 <logmsgbot> !log jasmine@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet
2026-01-15 19:50:21 <logmsgbot> !log jasmine@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet
2026-01-15 19:51:10 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence-Backup, ''DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11526289 (''RobH) a:''jcrespo Jaime, I had to split up the expansion and refresh budget lines for backup this quarter, so this racking task (and its parent order task) on...'
2026-01-15 19:51:14 <wikibugs> ('CR) ''Jasmine: [C:''+2] wikikube: decommission worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048] [puppet] - ''https://gerrit.wikimedia.org/r/1205225 (https://phabricator.wikimedia.org/T409102) (owner: ''Jasmine)'
2026-01-15 19:51:54 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T413525)', diff saved to https://phabricator.wikimedia.org/P87567 and previous config saved to /var/cache/conftool/dbconfig/20260115-195153-marostegui.json
2026-01-15 19:51:59 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 19:52:30 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11526301 (''RobH)'
2026-01-15 19:52:51 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T414725 (''RobH) ''NEW'
2026-01-15 19:53:21 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T414725#11526321 (''RobH) a:''jcrespo Jaime, I had to split up the expansion and refresh budget lines for backup this quarter, so this racking task (and its parent order task) only covers the li...'
2026-01-15 19:53:53 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T414725#11526331 (''RobH)'
2026-01-15 19:54:01 <jasmine_> !log “homer run T409102”
2026-01-15 19:54:05 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-01-15 19:54:05 <stashbot> T409102: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102
2026-01-15 19:56:15 <wikibugs> 'SRE, ''DNS, ''serviceops, ''Traffic, and 2 others: Set up DNS for abstract.wikipedia.org to be recognised - https://phabricator.wikimedia.org/T411724#11526339 (''Jdforrester-WMF) >>! In T411724#11526184, @ssingh wrote: > This is typically done as part of a new wiki creation process, but Traffic is happy...'
2026-01-15 20:02:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P87568 and previous config saved to /var/cache/conftool/dbconfig/20260115-200202-marostegui.json
2026-01-15 20:02:52 <wikibugs> ('PS1) ''CDanis: services: gerrit* --> monitoring_setup [puppet] - ''https://gerrit.wikimedia.org/r/1227423 (https://phabricator.wikimedia.org/T411895)'
2026-01-15 20:03:05 <wikibugs> ('CR) ''CDanis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1227423 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 20:05:17 <wikibugs> 'ops-codfw, ''DC-Ops, ''decommission-hardware, ''serviceops, ''Patch-For-Review: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102#11526351 (''jasmine_)'
2026-01-15 20:07:22 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T413525)', diff saved to https://phabricator.wikimedia.org/P87569 and previous config saved to /var/cache/conftool/dbconfig/20260115-200721-marostegui.json
2026-01-15 20:07:26 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 20:12:11 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P87570 and previous config saved to /var/cache/conftool/dbconfig/20260115-201210-marostegui.json
2026-01-15 20:12:53 <jinxer-wm> FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2026-01-15 20:14:11 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-01-15 20:17:30 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P87571 and previous config saved to /var/cache/conftool/dbconfig/20260115-201730-marostegui.json
2026-01-15 20:19:05 <icinga-wm> PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2026-01-15 20:19:55 <icinga-wm> RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2026-01-15 20:19:57 <wikibugs> ('PS4) ''A smart kitten: ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277)'
2026-01-15 20:20:56 <wikibugs> ('CR) ''A smart kitten: "PS4 is a rebase on top of https://gerrit.wikimedia.org/r/1227394, after I realised the two patches would probably have merge conflicts wit" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277) (owner: ''A smart kitten)'
2026-01-15 20:22:19 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T413525)', diff saved to https://phabricator.wikimedia.org/P87572 and previous config saved to /var/cache/conftool/dbconfig/20260115-202218-marostegui.json
2026-01-15 20:22:23 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 20:22:36 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
2026-01-15 20:22:53 <jinxer-wm> RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2026-01-15 20:22:57 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
2026-01-15 20:23:06 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1221 (T413525)', diff saved to https://phabricator.wikimedia.org/P87573 and previous config saved to /var/cache/conftool/dbconfig/20260115-202305-marostegui.json
2026-01-15 20:27:38 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P87574 and previous config saved to /var/cache/conftool/dbconfig/20260115-202738-marostegui.json
2026-01-15 20:31:35 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727 (''RobH) ''NEW'
2026-01-15 20:31:52 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11526457 (''RobH)'
2026-01-15 20:33:07 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup10[16-20] - https://phabricator.wikimedia.org/T414728 (''RobH) ''NEW'
2026-01-15 20:33:28 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup10[16-20] - https://phabricator.wikimedia.org/T414728#11526474 (''RobH)'
2026-01-15 20:35:46 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup1015 - https://phabricator.wikimedia.org/T414725#11526489 (''RobH)'
2026-01-15 20:36:35 <wikibugs> 'ops-codfw, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11526492 (''RobH) a:''jcrespo Jaime, I had to split up the expansion and refresh budget lines for backup this quarter, so this racking task (and its parent order task) only...'
2026-01-15 20:36:39 <wikibugs> 'ops-eqiad, ''SRE, ''Data-Persistence, ''DC-Ops: Q3:rack/setup/install backup10[16-20] - https://phabricator.wikimedia.org/T414728#11526498 (''RobH) a:''jcrespo Jaime, I had to split up the expansion and refresh budget lines for backup this quarter, so this racking task (and its parent order task) only...'
2026-01-15 20:37:47 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T413525)', diff saved to https://phabricator.wikimedia.org/P87575 and previous config saved to /var/cache/conftool/dbconfig/20260115-203746-marostegui.json
2026-01-15 20:37:51 <stashbot> T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525
2026-01-15 20:37:52 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
2026-01-15 20:38:00 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2236 (T413525)', diff saved to https://phabricator.wikimedia.org/P87576 and previous config saved to /var/cache/conftool/dbconfig/20260115-203759-marostegui.json
2026-01-15 20:39:07 <wikibugs> ('PS1) ''Jasmine: wikikube: decommission wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [puppet] - ''https://gerrit.wikimedia.org/r/1227431 (https://phabricator.wikimedia.org/T409103)'
2026-01-15 20:39:38 <wikibugs> ('CR) ''CI reject: [V:''-1] wikikube: decommission wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [puppet] - ''https://gerrit.wikimedia.org/r/1227431 (https://phabricator.wikimedia.org/T409103) (owner: ''Jasmine)'
2026-01-15 20:41:10 <wikibugs> ('PS2) ''Jasmine: wikikube: decommission worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [puppet] - ''https://gerrit.wikimedia.org/r/1227431 (https://phabricator.wikimedia.org/T409103)'
2026-01-15 20:44:47 <wikibugs> ('CR) ''Andrew Bogott: [C:''+2] Revert "wmcs cinder backups: move all backups to 2003 so 2004 can be reimaged" [puppet] - ''https://gerrit.wikimedia.org/r/1226952 (owner: ''Andrew Bogott)'
2026-01-15 20:59:10 <wikibugs> ('PS1) ''Clare Ming: Update experiment code for JS, PHP SDKs testing of TK [extensions/TestKitchen] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227435 (https://phabricator.wikimedia.org/T414528)'
2026-01-15 20:59:30 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 15 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [extensions/TestKitchen] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227435 (https://phabricator.wikimedia.org/T414528) (owner: ''Clare Ming)'
2026-01-15 21:00:05 <jouncebot> RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC late backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T2100).
2026-01-15 21:00:05 <jouncebot> xSavitar, katherine_g, Seawolf35, A_smart_kitten, and cjming: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-01-15 21:00:16 <A_smart_kitten> heya, i'm here :)
2026-01-15 21:00:17 <katherine_g> o/
2026-01-15 21:00:20 <xSavitar> o/
2026-01-15 21:00:29 <Seawolf35> o/
2026-01-15 21:01:05 <xSavitar> I can self-service my backports then deployers/others can carry one 🙏🏽
2026-01-15 21:01:21 <xSavitar> *on
2026-01-15 21:01:23 <jeena> I can help with backporting if needed
2026-01-15 21:01:29 <A_smart_kitten> I will need a deployer
2026-01-15 21:01:36 <Seawolf35> Me as well
2026-01-15 21:01:39 <xSavitar> jeena, I'll poke you once I'm done.
2026-01-15 21:01:58 <jeena> xSavitar: 👍 Thank you!
2026-01-15 21:03:12 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by derick@deploy2002 using scap backport" [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227281 (owner: ''D3r1ck01)'
2026-01-15 21:03:12 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by derick@deploy2002 using scap backport" [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227282 (https://phabricator.wikimedia.org/T413947) (owner: ''D3r1ck01)'
2026-01-15 21:04:01 <cjming> also happy to deploy if needed - will self-service when it's my turn
2026-01-15 21:05:25 <wikibugs> ('CR) ''Gmodena: blazegraph: alert on ratio of failed queries increase (''1 comment) [alerts] - ''https://gerrit.wikimedia.org/r/1227364 (https://phabricator.wikimedia.org/T414306) (owner: ''Trueg)'
2026-01-15 21:06:18 <wikibugs> ('Merged) ''jenkins-bot: Control: When saving grants, ensure array has no gaps [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227281 (owner: ''D3r1ck01)'
2026-01-15 21:06:18 <wikibugs> ('Merged) ''jenkins-bot: Control: Keep irrevocable grants when accepting new OAuth 2 consumers [extensions/OAuth] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227282 (https://phabricator.wikimedia.org/T413947) (owner: ''D3r1ck01)'
2026-01-15 21:06:39 <logmsgbot> !log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1227281|Control: When saving grants, ensure array has no gaps]], [[gerrit:1227282|Control: Keep irrevocable grants when accepting new OAuth 2 consumers (T413947)]]
2026-01-15 21:06:43 <stashbot> T413947: Updating grants (via Special:OAuthManageMyGrants) of OAuth accepted consumers overrides its grants with empty array - https://phabricator.wikimedia.org/T413947
2026-01-15 21:08:36 <logmsgbot> !log derick@deploy2002 derick, d3r1ck01: Backport for [[gerrit:1227281|Control: When saving grants, ensure array has no gaps]], [[gerrit:1227282|Control: Keep irrevocable grants when accepting new OAuth 2 consumers (T413947)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 21:09:07 <xSavitar> testing...
2026-01-15 21:09:54 <jinxer-wm> FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2026-01-15 21:11:58 <xSavitar> Things look good and issues seem to have been resolved. Syncing
2026-01-15 21:12:09 <logmsgbot> !log derick@deploy2002 derick, d3r1ck01: Continuing with sync
2026-01-15 21:14:53 <jinxer-wm> RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2026-01-15 21:16:20 <logmsgbot> !log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227281|Control: When saving grants, ensure array has no gaps]], [[gerrit:1227282|Control: Keep irrevocable grants when accepting new OAuth 2 consumers (T413947)]] (duration: 09m 41s)
2026-01-15 21:16:24 <stashbot> T413947: Updating grants (via Special:OAuthManageMyGrants) of OAuth accepted consumers overrides its grants with empty array - https://phabricator.wikimedia.org/T413947
2026-01-15 21:16:27 <Seawolf35> I’ll be afk for 5 min or so, I’ll be back in time for my patch.
2026-01-15 21:16:47 <xSavitar> respectfully yours, jeena / cjming, over to you 🙏🏽
2026-01-15 21:17:01 <xSavitar> I'm done!
2026-01-15 21:17:57 <jeena> okay well I was going to see if I could do A_smart_kitten and Seawolf35 's one together
2026-01-15 21:18:21 <jeena> but do you want to go ahead first cjming ?
2026-01-15 21:18:22 <A_smart_kitten> jeena: that's actually what i was thinking myself as well (so long as Seawolf35 is okay with it)
2026-01-15 21:18:30 <cjming> bows to jeena
2026-01-15 21:18:32 <jeena> yeah they just left the channel
2026-01-15 21:18:41 <A_smart_kitten> yeah, as they're AFK right now maybe cjming or katherine_g could go first?
2026-01-15 21:18:50 <cjming> jeena: go ahead - i have to fiddle with something first
2026-01-15 21:19:01 <jeena> okay I'll do yours katherine_g
2026-01-15 21:19:07 <katherine_g> ok, ty
2026-01-15 21:19:47 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by jhuneidi@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227346 (https://phabricator.wikimedia.org/T403982) (owner: ''Jsn.sherman)'
2026-01-15 21:20:40 <wikibugs> ('Merged) ''jenkins-bot: Deploy PersonalDashboard to testwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227346 (https://phabricator.wikimedia.org/T403982) (owner: ''Jsn.sherman)'
2026-01-15 21:20:59 <logmsgbot> !log jhuneidi@deploy2002 Started scap sync-world: Backport for [[gerrit:1227346|Deploy PersonalDashboard to testwiki (T403982)]]
2026-01-15 21:21:04 <stashbot> T403982: Create and deploy Extension:PersonalDashboard - https://phabricator.wikimedia.org/T403982
2026-01-15 21:21:57 <wikibugs> ('Abandoned) ''Arlolra: Support incremental roll out of Parsoid Read Views [extensions/ParserMigration] (wmf/1.46.0-wmf.7) - ''https://gerrit.wikimedia.org/r/1224837 (https://phabricator.wikimedia.org/T391881) (owner: ''Arlolra)'
2026-01-15 21:22:59 <logmsgbot> !log jhuneidi@deploy2002 jhuneidi, jsn: Backport for [[gerrit:1227346|Deploy PersonalDashboard to testwiki (T403982)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 21:23:06 <Seawolf35> Back
2026-01-15 21:24:11 <jinxer-wm> FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
2026-01-15 21:25:45 <jeena> katherine_g: do you need to check anything on the testservers?
2026-01-15 21:25:50 <katherine_g> alright, tested and it looks good on my end
2026-01-15 21:25:57 <katherine_g> jeena: ty
2026-01-15 21:25:59 <jeena> cool thanks!
2026-01-15 21:26:06 <logmsgbot> !log jhuneidi@deploy2002 jhuneidi, jsn: Continuing with sync
2026-01-15 21:26:09 <wikibugs> ('CR) ''Ssingh: services: gerrit* --> monitoring_setup (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1227423 (https://phabricator.wikimedia.org/T411895) (owner: ''CDanis)'
2026-01-15 21:27:26 <jeena> Seawolf35: we were wondering if your change can be deployed with A_smart_kitten 's together?
2026-01-15 21:27:36 <Seawolf35> Sure
2026-01-15 21:27:45 <A_smart_kitten> :)
2026-01-15 21:30:17 <logmsgbot> !log jhuneidi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227346|Deploy PersonalDashboard to testwiki (T403982)]] (duration: 09m 18s)
2026-01-15 21:30:21 <stashbot> T403982: Create and deploy Extension:PersonalDashboard - https://phabricator.wikimedia.org/T403982
2026-01-15 21:31:14 <jeena> cjming: I'm going to go ahead and do the remaining two now
2026-01-15 21:31:28 <cjming> jeena: great - thanks!
2026-01-15 21:32:40 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by jhuneidi@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277) (owner: ''A smart kitten)'
2026-01-15 21:32:40 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by jhuneidi@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227394 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 21:33:33 <wikibugs> ('Merged) ''jenkins-bot: ukwiki: Add "changetags" to sysop user group. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227394 (https://phabricator.wikimedia.org/T414277) (owner: ''Seawolf35gerrit)'
2026-01-15 21:33:35 <wikibugs> ('Merged) ''jenkins-bot: ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227385 (https://phabricator.wikimedia.org/T414277) (owner: ''A smart kitten)'
2026-01-15 21:33:53 <logmsgbot> !log jhuneidi@deploy2002 Started scap sync-world: Backport for [[gerrit:1227385|ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php (T414277 T414684)]], [[gerrit:1227394|ukwiki: Add "changetags" to sysop user group. (T414277)]]
2026-01-15 21:33:59 <stashbot> T414277: Some changes in user group rights in ukwiki - https://phabricator.wikimedia.org/T414277
2026-01-15 21:33:59 <stashbot> T414684: FlaggedRevs-specific group rights from core-Permissions.php get overridden by flaggedrevs.php - https://phabricator.wikimedia.org/T414684
2026-01-15 21:35:50 <logmsgbot> !log jhuneidi@deploy2002 asmartkitten, seawolf35gerrit, jhuneidi: Backport for [[gerrit:1227385|ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php (T414277 T414684)]], [[gerrit:1227394|ukwiki: Add "changetags" to sysop user group. (T414277)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 21:36:09 <A_smart_kitten> looking (cc Seawolf35)
2026-01-15 21:36:12 <Seawolf35> Testing
2026-01-15 21:37:06 <A_smart_kitten> my patch looks good AFAICS :]
2026-01-15 21:37:20 <Seawolf35> Mine lgtm
2026-01-15 21:37:28 <jeena> thanks!
2026-01-15 21:37:36 <logmsgbot> !log jhuneidi@deploy2002 asmartkitten, seawolf35gerrit, jhuneidi: Continuing with sync
2026-01-15 21:39:40 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 21:41:39 <logmsgbot> !log jhuneidi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227385|ukwiki: Move assignments of FlaggedRevs permissions to flaggedrevs.php (T414277 T414684)]], [[gerrit:1227394|ukwiki: Add "changetags" to sysop user group. (T414277)]] (duration: 07m 46s)
2026-01-15 21:41:45 <stashbot> T414277: Some changes in user group rights in ukwiki - https://phabricator.wikimedia.org/T414277
2026-01-15 21:41:45 <stashbot> T414684: FlaggedRevs-specific group rights from core-Permissions.php get overridden by flaggedrevs.php - https://phabricator.wikimedia.org/T414684
2026-01-15 21:42:01 <A_smart_kitten> jeena: thank you for deploying!
2026-01-15 21:42:04 <jeena> cjming: ready for you
2026-01-15 21:42:10 <jeena> A_smart_kitten: yw!
2026-01-15 21:42:10 <cjming> tysm!
2026-01-15 21:42:20 <Seawolf35> jeena ty!
2026-01-15 21:42:27 <jeena> yw!
2026-01-15 21:43:16 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by cjming@deploy2002 using scap backport" [extensions/TestKitchen] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227435 (https://phabricator.wikimedia.org/T414528) (owner: ''Clare Ming)'
2026-01-15 21:43:37 <icinga-wm> PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100%
2026-01-15 21:43:55 <icinga-wm> RECOVERY - Host titan1002 is UP: PING WARNING - Packet loss = 33%, RTA = 1653.67 ms
2026-01-15 21:44:11 <jinxer-wm> FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-01-15 21:45:08 <jinxer-wm> FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-01-15 21:47:17 <wikibugs> 'SRE, ''Release Pipeline, ''serviceops, ''Release-Engineering-Team (Seen): Kask functional testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041#11526839 (''Eevans)'
2026-01-15 21:49:11 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-01-15 21:52:45 <wikibugs> ('Merged) ''jenkins-bot: Update experiment code for JS, PHP SDKs testing of TK [extensions/TestKitchen] (wmf/1.46.0-wmf.11) - ''https://gerrit.wikimedia.org/r/1227435 (https://phabricator.wikimedia.org/T414528) (owner: ''Clare Ming)'
2026-01-15 21:53:06 <logmsgbot> !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1227435|Update experiment code for JS, PHP SDKs testing of TK (T414528 T414530)]]
2026-01-15 21:53:12 <wikibugs> 'SRE-OnFire, ''Cassandra, ''MediaWiki-Platform-Team (Radar), ''Sustainability (Incident Followup): Provision anonymous session storage - https://phabricator.wikimedia.org/T408935#11526882 (''Eevans)'
2026-01-15 21:53:13 <stashbot> T414528: Run synthetic experiment using Javascript SDK in Test Kitchen - https://phabricator.wikimedia.org/T414528
2026-01-15 21:53:13 <stashbot> T414530: Run synthetic experiment using PHP SDK in Test Kitchen - https://phabricator.wikimedia.org/T414530
2026-01-15 21:55:07 <logmsgbot> !log cjming@deploy2002 cjming: Backport for [[gerrit:1227435|Update experiment code for JS, PHP SDKs testing of TK (T414528 T414530)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-01-15 21:56:07 <cjming> checking
2026-01-15 21:58:18 <cjming> syncing
2026-01-15 21:58:30 <logmsgbot> !log cjming@deploy2002 cjming: Continuing with sync
2026-01-15 22:00:04 <jouncebot> Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260115T2200)
2026-01-15 22:02:29 <logmsgbot> !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227435|Update experiment code for JS, PHP SDKs testing of TK (T414528 T414530)]] (duration: 09m 23s)
2026-01-15 22:02:34 <stashbot> T414528: Run synthetic experiment using Javascript SDK in Test Kitchen - https://phabricator.wikimedia.org/T414528
2026-01-15 22:02:35 <stashbot> T414530: Run synthetic experiment using PHP SDK in Test Kitchen - https://phabricator.wikimedia.org/T414530
2026-01-15 22:03:47 <wikibugs> ('CR) ''CI reject: [V:''-1] logos: Add WP25 temporary logo for Hausa Wikipedia (hawiki) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227443 (https://phabricator.wikimedia.org/T414736) (owner: ''SarthakSingh2904)'
2026-01-15 22:06:40 <jinxer-wm> FIRING: SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 22:23:45 <wikibugs> ('PS2) ''SarthakSingh2904: logos: Add WP25 temporary logo for Hausa Wikipedia (hawiki) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227443 (https://phabricator.wikimedia.org/T414736)'
2026-01-15 22:24:47 <wikibugs> ('CR) ''CI reject: [V:''-1] logos: Add WP25 temporary logo for Hausa Wikipedia (hawiki) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227443 (https://phabricator.wikimedia.org/T414736) (owner: ''SarthakSingh2904)'
2026-01-15 22:25:10 <wikibugs> ('CR) ''Ryan Kemper: [C:''+1] java: create openjdk-21 image [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695) (owner: ''Bking)'
2026-01-15 22:25:32 <wikibugs> ('CR) ''Bking: [C:''+2] java: create openjdk-21 image [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695) (owner: ''Bking)'
2026-01-15 22:25:43 <wikibugs> ('CR) ''Bking: [V:''+2 C:''+2] java: create openjdk-21 image [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1227376 (https://phabricator.wikimedia.org/T414695) (owner: ''Bking)'
2026-01-15 22:32:14 <wikibugs> ('Abandoned) ''SarthakSingh2904: logos: Add WP25 temporary logo for Hausa Wikipedia (hawiki) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1227443 (https://phabricator.wikimedia.org/T414736) (owner: ''SarthakSingh2904)'
2026-01-15 22:38:06 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth1 (Subnet frack-fundraising-codfw in F5) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-01-15 22:59:07 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
2026-01-15 23:02:37 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new wikikube-worker nodes - pt1979@cumin2002"
2026-01-15 23:02:43 <logmsgbot> !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new wikikube-worker nodes - pt1979@cumin2002"
2026-01-15 23:02:43 <logmsgbot> !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-01-15 23:04:25 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 23:04:27 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''serviceops: Q2:rack/setup/install wikikube-worker2332-56 - https://phabricator.wikimedia.org/T408757#11527136 (''Papaul)'
2026-01-15 23:09:25 <jinxer-wm> FIRING: [14x] SystemdUnitFailed: prometheus-node-textfile-check-nft.service on tcp-proxy1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-01-15 23:13:13 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''serviceops: Q2:rack/setup/install wikikube-worker2332-56 - https://phabricator.wikimedia.org/T408757#11527155 (''Papaul)'
2026-01-15 23:13:46 <wikibugs> ('PS1) ''Jasmine: wikikube: decommission wikikube-worker[2116-2123,2216-2241].codfw.wmnet [puppet] - ''https://gerrit.wikimedia.org/r/1227454 (https://phabricator.wikimedia.org/T409104)'
2026-01-15 23:52:01 <icinga-wm> PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
2026-01-15 23:52:23 <icinga-wm> PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
2026-01-15 23:54:10 <jinxer-wm> FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
2026-01-15 23:54:39 <jinxer-wm> FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown

This page is generated from SQL logs, you can also download static txt files from here