|
2025-10-28 00:00:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 00:04:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 00:06:25
|
<logmsgbot>
|
!log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy3002.esams.wmnet with OS trixie
|
|
2025-10-28 00:06:37
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for gerrit-ssh-proxy - https://phabricator.wikimedia.org/T408064#11316600 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host
tcp-proxy3002.es...'
|
|
2025-10-28 00:12:17
|
<wikibugs>
|
('PS1) ''Zabe: Initial configuration for minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199089 (https://phabricator.wikimedia.org/T408317)'
|
|
2025-10-28 00:12:58
|
<wikibugs>
|
('PS1) ''Zabe: Initial configuration for pcmwikiqoute [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199090 (https://phabricator.wikimedia.org/T408317)'
|
|
2025-10-28 00:13:23
|
<wikibugs>
|
('PS2) ''Zabe: Initial configuration for pcmwikiqoute [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199090 (https://phabricator.wikimedia.org/T408318)'
|
|
2025-10-28 00:13:51
|
<zabe>
|
jouncebot: nowandnext
|
|
2025-10-28 00:13:52
|
<jouncebot>
|
No deployments scheduled for the next 1 hour(s) and 46 minute(s)
|
|
2025-10-28 00:13:52
|
<jouncebot>
|
In 1 hour(s) and 46 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0200)
|
|
2025-10-28 00:13:55
|
<wikibugs>
|
('CR) ''Zabe: [C:''+2] Initial configuration for minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199089 (https://phabricator.wikimedia.org/T408317) (owner: ''Zabe)'
|
|
2025-10-28 00:14:21
|
<wikibugs>
|
('CR) ''Zabe: [C:''+2] Initial configuration for pcmwikiqoute [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199090 (https://phabricator.wikimedia.org/T408318) (owner: ''Zabe)'
|
|
2025-10-28 00:14:48
|
<wikibugs>
|
('Merged) ''jenkins-bot: Initial configuration for minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199089 (https://phabricator.wikimedia.org/T408317) (owner: ''Zabe)'
|
|
2025-10-28 00:15:11
|
<wikibugs>
|
('Merged) ''jenkins-bot: Initial configuration for pcmwikiqoute [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199090 (https://phabricator.wikimedia.org/T408318) (owner: ''Zabe)'
|
|
2025-10-28 00:16:44
|
<logmsgbot>
|
!log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1199090|Initial configuration for pcmwikiqoute (T408318)]], [[gerrit:1199089|Initial configuration for minwikisource (T408317)]]
|
|
2025-10-28 00:16:53
|
<stashbot>
|
T408318: Create Wikiquote Nigerian Pidgin - https://phabricator.wikimedia.org/T408318
|
|
2025-10-28 00:16:54
|
<stashbot>
|
T408317: Create Wikisource Minangkabau - https://phabricator.wikimedia.org/T408317
|
|
2025-10-28 00:20:04
|
<wikibugs>
|
('PS1) ''Zabe: Activate minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199091 (https://phabricator.wikimedia.org/T408317)'
|
|
2025-10-28 00:20:33
|
<wikibugs>
|
('PS1) ''Zabe: Activate pcmwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199092 (https://phabricator.wikimedia.org/T408318)'
|
|
2025-10-28 00:24:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 00:25:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 00:30:56
|
<wikibugs>
|
('PS6) ''Scott French: P:cache::varnish::frontend: render known-client rate limit VCL [puppet] - ''https://gerrit.wikimedia.org/r/1198182 (https://phabricator.wikimedia.org/T403220)'
|
|
2025-10-28 00:34:02
|
<wikibugs>
|
('CR) ''Scott French: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1198182 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
|
|
2025-10-28 00:37:01
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 00:39:41
|
<wikibugs>
|
('PS1) ''TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1199093'
|
|
2025-10-28 00:39:41
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1199093 (owner: ''TrainBranchBot)'
|
|
2025-10-28 00:42:42
|
<logmsgbot>
|
!log zabe@deploy2002 zabe: Backport for [[gerrit:1199090|Initial configuration for pcmwikiqoute (T408318)]], [[gerrit:1199089|Initial configuration for minwikisource (T408317)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 00:42:48
|
<stashbot>
|
T408318: Create Wikiquote Nigerian Pidgin - https://phabricator.wikimedia.org/T408318
|
|
2025-10-28 00:42:48
|
<stashbot>
|
T408317: Create Wikisource Minangkabau - https://phabricator.wikimedia.org/T408317
|
|
2025-10-28 00:43:00
|
<logmsgbot>
|
!log zabe@deploy2002 zabe: Continuing with sync
|
|
2025-10-28 00:44:01
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 9.117 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 00:52:01
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 00:53:55
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 3.562 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 00:55:28
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 00:56:00
|
<wikibugs>
|
('Merged) ''jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1199093 (owner: ''TrainBranchBot)'
|
|
2025-10-28 00:57:20
|
<logmsgbot>
|
!log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199090|Initial configuration for pcmwikiqoute (T408318)]], [[gerrit:1199089|Initial configuration for minwikisource (T408317)]] (duration: 40m 37s)
|
|
2025-10-28 00:57:26
|
<stashbot>
|
T408318: Create Wikiquote Nigerian Pidgin - https://phabricator.wikimedia.org/T408318
|
|
2025-10-28 00:57:27
|
<stashbot>
|
T408317: Create Wikisource Minangkabau - https://phabricator.wikimedia.org/T408317
|
|
2025-10-28 00:58:47
|
<wikibugs>
|
('CR) ''Zabe: [C:''+2] Activate minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199091 (https://phabricator.wikimedia.org/T408317) (owner: ''Zabe)'
|
|
2025-10-28 00:59:21
|
<wikibugs>
|
('CR) ''Zabe: [C:''+2] Activate pcmwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199092 (https://phabricator.wikimedia.org/T408318) (owner: ''Zabe)'
|
|
2025-10-28 00:59:40
|
<wikibugs>
|
('Merged) ''jenkins-bot: Activate minwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199091 (https://phabricator.wikimedia.org/T408317) (owner: ''Zabe)'
|
|
2025-10-28 01:00:09
|
<wikibugs>
|
('Merged) ''jenkins-bot: Activate pcmwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199092 (https://phabricator.wikimedia.org/T408318) (owner: ''Zabe)'
|
|
2025-10-28 01:00:54
|
<logmsgbot>
|
!log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
|
|
2025-10-28 01:04:01
|
<wikibugs>
|
('PS1) ''Zabe: Update interwiki cache [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199096'
|
|
2025-10-28 01:04:01
|
<wikibugs>
|
('CR) ''Zabe: [C:''+2] Update interwiki cache [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199096 (owner: ''Zabe)'
|
|
2025-10-28 01:04:55
|
<wikibugs>
|
('Merged) ''jenkins-bot: Update interwiki cache [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199096 (owner: ''Zabe)'
|
|
2025-10-28 01:08:13
|
<wikibugs>
|
('PS1) ''TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1199097'
|
|
2025-10-28 01:08:13
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1199097 (owner: ''TrainBranchBot)'
|
|
2025-10-28 01:14:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:14:14
|
<logmsgbot>
|
!log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 13m 19s)
|
|
2025-10-28 01:14:28
|
<logmsgbot>
|
!log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1199092|Activate pcmwikisource (T408318)]], [[gerrit:1199091|Activate minwikisource (T408317)]], [[gerrit:1199096|Update interwiki cache]]
|
|
2025-10-28 01:14:34
|
<stashbot>
|
T408318: Create Wikiquote Nigerian Pidgin - https://phabricator.wikimedia.org/T408318
|
|
2025-10-28 01:14:34
|
<stashbot>
|
T408317: Create Wikisource Minangkabau - https://phabricator.wikimedia.org/T408317
|
|
2025-10-28 01:15:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:16:31
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Degraded RAID on an-worker1203 - https://phabricator.wikimedia.org/T408446#11316916 (''Jclark-ctr) →''Duplicate dup:''T408359'
|
|
2025-10-28 01:16:32
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2025.10.17 - 2025.11.07): Degraded RAID on an-worker1203 - https://phabricator.wikimedia.org/T408359#11316918 (''Jclark-ctr)'
|
|
2025-10-28 01:18:42
|
<logmsgbot>
|
!log zabe@deploy2002 zabe: Backport for [[gerrit:1199092|Activate pcmwikisource (T408318)]], [[gerrit:1199091|Activate minwikisource (T408317)]], [[gerrit:1199096|Update interwiki cache]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 01:22:32
|
<logmsgbot>
|
!log zabe@deploy2002 zabe: Continuing with sync
|
|
2025-10-28 01:23:01
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 01:23:57
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 4.728 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 01:30:52
|
<wikibugs>
|
('Merged) ''jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1199097 (owner: ''TrainBranchBot)'
|
|
2025-10-28 01:32:35
|
<logmsgbot>
|
!log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199092|Activate pcmwikisource (T408318)]], [[gerrit:1199091|Activate minwikisource (T408317)]], [[gerrit:1199096|Update interwiki cache]] (duration: 18m 07s)
|
|
2025-10-28 01:32:41
|
<stashbot>
|
T408318: Create Wikiquote Nigerian Pidgin - https://phabricator.wikimedia.org/T408318
|
|
2025-10-28 01:32:41
|
<stashbot>
|
T408317: Create Wikisource Minangkabau - https://phabricator.wikimedia.org/T408317
|
|
2025-10-28 01:33:39
|
<Jhs>
|
zabe, pcmwikisource??
|
|
2025-10-28 01:33:57
|
<zabe>
|
no worries
|
|
2025-10-28 01:34:03
|
<zabe>
|
I know its pcmwikiquote
|
|
2025-10-28 01:34:04
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:34:11
|
<zabe>
|
its just the commit message that is wrong
|
|
2025-10-28 01:34:13
|
<Jhs>
|
ah, ok, good :)
|
|
2025-10-28 01:39:40
|
<jinxer-wm>
|
FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:49:04
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:50:37
|
<wikibugs>
|
('PS1) ''Andrew Bogott: rabbitmq: rename config file on Trixie [puppet] - ''https://gerrit.wikimedia.org/r/1199100 (https://phabricator.wikimedia.org/T406516)'
|
|
2025-10-28 01:50:47
|
<wikibugs>
|
('CR) ''Andrew Bogott: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199100 (https://phabricator.wikimedia.org/T406516) (owner: ''Andrew Bogott)'
|
|
2025-10-28 01:53:22
|
<wikibugs>
|
('CR) ''Andrew Bogott: [C:''+2] rabbitmq: rename config file on Trixie [puppet] - ''https://gerrit.wikimedia.org/r/1199100 (https://phabricator.wikimedia.org/T406516) (owner: ''Andrew Bogott)'
|
|
2025-10-28 01:54:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 01:55:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 02:00:04
|
<jouncebot>
|
Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0200)
|
|
2025-10-28 02:07:56
|
<wikibugs>
|
('PS1) ''TrainBranchBot: Branch commit for wmf/1.45.0-wmf.25 [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199103 (https://phabricator.wikimedia.org/T405681)'
|
|
2025-10-28 02:07:58
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/1.45.0-wmf.25 [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199103 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 02:17:59
|
<icinga-wm>
|
PROBLEM - Host cloudrabbit2002-dev is DOWN: PING CRITICAL - Packet loss = 100%
|
|
2025-10-28 02:19:29
|
<icinga-wm>
|
RECOVERY - Host cloudrabbit2002-dev is UP: PING OK - Packet loss = 0%, RTA = 30.39 ms
|
|
2025-10-28 02:20:28
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 02:23:43
|
<wikibugs>
|
('Merged) ''jenkins-bot: Branch commit for wmf/1.45.0-wmf.25 [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199103 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 02:24:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 02:25:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 03:00:05
|
<jouncebot>
|
Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0300)
|
|
2025-10-28 03:02:39
|
<wikibugs>
|
('PS1) ''TrainBranchBot: testwikis to 1.45.0-wmf.25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199109 (https://phabricator.wikimedia.org/T405681)'
|
|
2025-10-28 03:02:41
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Initiated by mwpresync@deploy2002" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199109 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 03:03:33
|
<wikibugs>
|
('Merged) ''jenkins-bot: testwikis to 1.45.0-wmf.25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199109 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 03:04:01
|
<logmsgbot>
|
!log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.45.0-wmf.25 refs T405681
|
|
2025-10-28 03:04:06
|
<stashbot>
|
T405681: 1.45.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T405681
|
|
2025-10-28 03:14:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 03:15:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 03:20:57
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for gerrit-ssh-proxy - https://phabricator.wikimedia.org/T408064#11317298 (''Dzahn)'
|
|
2025-10-28 03:24:01
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11317299 (''Dzahn)'
|
|
2025-10-28 03:29:06
|
<wikibugs>
|
('PS1) ''Arlolra: ExtensionDistributor: Mark 1.45 as beta [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199113 (https://phabricator.wikimedia.org/T408466)'
|
|
2025-10-28 03:30:28
|
<jinxer-wm>
|
FIRING: KubernetesCalicoDown: ml-serve2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2001.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
|
|
2025-10-28 03:37:53
|
<wikibugs>
|
('PS1) ''C. Scott Ananian: Forward-compatibility: allow output flags to be serialized in `OutputFlags` [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199114 (https://phabricator.wikimedia.org/T292868)'
|
|
2025-10-28 03:38:26
|
<wikibugs>
|
('CR) ''C. Scott Ananian: [C:''+2] "Backport patch to wmf.25 which just missed the cut." [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199114 (https://phabricator.wikimedia.org/T292868) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 03:39:02
|
<wikibugs>
|
('PS1) ''C. Scott Ananian: ParserOutput: Add deprecation warnings for ParserOutput::getLanguageLinks() [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199115'
|
|
2025-10-28 03:39:12
|
<wikibugs>
|
('CR) ''C. Scott Ananian: [C:''+2] "Backport patch to wmf.25 which just missed the cut." [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199115 (owner: ''C. Scott Ananian)'
|
|
2025-10-28 03:39:45
|
<wikibugs>
|
('PS1) ''C. Scott Ananian: Implement a DOM version of the DeduplicateStyles pass [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199116 (https://phabricator.wikimedia.org/T405929)'
|
|
2025-10-28 03:39:56
|
<wikibugs>
|
('CR) ''C. Scott Ananian: [C:''+2] "Backport patch to wmf.25 which just missed the cut." [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199116 (https://phabricator.wikimedia.org/T405929) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 03:44:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 03:45:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 03:51:51
|
<logmsgbot>
|
!log mwpresync@deploy2002 Finished scap sync-world: testwikis to 1.45.0-wmf.25 refs T405681 (duration: 47m 50s)
|
|
2025-10-28 03:51:55
|
<stashbot>
|
T405681: 1.45.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T405681
|
|
2025-10-28 03:53:15
|
<wikibugs>
|
('Merged) ''jenkins-bot: Forward-compatibility: allow output flags to be serialized in `OutputFlags` [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199114 (https://phabricator.wikimedia.org/T292868) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 03:55:43
|
<wikibugs>
|
('Merged) ''jenkins-bot: ParserOutput: Add deprecation warnings for ParserOutput::getLanguageLinks() [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199115 (owner: ''C. Scott Ananian)'
|
|
2025-10-28 03:55:47
|
<wikibugs>
|
('Merged) ''jenkins-bot: Implement a DOM version of the DeduplicateStyles pass [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199116 (https://phabricator.wikimedia.org/T405929) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 04:00:04
|
<jouncebot>
|
Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0400)
|
|
2025-10-28 04:02:40
|
<logmsgbot>
|
!log mwpresync@deploy2002 Pruned MediaWiki: 1.45.0-wmf.22 (duration: 02m 38s)
|
|
2025-10-28 04:29:08
|
<wikibugs>
|
('PS1) ''C. Scott Ananian: ParserOutput: 'ParseUsedOptions' need not be present in serialized form [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199117'
|
|
2025-10-28 04:29:49
|
<wikibugs>
|
('CR) ''C. Scott Ananian: [C:''+2] "Pull late patch into the branch cut." [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199117 (owner: ''C. Scott Ananian)'
|
|
2025-10-28 04:30:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 04:34:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 04:38:26
|
<wikibugs>
|
('PS1) ''C. Scott Ananian: Expose the list of behavior switch magic words to Parsoid [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199118 (https://phabricator.wikimedia.org/T407290)'
|
|
2025-10-28 04:39:15
|
<wikibugs>
|
('CR) ''C. Scott Ananian: [C:''+2] "Late patch onto the train" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199118 (https://phabricator.wikimedia.org/T407290) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 04:43:39
|
<wikibugs>
|
('Merged) ''jenkins-bot: ParserOutput: 'ParseUsedOptions' need not be present in serialized form [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199117 (owner: ''C. Scott Ananian)'
|
|
2025-10-28 04:45:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 04:49:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 04:54:38
|
<wikibugs>
|
('Merged) ''jenkins-bot: Expose the list of behavior switch magic words to Parsoid [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199118 (https://phabricator.wikimedia.org/T407290) (owner: ''C. Scott Ananian)'
|
|
2025-10-28 04:55:28
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 04:57:25
|
<jinxer-wm>
|
FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 05:00:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 05:04:01
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 05:04:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 05:05:53
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30030 bytes in 0.587 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 05:09:04
|
<jinxer-wm>
|
FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 05:15:01
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 05:18:53
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 1.421 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 05:34:04
|
<jinxer-wm>
|
RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 05:39:40
|
<jinxer-wm>
|
FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 05:50:28
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 06:00:05
|
<jouncebot>
|
Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0600)
|
|
2025-10-28 06:00:05
|
<jouncebot>
|
marostegui, Amir1, and federico3: #bothumor My software never has bugs. It just develops random features. Rise for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0600).
|
|
2025-10-28 06:03:42
|
<wikibugs>
|
('CR) ''Krinkle: ExtensionDistributor: Mark 1.45 as beta (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199113 (https://phabricator.wikimedia.org/T408466) (owner: ''Arlolra)'
|
|
2025-10-28 06:05:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 06:09:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 06:16:48
|
<wikibugs>
|
'ops-ulsfo, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510 (''Papaul) ''NEW'
|
|
2025-10-28 06:20:28
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 06:43:12
|
<wikibugs>
|
'ops-ulsfo, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511 (''Papaul) ''NEW'
|
|
2025-10-28 06:43:42
|
<wikibugs>
|
'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11317386 (''Papaul) p:''Triage→''Medium'
|
|
2025-10-28 06:43:54
|
<wikibugs>
|
'ops-ulsfo, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11317387 (''Papaul) p:''Triage→''Medium'
|
|
2025-10-28 06:44:56
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis pcmwikiquote in section s5
|
|
2025-10-28 06:53:41
|
<logmsgbot>
|
!log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis pcmwikiquote in section s5
|
|
2025-10-28 06:54:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 06:54:43
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis minwikisource in section s5
|
|
2025-10-28 06:55:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 07:00:05
|
<jouncebot>
|
Amir1, Urbanecm, and awight: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0700). nyaa~
|
|
2025-10-28 07:00:05
|
<jouncebot>
|
sefehpisikler: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
|
2025-10-28 07:01:32
|
<logmsgbot>
|
marostegui@cumin1003 sanitize-wiki (PID 343895) is awaiting input
|
|
2025-10-28 07:10:45
|
<logmsgbot>
|
!log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis minwikisource in section s5
|
|
2025-10-28 07:30:28
|
<jinxer-wm>
|
FIRING: KubernetesCalicoDown: ml-serve2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2001.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
|
|
2025-10-28 07:43:11
|
<marostegui>
|
!log Deploy schema change on the master x1 T407587
|
|
2025-10-28 07:43:15
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 07:43:15
|
<stashbot>
|
T407587: Apply ce_event_contributions schema changes in production (x1) - https://phabricator.wikimedia.org/T407587
|
|
2025-10-28 07:43:35
|
<wikibugs>
|
('PS1) ''Muehlenhoff: Failover idp.w.o [dns] - ''https://gerrit.wikimedia.org/r/1199225'
|
|
2025-10-28 07:44:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 07:47:29
|
<wikibugs>
|
('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, October 28 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199026 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 07:47:54
|
<kostajh>
|
marostegui: I'd like to create database tables in x1 for two wikis for the above config patch, can you check the command I am going to run?
|
|
2025-10-28 07:49:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 07:50:28
|
<kostajh>
|
jouncebot: nowandnext
|
|
2025-10-28 07:50:28
|
<jouncebot>
|
For the next 0 hour(s) and 9 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T0700)
|
|
2025-10-28 07:50:28
|
<jouncebot>
|
In 2 hour(s) and 9 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1000)
|
|
2025-10-28 07:50:45
|
<kostajh>
|
also, marostegui are you done deploying?
|
|
2025-10-28 07:51:44
|
<kostajh>
|
I'll take that as a "yes"
|
|
2025-10-28 07:51:49
|
<marostegui>
|
kostajh: Yeah, go for anything
|
|
2025-10-28 07:51:53
|
<marostegui>
|
You need :)
|
|
2025-10-28 07:52:07
|
<marostegui>
|
kostajh: Show me the command
|
|
2025-10-28 07:52:52
|
<kostajh>
|
marostegui: `php maintenance/mysql.php --cluster extension1 --wiki loginwiki ./extensions/CheckUser/schema/mysql/tables-virtual-checkuser-generated.sql`
|
|
2025-10-28 07:53:41
|
<marostegui>
|
kostajh: I guess that is correct I guess you'd run another one for metawiki
|
|
2025-10-28 07:54:21
|
<kostajh>
|
yeah
|
|
2025-10-28 07:54:26
|
<kostajh>
|
ok, I will try it
|
|
2025-10-28 07:55:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 07:56:00
|
<wikibugs>
|
'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11317482 (''cmooney) @papaul looks good! Nothing jumping out at me as problematic in terms of the connectivity plan. I don't think it makes sense to
use 40G tho...'
|
|
2025-10-28 07:56:02
|
<kostajh>
|
marostegui: hm, mwscript sql.php has a `--wiki` and a `--wikidb` flag
|
|
2025-10-28 07:56:12
|
<kostajh>
|
should I specify both as `loginwiki` ?
|
|
2025-10-28 07:56:23
|
<marostegui>
|
kostajh: I am not sure, I am not familiar with this procedure :(
|
|
2025-10-28 07:56:27
|
<kostajh>
|
just reading over `mwscript sql.php --help`
|
|
2025-10-28 07:56:31
|
<marostegui>
|
As we don't use it
|
|
2025-10-28 07:56:39
|
<marostegui>
|
(DBAs do not create tables in prod)
|
|
2025-10-28 07:58:00
|
<kostajh>
|
ok
|
|
2025-10-28 07:58:10
|
<kostajh>
|
it seems to have worked
|
|
2025-10-28 07:58:41
|
<kostajh>
|
I will deploy my config patch now
|
|
2025-10-28 07:58:45
|
<wikibugs>
|
('PS1) ''Brouberol: opensearch-operator: watch the 3 opensearch namespaces in dse-k8s-eqiad [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199226 (https://phabricator.wikimedia.org/T404874)'
|
|
2025-10-28 07:59:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 07:59:12
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199026 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 07:59:20
|
<wikibugs>
|
('PS2) ''Brouberol: opensearch-operator: watch the 3 opensearch namespaces in dse-k8s-eqiad [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199226 (https://phabricator.wikimedia.org/T404874)'
|
|
2025-10-28 08:00:01
|
<wikibugs>
|
('Merged) ''jenkins-bot: CheckUser: Enable SI on metawiki and loginwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199026 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 08:01:04
|
<wikibugs>
|
('CR) ''Slyngshede: [C:''+1] Failover idp.w.o [dns] - ''https://gerrit.wikimedia.org/r/1199225 (owner: ''Muehlenhoff)'
|
|
2025-10-28 08:02:10
|
<logmsgbot>
|
!log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1199026|CheckUser: Enable SI on metawiki and loginwiki (T408428)]]
|
|
2025-10-28 08:02:15
|
<stashbot>
|
T408428: Suggested investigations: Enable on Metawiki and Loginwiki - https://phabricator.wikimedia.org/T408428
|
|
2025-10-28 08:02:40
|
<wikibugs>
|
('CR) ''Kosta Harlan: "For next time: could you please schedule this as a backport? It was unexpected to see this when I went to deploy a config patch this morni" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199117 (owner: ''C. Scott Ananian)'
|
|
2025-10-28 08:02:43
|
<jinxer-wm>
|
FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1019:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
|
|
2025-10-28 08:04:16
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C:''+2] Failover idp.w.o [dns] - ''https://gerrit.wikimedia.org/r/1199225 (owner: ''Muehlenhoff)'
|
|
2025-10-28 08:04:24
|
<logmsgbot>
|
!log jmm@dns1004 START - running authdns-update
|
|
2025-10-28 08:05:11
|
<logmsgbot>
|
!log jmm@dns1004 END - running authdns-update
|
|
2025-10-28 08:07:43
|
<jinxer-wm>
|
RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1019:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
|
|
2025-10-28 08:11:12
|
<logmsgbot>
|
!log elukey@cumin2002 START - Cookbook sre.hosts.powercycle for host ml-serve2001
|
|
2025-10-28 08:11:14
|
<logmsgbot>
|
!log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.powercycle (exit_code=99) for host ml-serve2001
|
|
2025-10-28 08:13:13
|
<gehel>
|
!log restarting blazegraph on wdqs1019 - free allocator decreasing - `sudo depool; sleep 30; sudo systemctl restart wdqs-blazegraph.service; sleep 30; sudo pool`
|
|
2025-10-28 08:13:16
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 08:14:39
|
<kostajh>
|
waiting on image building, which will probably take ~30 inutes
|
|
2025-10-28 08:17:13
|
<wikibugs>
|
('PS18) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 08:18:20
|
<logmsgbot>
|
!log elukey@cumin2002 START - Cookbook sre.hosts.powercycle for host ml-serve2001
|
|
2025-10-28 08:18:27
|
<logmsgbot>
|
!log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.powercycle (exit_code=99) for host ml-serve2001
|
|
2025-10-28 08:19:22
|
<wikibugs>
|
('CR) ''Jelto: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7480/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
|
|
2025-10-28 08:21:56
|
<wikibugs>
|
('PS19) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 08:23:33
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] opensearch-operator: watch the 3 opensearch namespaces in dse-k8s-eqiad [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199226 (https://phabricator.wikimedia.org/T404874) (owner: ''Brouberol)'
|
|
2025-10-28 08:23:56
|
<wikibugs>
|
('CR) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies (''4 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
|
|
2025-10-28 08:24:55
|
<icinga-wm>
|
RECOVERY - Host ml-serve2001 is UP: PING OK - Packet loss = 0%, RTA = 30.50 ms
|
|
2025-10-28 08:25:54
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
|
|
2025-10-28 08:26:21
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
|
|
2025-10-28 08:27:48
|
<wikibugs>
|
('PS7) ''Elukey: Add the sre.hosts.powercycle cookbook [cookbooks] - ''https://gerrit.wikimedia.org/r/1198928'
|
|
2025-10-28 08:28:07
|
<logmsgbot>
|
!log kharlan@deploy2002 kharlan: Backport for [[gerrit:1199026|CheckUser: Enable SI on metawiki and loginwiki (T408428)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 08:28:12
|
<stashbot>
|
T408428: Suggested investigations: Enable on Metawiki and Loginwiki - https://phabricator.wikimedia.org/T408428
|
|
2025-10-28 08:28:38
|
<moritzm>
|
!log installing openjdk-11 security updates
|
|
2025-10-28 08:28:41
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 08:29:04
|
<jinxer-wm>
|
RESOLVED: KubernetesCalicoDown: ml-serve2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2001.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
|
|
2025-10-28 08:29:38
|
<kostajh>
|
testing
|
|
2025-10-28 08:29:55
|
<logmsgbot>
|
!log elukey@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2001.codfw.wmnet
|
|
2025-10-28 08:29:58
|
<logmsgbot>
|
!log elukey@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2001.codfw.wmnet
|
|
2025-10-28 08:33:09
|
<logmsgbot>
|
!log kharlan@deploy2002 kharlan: Continuing with sync
|
|
2025-10-28 08:34:06
|
<wikibugs>
|
('PS1) ''Santiago Faci: xLab: Deploying v1.1.0 release to staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199228 (https://phabricator.wikimedia.org/T406729)'
|
|
2025-10-28 08:34:53
|
<wikibugs>
|
('PS1) ''Brouberol: opensearch-operator: add a separator between tenant role and rolebinding resources [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199230 (https://phabricator.wikimedia.org/T404874)'
|
|
2025-10-28 08:35:30
|
<wikibugs>
|
('PS2) ''Santiago Faci: xLab: Deploying v1.1.0 release to staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199228 (https://phabricator.wikimedia.org/T406729)'
|
|
2025-10-28 08:36:31
|
<wikibugs>
|
('PS3) ''Santiago Faci: xLab: Deploying v1.1.0 release to staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199228 (https://phabricator.wikimedia.org/T406729)'
|
|
2025-10-28 08:46:15
|
<wikibugs>
|
('PS1) ''Kosta Harlan: hCaptcha: Enable on loginwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199231 (https://phabricator.wikimedia.org/T408428)'
|
|
2025-10-28 08:49:07
|
<logmsgbot>
|
!log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199026|CheckUser: Enable SI on metawiki and loginwiki (T408428)]] (duration: 46m 57s)
|
|
2025-10-28 08:49:16
|
<stashbot>
|
T408428: Suggested investigations: Enable on Metawiki and Loginwiki - https://phabricator.wikimedia.org/T408428
|
|
2025-10-28 08:49:30
|
<kostajh>
|
I'm going to sync another patch, unless someone else needs to deploy
|
|
2025-10-28 08:49:36
|
<kostajh>
|
jouncebot: nowandnext
|
|
2025-10-28 08:49:36
|
<jouncebot>
|
No deployments scheduled for the next 1 hour(s) and 10 minute(s)
|
|
2025-10-28 08:49:36
|
<jouncebot>
|
In 1 hour(s) and 10 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1000)
|
|
2025-10-28 08:50:13
|
<wikibugs>
|
('CR) ''Mszwarc: [C:''+1] hCaptcha: Enable on loginwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199231 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 08:50:41
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199231 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 08:51:21
|
<wikibugs>
|
('PS3) ''Arthur taylor: Enable the MEX / wbui2025 beta feature on testwikidata [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1197613 (https://phabricator.wikimedia.org/T407737)'
|
|
2025-10-28 08:51:33
|
<wikibugs>
|
('PS8) ''Elukey: Add the sre.hosts.powercycle cookbook [cookbooks] - ''https://gerrit.wikimedia.org/r/1198928'
|
|
2025-10-28 08:51:38
|
<wikibugs>
|
('Merged) ''jenkins-bot: hCaptcha: Enable on loginwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199231 (https://phabricator.wikimedia.org/T408428) (owner: ''Kosta Harlan)'
|
|
2025-10-28 08:52:06
|
<logmsgbot>
|
!log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1199231|hCaptcha: Enable on loginwiki (T408428)]]
|
|
2025-10-28 08:53:11
|
<wikibugs>
|
('PS9) ''Elukey: Add the sre.hosts.powercycle cookbook [cookbooks] - ''https://gerrit.wikimedia.org/r/1198928'
|
|
2025-10-28 08:53:38
|
<logmsgbot>
|
!log elukey@cumin2002 START - Cookbook sre.hosts.powercycle for host ml-serve2001
|
|
2025-10-28 08:53:52
|
<logmsgbot>
|
!log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve2001
|
|
2025-10-28 08:54:47
|
<wikibugs>
|
('CR) ''DCausse: [C:''+1] cirrus: Start near match A/B test (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199054 (https://phabricator.wikimedia.org/T408154) (owner: ''Ebernhardson)'
|
|
2025-10-28 08:55:27
|
<icinga-wm>
|
PROBLEM - Host ml-serve2001 is DOWN: PING CRITICAL - Packet loss = 100%
|
|
2025-10-28 08:55:28
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 08:56:31
|
<logmsgbot>
|
!log kharlan@deploy2002 kharlan: Backport for [[gerrit:1199231|hCaptcha: Enable on loginwiki (T408428)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 08:56:50
|
<stashbot>
|
T408428: Suggested investigations: Enable on Metawiki and Loginwiki - https://phabricator.wikimedia.org/T408428
|
|
2025-10-28 08:56:55
|
<icinga-wm>
|
RECOVERY - Host ml-serve2001 is UP: PING OK - Packet loss = 0%, RTA = 30.36 ms
|
|
2025-10-28 08:57:40
|
<jinxer-wm>
|
FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 08:58:26
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] opensearch-operator: add a separator between tenant role and rolebinding resources [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199230 (https://phabricator.wikimedia.org/T404874) (owner: ''Brouberol)'
|
|
2025-10-28 08:58:45
|
<logmsgbot>
|
!log kharlan@deploy2002 kharlan: Continuing with sync
|
|
2025-10-28 08:59:55
|
<logmsgbot>
|
!log jmm@cumin2002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: OpenJDK security updates - jmm@cumin2002
|
|
2025-10-28 08:59:58
|
<wikibugs>
|
('PS1) ''Gehel: Hadoop: Introduce tmpreaper to cleanup /tmp [puppet] - ''https://gerrit.wikimedia.org/r/1199233 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 09:02:01
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] Hadoop: Introduce tmpreaper to cleanup /tmp [puppet] - ''https://gerrit.wikimedia.org/r/1199233 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:02:46
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
|
|
2025-10-28 09:05:17
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
|
|
2025-10-28 09:06:59
|
<wikibugs>
|
('CR) ''Clément Goubert: [C:''+1] Route /page/lint(.*) to the gateway on test2wiki [puppet] - ''https://gerrit.wikimedia.org/r/1199032 (https://phabricator.wikimedia.org/T384216) (owner: ''Aaron Schulz)'
|
|
2025-10-28 09:07:15
|
<wikibugs>
|
('CR) ''Filippo Giunchedi: "> > Nice find! Yes I think that ought to work and cater for module unload too. And yes I think there shouldn't be too many modules." [puppet] - ''https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726) (owner: ''JHathaway)'
|
|
2025-10-28 09:08:40
|
<logmsgbot>
|
!log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199231|hCaptcha: Enable on loginwiki (T408428)]] (duration: 16m 35s)
|
|
2025-10-28 09:08:45
|
<stashbot>
|
T408428: Suggested investigations: Enable on Metawiki and Loginwiki - https://phabricator.wikimedia.org/T408428
|
|
2025-10-28 09:14:40
|
<wikibugs>
|
('CR) ''Brouberol: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199233 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:14:44
|
<wikibugs>
|
('CR) ''Brouberol: Hadoop: Introduce tmpreaper to cleanup /tmp (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199233 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:15:50
|
<godog>
|
gehel: FYI these days systemd-tmpfiles has replaced tmpreaper, check out e.g. modules/icinga/manifests/init.pp
|
|
2025-10-28 09:20:04
|
<logmsgbot>
|
!log jmm@cumin2002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: OpenJDK security updates - jmm@cumin2002
|
|
2025-10-28 09:20:28
|
<gehel>
|
godog: Oh, nice! I'm too old school!
|
|
2025-10-28 09:21:56
|
<godog>
|
nice indeed, one line config file and you're done
|
|
2025-10-28 09:22:41
|
<wikibugs>
|
('CR) ''Elukey: [C:''+2] Use Thanos rules for Pyrra error metrics for xLab [puppet] - ''https://gerrit.wikimedia.org/r/1199023 (https://phabricator.wikimedia.org/T398869) (owner: ''Dr0ptp4kt)'
|
|
2025-10-28 09:29:06
|
<wikibugs>
|
('Abandoned) ''Gehel: Hadoop: Introduce tmpreaper to cleanup /tmp [puppet] - ''https://gerrit.wikimedia.org/r/1199233 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:30:52
|
<wikibugs>
|
('PS1) ''Majavah: P:toolforge::k8s::haproxy: Use hourly logrotate [puppet] - ''https://gerrit.wikimedia.org/r/1199238 (https://phabricator.wikimedia.org/T408457)'
|
|
2025-10-28 09:30:56
|
<wikibugs>
|
('CR) ''Elukey: LVS: Add druid-public-coordinator to service list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198499 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 09:31:32
|
<wikibugs>
|
('CR) ''Elukey: LVS: etcd data for druid-public-coordinator (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 09:34:13
|
<logmsgbot>
|
!log klausman@cumin1003 START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll-restart for Java security updates - klausman@cumin1003
|
|
2025-10-28 09:36:43
|
<logmsgbot>
|
!log cgoubert@cumin1003 START - Cookbook sre.dns.netbox
|
|
2025-10-28 09:36:45
|
<wikibugs>
|
'SRE, ''envoy, ''serviceops, ''Patch-For-Review: Upgrade Envoy to v1.29.12 - https://phabricator.wikimedia.org/T403663#11317841 (''LSobanski) Untagging #collaboration-services based on https://phabricator.wikimedia.org/T403663#11196043'
|
|
2025-10-28 09:37:12
|
<wikibugs>
|
('PS1) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 09:38:07
|
<wikibugs>
|
('CR) ''Stevemunene: LVS: Add druid-public-coordinator to service list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198499 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 09:38:27
|
<wikibugs>
|
('CR) ''Arthur taylor: Enable the MEX / wbui2025 beta feature on testwikidata [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1197613 (https://phabricator.wikimedia.org/T407737) (owner: ''Arthur taylor)'
|
|
2025-10-28 09:39:32
|
<logmsgbot>
|
!log cgoubert@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 09:39:40
|
<jinxer-wm>
|
FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 09:39:47
|
<logmsgbot>
|
!log cgoubert@cumin1003 START - Cookbook sre.dns.netbox
|
|
2025-10-28 09:39:54
|
<wikibugs>
|
('PS2) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 09:40:07
|
<wikibugs>
|
('CR) ''Gehel: "check-experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:40:13
|
<wikibugs>
|
('CR) ''Brouberol: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:41:00
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Traffic, ''Release-Engineering-Team (Radar), ''WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532 (''LSobanski) ''NEW'
|
|
2025-10-28 09:41:29
|
<wikibugs>
|
('CR) ''FNegri: [C:''+1] P:toolforge::k8s::haproxy: Use hourly logrotate [puppet] - ''https://gerrit.wikimedia.org/r/1199238 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
|
|
2025-10-28 09:41:49
|
<wikibugs>
|
('PS1) ''Majavah: aptrepo: Retire kubeadm/1.29 components [puppet] - ''https://gerrit.wikimedia.org/r/1199240'
|
|
2025-10-28 09:41:50
|
<wikibugs>
|
('PS1) ''Majavah: aptrepo: Import Kubeadm/1.31 packages [puppet] - ''https://gerrit.wikimedia.org/r/1199241 (https://phabricator.wikimedia.org/T372697)'
|
|
2025-10-28 09:41:58
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:42:05
|
<wikibugs>
|
('CR) ''Majavah: [C:''+2] P:toolforge::k8s::haproxy: Use hourly logrotate [puppet] - ''https://gerrit.wikimedia.org/r/1199238 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
|
|
2025-10-28 09:42:32
|
<logmsgbot>
|
!log cgoubert@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 09:42:54
|
<wikibugs>
|
('PS3) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 09:42:58
|
<logmsgbot>
|
!log cgoubert@cumin1003 START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
|
|
2025-10-28 09:43:07
|
<wikibugs>
|
('CR) ''Gehel: "check-experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:43:20
|
<wikibugs>
|
('CR) ''Brouberol: Hadoop: cleanup /tmp with systemd::tmpfile (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:43:35
|
<wikibugs>
|
('CR) ''Brouberol: Hadoop: cleanup /tmp with systemd::tmpfile (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:43:42
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Traffic, ''Release-Engineering-Team (Radar), ''WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11317892 (''LSobanski) p:''Triage→''High'
|
|
2025-10-28 09:43:59
|
<wikibugs>
|
('CR) ''Brouberol: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:44:21
|
<wikibugs>
|
('PS1) ''Jelto: aptrepo::staging: add job to clear incoming folder [puppet] - ''https://gerrit.wikimedia.org/r/1199243 (https://phabricator.wikimedia.org/T408527)'
|
|
2025-10-28 09:44:21
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Traffic, ''Release-Engineering-Team (Radar), ''WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11317895 (''LSobanski)'
|
|
2025-10-28 09:44:22
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11317894 (''LSobanski)'
|
|
2025-10-28 09:44:27
|
<wikibugs>
|
('CR) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:45:01
|
<wikibugs>
|
('Abandoned) ''Brouberol: growthbook: remove all traces of mongoDB from the chart [deployment-charts] - ''https://gerrit.wikimedia.org/r/1197589 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:45:30
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C:''+1] "Looks good, two nits inline" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:45:48
|
<wikibugs>
|
('CR) ''Stevemunene: [C:''+1] Definition of a ferretdb chart [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198977 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:46:25
|
<wikibugs>
|
('CR) ''Stevemunene: [C:''+1] ferretdb-growthbook: define helmfile and values [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198978 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:48:52
|
<logmsgbot>
|
!log cgoubert@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
|
|
2025-10-28 09:49:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 09:49:13
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] cloudnative-pg-cluster: allow direct access to the DB when pooling is disabled [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198974 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:49:16
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] cloudnative-pg-cluster: set env vars disabling s3 security feature not implemented in radosgw [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198975 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:49:17
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] postgresql-growthbook: define a custom PG image, libraries and post init SQL [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198514 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:49:24
|
<logmsgbot>
|
!log cgoubert@cumin1003 START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
|
|
2025-10-28 09:49:25
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] Definition of a ferretdb chart [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198977 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:49:27
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] ferretdb-growthbook: define helmfile and values [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198978 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:50:11
|
<wikibugs>
|
('PS4) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 09:50:18
|
<wikibugs>
|
('CR) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:50:28
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 09:51:14
|
<wikibugs>
|
('Merged) ''jenkins-bot: cloudnative-pg-cluster: allow direct access to the DB when pooling is disabled [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198974 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:51:28
|
<wikibugs>
|
('Merged) ''jenkins-bot: cloudnative-pg-cluster: set env vars disabling s3 security feature not implemented in radosgw [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198975 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:51:42
|
<wikibugs>
|
('Merged) ''jenkins-bot: postgresql-growthbook: define a custom PG image, libraries and post init SQL [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198514 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 09:51:52
|
<wikibugs>
|
('Merged) ''jenkins-bot: Definition of a ferretdb chart [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198977 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:51:54
|
<wikibugs>
|
('Merged) ''jenkins-bot: ferretdb-growthbook: define helmfile and values [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198978 (https://phabricator.wikimedia.org/T406579) (owner: ''Brouberol)'
|
|
2025-10-28 09:51:57
|
<logmsgbot>
|
!log klausman@cumin1003 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll-restart for Java security updates - klausman@cumin1003
|
|
2025-10-28 09:52:15
|
<logmsgbot>
|
!log klausman@cumin1003 START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll-restart for Java security updates - klausman@cumin1003
|
|
2025-10-28 09:53:20
|
<wikibugs>
|
('CR) ''Mark Bergsma: [C:''+1] admin: add dpogorzelski to ops-limited [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 09:54:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 09:54:05
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to ops-limited for dpogorzelski - https://phabricator.wikimedia.org/T407955#11317933 (''mark) Approved in Gerrit!'
|
|
2025-10-28 09:54:07
|
<wikibugs>
|
('PS2) ''Tiziano Fogli: nrpe2nodexp: use service description as alertname [puppet] - ''https://gerrit.wikimedia.org/r/1199242 (https://phabricator.wikimedia.org/T395446)'
|
|
2025-10-28 09:54:18
|
<klausman>
|
lookinfg at that alert
|
|
2025-10-28 09:55:27
|
<logmsgbot>
|
!log cgoubert@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
|
|
2025-10-28 09:55:59
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+1] Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 09:59:57
|
<wikibugs>
|
('CR) ''Elukey: LVS: Add druid-public-coordinator to service list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198499 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 10:00:05
|
<jouncebot>
|
Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1000)
|
|
2025-10-28 10:01:34
|
<wikibugs>
|
('CR) ''Stevemunene: LVS: etcd data for druid-public-coordinator (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 10:02:53
|
<wikibugs>
|
('CR) ''Clément Goubert: wikikube: Add wikikube-worker2[248-330] (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1181753 (https://phabricator.wikimedia.org/T390859) (owner: ''Jasmine)'
|
|
2025-10-28 10:03:44
|
<wikibugs>
|
('PS2) ''Jelto: aptrepo::staging: add job to clear incoming folder [puppet] - ''https://gerrit.wikimedia.org/r/1199243 (https://phabricator.wikimedia.org/T408527)'
|
|
2025-10-28 10:03:53
|
<wikibugs>
|
('CR) ''Clément Goubert: [C:''+2] taskgen: Update calico IPPool check [puppet] - ''https://gerrit.wikimedia.org/r/1191671 (https://phabricator.wikimedia.org/T375845) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:05:20
|
<wikibugs>
|
('CR) ''Jelto: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7482/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1199243 (https://phabricator.wikimedia.org/T408527) (owner: ''Jelto)'
|
|
2025-10-28 10:05:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 10:05:32
|
<wikibugs>
|
('PS2) ''Daniel Kinzler: rest-gateway: Create metrics mapping for ratelimit service [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199008 (https://phabricator.wikimedia.org/T408183)'
|
|
2025-10-28 10:09:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 10:09:22
|
<wikibugs>
|
('PS1) ''JavierMonton: Disable default user-agent collection. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964)'
|
|
2025-10-28 10:09:37
|
<jinxer-wm>
|
FIRING: Failing Rate (Dashboard - Desktop & Mobile): <no value> - https://alerts.wikimedia.org/?q=alertname%3DFailing+Rate+%28Dashboard+-+Desktop+%26+Mobile%29
|
|
2025-10-28 10:10:00
|
<logmsgbot>
|
!log klausman@cumin1003 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll-restart for Java security updates - klausman@cumin1003
|
|
2025-10-28 10:10:32
|
<wikibugs>
|
('PS1) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
|
|
2025-10-28 10:13:06
|
<wikibugs>
|
('PS1) ''Huei Tan: alertmanager: route Language and Product Localization team alerts [puppet] - ''https://gerrit.wikimedia.org/r/1199248 (https://phabricator.wikimedia.org/T376535)'
|
|
2025-10-28 10:14:14
|
<wikibugs>
|
('PS2) ''Huei Tan: alertmanager: route Language and Product Localization team alerts [puppet] - ''https://gerrit.wikimedia.org/r/1199248 (https://phabricator.wikimedia.org/T376535)'
|
|
2025-10-28 10:14:21
|
<wikibugs>
|
('PS3) ''Huei Tan: alertmanager: route Language and Product Localization team alerts [puppet] - ''https://gerrit.wikimedia.org/r/1199248 (https://phabricator.wikimedia.org/T376535)'
|
|
2025-10-28 10:14:25
|
<wikibugs>
|
'sre-alert-triage, ''Infrastructure-Foundations, ''netops: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T407833#11318022 (''cmooney) ''Open→''Resolved I removed these additional
sessions last week but got distracted and didn't come back to edi...'
|
|
2025-10-28 10:20:28
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 10:22:05
|
<wikibugs>
|
('CR) ''Klausman: [C:''+1] admin: add dpogorzelski to ops-limited [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 10:26:59
|
<wikibugs>
|
('CR) ''Elukey: LVS: etcd data for druid-public-coordinator (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 10:28:47
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] Route /page/lint(.*) to the gateway on test2wiki [puppet] - ''https://gerrit.wikimedia.org/r/1199032 (https://phabricator.wikimedia.org/T384216) (owner: ''Aaron Schulz)'
|
|
2025-10-28 10:29:37
|
<jinxer-wm>
|
RESOLVED: Failing Rate (Dashboard - Desktop & Mobile): <no value> - https://alerts.wikimedia.org/?q=alertname%3DFailing+Rate+%28Dashboard+-+Desktop+%26+Mobile%29
|
|
2025-10-28 10:29:41
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group0 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198929 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:30:23
|
<wikibugs>
|
('CR) ''Stevemunene: LVS: etcd data for druid-public-coordinator (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 10:30:51
|
<wikibugs>
|
('CR) ''Clément Goubert: [C:''+2] Route /page/lint(.*) to the gateway on test2wiki [puppet] - ''https://gerrit.wikimedia.org/r/1199032 (https://phabricator.wikimedia.org/T384216) (owner: ''Aaron Schulz)'
|
|
2025-10-28 10:32:14
|
<wikibugs>
|
('CR) ''Fabfur: "as @Elukey correctly pointed out, the procedure needs to be followed here, happy to review it again later" [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 10:34:27
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C:''+1] "Looks good" [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 10:37:02
|
<wikibugs>
|
'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11318126 (''elukey)'
|
|
2025-10-28 10:37:46
|
<wikibugs>
|
('CR) ''Dpogorzelski: [C:''+1] admin: add dpogorzelski to ops-limited [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 10:38:01
|
<wikibugs>
|
'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11318132 (''elukey) We finally have all three SLO published in Pyrra: https://slo.wikimedia.org/?search=xlab Let's wait a couple of weeks to observe the new SL...'
|
|
2025-10-28 10:41:58
|
<wikibugs>
|
('CR) ''Clément Goubert: [C:''+2] trafficserver: action api to rest-gateway group0 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198929 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:43:27
|
<wikibugs>
|
('CR) ''Muehlenhoff: "That would work, alternative proposal inline (which doesn't interfere with people working late in the American timezones)." [puppet] - ''https://gerrit.wikimedia.org/r/1199243 (https://phabricator.wikimedia.org/T408527) (owner: ''Jelto)'
|
|
2025-10-28 10:44:32
|
<wikibugs>
|
('PS1) ''Fabfur: P:cache:haproxy: don't repeat contact validation regex [puppet] - ''https://gerrit.wikimedia.org/r/1199251 (https://phabricator.wikimedia.org/T408060)'
|
|
2025-10-28 10:44:52
|
<wikibugs>
|
('CR) ''Fabfur: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
|
|
2025-10-28 10:45:33
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group0 100% [puppet] - ''https://gerrit.wikimedia.org/r/1198931 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:45:57
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group1 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198932 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:46:11
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group1 50% [puppet] - ''https://gerrit.wikimedia.org/r/1198933 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:46:22
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group1 100% [puppet] - ''https://gerrit.wikimedia.org/r/1198934 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:46:47
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group2 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198935 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:47:02
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group2 50% [puppet] - ''https://gerrit.wikimedia.org/r/1198936 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:47:11
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group2 100% [puppet] - ''https://gerrit.wikimedia.org/r/1198937 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:47:24
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway enwiki 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198938 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:50:03
|
<wikibugs>
|
('PS2) ''Clément Goubert: trafficserver: action api to rest-gateway group0 50% [puppet] - ''https://gerrit.wikimedia.org/r/1198930 (https://phabricator.wikimedia.org/T408223)'
|
|
2025-10-28 10:50:37
|
<moritzm>
|
!log installing openjdk-17 security updates
|
|
2025-10-28 10:50:40
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 10:51:07
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway enwiki 50% [puppet] - ''https://gerrit.wikimedia.org/r/1198939 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:51:17
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway enwiki 100% [puppet] - ''https://gerrit.wikimedia.org/r/1198940 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:51:35
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway cleanup [puppet] - ''https://gerrit.wikimedia.org/r/1198941 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 10:57:25
|
<jinxer-wm>
|
RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 10:58:50
|
<logmsgbot>
|
!log zabe@deploy2002 helmfile [codfw] START helmfile.d/services/mw-experimental: apply
|
|
2025-10-28 11:00:03
|
<logmsgbot>
|
!log zabe@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
|
|
2025-10-28 11:11:50
|
<wikibugs>
|
('PS1) ''Stevemunene: druid: add druid-coordinator to druid public worker role [puppet] - ''https://gerrit.wikimedia.org/r/1199256 (https://phabricator.wikimedia.org/T406222)'
|
|
2025-10-28 11:14:51
|
<wikibugs>
|
('CR) ''Mahmoud-abdelsattar: [C:''+1] Enable the MEX / wbui2025 beta feature on testwikidata [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1197613 (https://phabricator.wikimedia.org/T407737) (owner: ''Arthur taylor)'
|
|
2025-10-28 11:14:54
|
<wikibugs>
|
('PS2) ''Stevemunene: druid: add druid-coordinator to druid public worker role [puppet] - ''https://gerrit.wikimedia.org/r/1199256 (https://phabricator.wikimedia.org/T406222)'
|
|
2025-10-28 11:20:08
|
<wikibugs>
|
('PS3) ''Stevemunene: LVS: etcd data for druid-public-coordinator [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222)'
|
|
2025-10-28 11:20:12
|
<wikibugs>
|
('PS4) ''Stevemunene: LVS: Add druid-public-coordinator to service list [puppet] - ''https://gerrit.wikimedia.org/r/1198499 (https://phabricator.wikimedia.org/T406222)'
|
|
2025-10-28 11:21:24
|
<wikibugs>
|
('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 05 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1197613 (https://phabricator.wikimedia.org/T407737) (owner: ''Arthur taylor)'
|
|
2025-10-28 11:24:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 11:25:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 11:27:48
|
<wikibugs>
|
('PS1) ''Muehlenhoff: osm: Remove obsolete spec files [puppet] - ''https://gerrit.wikimedia.org/r/1199260 (https://phabricator.wikimedia.org/T381565)'
|
|
2025-10-28 11:29:06
|
<wikibugs>
|
('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199260 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 11:29:26
|
<wikibugs>
|
('PS10) ''Elukey: Add the sre.hosts.powercycle cookbook [cookbooks] - ''https://gerrit.wikimedia.org/r/1198928'
|
|
2025-10-28 11:30:32
|
<logmsgbot>
|
!log elukey@cumin2002 START - Cookbook sre.hosts.powercycle for host ml-serve2001
|
|
2025-10-28 11:31:39
|
<icinga-wm>
|
PROBLEM - Host ml-serve2001 is DOWN: PING CRITICAL - Packet loss = 100%
|
|
2025-10-28 11:31:48
|
<Msz2001>
|
I'm going to do a deployment to private code, related to Suggested Investigations
|
|
2025-10-28 11:32:03
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] osm: Remove obsolete spec files [puppet] - ''https://gerrit.wikimedia.org/r/1199260 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 11:33:55
|
<icinga-wm>
|
RECOVERY - Host ml-serve2001 is UP: PING OK - Packet loss = 0%, RTA = 30.43 ms
|
|
2025-10-28 11:35:59
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C:''+2] osm: Remove obsolete spec files [puppet] - ''https://gerrit.wikimedia.org/r/1199260 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 11:37:33
|
<wikibugs>
|
('PS1) ''Brouberol: cloudnative-pg-cluster: allow release values to override the pg_hba field [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199261 (https://phabricator.wikimedia.org/T406578)'
|
|
2025-10-28 11:37:56
|
<wikibugs>
|
('PS1) ''Brouberol: postgresql-growthbook: allow IPv4/6 remote TCP connections for the app user/db [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199262 (https://phabricator.wikimedia.org/T406578)'
|
|
2025-10-28 11:40:35
|
<logmsgbot>
|
!log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve2001
|
|
2025-10-28 11:41:07
|
<logmsgbot>
|
!log elukey@cumin2002 START - Cookbook sre.hosts.powercycle for host sretest2010
|
|
2025-10-28 11:42:12
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Unresponsive management for ms-be1090.mgmt:22 - https://phabricator.wikimedia.org/T408478#11318289 (''Jclark-ctr)'
|
|
2025-10-28 11:42:13
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11318292 (''Jclark-ctr) →''Duplicate dup:''T408478'
|
|
2025-10-28 11:42:50
|
<wikibugs>
|
('PS1) ''Mvolz: Update Zotero to node22 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199263 (https://phabricator.wikimedia.org/T393434)'
|
|
2025-10-28 11:42:53
|
<logmsgbot>
|
!log fceratto@cumin1003 START - Cookbook sre.hosts.decommission for hosts es2026.codfw.wmnet
|
|
2025-10-28 11:42:53
|
<logmsgbot>
|
!log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
|
|
2025-10-28 11:43:31
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11318295 (''Jclark-ctr) ''Duplicate→''Open Closed by
mistake'
|
|
2025-10-28 11:44:07
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Unresponsive management for ms-be1090.mgmt:22 - https://phabricator.wikimedia.org/T408478#11318299 (''Jclark-ctr) ''Open→''Resolved a:''Jclark-ctr Down due to work with card install T400877'
|
|
2025-10-28 11:44:34
|
<logmsgbot>
|
!log mvernon@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe-codfw
|
|
2025-10-28 11:45:40
|
<wikibugs>
|
('CR) ''Slyngshede: [C:''+1] admin: add dpogorzelski to ops-limited [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 11:47:44
|
<wikibugs>
|
('PS1) ''Muehlenhoff: osm_sync_lag.sh: Fix default to current directory [puppet] - ''https://gerrit.wikimedia.org/r/1199265 (https://phabricator.wikimedia.org/T381565)'
|
|
2025-10-28 11:47:57
|
<wikibugs>
|
('CR) ''Stevemunene: [C:''+1] cloudnative-pg-cluster: allow release values to override the pg_hba field [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199261 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:48:04
|
<wikibugs>
|
('CR) ''Stevemunene: [C:''+1] postgresql-growthbook: allow IPv4/6 remote TCP connections for the app user/db [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199262 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:48:52
|
<logmsgbot>
|
!log fceratto@cumin1003 START - Cookbook sre.dns.netbox
|
|
2025-10-28 11:49:06
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] cloudnative-pg-cluster: allow release values to override the pg_hba field [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199261 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:49:08
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] postgresql-growthbook: allow IPv4/6 remote TCP connections for the app user/db [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199262 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:49:19
|
<wikibugs>
|
('PS2) ''Brouberol: postgresql-growthbook: allow IPv4/6 remote TCP connections for the app user/db [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199262 (https://phabricator.wikimedia.org/T406578)'
|
|
2025-10-28 11:50:43
|
<wikibugs>
|
('CR) ''Brouberol: [V:''+2 C:''+2] postgresql-growthbook: allow IPv4/6 remote TCP connections for the app user/db [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199262 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:50:47
|
<wikibugs>
|
('CR) ''Brouberol: [V:''+2 C:''+2] cloudnative-pg-cluster: allow release values to override the pg_hba field [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199261 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
|
|
2025-10-28 11:54:33
|
<wikibugs>
|
('PS2) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
|
|
2025-10-28 11:54:35
|
<wikibugs>
|
('CR) ''Fabfur: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
|
|
2025-10-28 11:54:36
|
<logmsgbot>
|
fceratto@cumin1003 decommission (PID 372416) is awaiting input
|
|
2025-10-28 11:59:27
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11318342 (''Neslihan_Turan_WMDE) Hi, sorry for the delay. I had a problem accessing Slack but now I managed to sent my public key to Amir. My public key is already...'
|
|
2025-10-28 12:00:04
|
<jouncebot>
|
Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1200)
|
|
2025-10-28 12:00:36
|
<Msz2001>
|
Noting that I'll finish my deployment to private code in 2-3 minutes
|
|
2025-10-28 12:01:16
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Eqiad: row C/D switch refresh cabling task - https://phabricator.wikimedia.org/T396065#11318344 (''Jclark-ctr) @VRiley-WMF Hey, just a heads up — the fiber was installed with RX-to-RX and TX-to-TX, so the polarity wasn’t verified. Make sure to check polarity next time to avoid c...'
|
|
2025-10-28 12:04:38
|
<Msz2001>
|
!log Deployed changes to Suggested Investigations
|
|
2025-10-28 12:04:41
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 12:04:44
|
<Msz2001>
|
I'm finished with deploying
|
|
2025-10-28 12:08:08
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Eqiad: row C/D switch refresh cabling task - https://phabricator.wikimedia.org/T396065#11318379 (''cmooney) >>! In T396065#11318344, @Jclark-ctr wrote: > @cmooney link is up Ok great yep BGP looking good I've added it now. ` cmooney@ssw1-e1-eqiad> show bgp summary group core |...'
|
|
2025-10-28 12:08:51
|
<wikibugs>
|
('PS1) ''Muehlenhoff: maps: Stop installing osm2pgsql and osmborder [puppet] - ''https://gerrit.wikimedia.org/r/1199271 (https://phabricator.wikimedia.org/T381565)'
|
|
2025-10-28 12:09:14
|
<wikibugs>
|
('PS1) ''Cathal Mooney: ssw1-e1-eqiad: Add BGP peering to ssw1-d8-eqiad [homer/public] - ''https://gerrit.wikimedia.org/r/1199272 (https://phabricator.wikimedia.org/T396065)'
|
|
2025-10-28 12:12:05
|
<wikibugs>
|
('CR) ''Vgutierrez: [C:''-1] P:cache:haproxy: introduce ua classes (''4 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
|
|
2025-10-28 12:16:35
|
<wikibugs>
|
('CR) ''Dpogorzelski: [C:''+1] "Done" [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 12:19:43
|
<wikibugs>
|
('CR) ''Hnowlan: [C:''+1] trafficserver: action api to rest-gateway group0 50% [puppet] - ''https://gerrit.wikimedia.org/r/1198930 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
|
|
2025-10-28 12:19:57
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C:''+2] ssw1-e1-eqiad: Add BGP peering to ssw1-d8-eqiad [homer/public] - ''https://gerrit.wikimedia.org/r/1199272 (https://phabricator.wikimedia.org/T396065) (owner: ''Cathal Mooney)'
|
|
2025-10-28 12:21:15
|
<wikibugs>
|
('Merged) ''jenkins-bot: ssw1-e1-eqiad: Add BGP peering to ssw1-d8-eqiad [homer/public] - ''https://gerrit.wikimedia.org/r/1199272 (https://phabricator.wikimedia.org/T396065) (owner: ''Cathal Mooney)'
|
|
2025-10-28 12:24:09
|
<logmsgbot>
|
!log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
|
|
2025-10-28 12:26:28
|
<kostajh>
|
Msz2001: is deploying a follow up
|
|
2025-10-28 12:27:14
|
<logmsgbot>
|
fceratto@cumin1003 decommission (PID 372416) is awaiting input
|
|
2025-10-28 12:27:27
|
<kostajh>
|
these issues appeared after the previous deploy https://logstash.wikimedia.org/goto/d13b6c9cd8e42929d855b4c081e43484
|
|
2025-10-28 12:35:20
|
<Msz2001>
|
Deployed
|
|
2025-10-28 12:44:45
|
<wikibugs>
|
('PS1) ''Stevemunene: druid: Increase the size of the Druid broker cache size to 4GB [puppet] - ''https://gerrit.wikimedia.org/r/1199280 (https://phabricator.wikimedia.org/T408189)'
|
|
2025-10-28 12:45:22
|
<logmsgbot>
|
!log sukhe@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: reboot
|
|
2025-10-28 12:46:03
|
<logmsgbot>
|
!log sukhe@cumin1003 START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
|
|
2025-10-28 12:49:18
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Audit Eqiad Patch panels for variance from Netbox - https://phabricator.wikimedia.org/T408197#11318475 (''Jclark-ctr) a:''Jclark-ctr→''None'
|
|
2025-10-28 12:49:48
|
<logmsgbot>
|
!log sukhe@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
|
|
2025-10-28 12:53:07
|
<logmsgbot>
|
!log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
|
|
2025-10-28 12:53:07
|
<logmsgbot>
|
!log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 12:53:08
|
<logmsgbot>
|
!log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2026.codfw.wmnet
|
|
2025-10-28 12:55:28
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 13:00:05
|
<jouncebot>
|
Urbanecm and TheresNoTime: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1300).
|
|
2025-10-28 13:00:06
|
<jouncebot>
|
Bunnypranav and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
|
2025-10-28 13:00:53
|
<logmsgbot>
|
!log mvernon@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe-codfw
|
|
2025-10-28 13:01:15
|
<MatmaRex>
|
hi
|
|
2025-10-28 13:03:07
|
<MatmaRex>
|
anyone deploying?
|
|
2025-10-28 13:04:25
|
<jinxer-wm>
|
RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 13:06:09
|
<logmsgbot>
|
!log sukhe@cumin1003 START - Cookbook sre.hosts.reboot-single for host lvs2011.codfw.wmnet
|
|
2025-10-28 13:06:13
|
<wikibugs>
|
('PS5) ''Gehel: Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 13:07:15
|
<wikibugs>
|
('PS2) ''Muehlenhoff: Shift tile eqiad invalidation to the bookworm master [puppet] - ''https://gerrit.wikimedia.org/r/1195717 (https://phabricator.wikimedia.org/T381565)'
|
|
2025-10-28 13:08:08
|
<wikibugs>
|
('CR) ''CDanis: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies (''5 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
|
|
2025-10-28 13:08:23
|
<wikibugs>
|
('CR) ''Gehel: [C:''+2] Hadoop: cleanup /tmp with systemd::tmpfile [puppet] - ''https://gerrit.wikimedia.org/r/1199239 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 13:10:29
|
<wikibugs>
|
('Abandoned) ''Muehlenhoff: Shift tile eqiad invalidation to the bookworm master [puppet] - ''https://gerrit.wikimedia.org/r/1195717 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 13:11:13
|
<wikibugs>
|
('CR) ''Muehlenhoff: "The mwdebug servers are gone" [puppet] - ''https://gerrit.wikimedia.org/r/1178528 (https://phabricator.wikimedia.org/T360636) (owner: ''Muehlenhoff)'
|
|
2025-10-28 13:11:20
|
<wikibugs>
|
('PS2) ''Muehlenhoff: Remove obsolete appserver cergen certs [puppet] - ''https://gerrit.wikimedia.org/r/1178528 (https://phabricator.wikimedia.org/T360636)'
|
|
2025-10-28 13:14:04
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
|
|
2025-10-28 13:14:54
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
|
|
2025-10-28 13:17:38
|
<xSavitar>
|
MatmaRex, I can help if you'll assist with testing :)
|
|
2025-10-28 13:17:46
|
<logmsgbot>
|
!log sukhe@cumin1003 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lvs2011.codfw.wmnet
|
|
2025-10-28 13:17:50
|
<xSavitar>
|
Are you still around?
|
|
2025-10-28 13:17:58
|
<MatmaRex>
|
hi :) thanks
|
|
2025-10-28 13:18:28
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Traffic: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549 (''ssingh) ''NEW'
|
|
2025-10-28 13:18:29
|
<xSavitar>
|
Seems like Bunnypranav is not around
|
|
2025-10-28 13:18:36
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Traffic: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318574 (''ssingh) p:''Triage→''High'
|
|
2025-10-28 13:18:37
|
<xSavitar>
|
So I'll just quickly do MatmaRex's
|
|
2025-10-28 13:18:50
|
<bunnypranav>
|
Hi!
|
|
2025-10-28 13:19:07
|
<bunnypranav>
|
Bit late, apologies. I'm fine with waiting
|
|
2025-10-28 13:19:46
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by derick@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199074 (https://phabricator.wikimedia.org/T408447) (owner: ''Bartosz Dziewoński)'
|
|
2025-10-28 13:20:03
|
<xSavitar>
|
bunnypranav, okay! Will signal you once I'm done, thanks!
|
|
2025-10-28 13:20:13
|
<bunnypranav>
|
Sure :)
|
|
2025-10-28 13:20:39
|
<wikibugs>
|
('Merged) ''jenkins-bot: Make wgVectorMaxWidthOptions specify Special:Userlogin correctly [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199074 (https://phabricator.wikimedia.org/T408447) (owner: ''Bartosz Dziewoński)'
|
|
2025-10-28 13:21:13
|
<logmsgbot>
|
!log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1199074|Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447)]]
|
|
2025-10-28 13:21:19
|
<stashbot>
|
T408447: Under Vector 2022 on Wikimedia wikis, page width is different between Special:UserLogin and Special:CreateAccount - https://phabricator.wikimedia.org/T408447
|
|
2025-10-28 13:23:23
|
<wikibugs>
|
('PS1) ''Mszwarc: Remove hCaptcha site key from private/readme.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199291'
|
|
2025-10-28 13:23:50
|
<wikibugs>
|
('CR) ''Kosta Harlan: [C:''+1] Remove hCaptcha site key from private/readme.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199291 (owner: ''Mszwarc)'
|
|
2025-10-28 13:24:14
|
<kostajh>
|
xSavitar MatmaRex we need to sync the above patch ^
|
|
2025-10-28 13:24:15
|
<wikibugs>
|
('PS14) ''Pmiazga: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574)'
|
|
2025-10-28 13:25:04
|
<kostajh>
|
are either of you able to sync that? it should be a no-op. if not, either me or Msz2001 can do it
|
|
2025-10-28 13:25:08
|
<logmsgbot>
|
!log derick@deploy2002 derick, matmarex: Backport for [[gerrit:1199074|Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 13:25:12
|
<xSavitar>
|
kostajh, sure! After bunnypranav or now?
|
|
2025-10-28 13:25:25
|
<xSavitar>
|
MatmaRex, you can test
|
|
2025-10-28 13:25:26
|
<kostajh>
|
as soon as possible, I'd say
|
|
2025-10-28 13:25:49
|
<MatmaRex>
|
my change looks good
|
|
2025-10-28 13:25:53
|
<xSavitar>
|
Okay, once MatmaRex is done testing, maybe you can take over before bunnypranav (just an idea). That is if bunnypranav is up for it.
|
|
2025-10-28 13:26:05
|
<xSavitar>
|
MatmaRex, okay will sync now.
|
|
2025-10-28 13:26:06
|
<bunnypranav>
|
I'm fine, can wait if needed.
|
|
2025-10-28 13:26:12
|
<logmsgbot>
|
!log derick@deploy2002 derick, matmarex: Continuing with sync
|
|
2025-10-28 13:26:38
|
<xSavitar>
|
kostajh, okay bunnypranav agrees. I'll poke you once MatmaRex's patch is done syncing.
|
|
2025-10-28 13:27:39
|
<xSavitar>
|
kostajh, I can also help in doing it.
|
|
2025-10-28 13:28:22
|
<wikibugs>
|
('CR) ''Ottomata: Disable default user-agent collection. (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964) (owner: ''JavierMonton)'
|
|
2025-10-28 13:29:02
|
<kostajh>
|
thank you!
|
|
2025-10-28 13:29:17
|
<logmsgbot>
|
!log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
|
|
2025-10-28 13:29:30
|
<logmsgbot>
|
!log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
|
|
2025-10-28 13:29:39
|
<logmsgbot>
|
!log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
|
|
2025-10-28 13:29:46
|
<logmsgbot>
|
!log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
|
|
2025-10-28 13:29:49
|
<wikibugs>
|
('PS15) ''Pmiazga: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574)'
|
|
2025-10-28 13:29:49
|
<wikibugs>
|
('CR) ''Pmiazga: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present (''3 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 13:32:10
|
<logmsgbot>
|
!log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199074|Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447)]] (duration: 10m 56s)
|
|
2025-10-28 13:32:14
|
<stashbot>
|
T408447: Under Vector 2022 on Wikimedia wikis, page width is different between Special:UserLogin and Special:CreateAccount - https://phabricator.wikimedia.org/T408447
|
|
2025-10-28 13:33:05
|
<wikibugs>
|
('CR) ''Muehlenhoff: "Looks good to me!" [software/transferpy] - ''https://gerrit.wikimedia.org/r/1180570 (https://phabricator.wikimedia.org/T393692) (owner: ''Muehlenhoff)'
|
|
2025-10-28 13:33:19
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by derick@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199291 (owner: ''Mszwarc)'
|
|
2025-10-28 13:33:36
|
<xSavitar>
|
kostajh, so nothing to test I suppose?
|
|
2025-10-28 13:33:45
|
<kostajh>
|
xSavitar: nothing to test
|
|
2025-10-28 13:33:57
|
<xSavitar>
|
Ack! Will just sync it when it's time then, thanks~
|
|
2025-10-28 13:34:01
|
<xSavitar>
|
*!
|
|
2025-10-28 13:34:16
|
<wikibugs>
|
('Merged) ''jenkins-bot: Remove hCaptcha site key from private/readme.php [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199291 (owner: ''Mszwarc)'
|
|
2025-10-28 13:34:48
|
<logmsgbot>
|
!log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1199291|Remove hCaptcha site key from private/readme.php]]
|
|
2025-10-28 13:35:35
|
<MatmaRex>
|
thanks for deploying xSavitar
|
|
2025-10-28 13:35:59
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Traffic, ''Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11318699 (''LSobanski)'
|
|
2025-10-28 13:36:22
|
<xSavitar>
|
MatmaRex, thank you :)
|
|
2025-10-28 13:38:53
|
<logmsgbot>
|
!log derick@deploy2002 mszwarc, derick: Backport for [[gerrit:1199291|Remove hCaptcha site key from private/readme.php]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 13:39:16
|
<logmsgbot>
|
!log derick@deploy2002 mszwarc, derick: Continuing with sync
|
|
2025-10-28 13:39:42
|
<wikibugs>
|
'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11318700 (''Papaul) @cmooney thanks for the feedback, I will upgrade the diagram to match the 100G links between the core routers and the switches
and the type of...'
|
|
2025-10-28 13:42:43
|
<xSavitar>
|
bunnypranav, 64% done, will hand over to you in a few mins.
|
|
2025-10-28 13:42:56
|
<bunnypranav>
|
sure!
|
|
2025-10-28 13:43:46
|
<logmsgbot>
|
!log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199291|Remove hCaptcha site key from private/readme.php]] (duration: 08m 58s)
|
|
2025-10-28 13:43:55
|
<xSavitar>
|
bunnypranav over to you.
|
|
2025-10-28 13:44:18
|
<xSavitar>
|
and thank you for your patience. 🙏🏽
|
|
2025-10-28 13:44:27
|
<bunnypranav>
|
No worries
|
|
2025-10-28 13:45:21
|
<bunnypranav>
|
I need some help of yours as well, the patch is a creation of an namespace; do we need to run any maintenance scripts
|
|
2025-10-28 13:46:17
|
<bunnypranav>
|
btw, the namespace is "R:", and they already use that prefix, technically in the mainspace, so i assume the former.
|
|
2025-10-28 13:46:25
|
<bunnypranav>
|
xSavitar: ^^^
|
|
2025-10-28 13:46:38
|
<anzx>
|
bunnypranav: run namespacedupes
|
|
2025-10-28 13:46:49
|
<xSavitar>
|
anzx beat me to it.
|
|
2025-10-28 13:47:23
|
<bunnypranav>
|
I assume the pages wont be lost right?
|
|
2025-10-28 13:49:30
|
<wikibugs>
|
('PS2) ''JavierMonton: Disable default user-agent collection. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964)'
|
|
2025-10-28 13:49:32
|
<xSavitar>
|
bunnypranav, I think everything should be fine.
|
|
2025-10-28 13:49:36
|
<anzx>
|
bunnypranav: https://www.mediawiki.org/wiki/Manual:NamespaceDupes.php add prefix to check of any pages lost/unmoved/need manually moved can be retrieved
|
|
2025-10-28 13:49:53
|
<xSavitar>
|
Are there any pages that are already in that namespace? In the past?
|
|
2025-10-28 13:50:12
|
<xSavitar>
|
I guess I shouldn't say namespace but prefixed by R:
|
|
2025-10-28 13:50:28
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 13:50:37
|
<xSavitar>
|
After running that script, everything should work correctly and they should be part of the R: and R_talk: namespace I suppose.
|
|
2025-10-28 13:51:14
|
<bunnypranav>
|
Okay!
|
|
2025-10-28 13:51:19
|
<xSavitar>
|
runs for a meeting...
|
|
2025-10-28 13:51:28
|
<bunnypranav>
|
xSavitar: BTW I need you to deploy it for me, I am just a volunteer.
|
|
2025-10-28 13:51:57
|
<wikibugs>
|
('CR) ''Giuseppe Lavagetto: "I think the patch goes in the right direction, but is overcomplicated and misses a couple things:" [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
|
|
2025-10-28 13:52:12
|
<xSavitar>
|
bunnypranav, Oh I could do that but having a meeting now. Will you be fine doing the next backport window? That is if another deployer isn't around to help.
|
|
2025-10-28 13:52:15
|
<wikibugs>
|
('CR) ''JavierMonton: Disable default user-agent collection. (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964) (owner: ''JavierMonton)'
|
|
2025-10-28 13:52:27
|
<xSavitar>
|
I thought you would be the one deploying, apologies, I would have asked.
|
|
2025-10-28 13:52:31
|
<bunnypranav>
|
The next window is 1:30 am for me
|
|
2025-10-28 13:52:49
|
<bunnypranav>
|
Its fine
|
|
2025-10-28 13:53:22
|
<xSavitar>
|
Ops :(, I'll ping you here in a few hours (later this evening). If there is an open window, we can deploy your patch.
|
|
2025-10-28 13:53:39
|
<xSavitar>
|
Otherwise, we can do it tomorrow afternoon (that's when I'll be available).
|
|
2025-10-28 13:53:54
|
<xSavitar>
|
Is that okay by you?
|
|
2025-10-28 13:54:21
|
<wikibugs>
|
('CR) ''Clément Goubert: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 13:54:28
|
<bunnypranav>
|
Fine, I'll see if I am available tomorrow.
|
|
2025-10-28 13:54:46
|
<bunnypranav>
|
These deploy windows are pretty tough for asian timezones
|
|
2025-10-28 13:55:10
|
<xSavitar>
|
bunnypranav, FYI - this is the docs for adding a new namespace: https://wikitech.wikimedia.org/wiki/Adding_namespaces
|
|
2025-10-28 13:55:15
|
<xSavitar>
|
I hope it's still up to date.
|
|
2025-10-28 13:55:19
|
<bunnypranav>
|
Can I ping you in a few hours once I am available as well?
|
|
2025-10-28 13:55:34
|
<xSavitar>
|
bunnypranav, yes ping me please. I want to help.
|
|
2025-10-28 13:55:48
|
<bunnypranav>
|
Thank you so much!
|
|
2025-10-28 13:56:01
|
<xSavitar>
|
bunnypranav, no thank you for all the work. 🙏🏽
|
|
2025-10-28 13:56:12
|
<bunnypranav>
|
:D
|
|
2025-10-28 13:56:31
|
<xSavitar>
|
Re tz friendlyness, maybe you can ask on #wikimedia-releng about it.
|
|
2025-10-28 13:56:52
|
<xSavitar>
|
But we have multiple of these windows per day so I'm pretty sure one is friendly I suppose to your TZ
|
|
2025-10-28 13:57:11
|
<xSavitar>
|
goes AFK to attend a meeting.
|
|
2025-10-28 13:57:28
|
<bunnypranav>
|
Checked the wikitech page earlier, commit is fine; just needed confirmation on the maintenence scripts
|
|
2025-10-28 13:58:07
|
<bunnypranav>
|
yeah, the afternoon one was fine, today I was busy for the morning one, so couldn't schedule for it.
|
|
2025-10-28 14:00:05
|
<jouncebot>
|
Deploy window Metrics Platform Experimentation Lab Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1400)
|
|
2025-10-28 14:01:51
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] osm_sync_lag.sh: Fix default to current directory [puppet] - ''https://gerrit.wikimedia.org/r/1199265 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 14:02:12
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] maps: Stop installing osm2pgsql and osmborder [puppet] - ''https://gerrit.wikimedia.org/r/1199271 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
|
|
2025-10-28 14:02:41
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] LVS: etcd data for druid-public-coordinator [puppet] - ''https://gerrit.wikimedia.org/r/1198498 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 14:02:58
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] LVS: Add druid-public-coordinator to service list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198499 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 14:03:12
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] druid: add druid-coordinator to druid public worker role [puppet] - ''https://gerrit.wikimedia.org/r/1199256 (https://phabricator.wikimedia.org/T406222) (owner: ''Stevemunene)'
|
|
2025-10-28 14:05:32
|
<wikibugs>
|
('PS16) ''Pmiazga: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574)'
|
|
2025-10-28 14:05:51
|
<wikibugs>
|
('PS1) ''Brouberol: global_config: add an urldownloader external service [puppet] - ''https://gerrit.wikimedia.org/r/1199297 (https://phabricator.wikimedia.org/T408012)'
|
|
2025-10-28 14:09:58
|
<wikibugs>
|
('CR) ''Brouberol: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199297 (https://phabricator.wikimedia.org/T408012) (owner: ''Brouberol)'
|
|
2025-10-28 14:10:46
|
<wikibugs>
|
('PS5) ''Daniel Kinzler: api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 14:10:58
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128) (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:13:00
|
<wikibugs>
|
('PS1) ''Federico Ceratto: sanitize-wiki: log into phabricator [cookbooks] - ''https://gerrit.wikimedia.org/r/1199301 (https://phabricator.wikimedia.org/T408512)'
|
|
2025-10-28 14:14:10
|
<wikibugs>
|
('PS1) ''Muehlenhoff: Update account meta data for khantstop [puppet] - ''https://gerrit.wikimedia.org/r/1199302'
|
|
2025-10-28 14:14:48
|
<wikibugs>
|
('CR) ''Ottomata: [C:''+1] "I didn't look very deep to check each config, but LGTM!" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964) (owner: ''JavierMonton)'
|
|
2025-10-28 14:17:08
|
<wikibugs>
|
('PS17) ''Clément Goubert: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 14:19:50
|
<wikibugs>
|
('PS18) ''Clément Goubert: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 14:19:50
|
<wikibugs>
|
('PS7) ''Clément Goubert: api-gateway: support per-route rate limit groups for rest gateway [deployment-charts] - ''https://gerrit.wikimedia.org/r/1192879 (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:20:06
|
<wikibugs>
|
('PS8) ''Jasmine: wikikube: Add wikikube-worker2[248-330] [puppet] - ''https://gerrit.wikimedia.org/r/1181753 (https://phabricator.wikimedia.org/T390859)'
|
|
2025-10-28 14:20:28
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 14:21:24
|
<wikibugs>
|
('CR) ''Kamila Součková: [C:''+2] admin: add dpogorzelski to ops-limited [puppet] - ''https://gerrit.wikimedia.org/r/1198343 (https://phabricator.wikimedia.org/T407955) (owner: ''Kamila Součková)'
|
|
2025-10-28 14:23:12
|
<wikibugs>
|
('PS7) ''Clément Goubert: api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128) (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:23:15
|
<wikibugs>
|
('CR) ''Clare Ming: [C:''+2] xLab: Deploying v1.1.0 release to staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199228 (https://phabricator.wikimedia.org/T406729) (owner: ''Santiago Faci)'
|
|
2025-10-28 14:24:26
|
<wikibugs>
|
('CR) ''Jasmine: wikikube: Add wikikube-worker2[248-330] (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1181753 (https://phabricator.wikimedia.org/T390859) (owner: ''Jasmine)'
|
|
2025-10-28 14:24:53
|
<wikibugs>
|
('Merged) ''jenkins-bot: xLab: Deploying v1.1.0 release to staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199228 (https://phabricator.wikimedia.org/T406729) (owner: ''Santiago Faci)'
|
|
2025-10-28 14:26:09
|
<wikibugs>
|
('PS1) ''Majavah: toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457)'
|
|
2025-10-28 14:26:39
|
<wikibugs>
|
('CR) ''Andrew Bogott: [C:''+1] clean-stale-puppet-certs: Remove nodes from PuppetDB where enabled [puppet] - ''https://gerrit.wikimedia.org/r/1198299 (owner: ''Majavah)'
|
|
2025-10-28 14:27:21
|
<wikibugs>
|
'ops-codfw, ''SRE, ''DC-Ops, ''Traffic: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318894 (''Jhancock.wm) logged into idrac and found following error. ` A critical diagnostic event occurred in the memory device at B2. Contact your service provider for assistance in...'
|
|
2025-10-28 14:27:56
|
<wikibugs>
|
('CR) ''Kamila Součková: [C:''+1] "LGTM :-)" [puppet] - ''https://gerrit.wikimedia.org/r/1181753 (https://phabricator.wikimedia.org/T390859) (owner: ''Jasmine)'
|
|
2025-10-28 14:28:19
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
|
|
2025-10-28 14:28:33
|
<wikibugs>
|
('PS20) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 14:29:23
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to ops-limited for dpogorzelski - https://phabricator.wikimedia.org/T407955#11318896 (''Raine)'
|
|
2025-10-28 14:29:35
|
<wikibugs>
|
('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 29 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199246 (https://phabricator.wikimedia.org/T384964) (owner: ''JavierMonton)'
|
|
2025-10-28 14:30:05
|
<jouncebot>
|
Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1430)
|
|
2025-10-28 14:30:18
|
<wikibugs>
|
('PS19) ''Clément Goubert: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 14:30:58
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations, ''netops, ''Observability-Alerting: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11318918 (''tappof) I saw the alerts on the ALERTS metric: https://w.wiki/FqSi . I think there was a silence rule in place, so you didn't get any
notifications....'
|
|
2025-10-28 14:31:46
|
<wikibugs>
|
'ops-codfw, ''SRE, ''DC-Ops, ''Traffic: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318932 (''ssingh) ''Open→''Resolved a:''ssingh Thanks for the help @Jhancock.wm. Marking this as resolved for now.'
|
|
2025-10-28 14:32:33
|
<wikibugs>
|
('PS9) ''Clément Goubert: api-gateway: support per-route rate limit groups for rest gateway [deployment-charts] - ''https://gerrit.wikimedia.org/r/1192879 (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:33:26
|
<wikibugs>
|
'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11318939 (''dr0ptp4kt) >>! In T398869#11318126, @elukey wrote: > We finally have all three SLO published in Pyrra: https://slo.wikimedia.org/?search=xlab Thank...'
|
|
2025-10-28 14:33:50
|
<wikibugs>
|
('PS9) ''Clément Goubert: api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128) (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:35:17
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128) (owner: ''Daniel Kinzler)'
|
|
2025-10-28 14:36:02
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 14:37:38
|
<wikibugs>
|
'SRE, ''SRE-Unowned, ''Maps, ''Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11318965 (''elukey) Ran the diff testing tool between eqiad and codfw: ` | | ssim | |-----:|---------:| | 0.05 | 0.974994 | | 0.1 | 0.990161 | | 0.2 | 0.998943 | | 0.25 |...'
|
|
2025-10-28 14:37:46
|
<wikibugs>
|
('PS1) ''Brouberol: growthbook: deploy a more modern version against ferretdb [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199310 (https://phabricator.wikimedia.org/T408397)'
|
|
2025-10-28 14:39:48
|
<wikibugs>
|
('PS1) ''Federico Ceratto: site.pp, es2026.yaml: Decommission es2026 [puppet] - ''https://gerrit.wikimedia.org/r/1199311 (https://phabricator.wikimedia.org/T408385)'
|
|
2025-10-28 14:40:48
|
<hashar>
|
jouncebot: nowandnext
|
|
2025-10-28 14:40:48
|
<jouncebot>
|
For the next 0 hour(s) and 19 minute(s): xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1430)
|
|
2025-10-28 14:40:48
|
<jouncebot>
|
In 0 hour(s) and 19 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1500)
|
|
2025-10-28 14:41:36
|
<hashar>
|
I am restarting both CI Jenkins and Gerrit
|
|
2025-10-28 14:42:07
|
<hashar>
|
!log Restarting Gerrit
|
|
2025-10-28 14:42:10
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 14:44:46
|
<wikibugs>
|
('CR) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies (''4 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
|
|
2025-10-28 14:45:08
|
<hashar>
|
!log Restarted CI Jenkins
|
|
2025-10-28 14:45:11
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 14:45:44
|
<wikibugs>
|
('CR) ''Majavah: [C:''+2] clean-stale-puppet-certs: Remove nodes from PuppetDB where enabled [puppet] - ''https://gerrit.wikimedia.org/r/1198299 (owner: ''Majavah)'
|
|
2025-10-28 14:45:45
|
<hashar>
|
Gerrit/Jenkins/Zuul are all up and running
|
|
2025-10-28 14:46:02
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30030 bytes in 9.007 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 14:46:34
|
<wikibugs>
|
('CR) ''Andrea Denisse: [C:''+1] "lgtm, thank you!" [puppet] - ''https://gerrit.wikimedia.org/r/1199248 (https://phabricator.wikimedia.org/T376535) (owner: ''Huei Tan)'
|
|
2025-10-28 14:46:59
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations, ''netops, ''Observability-Alerting: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11319051 (''cmooney) >>! In T408378#11318918, @tappof wrote: > I saw the alerts on the ALERTS metric: https://w.wiki/FqSi . Ok thanks for that! That is a
good...'
|
|
2025-10-28 14:47:21
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2025.10.17 - 2025.11.07), ''Essential-Work: Degraded RAID on an-presto1013 - https://phabricator.wikimedia.org/T408065#11319065 (''RobH)'
|
|
2025-10-28 14:47:42
|
<wikibugs>
|
('PS1) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 14:48:29
|
<wikibugs>
|
('PS2) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 14:49:22
|
<wikibugs>
|
('CR) ''Clément Goubert: "Due to rebasing issues, I've squashed all the patch stack for the next phase of testing in one, plus renaming group to policy." [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:49:56
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:50:02
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 14:50:54
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30036 bytes in 0.463 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 14:51:04
|
<wikibugs>
|
('PS3) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 14:51:36
|
<wikibugs>
|
('PS1) ''Cathal Mooney: team-netops: ospf alert: add pint disable promql/series [alerts] - ''https://gerrit.wikimedia.org/r/1199332 (https://phabricator.wikimedia.org/T408378)'
|
|
2025-10-28 14:52:06
|
<wikibugs>
|
('CR) ''Pmiazga: api-gateway: Release patch for ratelimit test (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:52:32
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:52:33
|
<logmsgbot>
|
!log elukey@puppetserver1001 conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
|
|
2025-10-28 14:52:37
|
<wikibugs>
|
('PS2) ''Majavah: toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457)'
|
|
2025-10-28 14:52:37
|
<wikibugs>
|
('PS1) ''Majavah: toolforge::toolviews: Fix footgun with default values [puppet] - ''https://gerrit.wikimedia.org/r/1199333'
|
|
2025-10-28 14:54:03
|
<wikibugs>
|
('PS1) ''Gehel: hadoop: cleanup /tmp from directories as well as files [puppet] - ''https://gerrit.wikimedia.org/r/1199334 (https://phabricator.wikimedia.org/T396582)'
|
|
2025-10-28 14:55:01
|
<wikibugs>
|
('PS3) ''Cwhite: site: initial setup for new logging-sd hosts [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796)'
|
|
2025-10-28 14:55:07
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
|
|
2025-10-28 14:56:33
|
<wikibugs>
|
('PS4) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 14:57:38
|
<logmsgbot>
|
!log dancy@deploy2002 Installing scap version "4.218.0" for 2 host(s)
|
|
2025-10-28 14:57:57
|
<wikibugs>
|
('CR) ''Clément Goubert: api-gateway: Release patch for ratelimit test (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:57:58
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 14:58:11
|
<wikibugs>
|
('CR) ''FNegri: [C:''+1] toolforge::toolviews: Fix footgun with default values [puppet] - ''https://gerrit.wikimedia.org/r/1199333 (owner: ''Majavah)'
|
|
2025-10-28 14:59:11
|
<wikibugs>
|
('PS2) ''Majavah: toolforge::toolviews: Fix footgun with default values [puppet] - ''https://gerrit.wikimedia.org/r/1199333'
|
|
2025-10-28 14:59:24
|
<logmsgbot>
|
!log dancy@deploy2002 Installation of scap version "4.218.0" completed for 2 hosts
|
|
2025-10-28 14:59:59
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to ops-limited for dpogorzelski - https://phabricator.wikimedia.org/T407955#11319159 (''Raine) ''Open→''Resolved Done, ping me in case of trouble :-)'
|
|
2025-10-28 15:00:05
|
<jouncebot>
|
jelto, arnoldokoth, and mutante: It is that lovely time of the day again! You are hereby commanded to deploy SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1500).
|
|
2025-10-28 15:00:31
|
<jelto>
|
no my calendar says it's in one hour
|
|
2025-10-28 15:00:41
|
<taavi>
|
daylight confusion time
|
|
2025-10-28 15:01:05
|
<wikibugs>
|
('PS21) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 15:01:46
|
<wikibugs>
|
('CR) ''Majavah: [C:''+2] toolforge::toolviews: Fix footgun with default values [puppet] - ''https://gerrit.wikimedia.org/r/1199333 (owner: ''Majavah)'
|
|
2025-10-28 15:02:31
|
<wikibugs>
|
('PS3) ''Majavah: toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457)'
|
|
2025-10-28 15:04:14
|
<wikibugs>
|
'SRE, ''Traffic, ''FY2025-26 WE3.3 Engaging core audiences, ''Reader Experience Team (REx Sprint 8 [Q2 Oct 21-Nov 3]): [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11319191 (''Jdrewniak) When I talked to #traffic about this topic...'
|
|
2025-10-28 15:04:35
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
|
|
2025-10-28 15:05:42
|
<logmsgbot>
|
!log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: reboot for kernel
|
|
2025-10-28 15:06:00
|
<wikibugs>
|
('PS4) ''Majavah: toolforge::toolviews: Output proper Prometheus metrics [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457)'
|
|
2025-10-28 15:06:19
|
<logmsgbot>
|
!log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: reboot for kernel
|
|
2025-10-28 15:06:34
|
<wikibugs>
|
('PS22) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 15:07:20
|
<wikibugs>
|
'SRE, ''SRE-Unowned, ''Maps, ''Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11319213 (''elukey) @TheDJ Hi! As FYI we now have eqiad and codfw on the new stack, both eqiad and codfw are pooled :)'
|
|
2025-10-28 15:07:23
|
<wikibugs>
|
('CR) ''Jelto: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7486/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
|
|
2025-10-28 15:09:04
|
<jinxer-wm>
|
FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 15:09:07
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-cron: apply
|
|
2025-10-28 15:09:11
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
|
|
2025-10-28 15:09:19
|
<logmsgbot>
|
!log brennen@deploy2002 Started deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575
|
|
2025-10-28 15:09:21
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-cron: apply
|
|
2025-10-28 15:09:24
|
<stashbot>
|
T408575: Deploy Phabricator/Phorge 2025-10-28 - https://phabricator.wikimedia.org/T408575
|
|
2025-10-28 15:09:25
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
|
|
2025-10-28 15:09:53
|
<logmsgbot>
|
!log brennen@deploy2002 Finished deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575 (duration: 00m 34s)
|
|
2025-10-28 15:10:12
|
<logmsgbot>
|
!log brennen@deploy2002 Started deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575
|
|
2025-10-28 15:11:37
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Machine-Learning-Team: DIMM_A2 errors for ml-serve2001 - https://phabricator.wikimedia.org/T408516#11319244 (''elukey)'
|
|
2025-10-28 15:11:41
|
<swfrench-wmf>
|
!log applied mediawiki-common network policy updates in mw-script / mw-cron - T309738
|
|
2025-10-28 15:11:48
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 15:11:52
|
<stashbot>
|
T309738: Move MediaWiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738
|
|
2025-10-28 15:12:13
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11319246 (''Ladsgroup) >>! In T406590#11318342, @Neslihan_Turan_WMDE wrote: > Hi, sorry for the delay. I had a problem accessing Slack but now I managed to sent my...'
|
|
2025-10-28 15:12:22
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Machine-Learning-Team: DIMM_A2 errors for ml-serve2001 - https://phabricator.wikimedia.org/T408516#11319258 (''elukey) The host is up after a powercycle, but it is still not serving any traffic. Adding dcops if they want to investigate it further, giving the numerous occurrences of t...'
|
|
2025-10-28 15:13:24
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11319262 (''Ladsgroup) I confirmed the key out of band.'
|
|
2025-10-28 15:13:38
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11319266 (''Ladsgroup)'
|
|
2025-10-28 15:14:01
|
<wikibugs>
|
('PS1) ''Ottomata: AQS edit-analytics - deploy new edits/per_editor endpoint [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199337 (https://phabricator.wikimedia.org/T405041)'
|
|
2025-10-28 15:16:21
|
<logmsgbot>
|
!log brennen@deploy2002 Finished deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575 (duration: 06m 09s)
|
|
2025-10-28 15:16:33
|
<stashbot>
|
T408575: Deploy Phabricator/Phorge 2025-10-28 - https://phabricator.wikimedia.org/T408575
|
|
2025-10-28 15:16:56
|
<wikibugs>
|
('PS5) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 15:19:26
|
<wikibugs>
|
('CR) ''Elukey: [C:''+1] Nokia: always set system cpm packet filter on devices [homer/public] - ''https://gerrit.wikimedia.org/r/1199056 (https://phabricator.wikimedia.org/T402577) (owner: ''Cathal Mooney)'
|
|
2025-10-28 15:20:05
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+1] druid: Increase the size of the Druid broker cache size to 4GB [puppet] - ''https://gerrit.wikimedia.org/r/1199280 (https://phabricator.wikimedia.org/T408189) (owner: ''Stevemunene)'
|
|
2025-10-28 15:21:39
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Machine-Learning-Team: DIMM_A2 errors for ml-serve2001 - https://phabricator.wikimedia.org/T408516#11319327 (''Jhancock.wm) @elukey is it depooled? i wanna check some things out that might require some reboots.'
|
|
2025-10-28 15:23:36
|
<swfrench-wmf>
|
!log disable-puppet on A:cp hosts for haproxy config change
|
|
2025-10-28 15:23:39
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 15:24:02
|
<icinga-wm>
|
PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 15:24:02
|
<wikibugs>
|
('CR) ''Stevemunene: [C:''+1] "LGTM!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199310 (https://phabricator.wikimedia.org/T408397) (owner: ''Brouberol)'
|
|
2025-10-28 15:24:16
|
<wikibugs>
|
'ops-codfw, ''DC-Ops, ''Machine-Learning-Team: DIMM_A2 errors for ml-serve2001 - https://phabricator.wikimedia.org/T408516#11319338 (''elukey) @Jhancock.wm yep you can go ahead! Thanks :)'
|
|
2025-10-28 15:24:33
|
<wikibugs>
|
('CR) ''Scott French: "Thanks for the review!" [puppet] - ''https://gerrit.wikimedia.org/r/1193276 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
|
|
2025-10-28 15:24:36
|
<wikibugs>
|
('CR) ''Scott French: [C:''+2] P:cache::haproxy: move x_requestctl setup into listen section [puppet] - ''https://gerrit.wikimedia.org/r/1193276 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
|
|
2025-10-28 15:24:56
|
<icinga-wm>
|
RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30037 bytes in 2.732 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
|
|
2025-10-28 15:25:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 15:27:27
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11319349 (''MoritzMuehlenhoff)'
|
|
2025-10-28 15:27:29
|
<wikibugs>
|
('PS6) ''Clément Goubert: api-gateway: Release patch for ratelimit test [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128)'
|
|
2025-10-28 15:27:55
|
<wikibugs>
|
('Abandoned) ''Clément Goubert: api-gateway: rest gw should call ratelimit only when x-wmf-user-class header is present [deployment-charts] - ''https://gerrit.wikimedia.org/r/1191318 (https://phabricator.wikimedia.org/T405574) (owner: ''Pmiazga)'
|
|
2025-10-28 15:28:05
|
<wikibugs>
|
('Abandoned) ''Clément Goubert: api-gateway: support per-route rate limit groups for rest gateway [deployment-charts] - ''https://gerrit.wikimedia.org/r/1192879 (owner: ''Daniel Kinzler)'
|
|
2025-10-28 15:28:11
|
<wikibugs>
|
('Abandoned) ''Clément Goubert: api-gateway: make cookie name configurable for testing [deployment-charts] - ''https://gerrit.wikimedia.org/r/1198385 (https://phabricator.wikimedia.org/T408128) (owner: ''Daniel Kinzler)'
|
|
2025-10-28 15:29:37
|
<wikibugs>
|
('CR) ''CDanis: [C:''+1] "+1 from me! Although I don't think it's strictly necessary to make the same change on the public druid IMO" [puppet] - ''https://gerrit.wikimedia.org/r/1199280 (https://phabricator.wikimedia.org/T408189) (owner: ''Stevemunene)'
|
|
2025-10-28 15:34:04
|
<jinxer-wm>
|
RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 15:34:52
|
<logmsgbot>
|
!log jhancock@cumin1003 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ml-serve2001']
|
|
2025-10-28 15:34:53
|
<wikibugs>
|
('PS2) ''Arlolra: ExtensionDistributor: Mark 1.45 as beta [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199113 (https://phabricator.wikimedia.org/T408466)'
|
|
2025-10-28 15:35:13
|
<wikibugs>
|
('CR) ''Herron: [C:''+1] alertmanager: Add support for team mentions on the Slack template [puppet] - ''https://gerrit.wikimedia.org/r/1194321 (https://phabricator.wikimedia.org/T408145) (owner: ''Andrea Denisse)'
|
|
2025-10-28 15:36:36
|
<wikibugs>
|
('CR) ''Ottomata: [C:''+2] "Main patch has been reviewed, merging for deployment." [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199337 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 15:36:49
|
<wikibugs>
|
('CR) ''Herron: [C:''+1] nrpe2nodexp: use service description as alertname [puppet] - ''https://gerrit.wikimedia.org/r/1199242 (https://phabricator.wikimedia.org/T395446) (owner: ''Tiziano Fogli)'
|
|
2025-10-28 15:37:56
|
<wikibugs>
|
('CR) ''Arlolra: ExtensionDistributor: Mark 1.45 as beta (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199113 (https://phabricator.wikimedia.org/T408466) (owner: ''Arlolra)'
|
|
2025-10-28 15:38:23
|
<wikibugs>
|
('Merged) ''jenkins-bot: AQS edit-analytics - deploy new edits/per_editor endpoint [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199337 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 15:41:52
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 15:43:49
|
<swfrench-wmf>
|
!log rolling run-puppet-agent on A:cp hosts for haproxy config change
|
|
2025-10-28 15:43:52
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 15:44:52
|
<wikibugs>
|
('PS1) ''Kamila Součková: benthos-cache-invalidator: clean up releases [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199340'
|
|
2025-10-28 15:44:55
|
<logmsgbot>
|
!log jhancock@cumin1003 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ml-serve2001']
|
|
2025-10-28 15:46:22
|
<wikibugs>
|
'ops-eqiad, ''DC-Ops: Unresponsive management for ms-be1090.mgmt:22 - https://phabricator.wikimedia.org/T408585 (''phaultfinder) ''NEW'
|
|
2025-10-28 15:46:39
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] benthos-cache-invalidator: clean up releases [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199340 (owner: ''Kamila Součková)'
|
|
2025-10-28 15:49:29
|
<wikibugs>
|
('PS1) ''PipelineBot: citoid: pipeline bot promote [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199403'
|
|
2025-10-28 15:51:12
|
<wikibugs>
|
('PS3) ''Ebernhardson: cirrus: Start near match A/B test [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199054 (https://phabricator.wikimedia.org/T408154)'
|
|
2025-10-28 15:51:12
|
<wikibugs>
|
('CR) ''Ebernhardson: cirrus: Start near match A/B test (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199054 (https://phabricator.wikimedia.org/T408154) (owner: ''Ebernhardson)'
|
|
2025-10-28 15:51:58
|
<wikibugs>
|
('CR) ''CI reject: [V:''-1] cirrus: Start near match A/B test [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199054 (https://phabricator.wikimedia.org/T408154) (owner: ''Ebernhardson)'
|
|
2025-10-28 15:54:09
|
<jinxer-wm>
|
FIRING: HelmReleaseBadStatus: Helm release edit-analytics/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=edit-analytics - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
|
|
2025-10-28 15:54:31
|
<wikibugs>
|
('PS23) ''Jelto: git_ssh_proxy: add role::git_ssh_proxy for Gerrit and GitLab ssh proxies [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 15:58:29
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C:''+2] Nokia: always set system cpm packet filter on devices [homer/public] - ''https://gerrit.wikimedia.org/r/1199056 (https://phabricator.wikimedia.org/T402577) (owner: ''Cathal Mooney)'
|
|
2025-10-28 15:59:04
|
<jinxer-wm>
|
FIRING: MediaWikiElevatedUnknownLogins: Elevated number of login successes (source unknown) via mw-web - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
|
|
2025-10-28 15:59:17
|
<wikibugs>
|
('PS4) ''Ebernhardson: cirrus: Start near match A/B test [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199054 (https://phabricator.wikimedia.org/T408154)'
|
|
2025-10-28 16:00:00
|
<wikibugs>
|
('Merged) ''jenkins-bot: Nokia: always set system cpm packet filter on devices [homer/public] - ''https://gerrit.wikimedia.org/r/1199056 (https://phabricator.wikimedia.org/T402577) (owner: ''Cathal Mooney)'
|
|
2025-10-28 16:00:04
|
<jouncebot>
|
jhathaway and moritzm: Time to snap out of that daydream and deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1600).
|
|
2025-10-28 16:00:05
|
<jouncebot>
|
No Gerrit patches in the queue for this window AFAICS.
|
|
2025-10-28 16:04:04
|
<jinxer-wm>
|
RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of login successes (source unknown) via mw-web - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
|
|
2025-10-28 16:05:20
|
<wikibugs>
|
'ops-magru, ''SRE, ''DC-Ops: MAGRU power maint - CHG0262056 - October 29-30, 2025 - https://phabricator.wikimedia.org/T408589 (''RobH) ''NEW p:''Triage→''Low'
|
|
2025-10-28 16:05:50
|
<wikibugs>
|
'ops-magru, ''SRE, ''DC-Ops: MAGRU power maint - CHG0262056 - October 29-30, 2025 - https://phabricator.wikimedia.org/T408589#11319581 (''RobH) Please note the email required we give consent for the work so I did so via the email.'
|
|
2025-10-28 16:06:52
|
<wikibugs>
|
'ops-magru, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, and 2 others: MAGRU power maint - CHG0262056 - October 29-30, 2025 - https://phabricator.wikimedia.org/T408589#11319592 (''RobH) @netops & #traffic: I don't expect any impact from this according to the notification but just FYI!'
|
|
2025-10-28 16:13:53
|
<wikibugs>
|
'SRE, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592 (''Jdrewniak) ''NEW'
|
|
2025-10-28 16:14:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 16:14:23
|
<wikibugs>
|
'SRE, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11319644 (''Jdrewniak)'
|
|
2025-10-28 16:15:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 16:24:08
|
<wikibugs>
|
('CR) ''Marostegui: sanitize-wiki: log into phabricator (''1 comment) [cookbooks] - ''https://gerrit.wikimedia.org/r/1199301 (https://phabricator.wikimedia.org/T408512) (owner: ''Federico Ceratto)'
|
|
2025-10-28 16:30:56
|
<wikibugs>
|
('PS1) ''Marostegui: instances.yaml: Remove es1031 [puppet] - ''https://gerrit.wikimedia.org/r/1199462 (https://phabricator.wikimedia.org/T408600)'
|
|
2025-10-28 16:31:37
|
<wikibugs>
|
('CR) ''Marostegui: [C:''+2] instances.yaml: Remove es1031 [puppet] - ''https://gerrit.wikimedia.org/r/1199462 (https://phabricator.wikimedia.org/T408600) (owner: ''Marostegui)'
|
|
2025-10-28 16:32:53
|
<logmsgbot>
|
!log marostegui@cumin1003 dbctl commit (dc=all): 'Remove es1031 from dbctl T408600', diff saved to https://phabricator.wikimedia.org/P84315 and previous config saved to /var/cache/conftool/dbconfig/20251028-163252-marostegui.json
|
|
2025-10-28 16:32:59
|
<stashbot>
|
T408600: decommission es1031.eqiad.wmnet - https://phabricator.wikimedia.org/T408600
|
|
2025-10-28 16:34:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 16:34:18
|
<wikibugs>
|
('PS1) ''Marostegui: mariadb: Decommission es1031 [puppet] - ''https://gerrit.wikimedia.org/r/1199463 (https://phabricator.wikimedia.org/T408600)'
|
|
2025-10-28 16:34:49
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts es1031.eqiad.wmnet
|
|
2025-10-28 16:35:08
|
<wikibugs>
|
('PS1) ''Ottomata: edit-analytics - bump to build on bookworm [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199464 (https://phabricator.wikimedia.org/T405041)'
|
|
2025-10-28 16:35:14
|
<wikibugs>
|
('PS1) ''Elukey: prometheus-amd-rocm: fix exporter for ROCm 7.0.2 [puppet] - ''https://gerrit.wikimedia.org/r/1199465 (https://phabricator.wikimedia.org/T403697)'
|
|
2025-10-28 16:35:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 16:35:34
|
<wikibugs>
|
('CR) ''Marostegui: [C:''+2] mariadb: Decommission es1031 [puppet] - ''https://gerrit.wikimedia.org/r/1199463 (https://phabricator.wikimedia.org/T408600) (owner: ''Marostegui)'
|
|
2025-10-28 16:36:00
|
<wikibugs>
|
('CR) ''Marostegui: "is it already removed from dbctl?" [puppet] - ''https://gerrit.wikimedia.org/r/1199311 (https://phabricator.wikimedia.org/T408385) (owner: ''Federico Ceratto)'
|
|
2025-10-28 16:36:06
|
<wikibugs>
|
('CR) ''Ottomata: [C:''+2] edit-analytics - bump to build on bookworm [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199464 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 16:36:08
|
<wikibugs>
|
('PS1) ''Mszwarc: hCaptcha: Store risk score in cache, so that jobs can use it [extensions/ConfirmEdit] (wmf/1.45.0-wmf.24) - ''https://gerrit.wikimedia.org/r/1199466 (https://phabricator.wikimedia.org/T408542)'
|
|
2025-10-28 16:36:27
|
<wikibugs>
|
('PS1) ''Mszwarc: hCaptcha: Store risk score in cache, so that jobs can use it [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199467 (https://phabricator.wikimedia.org/T408542)'
|
|
2025-10-28 16:36:29
|
<wikibugs>
|
('PS2) ''Elukey: prometheus-amd-rocm: fix exporter for ROCm 7.0.2 [puppet] - ''https://gerrit.wikimedia.org/r/1199465 (https://phabricator.wikimedia.org/T403697)'
|
|
2025-10-28 16:37:45
|
<wikibugs>
|
('Merged) ''jenkins-bot: edit-analytics - bump to build on bookworm [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199464 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 16:38:29
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:38:34
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:40:43
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.dns.netbox
|
|
2025-10-28 16:40:58
|
<jinxer-wm>
|
FIRING: NELHigh: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
|
|
2025-10-28 16:41:03
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:41:18
|
<sukhe>
|
!incidents
|
|
2025-10-28 16:41:18
|
<sirenbot>
|
6905 (UNACKED) NELHigh sre (thanos-rule@main tcp.timed_out)
|
|
2025-10-28 16:41:24
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:41:24
|
<_joe_>
|
sukhe: hi
|
|
2025-10-28 16:41:29
|
<sukhe>
|
!ack 6905
|
|
2025-10-28 16:41:30
|
<_joe_>
|
!ack 6905
|
|
2025-10-28 16:41:30
|
<Raine>
|
!ack 6905
|
|
2025-10-28 16:41:32
|
<sirenbot>
|
6905 (ACKED) NELHigh sre (thanos-rule@main tcp.timed_out)
|
|
2025-10-28 16:41:33
|
<sirenbot>
|
6905 (ACKED) NELHigh sre (thanos-rule@main tcp.timed_out)
|
|
2025-10-28 16:41:33
|
<sirenbot>
|
6905 (ACKED) NELHigh sre (thanos-rule@main tcp.timed_out)
|
|
2025-10-28 16:44:09
|
<jinxer-wm>
|
RESOLVED: HelmReleaseBadStatus: Helm release edit-analytics/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=edit-analytics - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
|
|
2025-10-28 16:44:17
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
|
|
2025-10-28 16:44:36
|
<logmsgbot>
|
!log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
|
|
2025-10-28 16:44:36
|
<logmsgbot>
|
!log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 16:44:39
|
<logmsgbot>
|
!log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1031.eqiad.wmnet
|
|
2025-10-28 16:44:44
|
<wikibugs>
|
('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, October 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [extensions/ConfirmEdit] (wmf/1.45.0-wmf.24) - ''https://gerrit.wikimedia.org/r/1199466 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 16:45:20
|
<wikibugs>
|
('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, October 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199467 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 16:45:58
|
<jinxer-wm>
|
RESOLVED: NELHigh: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
|
|
2025-10-28 16:49:36
|
<wikibugs>
|
('CR) ''Brouberol: [C:''+2] growthbook: deploy a more modern version against ferretdb [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199310 (https://phabricator.wikimedia.org/T408397) (owner: ''Brouberol)'
|
|
2025-10-28 16:50:39
|
<wikibugs>
|
'ops-eqiad, ''DBA, ''DC-Ops, ''decommission-hardware: decommission es1031.eqiad.wmnet - https://phabricator.wikimedia.org/T408600#11320015 (''Marostegui)'
|
|
2025-10-28 16:50:50
|
<wikibugs>
|
'ops-eqiad, ''DBA, ''DC-Ops, ''decommission-hardware: decommission es1031.eqiad.wmnet - https://phabricator.wikimedia.org/T408600#11320042 (''Marostegui) This is ready for #dc-ops'
|
|
2025-10-28 16:51:10
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [codfw] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:51:22
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:51:33
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
|
|
2025-10-28 16:51:39
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
|
|
2025-10-28 16:51:57
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:52:17
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
|
|
2025-10-28 16:52:17
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 16:52:37
|
<logmsgbot>
|
!log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
|
|
2025-10-28 16:52:39
|
<wikibugs>
|
('PS1) ''Pppery: Update translation [phabricator/translations] (wmf/stable) - ''https://gerrit.wikimedia.org/r/1199469'
|
|
2025-10-28 16:53:11
|
<wikibugs>
|
('PS2) ''Pppery: Update translations [phabricator/translations] (wmf/stable) - ''https://gerrit.wikimedia.org/r/1199469'
|
|
2025-10-28 16:53:42
|
<wikibugs>
|
('PS3) ''Pppery: Update translations [phabricator/translations] (wmf/stable) - ''https://gerrit.wikimedia.org/r/1199469'
|
|
2025-10-28 16:55:28
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 17:00:05
|
<jouncebot>
|
swfrench-wmf: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki infrastructure (UTC late) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1700).
|
|
2025-10-28 17:00:13
|
<swfrench-wmf>
|
o/
|
|
2025-10-28 17:00:26
|
<_joe_>
|
jouncebot: cringe
|
|
2025-10-28 17:00:28
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 17:00:40
|
<swfrench-wmf>
|
lol
|
|
2025-10-28 17:01:33
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by swfrench@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199048 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 17:01:35
|
<wikibugs>
|
('PS4) ''JHathaway: sysctls: add optional module param to sysctl::parameters [puppet] - ''https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726)'
|
|
2025-10-28 17:02:23
|
<wikibugs>
|
('Merged) ''jenkins-bot: Enroll 10% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199048 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 17:02:56
|
<logmsgbot>
|
!log swfrench@deploy2002 Started scap sync-world: Backport for [[gerrit:1199048|Enroll 10% of client sessions in PHP 8.3 (T405955)]]
|
|
2025-10-28 17:03:06
|
<stashbot>
|
T405955: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955
|
|
2025-10-28 17:05:09
|
<wikibugs>
|
('CR) ''JHathaway: "I wasn't aware of ConditionKernelModuleLoaded. I tried it on a qemu sid box, but I couldn't get it to work properly. I think this is becau" [puppet] - ''https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726) (owner: ''JHathaway)'
|
|
2025-10-28 17:05:13
|
<wikibugs>
|
('CR) ''JHathaway: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726) (owner: ''JHathaway)'
|
|
2025-10-28 17:05:23
|
<logmsgbot>
|
!log swfrench@deploy2002 swfrench: Backport for [[gerrit:1199048|Enroll 10% of client sessions in PHP 8.3 (T405955)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 17:05:47
|
<wikibugs>
|
('CR) ''Klausman: [C:''+1] prometheus-amd-rocm: fix exporter for ROCm 7.0.2 [puppet] - ''https://gerrit.wikimedia.org/r/1199465 (https://phabricator.wikimedia.org/T403697) (owner: ''Elukey)'
|
|
2025-10-28 17:07:09
|
<logmsgbot>
|
!log swfrench@deploy2002 swfrench: Continuing with sync
|
|
2025-10-28 17:08:16
|
<icinga-wm>
|
PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
|
|
2025-10-28 17:08:40
|
<logmsgbot>
|
!log marostegui@cumin1003 dbctl commit (dc=all): 'Depool sretest2003 T407352', diff saved to https://phabricator.wikimedia.org/P84316 and previous config saved to /var/cache/conftool/dbconfig/20251028-170840-marostegui.json
|
|
2025-10-28 17:08:46
|
<stashbot>
|
T407352: Test config H 1P in external store - https://phabricator.wikimedia.org/T407352
|
|
2025-10-28 17:09:04
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 17:09:04
|
<jinxer-wm>
|
FIRING: [6x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 17:09:59
|
<logmsgbot>
|
!log marostegui@cumin1003 dbctl commit (dc=all): 'Depool es2040 to clone sretest2003 T407352', diff saved to https://phabricator.wikimedia.org/P84317 and previous config saved to /var/cache/conftool/dbconfig/20251028-170958-marostegui.json
|
|
2025-10-28 17:11:20
|
<logmsgbot>
|
!log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2040.codfw.wmnet,sretest2003.codfw.wmnet with reason: Cloning sretest2003 from es2040
|
|
2025-10-28 17:11:27
|
<logmsgbot>
|
!log swfrench@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199048|Enroll 10% of client sessions in PHP 8.3 (T405955)]] (duration: 08m 30s)
|
|
2025-10-28 17:11:31
|
<wikibugs>
|
('PS1) ''Marostegui: sretest2003: Move it to es7 [puppet] - ''https://gerrit.wikimedia.org/r/1199472 (https://phabricator.wikimedia.org/T407352)'
|
|
2025-10-28 17:11:32
|
<stashbot>
|
T405955: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955
|
|
2025-10-28 17:12:47
|
<logmsgbot>
|
!log marostegui@cumin1003 START - Cookbook sre.mysql.clone of es2040.codfw.wmnet onto sretest2003.codfw.wmnet
|
|
2025-10-28 17:13:01
|
<wikibugs>
|
'SRE, ''Traffic, ''FY2025-26 WE3.3 Engaging core audiences, ''Reader Experience Team (REx Sprint 8 [Q2 Oct 21-Nov 3]): [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11320228 (''CDanis) Sounds good to me @Jdrewniak ! Thanks :)'
|
|
2025-10-28 17:13:11
|
<wikibugs>
|
('CR) ''Marostegui: [C:''+2] sretest2003: Move it to es7 [puppet] - ''https://gerrit.wikimedia.org/r/1199472 (https://phabricator.wikimedia.org/T407352) (owner: ''Marostegui)'
|
|
2025-10-28 17:13:29
|
<swfrench-wmf>
|
part #1 of the infra window done. part #2 coming soon.
|
|
2025-10-28 17:13:44
|
<wikibugs>
|
('PS3) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
|
|
2025-10-28 17:13:54
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320235 (''Dzahn) a:''Neslihan_Turan_WMDE→''None Thank you for taking care of that, Ladsgroup!'
|
|
2025-10-28 17:14:02
|
<wikibugs>
|
('CR) ''Fabfur: [C:''-1] "still addressing the comments" [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
|
|
2025-10-28 17:14:06
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320237 (''Dzahn) ''Stalled→''In progress'
|
|
2025-10-28 17:14:51
|
<wikibugs>
|
('CR) ''Andrea Denisse: [C:''+2] alertmanager: Add support for team mentions on the Slack template [puppet] - ''https://gerrit.wikimedia.org/r/1194321 (https://phabricator.wikimedia.org/T408145) (owner: ''Andrea Denisse)'
|
|
2025-10-28 17:18:54
|
<wikibugs>
|
('CR) ''Scott French: [C:''+2] mw-(api-int|jobrunner): Serve 5% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199047 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 17:20:45
|
<wikibugs>
|
('Merged) ''jenkins-bot: mw-(api-int|jobrunner): Serve 5% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199047 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 17:22:53
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:23:08
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:23:29
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:23:38
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:25:27
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:25:40
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:25:47
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320304 (''RobH) [[ https://docs.google.com/spreadsheets/d/13ow4JxrsQdz8KSsdBBNwvlrAuGKo8OHWcnR4RhXTYc0/edit?usp=sharing | Google Sheet listing of all affect...'
|
|
2025-10-28 17:26:03
|
<wikibugs>
|
('PS3) ''Elukey: prometheus-amd-rocm: fix exporter for ROCm 7.0.2 [puppet] - ''https://gerrit.wikimedia.org/r/1199465 (https://phabricator.wikimedia.org/T403697)'
|
|
2025-10-28 17:26:23
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:26:31
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:27:25
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:27:37
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:27:58
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:28:04
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
|
|
2025-10-28 17:28:37
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:28:46
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:28:56
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:29:01
|
<logmsgbot>
|
!log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
|
|
2025-10-28 17:31:42
|
<wikibugs>
|
('PS5) ''BCornwall: varnish: Promote new m-dot redirect from 302/307 to 301/308 [puppet] - ''https://gerrit.wikimedia.org/r/1198429 (https://phabricator.wikimedia.org/T405931) (owner: ''Krinkle)'
|
|
2025-10-28 17:32:30
|
<wikibugs>
|
('CR) ''BCornwall: "I took the liberty to update two more tests to use 301s instead of 302s. varnishtests now pass. Mind giving that a lookover?" [puppet] - ''https://gerrit.wikimedia.org/r/1198429 (https://phabricator.wikimedia.org/T405931) (owner: ''Krinkle)'
|
|
2025-10-28 17:33:49
|
<logmsgbot>
|
!log fceratto@cumin1003 dbctl commit (dc=all): 'Depool es2027 T408406', diff saved to https://phabricator.wikimedia.org/P84318 and previous config saved to /var/cache/conftool/dbconfig/20251028-173348-fceratto.json
|
|
2025-10-28 17:33:53
|
<stashbot>
|
T408406: decommission es2027 - https://phabricator.wikimedia.org/T408406
|
|
2025-10-28 17:38:27
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320374 (''RobH)'
|
|
2025-10-28 17:38:46
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320386 (''RobH)'
|
|
2025-10-28 17:40:28
|
<wikibugs>
|
('CR) ''BCornwall: "Marking unresolved" [puppet] - ''https://gerrit.wikimedia.org/r/1198429 (https://phabricator.wikimedia.org/T405931) (owner: ''Krinkle)'
|
|
2025-10-28 17:46:12
|
<wikibugs>
|
('PS2) ''Federico Ceratto: site.pp, es2026.yaml: Decommission es2026 [puppet] - ''https://gerrit.wikimedia.org/r/1199311 (https://phabricator.wikimedia.org/T408385)'
|
|
2025-10-28 17:46:12
|
<wikibugs>
|
('PS1) ''Federico Ceratto: instances.yaml: remove es2027 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199476 (https://phabricator.wikimedia.org/T408406)'
|
|
2025-10-28 17:52:41
|
<wikibugs>
|
'SRE, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11320455 (''Aklapper) > - The ability to access this page via a custom domain/subdomain (TBD) Wasn't that {T407156} instead of TBD?'
|
|
2025-10-28 17:57:24
|
<wikibugs>
|
('PS1) ''Ottomata: edit-analytics - image bump to fix path route [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199479 (https://phabricator.wikimedia.org/T405041)'
|
|
2025-10-28 17:57:36
|
<wikibugs>
|
('CR) ''Scott French: [C:''+1] {api,rest}-gateway: Update to Envoy 1.32.12 in staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199085 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 17:57:38
|
<wikibugs>
|
('CR) ''Ottomata: [C:''+2] edit-analytics - image bump to fix path route [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199479 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 17:59:19
|
<wikibugs>
|
('Merged) ''jenkins-bot: edit-analytics - image bump to fix path route [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199479 (https://phabricator.wikimedia.org/T405041) (owner: ''Ottomata)'
|
|
2025-10-28 18:00:05
|
<jouncebot>
|
dduvall and dancy: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T1800).
|
|
2025-10-28 18:01:28
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:01:46
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:02:06
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [codfw] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:02:21
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:02:31
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:03:02
|
<logmsgbot>
|
!log otto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
|
|
2025-10-28 18:04:24
|
<jinxer-wm>
|
FIRING: [6x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 18:06:45
|
<wikibugs>
|
('PS1) ''TrainBranchBot: group0 to 1.45.0-wmf.25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199481 (https://phabricator.wikimedia.org/T405681)'
|
|
2025-10-28 18:06:52
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199481 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 18:07:42
|
<wikibugs>
|
('Merged) ''jenkins-bot: group0 to 1.45.0-wmf.25 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199481 (https://phabricator.wikimedia.org/T405681) (owner: ''TrainBranchBot)'
|
|
2025-10-28 18:08:15
|
<icinga-wm>
|
RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
|
|
2025-10-28 18:10:54
|
<wikibugs>
|
('PS1) ''Jdlrobson: Update QuickSurvey platforms [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199482'
|
|
2025-10-28 18:13:44
|
<wikibugs>
|
('PS2) ''Federico Ceratto: sanitize-wiki: log into phabricator [cookbooks] - ''https://gerrit.wikimedia.org/r/1199301 (https://phabricator.wikimedia.org/T408512)'
|
|
2025-10-28 18:14:43
|
<logmsgbot>
|
!log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.25 refs T405681
|
|
2025-10-28 18:14:48
|
<stashbot>
|
T405681: 1.45.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T405681
|
|
2025-10-28 18:17:44
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320544 (''Dzahn)'
|
|
2025-10-28 18:21:36
|
<wikibugs>
|
('PS1) ''Dzahn: admin: add SSH key and restricted group membership for neslihanturan [puppet] - ''https://gerrit.wikimedia.org/r/1199484 (https://phabricator.wikimedia.org/T406590)'
|
|
2025-10-28 18:23:19
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 18:24:24
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 18:27:33
|
<wikibugs>
|
'SRE, ''collaboration-services, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11320568 (''Dzahn)'
|
|
2025-10-28 18:27:56
|
<wikibugs>
|
'SRE, ''collaboration-services, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11320570 (''Dzahn) added tag for the SRE subteam that owns microsites hosted on "miscweb" / kubernetes'
|
|
2025-10-28 18:28:19
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 18:29:48
|
<wikibugs>
|
'SRE, ''collaboration-services, ''PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11320577 (''Dzahn) This is certainly possible (hosting on kubernetes 'miscweb' alongside other microsites) and deployment via deployment servers, but does require...'
|
|
2025-10-28 18:32:33
|
<wikibugs>
|
('CR) ''Dzahn: aptrepo::staging: add job to clear incoming folder (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199243 (https://phabricator.wikimedia.org/T408527) (owner: ''Jelto)'
|
|
2025-10-28 18:33:04
|
<wikibugs>
|
('CR) ''Krinkle: [C:''+1] "Thanks. LGTM." [puppet] - ''https://gerrit.wikimedia.org/r/1198429 (https://phabricator.wikimedia.org/T405931) (owner: ''Krinkle)'
|
|
2025-10-28 18:33:19
|
<wikibugs>
|
('PS8) ''Krinkle: varnish: Remove temporary enable_m_redir flag [puppet] - ''https://gerrit.wikimedia.org/r/1198430 (https://phabricator.wikimedia.org/T405931)'
|
|
2025-10-28 18:35:09
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS trixie
|
|
2025-10-28 18:37:39
|
<wikibugs>
|
('PS1) ''Dzahn: add discovery records for gerrit as CNAMEs to public names [dns] - ''https://gerrit.wikimedia.org/r/1199486 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 18:39:09
|
<wikibugs>
|
('CR) ''Dzahn: "Is this what you meant?" [dns] - ''https://gerrit.wikimedia.org/r/1199486 (https://phabricator.wikimedia.org/T365259) (owner: ''Dzahn)'
|
|
2025-10-28 18:43:56
|
<wikibugs>
|
('PS2) ''Dzahn: add discovery records for gerrit as CNAMEs to public names [dns] - ''https://gerrit.wikimedia.org/r/1199486 (https://phabricator.wikimedia.org/T365259)'
|
|
2025-10-28 18:49:51
|
<wikibugs>
|
('CR) ''Kamila Součková: [C:''+1] admin: add SSH key and restricted group membership for neslihanturan [puppet] - ''https://gerrit.wikimedia.org/r/1199484 (https://phabricator.wikimedia.org/T406590) (owner: ''Dzahn)'
|
|
2025-10-28 18:50:28
|
<wikibugs>
|
('CR) ''Pmiazga: [C:''+1] "LGTM" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199331 (https://phabricator.wikimedia.org/T408128) (owner: ''Clément Goubert)'
|
|
2025-10-28 18:51:36
|
<wikibugs>
|
('CR) ''Dzahn: [C:''+2] admin: add SSH key and restricted group membership for neslihanturan [puppet] - ''https://gerrit.wikimedia.org/r/1199484 (https://phabricator.wikimedia.org/T406590) (owner: ''Dzahn)'
|
|
2025-10-28 18:51:55
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 18:58:03
|
<wikibugs>
|
('CR) ''Muehlenhoff: site: initial setup for new logging-sd hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796) (owner: ''Cwhite)'
|
|
2025-10-28 19:00:02
|
<logmsgbot>
|
!log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 19:15:13
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320789 (''Dzahn) @Neslihan_Turan_WMDE Your user has just been created on the deployment server now. You have the access. Do you need any other info how to config...'
|
|
2025-10-28 19:15:32
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320792 (''Dzahn) ''In progress→''Resolved a:''Dzahn'
|
|
2025-10-28 19:16:32
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to 'restricted' for neslihanturan - https://phabricator.wikimedia.org/T406590#11320807 (''Dzahn) ` deploy1003:~] $ id neslihanturan uid=17901(neslihanturan) gid=500(wikidev) groups=500(wikidev),706(restricted),714(airflow-deployers) `'
|
|
2025-10-28 19:23:31
|
<logmsgbot>
|
!log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS trixie
|
|
2025-10-28 19:24:16
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS trixie
|
|
2025-10-28 19:26:41
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.ganeti.makevm for new host tcp-proxy7001.magru.wmnet
|
|
2025-10-28 19:26:43
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.netbox
|
|
2025-10-28 19:28:48
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS trixie
|
|
2025-10-28 19:29:22
|
<wikibugs>
|
('CR) ''JHathaway: Add the sre.hosts.powercycle cookbook (''1 comment) [cookbooks] - ''https://gerrit.wikimedia.org/r/1198928 (owner: ''Elukey)'
|
|
2025-10-28 19:30:28
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:30:32
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:30:33
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 19:30:33
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.wipe-cache tcp-proxy7001.magru.wmnet on all recursors
|
|
2025-10-28 19:30:36
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy7001.magru.wmnet on all recursors
|
|
2025-10-28 19:31:10
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:31:16
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:31:28
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
|
|
2025-10-28 19:31:41
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11320900 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by
dzahn@cumin2002 for host...'
|
|
2025-10-28 19:31:59
|
<icinga-wm>
|
PROBLEM - BFD status on cloudsw1-b1-codfw.mgmt is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
|
|
2025-10-28 19:32:39
|
<jinxer-wm>
|
FIRING: [2x] CoreBGPDown: Core BGP session down between cloudsw1-b1-codfw and cloudservices2004-dev (172.20.5.8) - group cloud_host - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
|
|
2025-10-28 19:37:00
|
<wikibugs>
|
('CR) ''JHathaway: [C:''+2] dmarc: add dmarc monitoring records to more domains [dns] - ''https://gerrit.wikimedia.org/r/1198598 (https://phabricator.wikimedia.org/T404884) (owner: ''JHathaway)'
|
|
2025-10-28 19:37:57
|
<logmsgbot>
|
!log jhathaway@dns1004 START - running authdns-update
|
|
2025-10-28 19:38:19
|
<jinxer-wm>
|
FIRING: [2x] JobUnavailable: Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 19:39:23
|
<logmsgbot>
|
!log jhathaway@dns1004 END - running authdns-update
|
|
2025-10-28 19:40:28
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 19:44:32
|
<logmsgbot>
|
!log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 19:45:09
|
<logmsgbot>
|
!log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 19:48:59
|
<logmsgbot>
|
!log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
|
|
2025-10-28 19:51:41
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.ganeti.makevm for new host tcp-proxy7002.magru.wmnet
|
|
2025-10-28 19:51:43
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.netbox
|
|
2025-10-28 19:54:56
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11320930 (''VRiley-WMF) Attempted to swap the unit and it wouldn't power back on. Swapped it back out with the old one, and it still won't power on. Check...'
|
|
2025-10-28 19:57:08
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:57:34
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:57:35
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 19:57:35
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.wipe-cache tcp-proxy7002.magru.wmnet on all recursors
|
|
2025-10-28 19:57:39
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy7002.magru.wmnet on all recursors
|
|
2025-10-28 19:58:11
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:58:19
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
|
|
2025-10-28 19:58:50
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host tcp-proxy7002.magru.wmnet with OS trixie
|
|
2025-10-28 19:59:04
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11320950 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by
dzahn@cumin2002 for host...'
|
|
2025-10-28 20:00:05
|
<jouncebot>
|
RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T2000).
|
|
2025-10-28 20:00:05
|
<jouncebot>
|
Msz2001: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
|
2025-10-28 20:00:41
|
<wikibugs>
|
('CR) ''BCornwall: [C:''+2] DNSRepository: Automated MarkMonitor domain sync [dns] - ''https://gerrit.wikimedia.org/r/1196775 (owner: ''Ncmonitor)'
|
|
2025-10-28 20:00:50
|
<Msz2001>
|
I'm going to deploy
|
|
2025-10-28 20:00:55
|
<logmsgbot>
|
!log brett@dns1004 START - running authdns-update
|
|
2025-10-28 20:01:44
|
<logmsgbot>
|
!log brett@dns1004 END - running authdns-update
|
|
2025-10-28 20:02:14
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.24) - ''https://gerrit.wikimedia.org/r/1199466 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 20:02:14
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C:''+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199467 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 20:03:29
|
<wikibugs>
|
('Merged) ''jenkins-bot: hCaptcha: Store risk score in cache, so that jobs can use it [extensions/ConfirmEdit] (wmf/1.45.0-wmf.24) - ''https://gerrit.wikimedia.org/r/1199466 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 20:04:04
|
<wikibugs>
|
('Merged) ''jenkins-bot: hCaptcha: Store risk score in cache, so that jobs can use it [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199467 (https://phabricator.wikimedia.org/T408542) (owner: ''Mszwarc)'
|
|
2025-10-28 20:04:41
|
<logmsgbot>
|
!log mszwarc@deploy2002 Started scap sync-world: Backport for [[gerrit:1199466|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]], [[gerrit:1199467|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]]
|
|
2025-10-28 20:04:53
|
<stashbot>
|
T408542: hCaptcha: Store risk score in global memcache key - https://phabricator.wikimedia.org/T408542
|
|
2025-10-28 20:06:58
|
<logmsgbot>
|
!log mszwarc@deploy2002 mszwarc: Backport for [[gerrit:1199466|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]], [[gerrit:1199467|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
|
|
2025-10-28 20:07:36
|
<logmsgbot>
|
!log mszwarc@deploy2002 mszwarc: Continuing with sync
|
|
2025-10-28 20:08:40
|
<logmsgbot>
|
!log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS trixie
|
|
2025-10-28 20:12:08
|
<logmsgbot>
|
!log mszwarc@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199466|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]], [[gerrit:1199467|hCaptcha: Store risk score in cache, so that jobs can use it (T408542)]] (duration: 07m 27s)
|
|
2025-10-28 20:12:16
|
<stashbot>
|
T408542: hCaptcha: Store risk score in global memcache key - https://phabricator.wikimedia.org/T408542
|
|
2025-10-28 20:13:41
|
<wikibugs>
|
('CR) ''BCornwall: [V:''+2 C:''+2] varnish: Promote new m-dot redirect from 302/307 to 301/308 [puppet] - ''https://gerrit.wikimedia.org/r/1198429 (https://phabricator.wikimedia.org/T405931) (owner: ''Krinkle)'
|
|
2025-10-28 20:14:49
|
<wikibugs>
|
('CR) ''BCornwall: [C:''+2] varnishtest: Remove logfile support [puppet] - ''https://gerrit.wikimedia.org/r/1199068 (https://phabricator.wikimedia.org/T408202) (owner: ''BCornwall)'
|
|
2025-10-28 20:14:55
|
<wikibugs>
|
('CR) ''BCornwall: varnishtest: Remove logfile support [puppet] - ''https://gerrit.wikimedia.org/r/1199068 (https://phabricator.wikimedia.org/T408202) (owner: ''BCornwall)'
|
|
2025-10-28 20:17:26
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321005 (''Peachey88)'
|
|
2025-10-28 20:20:38
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321044 (''jhathaway) @Krd thanks, I'm investigating, not sure of the cause either.'
|
|
2025-10-28 20:20:57
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321045 (''jhathaway) p:''Triage→''High'
|
|
2025-10-28 20:23:19
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 20:23:48
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321064 (''Krd) Non-representative example: From MAILER-DAEMON Tue Oct 28 20:21:46 2025 Received: from mx-in1001.wikimedia.org ([2620:0:861:4:208:80:155:102]:55514) by vrts1003.eq...'
|
|
2025-10-28 20:24:24
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 20:25:02
|
<logmsgbot>
|
!log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy7001.magru.wmnet with OS trixie
|
|
2025-10-28 20:25:03
|
<logmsgbot>
|
!log dzahn@cumin2002 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy7001.magru.wmnet
|
|
2025-10-28 20:25:12
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321067 (''Krd) Ir appears to me that we are accepting bounces from phishing e-mails sent with fake sender info@wikipedia.org.'
|
|
2025-10-28 20:25:20
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321069 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002
for host tcp-...'
|
|
2025-10-28 20:26:24
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321073 (''Krd) The 219.240.37.89 looks like a common factor. Can we block this source IP for SMTP as a first measure?'
|
|
2025-10-28 20:29:20
|
<Msz2001>
|
!log Deployed change to private Suggested Investigations code
|
|
2025-10-28 20:29:23
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
|
2025-10-28 20:29:34
|
<Msz2001>
|
Freeing the window, I deployed all that I planned
|
|
2025-10-28 20:33:03
|
<wikibugs>
|
('CR) ''Herron: [C:''+1] "LGTM once the ferm/nftables bit is sorted out!" [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796) (owner: ''Cwhite)'
|
|
2025-10-28 20:33:19
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 20:34:24
|
<jinxer-wm>
|
FIRING: [2x] SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 20:38:41
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
|
|
2025-10-28 20:44:42
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
|
|
2025-10-28 20:48:40
|
<wikibugs>
|
'SRE, ''envoy, ''serviceops, ''Patch-For-Review: Envoy config updates from v1.29 - https://phabricator.wikimedia.org/T404036#11321177 (''RLazarus) ''Open→''Resolved'
|
|
2025-10-28 20:49:22
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321182 (''jhathaway) >>! In T408632#11321073, @Krd wrote: > The 219.240.37.89 looks like a common factor. Can we block this source IP for SMTP as a first measure? done, though a pr...'
|
|
2025-10-28 20:50:25
|
<apine>
|
Hello, all! The Abstract Wikipedia team needs to do a semi-urgent deployment of backend services. I notice that the Web Team deployment window is coming up in ten minutes, but is rarely used.
|
|
2025-10-28 20:50:36
|
<apine>
|
Will the Web Team be using that window today, or can I grab it?
|
|
2025-10-28 20:52:10
|
<logmsgbot>
|
marostegui@cumin1003 clone (PID 543428) is awaiting input
|
|
2025-10-28 20:57:20
|
<wikibugs>
|
'SRE-Access-Requests, ''LDAP-Access-Requests: Grant Access to wmf LDAP and analytics-privatedata-users shell group for SherryYang-WMF - https://phabricator.wikimedia.org/T408639 (''SherryYang-WMF) ''NEW'
|
|
2025-10-28 20:58:19
|
<jinxer-wm>
|
FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
|
|
2025-10-28 21:00:05
|
<jouncebot>
|
Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251028T2100)
|
|
2025-10-28 21:01:37
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy7002.magru.wmnet with OS trixie
|
|
2025-10-28 21:01:37
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy7002.magru.wmnet
|
|
2025-10-28 21:01:48
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321209 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002
for host tcp-...'
|
|
2025-10-28 21:16:08
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DC-Ops: Unresponsive management for ms-be1090.mgmt:22 - https://phabricator.wikimedia.org/T408585#11321251 (''wiki_willy) a:''VRiley-WMF'
|
|
2025-10-28 21:17:55
|
<wikibugs>
|
'ops-eqiad, ''SRE, ''DBA, ''DC-Ops, ''decommission-hardware: decommission es1031.eqiad.wmnet - https://phabricator.wikimedia.org/T408600#11321257 (''wiki_willy) a:''VRiley-WMF'
|
|
2025-10-28 21:18:16
|
<wikibugs>
|
('PS4) ''Cwhite: site: initial setup for new logging-sd hosts [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796)'
|
|
2025-10-28 21:19:15
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321260 (''Dzahn)'
|
|
2025-10-28 21:21:58
|
<wikibugs>
|
('PS1) ''Cory Massaro: Wikifunctions: Upgrade orchestrator from 2025-10-22-011302 to 2025-10-28-205854. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199504 (https://phabricator.wikimedia.org/T406540)'
|
|
2025-10-28 21:24:08
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C:''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796) (owner: ''Cwhite)'
|
|
2025-10-28 21:26:20
|
<wikibugs>
|
('PS1) ''Cory Massaro: Update function-evaluators from 2025-10-21-143846 to 2025-10-28-150053. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199505 (https://phabricator.wikimedia.org/T407718)'
|
|
2025-10-28 21:26:44
|
<wikibugs>
|
('PS2) ''Cory Massaro: Wikifunctions: Update function-evaluators from 2025-10-21-143846 to 2025-10-28-150053. [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199505 (https://phabricator.wikimedia.org/T407718)'
|
|
2025-10-28 21:27:17
|
<wikibugs>
|
('CR) ''Bking: [C:''+1] hadoop: cleanup /tmp from directories as well as files [puppet] - ''https://gerrit.wikimedia.org/r/1199334 (https://phabricator.wikimedia.org/T396582) (owner: ''Gehel)'
|
|
2025-10-28 21:27:58
|
<wikibugs>
|
'SRE, ''Data-Engineering: stat1011: cannot create directory ‘/srv/published/datasets/one-off’: Permission denied - https://phabricator.wikimedia.org/T408641 (''Addshore) ''NEW'
|
|
2025-10-28 21:28:14
|
<logmsgbot>
|
!log sfaci@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
|
|
2025-10-28 21:28:42
|
<logmsgbot>
|
!log sfaci@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
|
|
2025-10-28 21:37:27
|
<wikibugs>
|
('CR) ''Cwhite: [C:''+2] site: initial setup for new logging-sd hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199062 (https://phabricator.wikimedia.org/T406796) (owner: ''Cwhite)'
|
|
2025-10-28 21:44:07
|
<wikibugs>
|
('PS1) ''JHathaway: postfix: add rspamd network discard map [puppet] - ''https://gerrit.wikimedia.org/r/1199507 (https://phabricator.wikimedia.org/T408632)'
|
|
2025-10-28 21:44:27
|
<wikibugs>
|
('CR) ''JHathaway: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199507 (https://phabricator.wikimedia.org/T408632) (owner: ''JHathaway)'
|
|
2025-10-28 21:50:04
|
<wikibugs>
|
('CR) ''JHathaway: [C:''+2] postfix: add rspamd network discard map [puppet] - ''https://gerrit.wikimedia.org/r/1199507 (https://phabricator.wikimedia.org/T408632) (owner: ''JHathaway)'
|
|
2025-10-28 22:04:24
|
<jinxer-wm>
|
FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
|
|
2025-10-28 22:21:10
|
<wikibugs>
|
('PS1) ''Andrew Bogott: cloudservices2004-dev.yaml: use new, yaml-style pdns-recursor config [puppet] - ''https://gerrit.wikimedia.org/r/1199512'
|
|
2025-10-28 22:22:08
|
<wikibugs>
|
'SRE, ''vrts, ''Znuny, ''Patch-For-Review: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11321474 (''jhathaway) @Krd how else can I help?'
|
|
2025-10-28 22:22:55
|
<wikibugs>
|
('PS2) ''Andrew Bogott: cloudservices2004-dev.yaml: use new, yaml-style pdns-recursor config [puppet] - ''https://gerrit.wikimedia.org/r/1199512'
|
|
2025-10-28 22:23:02
|
<wikibugs>
|
('CR) ''Andrew Bogott: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199512 (owner: ''Andrew Bogott)'
|
|
2025-10-28 22:23:27
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'restricted' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321486 (''Dzahn) @thcipriani Turns out this ticket might change from "restricted" to a full deployment access request. How about your approval if that was the case?'
|
|
2025-10-28 22:24:24
|
<jinxer-wm>
|
FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
|
|
2025-10-28 22:25:02
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321488 (''Dzahn)'
|
|
2025-10-28 22:26:38
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321493 (''Dzahn) edited ticket to change request from "restricted" to "deployment" after talking to Sean. We will redo the approvals for that but reuse the ticket.'
|
|
2025-10-28 22:28:08
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321498 (''Dzahn) a:''Dzahn→''thcipriani @seanleong-WMDE Could you add some context re: the request for deployment? @thcipriani for your consideration one more time'
|
|
2025-10-28 22:28:48
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321503 (''Dzahn)'
|
|
2025-10-28 22:30:57
|
<wikibugs>
|
('PS3) ''Andrew Bogott: cloudservices2004-dev.yaml: use new, yaml-style pdns-recursor config [puppet] - ''https://gerrit.wikimedia.org/r/1199512'
|
|
2025-10-28 22:32:16
|
<wikibugs>
|
('CR) ''Andrew Bogott: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199512 (owner: ''Andrew Bogott)'
|
|
2025-10-28 22:33:06
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
|
|
2025-10-28 22:33:16
|
<wikibugs>
|
('CR) ''RLazarus: [C:''+2] {api,rest}-gateway: Update to Envoy 1.32.12 in staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199085 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 22:33:20
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321519 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by
dzahn@cumin2002 for host...'
|
|
2025-10-28 22:35:01
|
<wikibugs>
|
('Merged) ''jenkins-bot: {api,rest}-gateway: Update to Envoy 1.32.12 in staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199085 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 22:37:50
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.ganeti.makevm for new host tcp-proxy2002.codfw.wmnet
|
|
2025-10-28 22:37:52
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.netbox
|
|
2025-10-28 22:38:26
|
<wikibugs>
|
('PS4) ''Andrew Bogott: cloudservices2004-dev.yaml: use new, yaml-style pdns-recursor config [puppet] - ''https://gerrit.wikimedia.org/r/1199512'
|
|
2025-10-28 22:38:39
|
<wikibugs>
|
('CR) ''Andrew Bogott: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199512 (owner: ''Andrew Bogott)'
|
|
2025-10-28 22:38:48
|
<logmsgbot>
|
!log rzl@deploy1003 helmfile [staging] START helmfile.d/services/api-gateway: apply
|
|
2025-10-28 22:39:00
|
<logmsgbot>
|
!log rzl@deploy1003 helmfile [staging] DONE helmfile.d/services/api-gateway: apply
|
|
2025-10-28 22:41:21
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
|
|
2025-10-28 22:41:56
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
|
|
2025-10-28 22:41:56
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 22:41:57
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.wipe-cache tcp-proxy2002.codfw.wmnet on all recursors
|
|
2025-10-28 22:42:00
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2002.codfw.wmnet on all recursors
|
|
2025-10-28 22:42:13
|
<logmsgbot>
|
!log rzl@deploy1003 helmfile [staging] START helmfile.d/services/rest-gateway: apply
|
|
2025-10-28 22:42:21
|
<logmsgbot>
|
!log rzl@deploy1003 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
|
|
2025-10-28 22:42:32
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
|
|
2025-10-28 22:42:38
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
|
|
2025-10-28 22:42:58
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host tcp-proxy2002.codfw.wmnet with OS trixie
|
|
2025-10-28 22:43:11
|
<wikibugs>
|
('PS5) ''Andrew Bogott: cloudservices2004-dev.yaml: use new, yaml-style pdns-recursor config [puppet] - ''https://gerrit.wikimedia.org/r/1199512'
|
|
2025-10-28 22:43:14
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321544 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by
dzahn@cumin2002 for host...'
|
|
2025-10-28 22:43:16
|
<logmsgbot>
|
dzahn@cumin2002 reimage (PID 1675734) is awaiting input
|
|
2025-10-28 22:43:22
|
<wikibugs>
|
('CR) ''Andrew Bogott: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199512 (owner: ''Andrew Bogott)'
|
|
2025-10-28 22:43:33
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
|
|
2025-10-28 22:43:52
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321546 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by
dzahn@cumin2002 for host...'
|
|
2025-10-28 22:45:30
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321550 (''Dzahn)'
|
|
2025-10-28 22:46:27
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321551 (''Dzahn) All VMs exist now. --> https://netbox.wikimedia.org/search/?q=tcp-proxy some still need t...'
|
|
2025-10-28 22:57:32
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321576 (''seanleong-WMDE) Hi, thanks @Dzahn. The ticket has been changed from "restricted" to "deployment", as this is part of the requirements to be a deployer, and "restricted" is...'
|
|
2025-10-28 22:58:51
|
<wikibugs>
|
('PS1) ''Scott French: mw-(api-ext|web): scale next releases to 20% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199513 (https://phabricator.wikimedia.org/T405955)'
|
|
2025-10-28 22:58:52
|
<wikibugs>
|
('PS1) ''Scott French: mw-(api-int|jobrunner): serve 10% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199514 (https://phabricator.wikimedia.org/T405955)'
|
|
2025-10-28 22:58:55
|
<wikibugs>
|
('PS1) ''Scott French: Enroll 25% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199515 (https://phabricator.wikimedia.org/T405955)'
|
|
2025-10-28 22:59:25
|
<wikibugs>
|
('CR) ''RLazarus: [C:''+1] mw-(api-ext|web): scale next releases to 20% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199513 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 22:59:29
|
<wikibugs>
|
('CR) ''RLazarus: [C:''+1] Enroll 25% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199515 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 22:59:32
|
<wikibugs>
|
('CR) ''RLazarus: [C:''+1] mw-(api-int|jobrunner): serve 10% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199514 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
|
|
2025-10-28 23:00:52
|
<wikibugs>
|
'SRE: puppetdb import job on netbox fails - Cannot retrieve PuppetDB 'networking' facts about tcp-proxy3002 - https://phabricator.wikimedia.org/T408646 (''Dzahn) ''NEW'
|
|
2025-10-28 23:01:09
|
<wikibugs>
|
'SRE: puppetdb import job on netbox fails - Cannot retrieve PuppetDB 'networking' facts for new VMs - https://phabricator.wikimedia.org/T408646#11321593 (''Dzahn)'
|
|
2025-10-28 23:03:06
|
<wikibugs>
|
'SRE: puppetdb import job on netbox fails - Cannot retrieve PuppetDB 'networking' facts for new VMs - https://phabricator.wikimedia.org/T408646#11321597 (''Dzahn)'
|
|
2025-10-28 23:03:07
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321598 (''seanleong-WMDE)'
|
|
2025-10-28 23:03:13
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy2002.codfw.wmnet with reason: host reimage
|
|
2025-10-28 23:06:46
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321600 (''seanleong-WMDE)'
|
|
2025-10-28 23:09:37
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy2002.codfw.wmnet with reason: host reimage
|
|
2025-10-28 23:12:26
|
<jinxer-wm>
|
FIRING: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
|
|
2025-10-28 23:14:00
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests: Grant Access to wmf group for jpchev - https://phabricator.wikimedia.org/T408636#11321623 (''Dzahn) @Jpchev Hi there, are you a Wikimedia Foundation employee or contractor? Or are you asking for access as a volunteer? Any specific systems you have in mind?'
|
|
2025-10-28 23:14:33
|
<wikibugs>
|
('PS1) ''RLazarus: mw-*: Upgrade to Envoy 1.32.12 in the MW canary releases and mw-debug [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199519 (https://phabricator.wikimedia.org/T405808)'
|
|
2025-10-28 23:16:43
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to 'deployment' for seanleong-wmde - https://phabricator.wikimedia.org/T406592#11321629 (''seanleong-WMDE)'
|
|
2025-10-28 23:17:26
|
<jinxer-wm>
|
RESOLVED: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
|
|
2025-10-28 23:21:21
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations: puppetdb import job on netbox fails - Cannot retrieve PuppetDB 'networking' facts for new VMs - https://phabricator.wikimedia.org/T408646#11321640 (''Dzahn)'
|
|
2025-10-28 23:25:53
|
<wikibugs>
|
('CR) ''Scott French: [C:''+1] mw-*: Upgrade to Envoy 1.32.12 in the MW canary releases and mw-debug [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199519 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 23:26:33
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy2002.codfw.wmnet with OS trixie
|
|
2025-10-28 23:26:35
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy2002.codfw.wmnet
|
|
2025-10-28 23:26:41
|
<logmsgbot>
|
!log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy7001.magru.wmnet with OS trixie
|
|
2025-10-28 23:26:53
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321656 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002
for host tcp-...'
|
|
2025-10-28 23:26:57
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321657 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002
for host tcp-...'
|
|
2025-10-28 23:28:17
|
<wikibugs>
|
('CR) ''RLazarus: [C:''+2] mw-*: Upgrade to Envoy 1.32.12 in the MW canary releases and mw-debug [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199519 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 23:28:24
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''LDAP-Access-Requests: Grant Access to wmf LDAP and analytics-privatedata-users shell group for SherryYang-WMF - https://phabricator.wikimedia.org/T408639#11321663 (''Dzahn) Hello @SherryYang-WMF, re: the "wmf" LDAP group Please take a look here: https://wikitech.wikimedia....'
|
|
2025-10-28 23:28:35
|
<rzl>
|
jouncebot: nowandnext
|
|
2025-10-28 23:28:35
|
<jouncebot>
|
No deployments scheduled for the next 0 hour(s) and 31 minute(s)
|
|
2025-10-28 23:28:35
|
<jouncebot>
|
In 0 hour(s) and 31 minute(s): Abstract Wikipedia emergency deploy window (one-off) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251029T0000)
|
|
2025-10-28 23:28:58
|
<rzl>
|
I'll deploy an envoy upgrade to mw-debug and the canaries
|
|
2025-10-28 23:30:17
|
<wikibugs>
|
('Merged) ''jenkins-bot: mw-*: Upgrade to Envoy 1.32.12 in the MW canary releases and mw-debug [deployment-charts] - ''https://gerrit.wikimedia.org/r/1199519 (https://phabricator.wikimedia.org/T405808) (owner: ''RLazarus)'
|
|
2025-10-28 23:32:44
|
<logmsgbot>
|
!log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
|
|
2025-10-28 23:32:54
|
<jinxer-wm>
|
FIRING: [2x] CoreBGPDown: Core BGP session down between cloudsw1-b1-codfw and cloudservices2004-dev (172.20.5.8) - group cloud_host - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
|
|
2025-10-28 23:33:12
|
<logmsgbot>
|
!log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
|
|
2025-10-28 23:33:40
|
<wikibugs>
|
('CR) ''Atieno: [C:''+1] ExtensionDistributor: Mark 1.45 as beta [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199113 (https://phabricator.wikimedia.org/T408466) (owner: ''Arlolra)'
|
|
2025-10-28 23:35:23
|
<logmsgbot>
|
!log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
|
|
2025-10-28 23:35:42
|
<logmsgbot>
|
!log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
|
|
2025-10-28 23:37:05
|
<logmsgbot>
|
!log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy3002.esams.wmnet with OS trixie
|
|
2025-10-28 23:37:26
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321678 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002
for host tcp-...'
|
|
2025-10-28 23:38:01
|
<logmsgbot>
|
!log rzl@deploy2002 Started scap sync-world: https://gerrit.wikimedia.org/r/1199519 T405808
|
|
2025-10-28 23:38:07
|
<stashbot>
|
T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
|
|
2025-10-28 23:39:24
|
<jinxer-wm>
|
FIRING: [2x] JobUnavailable: Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
|
2025-10-28 23:39:38
|
<wikibugs>
|
'SRE, ''Data-Platform-SRE: Make the shell group analytics-privatedata-users less confusing - https://phabricator.wikimedia.org/T405517#11321680 (''Dzahn) The link above is common example. The user asks for `analytics-privatedata-users` (or is told to ask for it as part of some onboarding docs). But that is...'
|
|
2025-10-28 23:40:41
|
<logmsgbot>
|
!log rzl@deploy2002 Finished scap sync-world: https://gerrit.wikimedia.org/r/1199519 T405808 (duration: 03m 34s)
|
|
2025-10-28 23:43:20
|
<wikibugs>
|
'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vm-requests, ''Patch-For-Review: Site: 14 VMs request for tcp-proxy (gerrit-ssh-proxy) - https://phabricator.wikimedia.org/T408064#11321712 (''Dzahn)'
|
|
2025-10-28 23:44:02
|
<wikibugs>
|
('PS1) ''Zabe: Using Hadoop for MostTranscludedPages on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199522 (https://phabricator.wikimedia.org/T309738)'
|
|
2025-10-28 23:44:06
|
<logmsgbot>
|
!log dzahn@cumin2002 START - Cookbook sre.dns.netbox
|
|
2025-10-28 23:46:42
|
<logmsgbot>
|
!log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
|
2025-10-28 23:48:57
|
<wikibugs>
|
'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11321730 (''Papaul)'
|
|
2025-10-28 23:59:47
|
<wikibugs>
|
('PS1) ''Santiago Faci: Metrics Platform PHP client library: set performer_registration_dt as null when the user is anon [extensions/EventLogging] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1199524 (https://phabricator.wikimedia.org/T408547)'
|