[00:01:01] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply [00:08:12] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1165184 [00:08:12] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1165184 (owner: 10TrainBranchBot) [00:09:28] (03PS1) 10Ncmonitor: DNSRepository: Automated MarkMonitor domain sync [dns] - 10https://gerrit.wikimedia.org/r/1165186 [00:09:31] (03PS1) 10Ncmonitor: ACMEChiefConfig: Automated MarkMonitor domain sync [puppet] - 10https://gerrit.wikimedia.org/r/1165187 [00:13:22] (03PS1) 10Clare Ming: xLab: Deploy v0.7.6 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165189 [00:13:31] FIRING: [3x] SLOMetricAbsent: citoid-latency codfw - https://slo.wikimedia.org/?search=citoid-latency - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [00:14:33] (03PS1) 10Clare Ming: xLab: Deploy v0.7.6 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165190 [00:15:44] (03CR) 10Santiago Faci: [C:03+2] xLab: Deploy v0.7.6 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165189 (owner: 10Clare Ming) [00:16:06] (03CR) 10Santiago Faci: [C:03+2] xLab: Deploy v0.7.6 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165190 (owner: 10Clare Ming) [00:17:15] (03Merged) 10jenkins-bot: xLab: Deploy v0.7.6 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165189 (owner: 10Clare Ming) [00:17:42] (03Merged) 10jenkins-bot: xLab: Deploy v0.7.6 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165190 (owner: 10Clare Ming) [00:18:18] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply [00:19:20] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply [00:19:48] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply [00:20:15] !log cjming@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply [00:29:57] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1165184 (owner: 10TrainBranchBot) [01:02:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:08:10] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.45.0-wmf.8 [core] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1165198 (https://phabricator.wikimedia.org/T392178) [01:08:11] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.45.0-wmf.8 [core] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1165198 (https://phabricator.wikimedia.org/T392178) (owner: 10TrainBranchBot) [01:17:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:19:24] (03Merged) 10jenkins-bot: Branch commit for wmf/1.45.0-wmf.8 [core] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1165198 (https://phabricator.wikimedia.org/T392178) (owner: 10TrainBranchBot) [02:00:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250630T1430) [02:00:05] Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250701T0200) [02:28:32] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:40:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:41:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:51:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:57:42] FIRING: [6x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:00:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250630T1430) [03:00:05] Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250701T0300) [03:01:42] (03PS1) 10TrainBranchBot: testwikis to 1.45.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1165209 (https://phabricator.wikimedia.org/T392178) [03:01:43] (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.45.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1165209 (https://phabricator.wikimedia.org/T392178) (owner: 10TrainBranchBot) [03:02:36] (03Merged) 10jenkins-bot: testwikis to 1.45.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1165209 (https://phabricator.wikimedia.org/T392178) (owner: 10TrainBranchBot) [03:03:01] !log mwpresync@deploy1003 Started scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178 [03:03:07] T392178: 1.45.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T392178 [03:43:32] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:45:42] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:46:20] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:53:32] FIRING: SystemdUnitFailed: wdqs-updater.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:53:32] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:58:49] !log mwpresync@deploy1003 Finished scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178 (duration: 55m 48s) [03:58:56] T392178: 1.45.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T392178 [04:00:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250630T1430) [04:00:05] Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250701T0400) [04:01:44] !log mwpresync@deploy1003 Pruned MediaWiki: 1.45.0-wmf.5 (duration: 01m 38s) [04:13:31] FIRING: [3x] SLOMetricAbsent: citoid-latency codfw - https://slo.wikimedia.org/?search=citoid-latency - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:24:57] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-b7-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T397983#10961832 (10phaultfinder) [04:36:20] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [04:38:32] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:54:28] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [04:55:20] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8997 bytes in 0.234 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:06:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable