[00:07:08] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [00:13:39] PROBLEM - SSH on stat1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [00:15:33] RECOVERY - SSH on stat1010 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [00:17:15] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [00:19:39] PROBLEM - SSH on stat1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [00:20:29] RECOVERY - SSH on stat1010 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [00:24:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 21.8% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:34:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.62% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:40:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 20.52% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:45:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:12:50] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304647 [01:12:50] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304647 (owner: 10TrainBranchBot) [01:21:30] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304647 (owner: 10TrainBranchBot) [01:22:39] PROBLEM - SSH on stat1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [01:28:29] RECOVERY - SSH on stat1010 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [02:00:30] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:07:08] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) [02:09:39] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:39] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:19:39] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:34:39] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:39:31] 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12037972 (10Krinkle) Whenever I update Fresh, I tend to wrongly update at least one of the various commands incorrectly. I then catch myself when I'm reviewing the diff... [02:39:39] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:44:39] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:07:08] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [04:17:15] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished