[00:08:45] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1162130 [00:08:45] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1162130 (owner: 10TrainBranchBot) [00:25:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [00:25:39] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [00:29:37] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1162130 (owner: 10TrainBranchBot) [00:30:37] PROBLEM - puppetboard.wikimedia.org requires authentication on puppetboard1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [00:31:27] RECOVERY - puppetboard.wikimedia.org requires authentication on puppetboard1003 is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 554 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [00:31:45] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:32:35] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54083 bytes in 0.111 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:46:41] PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/d0eace50d56d748aa4d4615502313e8270e2a14ae71c1899659fc8108ccfe543/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [00:50:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [00:51:26] (03PS1) 10Andrew Bogott: Cinder: reduce rpc_response_timeout even more [puppet] - 10https://gerrit.wikimedia.org/r/1162145 (https://phabricator.wikimedia.org/T397517) [00:53:04] (03CR) 10Andrew Bogott: [C:03+2] Cinder: reduce rpc_response_timeout even more [puppet] - 10https://gerrit.wikimedia.org/r/1162145 (https://phabricator.wikimedia.org/T397517) (owner: 10Andrew Bogott) [00:58:29] FIRING: [10x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:00:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [01:06:41] RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [01:11:48] FIRING: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:25:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [01:51:48] RESOLVED: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:55:01] (03PS1) 10EggRoll97: Assign oathauth-verify-user to default bureaucrat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1162158 (https://phabricator.wikimedia.org/T265726) [02:00:21] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1162158 (https://phabricator.wikimedia.org/T265726) (owner: 10EggRoll97) [02:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:45:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [03:00:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [03:22:09] (03PS1) 10Andrew Bogott: neutron: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1162178 [03:23:53] (03CR) 10Andrew Bogott: [C:03+2] neutron: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1162178 (owner: 10Andrew Bogott) [03:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:25:39] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [04:30:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:55:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:58:29] FIRING: [10x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:05:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:06:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:16:42] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:25:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [06:13:48] FIRING: PuppetFailure: Puppet has failed on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:23:48] RESOLVED: PuppetFailure: Puppet has failed on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:03:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:08:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:12:23] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1162238 [07:28:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [08:25:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [08:58:29] FIRING: [10x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:08:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:28:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:51:33] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:56:33] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:08:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [10:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:03:48] FIRING: PuppetFailure: Puppet has failed on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:13:48] RESOLVED: PuppetFailure: Puppet has failed on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:28:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [11:29:27] FIRING: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:38:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [11:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:06:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.45s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:11:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.794s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:25:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [12:58:29] FIRING: [10x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:09:06] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [14:15:00] (03CR) 10D3r1ck01: [C:03+1] "Ah! Is Echo making this mistake too: https://codesearch.wmcloud.org/search/?q=InitializeSettings.php&files=&excludeFiles=&repos=" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1154108 (owner: 10Krinkle) [14:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:06:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:16:42] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:29:42] FIRING: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [16:25:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [16:33:29] FIRING: [11x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:58:29] FIRING: [11x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:16:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [17:31:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [18:03:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [18:06:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [18:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:53:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:08:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:29:42] FIRING: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:30:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:35:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:37:47] 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q4:rack/setup/install aux-k8s-worker100[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T393053#10936354 (10Jclark-ctr) aux-k8s-worker1008 Is failing I assume possibly bad dac cable will test more Monday while on site. [19:40:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [20:00:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [20:03:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:13:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:25:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [20:58:30] FIRING: [10x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:33:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [21:48:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [21:53:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:18:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:30:05] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:30:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:35:05] RESOLVED: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:40:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [22:55:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:04:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [23:19:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [23:29:42] FIRING: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [23:38:16] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1162512 [23:38:17] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1162512 (owner: 10TrainBranchBot) [23:42:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [23:45:40] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [23:50:15] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1162512 (owner: 10TrainBranchBot) [23:52:35] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn