[00:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:05:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:06:20] 10Grid-Engine-to-K8s-Migration: Migrate convert from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319646 (10bd808) 05Stalled→03Open https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Installing_Apt_packages should unblock moving this webservice to Kubernetes. [00:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:25:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:30:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:35:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:45:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:47:23] 10Grid-Engine-to-K8s-Migration: Migrate wscontest from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320189 (10Samwilson) 05Open→03Resolved This was done a while ago, but I forgot to close this task. [00:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:05:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:15:22] 10Grid-Engine-to-K8s-Migration: Migrate ws-search from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320188 (10Samwilson) 05Open→03Resolved Done. [01:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:25:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:30:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:05:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:46:56] 10Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883 (10MBH) Several months ago I transferred almost all my tools to Kubernetes. Now I'm using Grid for three purposes: # One of my bots needs 16G of memory (or less, but more than... [03:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [03:20:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [03:25:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [03:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [03:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [03:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [03:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:05:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:10:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:25:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:35:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:38:32] 10Grid-Engine-to-K8s-Migration, 10User-revi: Migrate tc-rc from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320079 (10revi) My circumstances (at present) do not allow me to do much non-mobile work (there's a long story but I do not wish to be logged perpetually in phab log... [04:50:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:51:47] 10Toolforge (Quota-requests): Request increased quota for Legoktm's Rust Toolforge tools - https://phabricator.wikimedia.org/T351604 (10Legoktm) 05Open→03Declined Ah cool, I'm happy to wait a few weeks for the new quotas! >>! In T351604#9343239, @taavi wrote: > The old default quotas had a K8s `LimitRange`... [04:54:16] 10Toolforge (Quota-requests): Request increased quota for Legoktm's Rust Toolforge tools - https://phabricator.wikimedia.org/T351604 (10Legoktm) I've added a note to the docs that the default quotas are changing soon: https://wikitech.wikimedia.org/w/index.php?diff=2128934&oldid=2128314&title=Help:Toolforge/Kube... [05:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:05:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:10:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:00:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:00:23] 10Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883 (10komla) @MBH for point 2, you can use the jobs framework cli. see: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Creating_one-off_jobs Please request a q... [06:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [06:05:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:25:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:45:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:50:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:15:00] 10VPS-Projects, 10WMDE-TechWish-Maintenance-2023, 10WMDE-TechWish-Sprint-2023-11-22: Scraper: destroy Cloud VPS runner instance - https://phabricator.wikimedia.org/T345411 (10Tobi_WMDE_SW) [07:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:35:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:40:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:10:41] 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10Kanashimi) @taavi Will we accept longer IDs? Or do I have to shorten it? [08:20:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:26:31] 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10fgiunchedi) >>! In T351710#9349895, @Vgutierrez wrote: > nice, but please set a sane TLS configuration :) ideally nothing lower than TLSv1.2 and solid ciphersuites Tra... [08:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:35:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:35:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [09:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:22:22] 10Toolforge Jobs framework: Better validate job names - https://phabricator.wikimedia.org/T351705 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/47 job: Validate job name length [09:27:49] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [09:28:00] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [09:28:50] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [09:29:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [09:29:21] 10Toolforge Jobs framework: Better validate job names - https://phabricator.wikimedia.org/T351705 (10taavi) 05Open→03Resolved [09:30:05] 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Maintenance-2023, 10WMDE-TechWish-Sprint-2023-11-22: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10Tobi_WMDE_SW) [09:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [10:23:01] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/8 Automatical... [10:25:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:29:27] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [10:29:29] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10taavi) just in case: ` [taavi@toolsbeta-bastion-6 ~/quota] $ kubectl get quota -A -o json > backup-quota.json [taavi@toolsbeta-ba... [10:29:38] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [10:30:51] 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10fgiunchedi) I tested a revert to `gtls` for centrallog hosts (the receiver part only), rsyslog now stays silent on centrallog though I still see the (re) connections fr... [10:31:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1033. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1033 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:35:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:45:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:57:30] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers (T350873) [10:57:33] T350873: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 [10:57:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers (T350873) [11:01:47] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [11:01:59] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [11:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:26:28] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [11:26:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1033. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1033 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:26:39] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [11:26:46] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [11:26:59] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [11:30:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:31:45] 10Toolforge (Toolforge iteration 02), 10cloud-services-team: Re-visit Toolforge Kubernetes default quotas (April 2023) - https://phabricator.wikimedia.org/T333979 (10taavi) [11:31:50] 10Toolforge (Toolforge iteration 02), 10cloud-services-team: track and apply Toolforge quota changes via a Git repository - https://phabricator.wikimedia.org/T324558 (10taavi) [11:31:52] 10Toolforge (Toolforge iteration 02), 10cloud-services-team: track and apply Toolforge quota changes via a Git repository - https://phabricator.wikimedia.org/T324558 (10taavi) 05In progress→03Resolved [11:32:21] 10Toolforge (Toolforge iteration 02), 10cloud-services-team: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10taavi) 05Open→03Resolved ` finished run, wrote 0 new accounts, disabled 0 accounts, cleaned up 0 accounts, renewed 9 accounts, updated 3201 quotas ` [11:40:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [12:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:25:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:31:19] (HAProxyServiceUnavailable) firing: (16) HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [12:31:24] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1002:9900 - https://phabricator.wikimedia.org/T350127 (10phaultfinder) [12:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:35:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:35:46] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:36:19] (HAProxyServiceUnavailable) firing: (16) HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [12:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:46:20] (HAProxyServiceUnavailable) resolved: (16) HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [12:50:20] (HAProxyBackendUnavailable) firing: (2) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:53:40] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1064 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [12:53:40] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1043 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [12:53:40] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1039 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [12:53:45] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1041 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [12:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:55:19] (HAProxyBackendUnavailable) resolved: (2) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:00:19] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [13:02:06] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [13:02:17] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1002:9900 - https://phabricator.wikimedia.org/T350127 (10taavi) 05Open→03Resolved a:03taavi [13:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:04:43] (03PS1) 10Majavah: openstack: restart_openstack: ignore errors with single nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/976702 [13:08:08] (03CR) 10CI reject: [V: 04-1] openstack: restart_openstack: ignore errors with single nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/976702 (owner: 10Majavah) [13:11:13] 10PAWS: jupyterlab to 4.0.9 - https://phabricator.wikimedia.org/T351726 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/351 [13:11:37] vivian-rook opened https://github.com/toolforge/paws/pull/351 [13:12:14] (03PS2) 10Majavah: openstack: restart_openstack: ignore errors with single nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/976702 [13:13:40] (NeutronAgentDown) resolved: Neutron neutron-linuxbridge-agent on cloudvirt1064 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:13:40] (NeutronAgentDown) resolved: Neutron neutron-linuxbridge-agent on cloudvirt1039 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:13:40] (NeutronAgentDown) resolved: Neutron neutron-linuxbridge-agent on cloudvirt1041 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:13:45] (NeutronAgentDown) resolved: Neutron neutron-linuxbridge-agent on cloudvirt1043 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [13:15:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [13:16:41] (03CR) 10Majavah: [C: 03+2] openstack: restart_openstack: ignore errors with single nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/976702 (owner: 10Majavah) [13:18:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [13:20:05] (03Merged) 10jenkins-bot: openstack: restart_openstack: ignore errors with single nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/976702 (owner: 10Majavah) [13:21:52] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [13:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:28:07] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [13:31:19] (HAProxyServiceUnavailable) firing: HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [13:31:19] (HAProxyServiceUnavailable) firing: HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [13:31:24] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1001:9900 HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1001:9900 - https://phabricator.wikimedia.org/T351813 (10phaultfinder) [13:31:26] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1002:9900 HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://phabricator.wikimedia.org/T351814 (10phaultfinder) [13:32:21] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1001:9900 HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1001:9900 - https://phabricator.wikimedia.org/T351813 (10taavi) 05Open→03Resolved a:03taavi [13:32:24] 10cloud-services-team: HAProxyServiceUnavailable cloudlb1002:9900 HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://phabricator.wikimedia.org/T351814 (10taavi) 05Open→03Resolved a:03taavi [13:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:36:19] (HAProxyServiceUnavailable) resolved: HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [13:36:19] (HAProxyServiceUnavailable) resolved: HAProxy service wikireplica-db-analytics-s1 has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [14:03:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:03:41] (ProbeDown) firing: Service toolserver-proxy-01:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolserver-proxy-01:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:08:41] (ProbeDown) resolved: Service toolserver-proxy-01:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolserver-proxy-01:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:13:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:55:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:03:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:07:05] 10Data-Services, 10cloud-services-team, 10Data-Engineering, 10Data-Platform-SRE: Some wikibase tables not available in commonswiki_p - https://phabricator.wikimedia.org/T298452 (10Ladsgroup) The new term tables in commons (wbt_*) should be empty to my knowledge. Is there a reason to make them visible? Or d... [15:08:53] 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review, 10SRE Observability (FY2023/2024-Q2): ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10lmata) [15:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:56:39] 10Data-Services, 10cloud-services-team, 10Data-Engineering: Surface Temporary user information to Cloud Wiki Replicas - https://phabricator.wikimedia.org/T346679 (10taavi) 05Open→03Resolved [15:58:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:58:43] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1026 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [16:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [16:18:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:37:55] 10Tools, 10Wikidata Dev Team, 10wmde-wikidata-tech: [GENERAL] Deprecate connecting senses prototype - https://phabricator.wikimedia.org/T351829 (10Lucas_Werkmeister_WMDE) [17:13:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [18:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:43:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:04:19] RECOVERY - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is OK: OK: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:15:37] (CephSlowOps) firing: Ceph cluster in eqiad has 9 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [19:16:05] PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:20:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 30 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [19:28:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:37:00] 10Grid-Engine-to-K8s-Migration: Migrate locktool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319863 (10DatGuy) 05Open→03Resolved [20:13:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:27:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:32:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:33:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [21:18:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:33:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [22:18:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:28:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:33:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:12:55] 10Toolforge (Software install/update): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) @komla Sorry, I was away and did not see your comment until now. A build pack will not work for me they way you are trying. I have multiple C# projects in the one git... [23:13:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:18:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:20:34] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1048. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1048 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:28:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1062. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:33:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1053. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1053 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:33:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1064. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1064 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:37:08] PROBLEM - Check systemd state on clouddb1015 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:37:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1034. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1034 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:37:34] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1043. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1043 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:37:39] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1036. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1036 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:37:44] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt-wdqs1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt-wdqs1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:38:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1038. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1038 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:40:33] (SystemdUnitDown) firing: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1037 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources