[00:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:03:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:15:34] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1048. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1048 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:23:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1062. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:28:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1053. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1053 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:28:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1064. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1064 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:32:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1043. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1043 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:32:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1034. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1034 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:32:39] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1036. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1036 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:32:44] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt-wdqs1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt-wdqs1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:33:12] RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:33:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1038. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1038 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:35:33] (SystemdUnitDown) resolved: The service unit export_smart_data_dump.service is in failed status on host cloudvirt1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1037 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:43:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:18:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:28:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:03:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:43:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [03:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [03:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:13:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:18:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [04:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:03:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:08:37] (CephSlowOps) firing: Ceph cluster in eqiad has 34 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [05:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:09:05] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder) [05:13:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:13:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 11 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [05:18:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:29:23] 10Grid-Engine-to-K8s-Migration: Migrate commons-android-app from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319638 (10whym) Update: I intend to migrate it but it might take some more time. (I was not the original maintainer and still don't know how the code is organized well.) [05:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [06:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [06:33:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:52:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:57:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:03:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [09:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:13:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:18:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:33:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:13:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [10:34:53] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [10:35:06] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [10:45:33] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [10:45:46] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [10:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:33:30] (03PS2) 10Majavah: secret: dkim: move wmcs dkim keys to correct location [labs/private] - 10https://gerrit.wikimedia.org/r/969690 [11:33:36] (03PS2) 10Majavah: hieradata: fix cloudinfra webproxy password location [labs/private] - 10https://gerrit.wikimedia.org/r/969689 [11:33:42] (03PS2) 10Majavah: hieradata: add fake metricsinfra grafana password [labs/private] - 10https://gerrit.wikimedia.org/r/969691 [11:33:48] (03PS1) 10Majavah: secret: add the project-proxy acme-chief account [labs/private] - 10https://gerrit.wikimedia.org/r/977047 [11:40:03] (03CR) 10Majavah: [V: 03+2 C: 03+2] secret: add the project-proxy acme-chief account [labs/private] - 10https://gerrit.wikimedia.org/r/977047 (owner: 10Majavah) [11:43:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:48:03] (PuppetSyncFailure) firing: Failed to update Puppet repository /var/lib/git/labs/private on instance project-proxy-puppetmaster-01 in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [11:53:03] (PuppetSyncFailure) resolved: Failed to update Puppet repository /var/lib/git/labs/private on instance project-proxy-puppetmaster-01 in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [11:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [12:03:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:29:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:32:01] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/977070 (owner: 10L10n-bot) [12:32:03] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/977071 (owner: 10L10n-bot) [12:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:41:54] 10Grid-Engine-to-K8s-Migration: Migrate checkpersondata from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319624 (10tidoni_t) I have a Shell Script, that reads data from the database (using the mysql command) and than calls a perl script. Is there a way to run an image, that... [12:43:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [13:03:47] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/977070 (owner: 10L10n-bot) [13:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:04:08] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/977071 (owner: 10L10n-bot) [13:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [13:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:43:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:03:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [14:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [14:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [14:28:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [14:48:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:13:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:34:19] 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review, and 2 others: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10fgiunchedi) [15:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:43:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:48:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:58:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [16:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:33:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:44:32] 10Cloud-VPS, 10Toolforge, 10SRE: Some of my tools (eg wikidata-todo) just start throwing 504 errors - https://phabricator.wikimedia.org/T346126 (10M2k_dewiki) 05Resolved→03Open Hello, https://templatetransclusioncheck.toolforge.org/ https://templatetransclusioncheck.toolforge.org/?lang=de&name=Vorlage:... [17:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:13:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:28:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:48:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:49:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:54:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [18:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:10:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:20:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:30:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:45:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:46:03] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:55:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:00:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:01:58] RECOVERY - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is OK: OK: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:05:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:08:34] 10Grid-Engine-to-K8s-Migration: Migrate steve-adder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320062 (10Izno) https://steve-adder.toolforge.org/ seems to have gone down, I'd guess as a result of this task. :) [19:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:22:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:24:18] PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:50:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:55:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:05:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:14:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:15:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:15:33] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:19:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:40:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:50:33] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:00:33] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:03:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [21:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:24:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:25:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:10:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:14:34] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [22:20:33] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:35:33] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:04:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:09:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:22:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:27:53] 10Grid-Engine-to-K8s-Migration: Migrate bldrwnsch from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319593 (10simon04) 05Stalled→03Resolved https://github.com/simon04/bldrwnsch/commit/5dec9f1c9f57c4578a61b503f3e978fe0ecdf0cf https://wm-bot.wmcloud.org/logs/%23wikimedia-c... [23:34:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:39:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:44:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:45:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:49:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:50:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:54:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:55:33] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:59:04] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun