[00:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:19:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:29:28] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:29:29] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [08:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:15:37] (CephSlowOps) firing: Ceph cluster in has 6 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:15:42] 10cloud-services-team: CephSlowOps Ceph cluster in has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349425 (10phaultfinder) [09:20:37] (CephSlowOps) resolved: Ceph cluster in has 6 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:28:37] (CephSlowOps) firing: Ceph cluster in has 910 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:28:45] (ProbeDown) firing: (3) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:29:00] 10cloud-services-team: CephSlowOps Ceph cluster in has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349425 (10phaultfinder) [09:33:37] (CephSlowOps) resolved: Ceph cluster in has 910 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:33:45] (ProbeDown) resolved: (3) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:05:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-sgeweblight-10-24 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [10:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:29:30] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:50:03] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgeweblight-10-14 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [11:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:44:38] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02), 10Patch-For-Review: Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Sfaci) @Ladsgroup Keep in mind that these tests have been run locally to test the fix before... [13:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:50:03] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgeweblight-10-14 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:33:39] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:50:03] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgeweblight-10-14 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [17:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:34:29] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:50:03] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgeweblight-10-14 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [20:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [22:27:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:34:29] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:50:03] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgeweblight-10-14 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [23:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed