[00:01:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:10:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:11:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:15:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[01:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[03:06:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:11:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:39:41] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[03:43:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[03:47:44] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (348643)
[03:48:05] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[04:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[05:06:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[05:16:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[05:34:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:38:14] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (348643)
[06:14:43] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate spi-table-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320054 (10Mz7) Acknowledging this ticket... I see it's been pending on me for a long time. I will see if I can find some time this week to get this done.
[06:52:18] <wikibugs>	 10Data-Services, 10DBA: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10Marostegui) p:05Triage→03Medium Let us know when the wiki is created so we can sanitize it
[06:52:35] <wikibugs>	 10Data-Services, 10DBA: Prepare and check storage layer for dgawiki - https://phabricator.wikimedia.org/T350228 (10Marostegui) p:05Triage→03Medium Let us know when the wiki is created so we can sanitize it
[06:52:55] <wikibugs>	 10Data-Services, 10DBA: Prepare and check storage layer for bjnwikiquote - https://phabricator.wikimedia.org/T350234 (10Marostegui) p:05Triage→03Medium Let us know when the wiki is created so we can sanitize it
[07:01:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[07:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[07:11:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[07:27:36] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[07:28:00] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[08:09:37] <jinxer-wm>	 (CephSlowOps) firing: Ceph cluster in eqiad has 27 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[08:09:42] <wikibugs>	 10cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder)
[08:14:37] <jinxer-wm>	 (CephSlowOps) resolved: Ceph cluster in eqiad has 25 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[08:16:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:26:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:34:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[10:06:10] <wikibugs>	 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Maintenance-2023: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10WMDE-Fisch)
[10:08:01] <wikibugs>	 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Maintenance-2023: Delete technischewuensche tool code repository in Diffusion - https://phabricator.wikimedia.org/T349847 (10WMDE-Fisch) We're taking care if it. I created a subticket and when that's done we can delete the deprecated source. Thanks again...
[10:08:40] <wikibugs>	 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Maintenance-2023: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10WMDE-Fisch) a:05Aklapper→03WMDE-Fisch
[10:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[10:11:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:16:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:23:10] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Fix ceph-common version in Bookworm - https://phabricator.wikimedia.org/T350188 (10fnegri) We pull the Ceph packages from https://mirror.croit.io/debian-octopus but that repo only includes packages for buster and bullseye, not for bookworm.  The...
[10:26:57] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudcontrol1006.eqiad.wmnet with OS bookworm
[10:29:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (5) HAProxy service glance-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[10:29:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[10:29:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[10:30:19] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: HAProxy service neutron-api_backend has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[10:30:24] <wikibugs>	 10cloud-services-team: HAProxyServiceUnavailable cloudlb1002:9900 HAProxy service neutron-api_backend has no available backends on cloudlb1002:9900 - https://phabricator.wikimedia.org/T350358 (10phaultfinder)
[10:34:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (15) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[10:35:19] <jinxer-wm>	 (HAProxyServiceUnavailable) resolved: HAProxy service neutron-api_backend has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[10:46:03] <wmcs-alerts>	 (PuppetSyncFailure) firing: Failed to update Puppet repository /var/lib/git/operations/puppet on instance toolsbeta-puppetmaster-04 in project toolsbeta   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[10:51:03] <wmcs-alerts>	 (PuppetSyncFailure) resolved: Failed to update Puppet repository /var/lib/git/operations/puppet on instance toolsbeta-puppetmaster-04 in project toolsbeta   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[11:13:03] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudcontrol1006.eqiad.wmnet with OS bookworm executed with errors: - clo...
[11:18:33] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate cewbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319622 (10Kanashimi) May I increase the number of Kubernetes pods running at the same time? Actually, the old 16 is not enough, so I split it into 4+1 tools...
[11:20:09] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10fnegri) The reimage failed, logging into the mgmt interface I see this message: `No root file system is defined. Please correct this from the partitioning menu.`
[11:27:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[11:33:32] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudcontrol1006.eqiad.wmnet with OS bookworm
[11:34:46] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate cewbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319622 (10Kanashimi) @nskaggs @komla @Aklapper And I also need to use more memory, maybe 16*8GiB per tool...
[11:41:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:44:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (348643)
[11:47:24] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate cewbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319622 (10Aklapper) Please see https://phabricator.wikimedia.org/project/view/4834/
[11:51:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:13:19] <wikibugs>	 10Data-Services, 10DBA: Prepare and check storage layer for bbcwiki - https://phabricator.wikimedia.org/T350372 (10Marostegui) p:05Triage→03Medium Let us know when the wiki is created so we can sanitize it
[12:14:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[12:19:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[12:19:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[12:19:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[12:20:08] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudcontrol1006.eqiad.wmnet with OS bookworm completed: - cloudcontrol10...
[12:24:20] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[12:46:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:01:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[13:16:03] <wmcs-alerts>	 (InstanceDown) firing: (2) Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:34:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[13:41:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:47:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[13:48:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (348643)
[13:48:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:48:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[13:48:43] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew)
[13:58:26] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[14:26:30] <wikibugs>	 10VPS-project-Wikistats: Add bbcwiki to wikistats - https://phabricator.wikimedia.org/T350377 (10Dzahn) a:03Dzahn
[14:38:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:45:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:52:13] <icinga-wm>	 PROBLEM - Host cloudcephosd1029 is DOWN: PING CRITICAL - Packet loss = 100%
[14:56:07] <icinga-wm>	 PROBLEM - Host cloudcephosd1030 is DOWN: PING CRITICAL - Packet loss = 100%
[14:57:33] <icinga-wm>	 PROBLEM - Check unit status of purge_vm_backup on cloudbackup1004 is CRITICAL: CRITICAL: Status of the systemd unit purge_vm_backup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:58:43] <icinga-wm>	 RECOVERY - Host cloudcephosd1029 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms
[15:01:34] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit purge_vm_backup.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[15:02:41] <icinga-wm>	 RECOVERY - Host cloudcephosd1030 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[15:04:16] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Jclark-ctr)
[15:05:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:12:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:12:08] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate cewbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319622 (10nskaggs) >>! In T319622#9301216, @Kanashimi wrote: > @nskaggs @komla @Aklapper And I also need to use more memory, maybe 16*8GiB per tool... It would be nice to have the...
[15:18:02] <wikibugs>	 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install cloudcephosd10(3[5-9]|40) - https://phabricator.wikimedia.org/T324998 (10Jclark-ctr) Servers have been boxed up and shipped out
[15:22:22] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10Raymond_Ndibe) 05In progress→03Resolved
[15:22:59] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[15:23:25] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10Raymond_Ndibe) 05Resolved→03Open
[15:23:28] <wikibugs>	 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - https://phabricator.wikimedia.org/T342455 (10Jclark-ctr)
[15:24:55] <wikibugs>	 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - https://phabricator.wikimedia.org/T342455 (10Jclark-ctr) @Andrew @cmooney   dc ops is finished with our side
[15:26:28] <wikibugs>	 10VPS-project-Wikistats: New wikistats interface takes minutes to load the mediawikis list - https://phabricator.wikimedia.org/T167066 (10Dzahn) I am glad to see this resolved - though I have no idea how it was solved :)
[15:26:45] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/124  envvars-api: bump to 0.0.32-20231101134104-2436443d
[15:30:59] <wikibugs>	 10Toolforge (Toolforge iteration 02): [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10Raymond_Ndibe) 05Open→03In progress
[15:41:33] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10MediaWiki-Engineering: Migrate ruprecht from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320021 (10daniel) >>! In T320021#9298181, @taavi wrote: > The `ruprecht` tool is still running on the grid engine. If it's no longer used, please stop...
[16:00:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (348643)
[16:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[16:12:22] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10User-Raymond_Ndibe: move from single script to multi-script approach in maintain-harbor - https://phabricator.wikimedia.org/T350410 (10Raymond_Ndibe)
[16:17:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[16:21:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[16:23:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (348643)
[16:23:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) (348643)
[16:24:34] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (2) HAProxy service keystone-admin-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:30:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[16:34:20] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (2) HAProxy service keystone-admin-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:38:35] <wikibugs>	 10VPS-project-Wikistats, 10User-RhinosF1: remove referata table? - https://phabricator.wikimedia.org/T262148 (10Dzahn) @RhinosF1 So referata is a dead project? It still claims "temporary" technical issues but I guess that has been shown for a long time now.
[16:39:27] <wikibugs>	 10VPS-project-Wikistats, 10User-RhinosF1: remove referata table? - https://phabricator.wikimedia.org/T262148 (10Dzahn) @RhinosF1 I think we have to remove puppetized systemd timers too?
[16:46:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (348643)
[16:56:33] <jinxer-wm>	 (SystemdUnitDownForLong) firing: The systemd unit purge_vm_backup.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[16:56:38] <wikibugs>	 10cloud-services-team: SystemdUnitDownForLong cloudbackup1004:9100 Unit purge_vm_backup.service on node cloudbackup1004 has been down for long. - https://phabricator.wikimedia.org/T350415 (10phaultfinder)
[16:57:44] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudcontrol1005.eqiad.wmnet with OS bookworm
[17:00:20] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:00:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[17:00:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[17:01:19] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (2) HAProxy service neutron-api_backend has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[17:01:19] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (2) HAProxy service neutron-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[17:05:08] <wikibugs>	 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032 (10rook) Looks like the web pod had some db connection issues a little after it started. Restarting seems to have cleared it, though let's see if it comes back. ` [2023-11-01 12:20:32 +0000] [1] [INFO] Starting gunicorn 21.2.0...
[17:05:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (17) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:06:19] <jinxer-wm>	 (HAProxyServiceUnavailable) resolved: (2) HAProxy service neutron-api_backend has no available backends on cloudlb1002:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[17:06:19] <jinxer-wm>	 (HAProxyServiceUnavailable) resolved: (2) HAProxy service neutron-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[17:10:08] <wikibugs>	 10VPS-project-Wikistats, 10User-RhinosF1: remove referata table? - https://phabricator.wikimedia.org/T262148 (10RhinosF1) >>! In T262148#9302525, @Dzahn wrote: > @RhinosF1 So referata is a dead project? It still claims "temporary" technical issues but I guess that has been shown for a long time now. Temporary...
[17:10:33] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:16:18] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (15) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:20:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:21:19] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:32:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:34:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[17:35:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:40:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:45:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:45:22] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudcontrol1005.eqiad.wmnet with OS bookworm completed: - cloudcontrol10...
[17:45:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[17:45:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[17:55:20] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (13) HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:58:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[18:14:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (348643)
[18:38:12] <wikibugs>	 10Toolforge, 10Fix-Suggester-Bot: File system access is very slow - https://phabricator.wikimedia.org/T350432 (10kostajh)
[18:39:05] <wikibugs>	 10Toolforge, 10Fix-Suggester-Bot: File system access is very slow - https://phabricator.wikimedia.org/T350432 (10kostajh)
[18:47:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[18:56:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[19:01:48] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit purge_vm_backup.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[19:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[19:22:59] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[19:51:26] <wikibugs>	 10VPS-project-Wikistats, 10User-RhinosF1: remove referata table? - https://phabricator.wikimedia.org/T262148 (10Dzahn) >>! In T262148#9302708, @RhinosF1 wrote: > Temporary seems to have become indefinite.  ACK, thought so! thanks for confirming.  >> @RhinosF1 I think we have to remove puppetized systemd timers...
[20:01:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:06:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:07:45] <wmcs-alerts>	 (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:12:45] <wmcs-alerts>	 (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:37:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (348643)
[20:56:48] <jinxer-wm>	 (SystemdUnitDownForLong) firing: The systemd unit purge_vm_backup.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[20:58:35] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Add homer public repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/970852 (owner: 10Majavah)
[20:58:51] <wikibugs>	 (03PS2) 10Krinkle: write_config: Add homer public repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/970852 (owner: 10Majavah)
[20:58:55] <wikibugs>	 (03CR) 10Krinkle: write_config: Add homer public repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/970852 (owner: 10Majavah)
[20:58:58] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] write_config: Add homer public repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/970852 (owner: 10Majavah)
[21:00:10] <wikibugs>	 (03Merged) 10jenkins-bot: write_config: Add homer public repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/970852 (owner: 10Majavah)
[21:21:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:27:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:34:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[21:58:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[22:07:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[22:09:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[22:15:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[23:01:48] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit purge_vm_backup.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[23:07:55] <wikibugs>	 10VPS-project-Wikistats: Automate wikistats commands - https://phabricator.wikimedia.org/T345235 (10Dzahn) Is this really automating it though? I mean, sure, putting the commands into a script will make it a little easier but end of the day you are still manually running a (single) command and you have to react...
[23:23:14] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[23:44:28] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[23:55:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown