[00:12:03] <wmcs-alerts>	 (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[00:15:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:20:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:35:37] <jinxer-wm>	 (CephSlowOps) firing: Ceph cluster in eqiad has 27 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[00:35:46] <wikibugs>	 10cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder)
[00:39:38] <jinxer-wm>	 (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[00:40:37] <jinxer-wm>	 (CephSlowOps) resolved: Ceph cluster in eqiad has 1 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[00:44:37] <jinxer-wm>	 (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[00:53:21] <icinga-wm>	 PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[01:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[03:10:10] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate shorturls from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320036 (10Legoktm) a:05Ladsgroup→03Legoktm
[03:48:51] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate shorturls from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320036 (10Legoktm) 05Open→03Resolved ` tools.shorturls@tools-sgebastion-10:~$ toolforge jobs list Job name:    Job type:         Status: -----------  ----------------  ----...
[04:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[05:01:57] <icinga-wm>	 PROBLEM - Check unit status of backup_vms on cloudbackup1003 is CRITICAL: CRITICAL: Status of the systemd unit backup_vms https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_unit_status_of_backup_vms
[05:06:34] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit backup_vms.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[07:01:34] <jinxer-wm>	 (SystemdUnitDownForLong) firing: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[07:04:13] <icinga-wm>	 RECOVERY - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2002 is OK: OK: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[07:33:31] <icinga-wm>	 PROBLEM - Check unit status of backup_cinder_volumes on cloudbackup2002 is CRITICAL: CRITICAL: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:06:48] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit backup_vms.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[10:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[10:40:10] <jinxer-wm>	 (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1029 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[10:40:10] <jinxer-wm>	 (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1030 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[10:45:26] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[11:01:49] <jinxer-wm>	 (SystemdUnitDownForLong) firing: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[13:06:33] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[13:08:31] <icinga-wm>	 PROBLEM - Check unit status of backup_glance_images on cloudbackup1003 is CRITICAL: CRITICAL: Status of the systemd unit backup_glance_images https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[14:45:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[14:46:38] <wikibugs>	 (03CR) 10Merlijn van Deen: [C: 03+2] Handle Un-WIP as if it was a new changeset [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/975400 (https://phabricator.wikimedia.org/T350778) (owner: 10Merlijn van Deen)
[14:47:51] <wikibugs>	 (03Merged) 10jenkins-bot: Handle Un-WIP as if it was a new changeset [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/975400 (https://phabricator.wikimedia.org/T350778) (owner: 10Merlijn van Deen)
[14:55:29] <wikibugs>	 (03PS1) 10Merlijn van Deen: Add basic readme [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/977293
[14:56:33] <wikibugs>	 (03CR) 10Merlijn van Deen: [C: 03+2] Add basic readme [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/977293 (owner: 10Merlijn van Deen)
[14:56:33] <wikibugs>	 10Wikibugs, 10Patch-For-Review: Better message than "This change is ready for review" when patch stops being WIP - https://phabricator.wikimedia.org/T350778 (10valhallasw) Tested with https://gerrit.wikimedia.org/r/c/labs/tools/wikibugs2/+/977293, seems to work as expected:  {F41538082}
[14:56:41] <wikibugs>	 10Wikibugs, 10Patch-For-Review: Better message than "This change is ready for review" when patch stops being WIP - https://phabricator.wikimedia.org/T350778 (10valhallasw) 05Open→03Resolved a:03valhallasw
[14:57:41] <wikibugs>	 (03Merged) 10jenkins-bot: Add basic readme [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/977293 (owner: 10Merlijn van Deen)
[15:01:34] <jinxer-wm>	 (SystemdUnitDownForLong) firing: (2) The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[15:01:39] <wikibugs>	 10cloud-services-team: SystemdUnitDownForLong cloudbackup1003:9100 - https://phabricator.wikimedia.org/T351979 (10phaultfinder)
[15:13:37] <wikibugs>	 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Maintenance-2023: Delete technischewuensche tool code repository in Diffusion - https://phabricator.wikimedia.org/T349847 (10Aklapper) 05Open→03Stalled
[16:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[17:04:45] <icinga-wm>	 PROBLEM - Check unit status of backup_vms on cloudbackup1004 is CRITICAL: CRITICAL: Status of the systemd unit backup_vms https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_unit_status_of_backup_vms
[17:06:33] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:06:49] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:50:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[19:00:45] <icinga-wm>	 RECOVERY - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is OK: OK: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:01:33] <jinxer-wm>	 (SystemdUnitDownForLong) firing: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[19:01:49] <jinxer-wm>	 (SystemdUnitDownForLong) firing: (2) The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDownForLong - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDownForLong
[19:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[19:16:37] <jinxer-wm>	 (CephSlowOps) firing: Ceph cluster in eqiad has 103 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[19:16:43] <wikibugs>	 10cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder)
[19:21:37] <jinxer-wm>	 (CephSlowOps) resolved: Ceph cluster in eqiad has 6 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[21:06:34] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:06:49] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:14:34] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[22:25:27] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse