[00:01:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:04:18] <jinxer-wm>	 (DiskSpace) resolved: Disk space cloudcontrol1006:9100:/ 0.6712% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[00:04:33] <jinxer-wm>	 (SystemdUnitDown) resolved: (2) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:06:19] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: HAProxy service nova-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:09:48] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit logrotate.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:12:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: (2) Galera in  has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[00:12:49] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:13:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:14:49] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit purge_vm_backup.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:15:25] <icinga-wm>	 RECOVERY - Disk space on cloudcontrol1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudcontrol1006&var-datasource=eqiad+prometheus/ops
[00:17:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: (2) Galera in  has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[00:17:50] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:19:49] <jinxer-wm>	 (SystemdUnitDown) firing: (4) The service unit purge_vm_backup.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:24:42] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[02:09:34] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[02:09:39] <wikibugs>	 10cloud-services-team: SystemdUnitDown  Unit purge_vm_backup.service on node cloudbackup1003 has been down for long. - https://phabricator.wikimedia.org/T352625 (10phaultfinder)
[03:18:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:37:13] <jinxer-wm>	 (DiskSpace) firing: Disk space cloudcontrol1006:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[03:42:03] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate php-security-checker from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319966 (10Legoktm) 05Open→03Resolved https://gitlab.wikimedia.org/toolforge-repos/php-security-checker/-/commit/c9a4bbd497de86f791cb1d2c05673cf76de6fb1e  ` tools...
[03:44:49] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:49:48] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:51:05] <icinga-wm>	 PROBLEM - Disk space on cloudcontrol1006 is CRITICAL: DISK CRITICAL - free space: / 0MiB (0% inode=97%): /tmp 0MiB (0% inode=97%): /var/tmp 0MiB (0% inode=97%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudcontrol1006&var-datasource=eqiad+prometheus/ops
[03:54:49] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:04:48] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:14:11] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate dbreps from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319665 (10Legoktm) 05Open→03Resolved Should be done now: https://github.com/mzmcbride/database-reports/commit/e78577b7f6bd19c584d78748befaf091c5a50071
[04:16:18] <wikibugs>	 10Wikibugs, 10Phabricator, 10NewFunctionality-Worktype: Create conduit method to query the feed and return records with relevant details populated instead of just a bunch of phids - https://phabricator.wikimedia.org/T123417 (10Aklapper) I stumbled upon rPHAB586aaa547ade5bf97fa02e2c8e11511b0387b737 which refe...
[04:24:43] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[04:44:48] <jinxer-wm>	 (SystemdUnitDown) firing: (4) The service unit man-db.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:54:49] <jinxer-wm>	 (SystemdUnitDown) firing: (4) The service unit man-db.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[05:02:14] <jinxer-wm>	 (DiskSpace) resolved: Disk space cloudcontrol1006:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[05:03:07] <icinga-wm>	 PROBLEM - Host cloudcontrol1006 is DOWN: PING CRITICAL - Packet loss = 100%
[05:03:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[05:03:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) firing: (2) Galera in  has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[05:04:01] <icinga-wm>	 RECOVERY - Host cloudcontrol1006 is UP: PING OK - Packet loss = 0%, RTA = 28.01 ms
[05:05:10] <jinxer-wm>	 (SystemdUnitDown) resolved: (4) The service unit man-db.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[05:08:20] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (13) HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[05:08:40] <jinxer-wm>	 (GaleraClusterSizeMismatch) resolved: (2) Galera in  has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[05:12:55] <icinga-wm>	 RECOVERY - Disk space on cloudcontrol1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudcontrol1006&var-datasource=eqiad+prometheus/ops
[05:14:49] <jinxer-wm>	 (SystemdUnitDown) firing: (4) The service unit man-db.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[05:24:49] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[06:09:49] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[06:18:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:24:43] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[08:32:13] <jinxer-wm>	 (DiskSpace) firing: Disk space cloudcontrol1007:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[08:38:19] <icinga-wm>	 PROBLEM - Disk space on cloudcontrol1007 is CRITICAL: DISK CRITICAL - free space: / 0MiB (0% inode=97%): /tmp 0MiB (0% inode=97%): /var/tmp 0MiB (0% inode=97%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudcontrol1007&var-datasource=eqiad+prometheus/ops
[08:39:49] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[08:48:47] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 (10dcaro) p:05Triage→03High
[08:50:04] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[08:50:15] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 (10dcaro) Truncated the log: ` e...
[08:52:13] <jinxer-wm>	 (DiskSpace) resolved: Disk space cloudcontrol1007:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[08:54:23] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 (10dcaro) Rebooting cloudcontrol...
[08:54:49] <jinxer-wm>	 (SystemdUnitDown) resolved: (2) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[08:58:47] <icinga-wm>	 RECOVERY - Disk space on cloudcontrol1007 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudcontrol1007&var-datasource=eqiad+prometheus/ops
[09:02:05] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [builds-api] Use admin user credentials for Harbor API auth in dev - https://phabricator.wikimedia.org/T352022 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/141  builds-api: bump...
[09:03:42] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api
[09:03:56] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api
[09:11:49] <wm-bot2>	 !log tf-infra-test dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:11:50] <wm-bot2>	 !log tf-infra-test dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=97)
[09:11:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tf-infra-test/SAL
[09:11:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tf-infra-test/SAL
[09:15:25] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api
[09:15:40] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api
[09:19:54] <wm-bot2>	 !log tf-infra-test dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tf-infra-test/SAL
[09:23:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:23:27] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [builds-api] Use admin user credentials for Harbor API auth in dev - https://phabricator.wikimedia.org/T352022 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/141  builds-api: bump...
[09:41:38] <wm-bot2>	 !log etytree dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Etytree/SAL
[09:41:49] <wm-bot2>	 !log etytree dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255)
[09:41:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Etytree/SAL
[10:09:49] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[10:26:02] <wm-bot2>	 !log tf-infra-test dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[10:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tf-infra-test/SAL
[10:28:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:40:06] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ganeti RAPI dummy certs [labs/private] - 10https://gerrit.wikimedia.org/r/979901 (https://phabricator.wikimedia.org/T350686)
[10:45:16] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Remove ganeti RAPI dummy certs [labs/private] - 10https://gerrit.wikimedia.org/r/979901 (https://phabricator.wikimedia.org/T350686) (owner: 10Muehlenhoff)
[10:46:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/979905
[10:56:48] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Remove obsolete dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/979905 (owner: 10Muehlenhoff)
[11:28:42] <wikibugs>	 (03PS1) 10Klausman: hiera: clean up more ORES leftovers [labs/private] - 10https://gerrit.wikimedia.org/r/979915 (https://phabricator.wikimedia.org/T347278)
[11:30:44] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] hiera: clean up more ORES leftovers [labs/private] - 10https://gerrit.wikimedia.org/r/979915 (https://phabricator.wikimedia.org/T347278) (owner: 10Klausman)
[12:24:43] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[13:49:27] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[14:09:49] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[14:16:11] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 (10Andrew) I saved a logfile fro...
[14:22:07] <wikibugs>	 10Toolforge (Toolforge iteration 02): [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10Slst2020) note: lowering a project's quota below the amount of storage it currently uses does not break anything
[14:22:17] <wikibugs>	 10Toolforge (Toolforge iteration 02): [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10Slst2020)
[14:23:54] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 (10Andrew) Greenlet 3.0 release...
[14:32:01] <wikibugs>	 (03CR) 10Klausman: [V: 03+2 C: 03+2] hiera: clean up more ORES leftovers [labs/private] - 10https://gerrit.wikimedia.org/r/979915 (https://phabricator.wikimedia.org/T347278) (owner: 10Klausman)
[14:35:17] <wikibugs>	 10Toolforge (Toolforge iteration 02): [builds-cli,builds-api] Allow build service to cleanup images to free quota - https://phabricator.wikimedia.org/T341067 (10Slst2020) Reminder to add a confirmation prompt and a warning message that all builds will be wiped and the user will need to start a new build
[15:23:35] <wikibugs>	 (03PS2) 10David Caro: ceph: add missing cumin_params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/969321
[15:23:42] <wikibugs>	 (03PS2) 10David Caro: some fixes, to sort out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/970414
[15:27:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ceph: add missing cumin_params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/969321 (owner: 10David Caro)
[15:27:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] some fixes, to sort out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/970414 (owner: 10David Caro)
[15:34:25] <wikibugs>	 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review, and 2 others: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10fgiunchedi) Current situation:  * We have a separate `rsyslog-receiver` unit/instance with only the receiver bits on centrallog hosts * The fleet is runni...
[15:36:03] <wmcs-alerts>	 (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:50:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[15:55:19] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:19:59] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Goal: Support 'unmanaged' projects in cloud-vps - https://phabricator.wikimedia.org/T326818 (10Andrew) Notes:  'no puppet, no ldap, no cumin' is really just 'no puppet' since puppet sets up the other things.   This can be implemented by a new keystone...
[16:51:29] <wikibugs>	 (03PS1) 10BryanDavis: dev: Bump GitLab container to v16.6.1 [labs/striker] - 10https://gerrit.wikimedia.org/r/980001
[17:13:57] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate superyetkin from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320070 (10Superyetkin) I am still working on this. It may take a few weeks for me to get my scripts working with the new job engine.
[17:32:41] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] dev: Bump GitLab container to v16.6.1 [labs/striker] - 10https://gerrit.wikimedia.org/r/980001 (owner: 10BryanDavis)
[17:35:50] <wikibugs>	 (03Merged) 10jenkins-bot: dev: Bump GitLab container to v16.6.1 [labs/striker] - 10https://gerrit.wikimedia.org/r/980001 (owner: 10BryanDavis)
[18:14:34] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:19:15] <wikibugs>	 (03PS1) 10Jforrester: releases: Bump Vue from 3.2.37 to 3.3.9, drop compat [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/980017 (https://phabricator.wikimedia.org/T340590)
[18:20:27] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] releases: Bump Vue from 3.2.37 to 3.3.9, drop compat [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/980017 (https://phabricator.wikimedia.org/T340590) (owner: 10Jforrester)
[18:21:00] <wikibugs>	 (03Merged) 10jenkins-bot: releases: Bump Vue from 3.2.37 to 3.3.9, drop compat [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/980017 (https://phabricator.wikimedia.org/T340590) (owner: 10Jforrester)
[18:36:03] <wmcs-alerts>	 (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[18:42:17] <wikibugs>	 10Tool-Pageviews, 10Data-Engineering-Icebox: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10mforns) @Dominicbm, hi! We Data Products team are reviewing this task now to see what we can do. We realized that there might be some overlap between this task's...
[19:11:00] <wikibugs>	 10Quarry: Allow search within SQL - https://phabricator.wikimedia.org/T352212 (10Aklapper)
[20:21:21] <wikibugs>	 10Cloud-VPS: cannot create/update a variety of DNS records - https://phabricator.wikimedia.org/T352713 (10jsn.sherman)
[20:21:47] <wikibugs>	 10Cloud-VPS: cannot create/update a variety of DNS records - https://phabricator.wikimedia.org/T352713 (10jsn.sherman)
[20:24:17] <wikibugs>	 (03PS1) 10Andrew Bogott: WMF hacks: replace key and metadata panels for VM creation [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/980035 (https://phabricator.wikimedia.org/T326818)
[20:27:47] <wikibugs>	 10Cloud-VPS: cannot create/update a variety of DNS records - https://phabricator.wikimedia.org/T352713 (10jsn.sherman)
[20:31:25] <wikibugs>	 10Cloud-VPS: cannot create/update a variety of DNS records - https://phabricator.wikimedia.org/T352713 (10jsn.sherman)
[20:32:58] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Goal, 10Patch-For-Review: Support 'unmanaged' projects in cloud-vps - https://phabricator.wikimedia.org/T326818 (10Andrew) > Implementing puppetfree VMs can be done by having the cloud-init script skip all the puppet bits based on a metadata flag....
[21:18:17] <wikibugs>	 10Cloud-VPS: cannot create/update a variety of DNS records - https://phabricator.wikimedia.org/T352713 (10jsn.sherman)
[21:36:03] <wmcs-alerts>	 (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[22:14:49] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit purge_vm_backup.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:32:03] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[23:06:15] <wikibugs>	 10Toolforge (Software install/update): Please install hugin-tools and pillow again - https://phabricator.wikimedia.org/T347446 (10tstarling) I filed this task because it was suggested at [[https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#Requires_a_system_library_or_tool_to_be_present|w...
[23:24:05] <wikibugs>	 10Cloud-VPS (Quota-requests): Please delete meet and chat VPS projects - https://phabricator.wikimedia.org/T352727 (10Ladsgroup)
[23:28:50] <wikibugs>	 10Toolforge: Python virtual environment does not seem to get properly activated by a job using the new Jobs framework - https://phabricator.wikimedia.org/T309309 (10Huji) Coming back to this just to memorialize how things finally got to work.  ##### Step 0: set up the desired directory structure  I ended up sett...
[23:28:59] <wikibugs>	 10Toolforge: Python virtual environment does not seem to get properly activated by a job using the new Jobs framework - https://phabricator.wikimedia.org/T309309 (10Huji) 05Open→03Resolved a:03Huji
[23:29:09] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10User-Huji: Migrate huji from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319800 (10Huji)
[23:29:12] <wikibugs>	 10Toolforge: Python virtual environment does not seem to get properly activated by a job using the new Jobs framework - https://phabricator.wikimedia.org/T309309 (10Huji)
[23:32:35] <wikibugs>	 10cloud-services-team: Shinken is unavailable (404 - no proxy is configured) - https://phabricator.wikimedia.org/T352594 (10valerio.bozzolan) Interesting  OK I think we can just drop the link from the documentation. Done! :3
[23:34:46] <wikibugs>	 10Cloud-VPS (Quota-requests): Please delete meet and chat VPS projects - https://phabricator.wikimedia.org/T352727 (10Aklapper) If that is done should also update https://meta.wikimedia.org/wiki/Discourse#Alternative_chat and https://meta.wikimedia.org/wiki/Wikimedia_Chat and https://meta.wikimedia.org/wiki/Wiki...
[23:35:03] <wikibugs>	 10cloud-services-team: Shinken is unavailable (404 - no proxy is configured) - https://phabricator.wikimedia.org/T352594 (10valerio.bozzolan) 05Open→03Resolved
[23:36:56] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate isprangefinder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319820 (10SQL) Sorry - missed your email, and with the holidays this has slipped my mind. The offending crontab entry has been commented out.
[23:39:12] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate ipcheck from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319814 (10SQL) 05In progress→03Resolved The offending crontab entries have been disabled.
[23:41:41] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate isprangefinder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319820 (10SQL) 05In progress→03Resolved