[00:04:00] <jinxer-wm>	 FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[00:04:22] <jinxer-wm>	 RESOLVED: [3x] HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:05:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[00:06:28] <wmcs-alerts>	 FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:08:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[00:10:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[00:14:12] <wikibugs>	 10Cloud-Services: Prepare "What's new with Wikimedia Cloud Services" presentation for WikiConNA 2024 - https://phabricator.wikimedia.org/T373159 (10bd808) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and...
[00:17:00] <jinxer-wm>	 RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[00:18:15] <wikibugs>	 10Cloud-Services: Prepare "What's new with Wikimedia Cloud Services" presentation for WikiConNA 2024 - https://phabricator.wikimedia.org/T373159#10086719 (10bd808) p:05Triage→03Medium a:03bd808
[00:18:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:20:19] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[00:21:50] <wmcs-alerts>	 FIRING: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[00:23:49] <wmcs-alerts>	 FIRING: TfInfraTestDestroyFailed: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[00:26:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:30:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[00:32:43] <wikibugs>	 10Cloud-Services: Prepare "What's new with Wikimedia Cloud Services" presentation for WikiConNA 2024 - https://phabricator.wikimedia.org/T373159#10086732 (10bd808) @Andrew, @dcaro, @Slst2020, @komla, @taavi: ideas for topics to cover are very welcome.  At a high level I have been thinking about telling the story...
[00:39:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 80.1% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[00:44:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 80.1% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[00:48:26] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[01:08:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[01:13:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[02:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[02:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[03:08:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:13:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[04:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[04:43:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[05:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[05:23:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[05:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[07:29:56] <wikibugs>	 (03CR) 10Jean-Frédéric: [C:03+1] "I think ./toolbox should also be considered MIT? It was the continuation of the ./api. But we can also deal with it later." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1064471 (https://phabricator.wikimedia.org/T174633) (owner: 10Lokal Profil)
[07:30:39] <wikibugs>	 (03CR) 10Jean-Frédéric: [C:03+2] Update documentation on localhost address [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1064474 (owner: 10Lokal Profil)
[07:32:27] <wikibugs>	 (03CR) 10Jean-Frédéric: [C:03+2] ">  With patch 669b549 the localhost addresses changed from localhost:8000:80" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1064474 (owner: 10Lokal Profil)
[07:32:56] <wikibugs>	 (03Merged) 10jenkins-bot: Update documentation on localhost address [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1064474 (owner: 10Lokal Profil)
[08:08:41] <jinxer-wm>	 FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted
[08:09:03] <wikibugs>	 (03PS1) 10Jean-Frédéric: Use toolforge-jobs to install requirements in deployment process [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065124
[08:09:03] <wikibugs>	 (03PS1) 10Jean-Frédéric: Remove `composer update` step from build-php script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065125
[08:09:28] <wikibugs>	 (03PS2) 10Jean-Frédéric: Use toolforge-jobs to install requirements during deployment [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065124
[08:09:28] <wikibugs>	 (03PS2) 10Jean-Frédéric: Remove `composer update` step from build-php script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065125
[08:33:41] <jinxer-wm>	 RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted
[09:05:57] <wikibugs>	 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163#10087095 (10Leloiandudu) I haven't seen these for a few months but started getting them every couple of hours today. The error message is slightly different now: `7:16:58 AM Got 'Resource temporarily un...
[09:08:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:13:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:17:14] <wmcs-alerts>	 FIRING: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[09:17:14] <wmcs-alerts>	 FIRING: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[09:18:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:22:14] <wmcs-alerts>	 RESOLVED: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[09:22:14] <wmcs-alerts>	 RESOLVED: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[09:28:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:40:00] <wmcs-alerts>	 FIRING: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[09:43:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:45:00] <wmcs-alerts>	 RESOLVED: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[10:20:09] <wikibugs>	 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10087333 (10DavidTornheim) >>! In T361426#10086510, @bd808 wrote: >>>! In T361426#10086509, @bd808 wrote: >> Someone wrote to the page that causes the bot to halt. The edit looks like a vandal: https://en....
[10:38:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:11:28] <wmcs-alerts>	 RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[11:18:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:18:59] <wikibugs>	 10Quarry: Update cluster to 1.26 - https://phabricator.wikimedia.org/T373093#10087457 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/67
[11:19:03] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/67
[11:28:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:36:13] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] build: Updating composer dependencies [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1061184 (owner: 10Libraryupgrader)
[13:48:49] <wmcs-alerts>	 RESOLVED: TfInfraTestDestroyFailed: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[14:08:59] <wikibugs>	 10Toolforge: Toolforge buildservice logs error - https://phabricator.wikimedia.org/T373201 (10Bawolff) 03NEW
[14:42:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1062.eqiad.wmnet' (T369044)
[14:42:19] <stashbot>	 T369044: Upgrade cloud-vps openstack to version 'Caracal' - https://phabricator.wikimedia.org/T369044
[14:44:53] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10088082 (10Andrew) *bump*
[14:47:45] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#10088093 (10Andrew)
[14:48:27] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1062 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:48:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1062.eqiad.wmnet' (T369044)
[14:48:52] <stashbot>	 T369044: Upgrade cloud-vps openstack to version 'Caracal' - https://phabricator.wikimedia.org/T369044
[14:49:27] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1062 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:50:06] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#10088096 (10Andrew) 05Open→03Resolved
[14:53:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit neutron-openvswitch-agent.service is in failed status on host cloudvirt1062. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[15:08:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:13:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 9 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[15:20:56] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Puppet-Infrastructure, 13Patch-For-Review: Ownership confusion on cloud-local puppet servers - https://phabricator.wikimedia.org/T364492#10088196 (10Andrew) 05Open→03Resolved I think this is a little better after the last round of fixes
[15:55:57] <wikibugs>	 10VPS-Projects: magnum clusters not deploying in eqiad1 - https://phabricator.wikimedia.org/T373207 (10rook) 03NEW
[16:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 15 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:37:11] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Maintenance: [tf-infra-tests] Failing to destroy - volumes stuck - https://phabricator.wikimedia.org/T352895#10088407 (10rook) This appears to have been repaired somewhere along the line. Tofu seems to be...
[16:37:12] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Maintenance: [tf-infra-tests] Failing to destroy - volumes stuck - https://phabricator.wikimedia.org/T352895#10088408 (10rook) 05In progress→03Resolved
[16:48:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit neutron-openvswitch-agent.service on node cloudvirt1062 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:49:03] <wikibugs>	 06cloud-services-team: SystemdUnitDown  Unit neutron-openvswitch-agent.service on node cloudvirt1062 has been down for long. - https://phabricator.wikimedia.org/T373214 (10phaultfinder) 03NEW
[17:09:29] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:14:29] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:19:29] <wmcs-alerts>	 RESOLVED: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:26:58] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[18:27:41] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Create backups and/or replication - https://phabricator.wikimedia.org/T336668#10088664 (10Raymond_Ndibe)
[18:27:41] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Deploy with Helm - https://phabricator.wikimedia.org/T356301#10088665 (10Raymond_Ndibe)
[18:28:40] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Create backups and/or replication - https://phabricator.wikimedia.org/T336668#10088667 (10Raymond_Ndibe)
[18:28:41] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#10088668 (10Raymond_Ndibe)
[18:30:17] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Create backups and/or replication - https://phabricator.wikimedia.org/T336668#10088675 (10Raymond_Ndibe)
[18:30:18] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#10088676 (10Raymond_Ndibe)
[19:01:58] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[19:04:43] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10088786 (10Eevans) Ok, it seems to be working n...
[19:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[19:23:11] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10088831 (10Eevans) I think we can mark this clo...
[19:34:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[19:38:18] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#10088905 (10Raymond_Ndibe) == Possible Steps ==  **Toolsbeta:** [x] create `harborstorage` object storage on horizon [] figure out authentication for...
[20:19:17] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Provisioning of Kubernetes cluster via Magnum stopped working around time time of OpenStack upgrade - https://phabricator.wikimedia.org/T373227 (10bd808) 03NEW
[20:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:20:22] <jinxer-wm>	 FIRING: [2x] HAProxyBackendUnavailable: HAProxy service heat-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[20:20:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Provisioning of Kubernetes cluster via Magnum stopped working around time time of OpenStack upgrade - https://phabricator.wikimedia.org/T373227#10089005 (10bd808) @Andrew, any idea about where I should start looking for hints about what might...
[20:23:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit heat-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:34:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:49:11] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit neutron-openvswitch-agent.service on node cloudvirt1062 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:09:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:23:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit heat-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:25:22] <jinxer-wm>	 FIRING: [2x] HAProxyBackendUnavailable: HAProxy service heat-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[21:28:56] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitDown: The service unit heat-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:30:22] <jinxer-wm>	 RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service heat-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[21:30:41] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[21:31:11] <wmcs-alerts>	 FIRING: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[21:31:11] <wmcs-alerts>	 FIRING: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[21:34:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Provisioning of Kubernetes cluster via Magnum stopped working around time time of OpenStack upgrade - https://phabricator.wikimedia.org/T373227#10089087 (10bd808) https://docs.openstack.org/magnum/2024.1/admin/troubleshooting-guide.html#heat-...
[21:34:20] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Provisioning of Kubernetes cluster via Magnum stopped working around time of OpenStack upgrade - https://phabricator.wikimedia.org/T373227#10089088 (10bd808)
[21:34:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:35:41] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[21:36:11] <wmcs-alerts>	 RESOLVED: HarborProbeUnknown: Harbor might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborProbeUnknown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborProbeUnknown
[21:36:11] <wmcs-alerts>	 RESOLVED: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[21:49:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:55:10] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Provisioning of Kubernetes cluster via Magnum stopped working around time of OpenStack upgrade - https://phabricator.wikimedia.org/T373227#10089118 (10bd808)
[22:04:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[23:34:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[23:34:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[23:40:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown