[00:06:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:11:49] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:30:44] 10Cloud-VPS: Update Vagrant puppet role to work on Bookworm. - https://phabricator.wikimedia.org/T356551 (10Jdlrobson) [03:26:54] 10Openstack-Magnum, 10cloud-services-team: Hide fedora images from human Horizon users - https://phabricator.wikimedia.org/T356547 (10Andrew) A lot of what you're seeing is because of having the admin flag, I think. When I log in as 'Andrew bogott mortal' and look at a project I'm a member of, this is what the... [03:40:22] 10VPS-project-Codesearch: Add a copy to clipboard button after file names on the codesearch result page - https://phabricator.wikimedia.org/T356557 (10Tgr) [04:01:57] (SystemdUnitDown) resolved: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:06:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:06:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [04:11:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [04:11:26] (SystemdUnitDown) resolved: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:06:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:18:46] 10Toolforge (Toolforge iteration 04), 10User-aborrero: [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164 (10Leloiandudu) last time this happened for me was: 03 Feb 2024 03:15 UTC [09:41:01] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:24:14] 10Cloud-VPS: libup-db02 is in error state - https://phabricator.wikimedia.org/T356435 (10taavi) So the instance seems to be in Active/Healthy again. The main thing consuming disk is the `logs` table. ` root@libup-db02:/var/lib/mysql/data/libup# du -sh * | sort -hr 9.1G logs.ibd ` I wonder if we really need to st... [10:24:31] 10Cloud-VPS (Quota-requests): Increase trove quota for library-upgrader - https://phabricator.wikimedia.org/T356560 (10taavi) [10:31:48] 10Cloud-VPS (Quota-requests): Increase trove quota for library-upgrader - https://phabricator.wikimedia.org/T356560 (10taavi) [10:33:48] 10Cloud-VPS (Quota-requests): Increase trove quota for library-upgrader - https://phabricator.wikimedia.org/T356560 (10dcaro) +1 [11:11:37] 10Cloud-VPS (Quota-requests): Increase trove quota for library-upgrader - https://phabricator.wikimedia.org/T356560 (10taavi) 05Open→03Resolved a:03taavi `lang=shell-session taavi@cloudcontrol1005 ~ $ os database quota update library-upgrader volumes 30 +---------+-------+ | Field | Value | +---------+--... [11:12:12] 10Cloud-VPS: libup-db02 is in error state - https://phabricator.wikimedia.org/T356435 (10taavi) 05Open→03Resolved a:03taavi [11:12:28] 10Cloud-VPS: libup-db02 is in error state - https://phabricator.wikimedia.org/T356435 (10taavi) ` Filesystem Size Used Avail Use% Mounted on /dev/sdb 30G 9.3G 19G 33% /var/lib/mysql ` [11:16:27] 10Grid-Engine-to-K8s-Migration: Migrate spellcheck from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320053 (10-jem-) 05Open→03Resolved Sorry for the silence... this migration is completed; the only needed change involved the daily crontab execution which updates the wron... [11:34:27] 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T356562 (10LibUp-bot) [11:37:13] 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T356562 (10taavi) 05Open→03Resolved a:03taavi [12:28:23] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:33:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:55:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:00:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:09:51] (03PS1) 10AgnesAbah: modify README.md [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995348 [13:16:02] (03CR) 10AgnesAbah: "I just edited the README.ml file as a test" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995348 (owner: 10AgnesAbah) [13:41:01] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [13:53:56] 10Grid-Engine-to-K8s-Migration: Migrate commons-android-app from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319638 (10whym) I'm pretty sure the cron task is migrated in the first part of the last 7 days. The remaining number will go away soon, if we just wait more. I'm st... [14:36:06] (03PS1) 10AgnesAbah: modified README.md [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995349 [14:38:00] (03CR) 10AgnesAbah: "I modified README.md" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995349 (owner: 10AgnesAbah) [15:54:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [16:05:52] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install cloudcephosd10(3[5-9]|40) - https://phabricator.wikimedia.org/T324998 (10Volans) There are pending DNS changes in Netbox not committed to the auto-generated DNS repository related to those hosts since yesterday: ` Fri 22... [17:41:16] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:12:22] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:17:22] (HAProxyBackendUnavailable) resolved: HAProxy service nova-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:23:20] (03PS1) 10AgnesAbah: modified isa/templates/main/layout.html [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995350 [19:43:00] (03CR) 10AgnesAbah: "I added report bug tap" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/995350 (owner: 10AgnesAbah) [19:55:03] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [21:41:16] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:24:48] (PuppetConstantChange) resolved: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:56:12] (03PS1) 10Daimona Eaytoy: Bump mediawiki/mediawiki-phan-config to 0.14.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/995725 [23:00:29] 10Toolforge Jobs framework: Facilitate one-off execution of scheduled jobs - https://phabricator.wikimedia.org/T356580 (10Huji) [23:10:23] 10Toolforge Jobs framework: Add a new output format for toolforge jobs list command which returns the input command for scheduled jobs - https://phabricator.wikimedia.org/T356581 (10Huji) [23:18:09] 10Toolforge: Do not deprecate python versions on the toolforge jobs framework that are the default version on toolforge - https://phabricator.wikimedia.org/T356582 (10Huji) [23:45:29] (03PS1) 10Jforrester: Bump mediawiki/mediawiki-phan-config to 0.14.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/995736 [23:45:34] (03CR) 10Jforrester: [C: 03+2] Bump mediawiki/mediawiki-phan-config to 0.14.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/995736 (owner: 10Jforrester) [23:46:35] (03Merged) 10jenkins-bot: Bump mediawiki/mediawiki-phan-config to 0.14.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/995736 (owner: 10Jforrester)