[00:00:35] ACKNOWLEDGEMENT - Disk space on cloudbackup2001 is CRITICAL: DISK CRITICAL - free space: /srv/cinder-backups 2932140 MB (3% inode=98%): Andrew Bogott This should clear the next time the backup jobs run. I removed the biggest volume from the jobs. https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup2001&var-datasource=codfw+prometheus/ops [00:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:10:56] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10Ladsgroup) Yeah, I looked at it (and several others did too), I don't know any docker image that would have git installed. Is jlocal getting shut down... [00:23:37] 10Toolforge Build Service, 10Upstream: Python buildpack does not detect requirements from pyproject.toml - https://phabricator.wikimedia.org/T353762 (10bd808) I tagged this as #upstream because I think it fundamentally is an upstream issue. We could try to find a different Python buildpack (or make our own) if... [00:31:57] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10JJMC89) >>! In T319981#9417352, @Ladsgroup wrote: > I don't know any docker image that would have git installed. Don't all of the Toolforge images have... [00:46:19] 10Toolforge Build Service, 10Upstream: Python buildpack does not detect requirements from pyproject.toml - https://phabricator.wikimedia.org/T353762 (10bd808) For my use case in https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval, it turns out that adding this requirements.txt was all that was... [00:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:12:38] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10Ladsgroup) Maybe it has changed recently but I got the exact same thing when I tried it: T319981#8372518 [01:30:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [01:35:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:21:27] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10JJMC89) After removing the python2 bits and adding a venv, I get to the point of zipping before the script fails with `zip: command not found`. No issu... [03:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [03:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:51:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [04:56:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [05:10:12] 10Toolforge (Software install/update): Add zip to Kuberenetes base images - https://phabricator.wikimedia.org/T353769 (10bd808) [05:13:32] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10bd808) >>! In T319981#9417468, @JJMC89 wrote: > After removing the python2 bits and adding a venv, I get to the point of zipping before the script fail... [05:41:12] 10Grid-Engine-to-K8s-Migration: Migrate every-other-wiki-has from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319733 (10komla) 05Openβ†’03Invalid [05:41:29] 10Grid-Engine-to-K8s-Migration: Migrate every-other-wiki-has from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319733 (10komla) removed [06:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [06:42:12] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10Samwilson) [06:43:15] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10Samwilson) [06:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [09:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:02:32] 10Grid-Engine-to-K8s-Migration: Migrate anchor-corrector from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319555 (10Kanashimi) 05Openβ†’03Resolved [11:22:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-14, tools-sgeexec-10-15, tools-sgeweblight-10-18, tools-sgeweblight-10-24 [11:30:03] (InstanceDown) firing: Project tools instance tools-sgeexec-10-15 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:35:03] (InstanceDown) resolved: Project tools instance tools-sgeexec-10-15 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:35:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [11:37:36] 10Cloud-VPS, 10cloud-services-team: Neutron API backends are flapping - https://phabricator.wikimedia.org/T353796 (10taavi) [11:37:43] 10Cloud-VPS, 10cloud-services-team: Hide + disable 'key pair' tab when creating puppetized VMs - https://phabricator.wikimedia.org/T353331 (10taavi) [11:37:49] 10Cloud-VPS, 10cloud-services-team: Hide VM puppet tab for unpuppetized VMs - https://phabricator.wikimedia.org/T353332 (10taavi) [11:40:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [11:45:28] 10Toolforge (Software install/update): Add zip to Kubernetes base images - https://phabricator.wikimedia.org/T353769 (10Aklapper) [12:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [12:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:23:01] 10Toolforge (Software install/update): Can't pip install mysqlclient on Toolforge - https://phabricator.wikimedia.org/T349341 (10taavi) Sigh. `mysqlclient` does ship Windows wheels but not `manylinux` ones. Installing `pkgconf` is probably fine, but using [[ https://pypi.org/project/pymysql/ | pymysql ]] or an a... [12:39:02] 10Tools, 10WMDE-TechWish-Maintenance, 10Release-Engineering-Team (Quid Pro Crow πŸ¦ƒ): Delete technischewuensche tool code repository in Diffusion - https://phabricator.wikimedia.org/T349847 (10WMDE-Fisch) [12:39:36] 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Sprint-2023-12-06: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10WMDE-Fisch) 05Openβ†’03Resolved a:05WMDE-Fischβ†’03None [12:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:59:57] (03CR) 10VolkerE: [C: 03+1] stylelint: Run auto-fixer, it's quite good nowadays [labs/libraryupgrader] - 10https://gerrit.wikimedia.org/r/897375 (owner: 10Jforrester) [13:01:06] (03PS1) 10VolkerE: releases: Bump Codex to 1.2.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/984526 [13:29:28] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10ayounsi) Big and needed change, thanks ! Looking at the doc at https://wikitech.wikimedia.o... [14:30:13] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10cmooney) @ayounsi thanks for the feedback! >>! In T346428#9418490, @ayounsi wrote: > Lookin... [15:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:05:51] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10ayounsi) To follow up only on the Cassandra usecase, my proposal here is to actually remove... [15:10:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:15:37] (CephSlowOps) firing: Ceph cluster in eqiad has 3 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [15:15:53] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [15:20:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 4 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [15:31:06] (03CR) 10Catrope: [C: 03+2] releases: Bump Codex to 1.2.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/984526 (owner: 10VolkerE) [15:31:46] (03Merged) 10jenkins-bot: releases: Bump Codex to 1.2.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/984526 (owner: 10VolkerE) [15:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:01:06] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10cmooney) >>! In T346428#9418800, @ayounsi wrote: > To follow up only on the Cassandra usecas... [16:07:42] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10bd808) +1 [16:08:10] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10nskaggs) +1 [16:10:03] 10Toolforge (Quota-requests): Request increased quota for cewbot, toc, signature-checker, mgp-cewbot Toolforge tool - https://phabricator.wikimedia.org/T353104 (10taavi) 05Openβ†’03Resolved This is complete, please re-open or file a new task if you face any issues. [16:48:10] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Jclark-ctr) i have been back and forth with dell no answer yet of what is causing this I still believe it is stil... [16:52:37] 10Cloud-VPS (Quota-requests), 10Tool-spacemedia, 10cloud-services-team: disk quota increase (+200 GB) for spacemedia Cloud VPS project - https://phabricator.wikimedia.org/T353670 (10fnegri) a:03fnegri [16:52:44] 10Cloud-VPS (Project-requests), 10cloud-services-team, 10Adiutor: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 (10fnegri) a:03fnegri [16:55:29] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10fnegri) a:03fnegri [17:04:16] 10Cloud-VPS (Quota-requests), 10Tool-spacemedia, 10cloud-services-team: disk quota increase (+200 GB) for spacemedia Cloud VPS project - https://phabricator.wikimedia.org/T353670 (10fnegri) 05Openβ†’03Resolved Trove quotas are [managed separately](https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/... [17:15:23] !log fnegri@cloudcumin1001 adiutor START - Cookbook wmcs.vps.create_project for project adiutor in eqiad1 (T353421) [17:15:24] fnegri@cloudcumin1001: Unknown project "adiutor" [17:15:24] T353421: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 [17:17:25] !log fnegri@cloudcumin1001 adiutor END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project adiutor in eqiad1 (T353421) [17:17:25] fnegri@cloudcumin1001: Unknown project "adiutor" [17:27:52] 10Cloud-VPS (Project-requests), 10cloud-services-team, 10Adiutor: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 (10fnegri) 05Openβ†’03In progress [17:38:44] 10Cloud-VPS (Project-requests), 10cloud-services-team, 10Adiutor: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 (10fnegri) I've started creating the project but the cookbook failed, so I'm not sure if it will work correctly. @Andrew is looking into it. Cookbook command:... [17:40:14] 10Grid-Engine-to-K8s-Migration: Migrate exportpdf from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319734 (10komla) 05Openβ†’03Invalid deleted tool [17:57:49] !log fnegri@cloudcumin1001 adiutor START - Cookbook wmcs.vps.add_user_to_project for user 'vikipolimer' in role 'member' (T353421) [17:57:53] T353421: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 [17:58:35] !log fnegri@cloudcumin1001 adiutor END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'vikipolimer' in role 'member' (T353421) [17:58:47] !log fnegri@cloudcumin1001 adiutor START - Cookbook wmcs.vps.add_user_to_project for user 'tgr' in role 'member' (T353421) [17:58:54] !log fnegri@cloudcumin1001 adiutor END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'tgr' in role 'member' (T353421) [18:01:32] 10Cloud-VPS (Project-requests), 10cloud-services-team, 10Adiutor: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 (10fnegri) @Vikipolimer @Tgr the project seems to be fine and I've added both of you as members (admins). Let us know if anything does not work as expected. P... [18:01:38] 10Cloud-VPS (Project-requests), 10cloud-services-team, 10Adiutor: Request creation of Adiutor VPS project - https://phabricator.wikimedia.org/T353421 (10fnegri) 05In progressβ†’03Resolved [18:03:42] !log fnegri@cloudcumin1001 dhinustestproject START - Cookbook wmcs.vps.create_project for project dhinustestproject in eqiad1 [18:03:43] fnegri@cloudcumin1001: Unknown project "dhinustestproject" [18:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [18:05:45] !log fnegri@cloudcumin1001 dhinustestproject END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project dhinustestproject in eqiad1 [18:05:45] fnegri@cloudcumin1001: Unknown project "dhinustestproject" [18:10:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:13:13] 10Cloud-VPS: [openstack] Creating a new project returns Gateway Timeout (HTTP 504) - https://phabricator.wikimedia.org/T353829 (10fnegri) [18:18:20] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:23:11] 10Cloud-VPS: [openstack] Creating a new project returns Gateway Timeout (HTTP 504) - https://phabricator.wikimedia.org/T353829 (10fnegri) I first spotted this error while creating this project: {T353421} [18:23:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:24:37] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10fnegri) 05Openβ†’03In progress [18:25:53] !log fnegri@cloudcumin1001 wikisource START - Cookbook wmcs.openstack.quota_increase (T353770) [18:25:56] !log fnegri@cloudcumin1001 wikisource END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T353770) [18:25:57] T353770: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 [18:27:16] !log fnegri@cloudcumin1001 wikisource START - Cookbook wmcs.openstack.quota_increase (T353770) [18:27:19] !log fnegri@cloudcumin1001 wikisource END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T353770) [18:33:46] 10Cloud-VPS (Quota-requests), 10Community-Tech, 10WS Export: Increase wikisource project quota in order to upgrade WS Export - https://phabricator.wikimedia.org/T353770 (10fnegri) 05In progressβ†’03Resolved The cookbook failed, but I managed to increase the quota using the openstack CLI: ` fnegri@cloudcon... [18:47:28] !log toolsbeta fran@wmf3169 START - Cookbook wmcs.openstack.quota_increase [18:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:47:32] !log toolsbeta fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) [18:47:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:47:46] !log toolsbeta fran@wmf3169 START - Cookbook wmcs.openstack.quota_increase [18:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:47:49] !log toolsbeta fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) [18:47:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:47:52] !log toolsbeta fran@wmf3169 START - Cookbook wmcs.openstack.quota_increase [18:47:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:47:55] !log toolsbeta fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) [18:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:50:01] 10Tool-masto-collab: Add i18n support - https://phabricator.wikimedia.org/T353831 (10Poslovitch) [18:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:51:07] !log toolsbeta fran@wmf3169 START - Cookbook wmcs.openstack.quota_increase [18:51:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:51:11] !log toolsbeta fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) [18:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:58:17] 10Tool-masto-collab: Provide "redactional tips" in the post proposal form - https://phabricator.wikimedia.org/T353832 (10Poslovitch) [19:03:37] 10Cloud-VPS: [wmcs-cookbooks] quota_show fails to parse openstack CLI output - https://phabricator.wikimedia.org/T353833 (10fnegri) [19:08:23] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [19:13:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [19:34:21] PROBLEM - Check systemd state on cloudservices1006 is CRITICAL: CRITICAL - degraded: The following units failed: labs-ip-alias-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:34:43] PROBLEM - Check systemd state on cloudservices1005 is CRITICAL: CRITICAL - degraded: The following units failed: labs-ip-alias-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:36:33] (SystemdUnitDown) firing: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:37:12] 10Cloud-VPS: [openstack] Creating a new project returns Gateway Timeout (HTTP 504) - https://phabricator.wikimedia.org/T353829 (10Andrew) This same error appears when creating projects without the cookbook: ` root@cloudcontrol1005:~# openstack project create T353829test3 Gateway Timeout (HTTP 504) ` And yet,... [19:50:20] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:55:09] RECOVERY - Check systemd state on cloudservices1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:55:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:57:01] RECOVERY - Check systemd state on cloudservices1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:07:05] (SystemdUnitDown) resolved: (2) The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:18:47] 10Cloud-VPS: [openstack] Creating a new project returns Gateway Timeout (HTTP 504) - https://phabricator.wikimedia.org/T353829 (10Andrew) It's timing out after 120 seconds. [20:55:35] 10Cloud-VPS (Quota-requests), 10Tool-spacemedia, 10cloud-services-team: disk quota increase (+200 GB) for spacemedia Cloud VPS project - https://phabricator.wikimedia.org/T353670 (10Don-vip) It works, thank you! I was a bit worried because it took about 50 minutes to build the instance, but it's healthy, I c... [21:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [21:06:43] (DiskSpace) resolved: Disk space cloudbackup2001:9100:/srv/cinder-backups 5.983% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [21:10:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:24:25] RECOVERY - Disk space on cloudbackup2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup2001&var-datasource=codfw+prometheus/ops [21:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:06:25] 10Toolforge Build Service: apt buildpack (Aptfile support) doesn’t really work - https://phabricator.wikimedia.org/T353847 (10LucasWerkmeister) [23:10:46] 10Toolforge Build Service: apt buildpack (Aptfile support) doesn’t really work - https://phabricator.wikimedia.org/T353847 (10LucasWerkmeister) (The examples in the task description are all from tools-harbor.wmcloud.org/tool-lucaswerkmeister-test/tool-lucaswerkmeister-test:latest, as built from [this wd-shex-inf... [23:17:17] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10LucasWerkmeister) T353698 solved building the image (thanks!); now I’m stuck on T353847.