[00:45:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [00:50:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [01:32:56] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3702 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [02:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [02:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:30:10] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10Hawkeye7) I tried logging in to gitlab as Ross Mallett at https://idp.wikimedia.org/login The messa... [02:41:33] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10Hawkeye7) Normally, a Dotnet project has the solution file (.sln) at the t... [02:52:40] 10Grid-Engine-to-K8s-Migration, 10Event Metrics, 10Community-Tech (CommTech-Kanban): Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319779 (10MusikAnimal) 05Open→03Declined Declining in favor of {T353217}. I have commented out the grid engine c... [03:15:37] (CephSlowOps) firing: Ceph cluster in eqiad has 1 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [03:15:42] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [03:19:37] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [03:20:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 66 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [03:24:37] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [03:36:13] 10Grid-Engine-to-K8s-Migration, 10Event Metrics, 10Community-Tech (CommTech-Kanban): Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319779 (10MusikAnimal) 05Declined→03Resolved Actually, I managed to figure it out :) T353217 is still valid, but... [04:05:05] 10Toolforge (Software install/update), 10User-bd808: mysqldump is not present in Kubernetes container images - https://phabricator.wikimedia.org/T254636 (10MusikAnimal) > Surely the mysqldump command needs a `--host` argument? The error message indicates it's trying to connect to the local mysql socket, but th... [04:09:14] 10Grid-Engine-to-K8s-Migration, 10Commons Deletion Notification bot, 10Community-Tech (CommTech-Kanban): Migrate commtech-commons from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319642 (10MusikAnimal) 05Open→03Resolved The bot has been down since June (T339145), so... [04:14:17] 10Grid-Engine-to-K8s-Migration: Migrate musikbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319914 (10MusikAnimal) 05Stalled→03Resolved a:03MusikAnimal >>! In T319914#9385424, @MusikAnimal wrote: >> @MusikAnimal have you taken another look at this T254636 was res... [04:31:50] 10Cloud-VPS, 10cloud-services-team: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10santhosh) Rebooted. and ` $ uname -r 6.1.0-15-cloud-amd64 ` This is ok, right? [04:37:56] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 14705 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [05:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [05:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:38:54] 10Grid-Engine-to-K8s-Migration: Migrate potd from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319974 (10Legoktm) I've moved the Git repo to Wikimedia GitLab; the main technical change we need to make for this is to switch to using SMTP, I've done so in https://gitlab.wikimed... [06:28:55] 10Grid-Engine-to-K8s-Migration: Migrate dexbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319674 (10Ladsgroup) FWIW, the tool has been mostly migrated for years now. The ones left are a very small portion of the tool's job. [06:37:29] 10Cloud-VPS, 10cloud-services-team: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10KartikMistry) >>! In T353185#9398619, @santhosh wrote: > Rebooted. and > > ` > $ uname -r > 6.1.0-15-cloud-amd64 > ` > > This is ok, right? We are good! [06:41:38] 10Cloud-VPS, 10cloud-services-team, 10Language-Team (Language-2023-October-December): Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10KartikMistry) p:05Triage→03High [07:02:56] (ToolsToolsDBReplicationLagIsTooHigh) resolved: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 4181 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [07:05:49] 10Grid-Engine-to-K8s-Migration: Migrate video-cut-tool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320123 (10Aklapper) 05Open→03Resolved [07:28:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:33:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [08:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:47:34] 10Tools, 10WMDE-TechWish-Maintenance: Delete technischewuensche tool code repository in Diffusion - https://phabricator.wikimedia.org/T349847 (10WMDE-Fisch) 05Stalled→03Open This can be done now. Please go forward with it. The relevant code is backed up and moved to different repos. Feel free to skip some... [08:48:38] 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Sprint-2023-11-22, 10WMDE-TechWish-Sprint-2023-12-06: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10WMDE-Fisch) >>! In T350352#9384162, @Aklapper wrote: > Once you manage to tick off "Back... [09:34:33] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolf... [09:59:47] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [toolforge-envvars.api,toolforge-build.api] Support using custom environment variables at build time - https://phabricator.wikimedia.org/T338142 (10CodeReviewBot) dcaro opene... [10:46:14] 10VPS-project-Codesearch, 10Special:NewLexeme revival, 10wmde-wikidata-tech: Please add wmde/new-lexeme-special-page to codesearch index - https://phabricator.wikimedia.org/T351938 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup [10:59:48] (03CR) 10Samtar: [C: 03+2] Ignore canary events in SULWatcher [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/982147 (owner: 10AntiCompositeNumber) [11:00:22] (03Merged) 10jenkins-bot: Ignore canary events in SULWatcher [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/982147 (owner: 10AntiCompositeNumber) [11:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [12:05:15] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolf... [12:11:19] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolf... [12:12:31] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [12:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:12:35] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [12:13:03] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [12:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:16:32] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [12:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:17:06] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [12:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:23:00] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolf... [12:55:02] (03PS3) 10Nikerabbit: Remove trailing whitespace [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/982233 (https://phabricator.wikimedia.org/T310688) [12:55:51] (03CR) 10CI reject: [V: 04-1] Remove trailing whitespace [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/982233 (https://phabricator.wikimedia.org/T310688) (owner: 10Nikerabbit) [13:14:04] 10Toolforge (Software install/update): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) [13:14:10] 10Toolforge (Software install/update): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) [13:15:32] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10dcaro) 05In progress→03Stalled [13:15:41] 10Toolforge (Toolforge iteration 02): [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10dcaro) [13:15:45] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10dcaro) 05Stalled→03In progress [13:15:49] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [13:15:49] 10Toolforge (Toolforge iteration 02): [builds-cli,builds-api] Allow build service to cleanup images to free quota - https://phabricator.wikimedia.org/T341067 (10dcaro) a:03dcaro [13:15:51] 10Toolforge (Toolforge iteration 02): [builds-cli,builds-api] Allow build service to cleanup images to free quota - https://phabricator.wikimedia.org/T341067 (10dcaro) 05Open→03In progress [13:16:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [13:19:41] 10Toolforge Jobs framework: toolforge jobs restart sometimes times out - https://phabricator.wikimedia.org/T352874 (10taavi) 05Open→03Resolved I believe this is fixed for any new jobs. Please re-open if not. [14:03:59] 10Grid-Engine-to-K8s-Migration: Migrate mrmetadata from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319911 (10Ladsgroup) Hi, if grid engine goes down, it breaks the tool but the thing is that the tool is not that important, it's not used often and it's fine if it stays broke... [14:10:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [14:10:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:12:10] 10Grid-Engine-to-K8s-Migration: Migrate jarallah-ii from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319825 (10taavi) 05Open→03Declined [14:29:34] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10taavi) No, that's not configured either. [14:52:59] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [14:57:59] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:07:55] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Observability-Alerting, 10Goal: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502 (10dcaro) [15:09:12] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Observability-Alerting, 10Goal: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502 (10dcaro) [15:11:42] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2): Move Cloud VPS control plane alerting to alertmanager - https://phabricator.wikimedia.org/T345294 (10taavi) [15:12:59] (PuppetConstantChange) firing: (3) Puppet performing a change on every puppet run on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:29:39] 10Grid-Engine-to-K8s-Migration: Migrate mrmetadata from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319911 (10gpaumier) 05Open→03Resolved This tool is no longer needed and I have disabled it. [15:30:52] 10Grid-Engine-to-K8s-Migration: Migrate archaeo from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319563 (10gpaumier) 05Open→03Resolved This tool is no longer needed and I have disabled it. [15:30:57] 10Grid-Engine-to-K8s-Migration: Migrate copywhat from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319648 (10gpaumier) 05Open→03Resolved This tool is no longer needed and I have disabled it. [15:31:38] 10Grid-Engine-to-K8s-Migration: Migrate copywhat from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319648 (10komla) Okay, thanks. [15:33:58] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Observability-Alerting, 10Goal: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502 (10fgiunchedi) [15:34:22] 10Cloud-VPS, 10Data-Services, 10cloud-services-team, 10Observability-Alerting, 10User-dcaro: Migrate labstore prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T309011 (10fgiunchedi) [15:36:14] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [15:44:45] 10Grid-Engine-to-K8s-Migration: Migrate deltaquad-bots from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319668 (10nskaggs) @AmandaNP Thanks for making contact. I've removed your tool from the list. Best wishes on the migration. If you need additional support, please reach out. [15:47:59] (PuppetConstantChange) resolved: (3) Puppet performing a change on every puppet run on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:49:01] !log taavi@cloudcumin1001 admin Added a new k8s worker tools-k8s-worker-98.tools.eqiad1.wikimedia.cloud to the cluster [15:49:01] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [15:50:16] 10Grid-Engine-to-K8s-Migration: Migrate dexbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319674 (10komla) >>! In T319674#9398686, @Ladsgroup wrote: > FWIW, the tool has been mostly migrated for years now. The ones left are a very small portion of the tool's job. Tha... [15:51:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [15:51:09] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/5 kubernetes: Add node ready alerts [15:51:51] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [15:54:23] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-14, tools-sgeexec-10-8 [15:58:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [16:02:03] (InstanceDown) firing: Project tools instance tools-sgeweblight-10-14 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:06:05] 10Toolforge: Monitoring and alerting is needed for Kubernetes cluster capacity - https://phabricator.wikimedia.org/T352581 (10taavi) a:03taavi [16:07:03] (InstanceDown) resolved: Project tools instance tools-sgeweblight-10-14 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:11:43] !log taavi@cloudcumin1001 admin Added a new k8s worker tools-k8s-worker-99.tools.eqiad1.wikimedia.cloud to the cluster [16:11:44] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [16:17:13] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/5 kubernetes: Add node ready alerts [16:24:26] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - https://phabricator.wikimedia.org/T342455 (10Jclark-ctr) 05Open→03Resolved [16:34:21] 10Grid-Engine-to-K8s-Migration: Migrate isa from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319818 (10komla) >>! In T319818#9395678, @Sebastian_Berlin-WMSE wrote: > What is running on GridEngine? https://grid-deprecation.toolforge.org/t/isa shows nothing. `qstat` prints not... [16:38:14] 10Grid-Engine-to-K8s-Migration: Migrate wiki-irc from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320148 (10komla) 05Open→03Resolved [16:38:41] 10Grid-Engine-to-K8s-Migration: Migrate isa from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319818 (10komla) 05Open→03Resolved [16:47:59] 10Grid-Engine-to-K8s-Migration, 10Toolforge: Cannot stop ahechtbot webservice on gridengine, stuck in "dr" state. - https://phabricator.wikimedia.org/T353112 (10taavi) 05Open→03Resolved a:03taavi `lang=shell-session taavi@tools-sgegrid-master:~ $ sudo qdel -f 3652274 root forced the deletion of job 36522... [16:49:30] 10Grid-Engine-to-K8s-Migration: Migrate bracketbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319608 (10komla) >>! In T319608#9394877, @Jonesey95 wrote: > This bot is long-dead. I helped with getting it set up on the user specification side. > > See https://en.wikipe... [16:51:36] 10cloud-services-team: Implement 2022 Feedback/Comments for 2023 Cloud Survey - https://phabricator.wikimedia.org/T334818 (10komla) Thanks! [16:51:50] 10cloud-services-team: Implement 2022 Feedback/Comments for 2023 Cloud Survey - https://phabricator.wikimedia.org/T334818 (10komla) 05Open→03Resolved [16:52:13] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew) P54340 [16:55:49] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jsn.sherman) hmm; I see that exim is configured to use `root@wmcloud.org`, which is what I see when I test with the... [16:56:20] 10Grid-Engine-to-K8s-Migration: Migrate wmds-archive from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320179 (10taavi) >>! In T320179#9395379, @Tgr wrote: > There is apparently no `lighttpd-plain` type in Kubernetes. I guess I just pick a programming language at random? PHP... [16:59:03] 10Grid-Engine-to-K8s-Migration: Migrate abbe98tools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319469 (10komla) @Abbe98 Thanks for the feedback. I have removed all your tools from the list. [17:00:52] 10Grid-Engine-to-K8s-Migration: Migrate articles-by-lat-lon-without-images from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319573 (10komla) I have responded on the Abbe98tools ticket T319469 [17:05:15] 10Grid-Engine-to-K8s-Migration: Migrate map-search from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319875 (10komla) I have responded on the Abbe98tools ticket T319469 [17:05:37] 10Grid-Engine-to-K8s-Migration: Migrate wmf-sitematrix from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320180 (10komla) I have responded on the Abbe98tools ticket T319469 [17:08:04] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) >>! In T353176#9398537, @Hawkeye7 wrote: > I tried logging in to gitlab as Ross Mallett at htt... [17:08:15] 10Grid-Engine-to-K8s-Migration: Migrate pibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319967 (10komla) >>! In T319967#9394795, @Mike_Peel wrote: > I'm the operator of Pi bot, I'll check into this as soon as I can. Noted! [17:09:06] 10Tools, 10WMDE-TechWish-Maintenance: Delete technischewuensche tool code repository in Diffusion - https://phabricator.wikimedia.org/T349847 (10Dzahn) thanks for that, WMDE-Fisch [17:10:00] 10Grid-Engine-to-K8s-Migration: Migrate bub from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319610 (10komla) @Soda it shows it's running. If it is no longer in use, kindly disable it and mark this as closed. [17:15:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [17:15:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:16:35] 10Grid-Engine-to-K8s-Migration: Migrate bracketbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319608 (10Jonesey95) I guess I should have said that this is probably a zombie process, since the bot has not edited (on en.WP) in seven years. [17:20:51] 10Grid-Engine-to-K8s-Migration: Migrate dplbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319701 (10komla) See the usage of the pywikibot image [[ https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts | here ]] Is your repo, public? [17:22:07] 10Grid-Engine-to-K8s-Migration: Migrate commons-maintenance-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319641 (10komla) Should this be disabled? If someone can confirm. [17:25:12] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew) Here's a new diff. This compares outputs from Nov 17 with today. The < is from the 17th, the < is today.... [17:25:50] 10Grid-Engine-to-K8s-Migration: Migrate fpcstats from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319752 (10komla) @bjh21 thanks. @mdaniels5757 kindly update. [17:26:20] 10Striker, 10GitLab (Integrations), 10Patch-For-Review, 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10CodeReviewBot) dancy opened https://gitlab.wikimedia.org/repos/releng/gitlab-se... [17:26:39] 10Grid-Engine-to-K8s-Migration: Migrate eatchabot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319713 (10komla) @bjh21 thanks. @mdaniels5757 kindly update. [17:27:27] 10Grid-Engine-to-K8s-Migration: Migrate vrb from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320129 (10komla) @bjh21 thanks. @mdaniels5757 kindly update. [17:27:41] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-19 [17:29:54] 10Grid-Engine-to-K8s-Migration: Migrate unpatrollededitstats from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320105 (10komla) @bjh21 thanks. @Stang kindly give an update. [17:29:58] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-18 [17:30:03] vivian-rook opened https://github.com/toolforge/superset-deploy/pull/12 [17:31:48] 10superset.wmcloud.org: sql backup to rotate after successful backup - https://phabricator.wikimedia.org/T352766 (10rook) a:03rook [17:31:48] vivian-rook closed https://github.com/toolforge/superset-deploy/pull/12 [17:32:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [17:35:03] (InstanceDown) firing: Project tools instance tools-sgeexec-10-19 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:40:03] (InstanceDown) resolved: Project tools instance tools-sgeexec-10-19 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:40:07] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew) >>! In T348643#9397062, @Jclark-ctr wrote: > @Andrew Dell is requesting smartctl output showing what dr... [17:45:19] !log taavi@cloudcumin1001 admin Added a new k8s worker tools-k8s-worker-100.tools.eqiad1.wikimedia.cloud to the cluster [17:45:19] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [18:19:59] 10Grid-Engine-to-K8s-Migration: Migrate dplbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319701 (10dcaro) >>! In T319701#9394732, @russblau wrote: > I do have some questions. For background, this tool consists of a PHP webserver that relies on a Toolsdb database, with... [18:52:37] 10Striker, 10GitLab (Integrations), 10Patch-For-Review, 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10CodeReviewBot) dancy merged https://gitlab.wikimedia.org/repos/releng/gitlab-se... [18:55:10] 10Striker, 10GitLab (Integrations), 10Patch-For-Review, 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10dancy) I ran the `fix-auth-provider` script and updated about 667 accounts. [19:03:16] (03PS1) 10Andrew Bogott: launch-instance workflow: allow keypair panel for all launches [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982453 (https://phabricator.wikimedia.org/T326818) [19:08:02] (03PS1) 10Andrew Bogott: launch-instance: add caption to keypair panel about puppet [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982454 (https://phabricator.wikimedia.org/T326818) [19:10:40] 10Toolforge Build Service (Beta release): [buildservice] Feature request - Indicate when long envvars are cutoff when listing - https://phabricator.wikimedia.org/T353287 (10Amorymeltzer) [19:16:21] 10Toolforge Build Service: [buildservice] Feature request - Indicate when long envvars are cutoff when listing - https://phabricator.wikimedia.org/T353287 (10taavi) [19:16:28] 10Toolforge Build Service: [buildservice] Cache .m2 folder (local maven repository) between builds - https://phabricator.wikimedia.org/T350307 (10taavi) [19:16:34] 10Toolforge Build Service: Add Rust buildpack to Toolforge build service - https://phabricator.wikimedia.org/T337066 (10taavi) [19:17:52] 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [tbs.beta] Create a toolforge build service beta release - https://phabricator.wikimedia.org/T267374 (10taavi) 05In progress→03Resolved [19:17:56] 10Cloud Services Proposals, 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs - https://phabricator.wikimedia.org/T194332 (10taavi) [19:19:18] 10Toolforge, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] Add triggering support - https://phabricator.wikimedia.org/T334587 (10taavi) [19:20:09] 10Toolforge Jobs framework: Replace already completed one-off jobs when starting a new one - https://phabricator.wikimedia.org/T352989 (10taavi) [19:20:43] 10Grid-Engine-to-K8s-Migration, 10WMCZ-General: Make it possible to run pandoc in Toolforge's jobs framework - https://phabricator.wikimedia.org/T345029 (10taavi) [19:21:00] 10Toolforge Jobs framework: toolforge-jobs --wait will only wait 5 minutes - https://phabricator.wikimedia.org/T352945 (10taavi) [19:21:18] 10Toolforge Jobs framework: Show a job status when a job is being deleted - https://phabricator.wikimedia.org/T348242 (10taavi) [19:21:36] 10Toolforge Jobs framework: Add health check support to toolforge-jobs - https://phabricator.wikimedia.org/T348512 (10taavi) [19:57:53] 10Tool-Pageviews, 10Data-Engineering-Icebox: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Dominicbm) Hi @mforns, sorry for late reply. I think I am not sure how the Commons Impact Metrics project is going to affect the existing AQS APIs. For my own pa... [20:03:35] 10Grid-Engine-to-K8s-Migration: Migrate welcomebots-bn from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320145 (10Ahmad_Kanik) 05Open→03Resolved a:03Ahmad_Kanik Script of this bot which was running on GridEngine is not working. I've shut down those jobs. We will try to... [20:17:58] (03PS2) 10Andrew Bogott: launch-instance: add caption to keypair panel about puppet [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982454 (https://phabricator.wikimedia.org/T326818) [20:18:00] (03PS2) 10Andrew Bogott: launch-instance workflow: allow keypair panel for all launches [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982453 (https://phabricator.wikimedia.org/T326818) [20:20:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:20:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [20:27:39] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] launch-instance: add caption to keypair panel about puppet [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982454 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:27:46] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] launch-instance workflow: allow keypair panel for all launches [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982453 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:55:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [21:01:45] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [21:03:21] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) >>! In T353176#9401092, @dancy wrote: > I ran the `fix-auth-provider` script and updated about... [21:21:02] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10dancy) I just tried adding the openid_connect provider identity with extern_uid `echidnalives` to the... [21:47:05] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) >>! In T353176#9401490, @dancy wrote: > I just tried adding the openid_connect provider identi... [21:53:19] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) [22:00:07] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) 05In progress→03Resolved I am going to call this task {{Done}} per "My check from T353176#... [22:44:10] (03PS1) 10Andrew Bogott: Mild reformatting of the keypair panel [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982487 [22:44:36] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Mild reformatting of the keypair panel [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/982487 (owner: 10Andrew Bogott) [22:45:56] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3697 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [23:03:14] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10Hawkeye7) @bd808 I can confirm that it is working now. Thanks for your assistance in resolving this p... [23:09:39] 10Grid-Engine-to-K8s-Migration: Migrate bracketbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319608 (10komla) I have disabled the tool [23:09:57] 10Grid-Engine-to-K8s-Migration: Migrate bracketbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319608 (10komla) 05Open→03Resolved [23:20:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [23:22:45] 10Toolforge, 10Patch-Needs-Improvement: Introduce static HTML webservice type on Toolforge - https://phabricator.wikimedia.org/T241817 (10Pppery) [23:25:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:25:56] (ToolsToolsDBReplicationLagIsTooHigh) resolved: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3936 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [23:43:56] (ToolsGridQueueProblem) firing: Grid queue webgrid-lighttpd@tools-sgeweblight-10-21.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem