[00:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:19:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [01:30:36] 10Toolforge (Quota-requests): Request increased quota for cewbot, toc, signature-checker, mgp-cewbot Toolforge tool - https://phabricator.wikimedia.org/T353104 (10Kanashimi) [01:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [02:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:01:27] 10Toolforge (Quota-requests): Request increased quota for cewbot, toc, signature-checker, mgp-cewbot Toolforge tool - https://phabricator.wikimedia.org/T353104 (10Kanashimi) [03:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:54:00] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [04:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [05:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:19:43] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:59:01] (PuppetConstantChange) resolved: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [08:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:24:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:44:27] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:50:24] 10Grid-Engine-to-K8s-Migration: Migrate video-cut-tool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320123 (10Gopavasanth) Hey @Aklapper, I already deleted these tools at toolforge many days back: https://toolsadmin.wikimedia.org/tools/?q=video let me know if I still nee... [08:51:01] 10Grid-Engine-to-K8s-Migration: Migrate video-cut-tool-back-end from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320124 (10Gopavasanth) This tool is no more exist as per: https://toolsadmin.wikimedia.org/tools/?q=video I have deleted it many days back. [08:57:37] (CephSlowOps) firing: Ceph cluster in eqiad has 64 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [08:57:42] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [09:02:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 64 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:21:02] 10Tool-pm20-search: Angabe der Dokumentenanzahl in der Ergebnisliste - https://phabricator.wikimedia.org/T353108 (10Jneubert) [09:21:56] 10Tool-pm20-search: Sortierung der Suchergebnisse - https://phabricator.wikimedia.org/T353107 (10Jneubert) [09:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [11:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:06:31] 10Toolforge (Software install/update), 10User-bd808: mysqldump is not present in Kubernetes container images - https://phabricator.wikimedia.org/T254636 (10SD0001) @MusikAnimal Surely the mysqldump command needs a `--host` argument? The error message indicates it's trying to connect to the local mysql socket,... [11:25:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [11:30:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:11:45] 10Grid-Engine-to-K8s-Migration: Migrate phetools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319965 (10Soda) @Xover Could you add me as one of the maintainers (mainly to be able to see the current code). Also, do we have any statistics on how much the OCR endpoint is b... [12:29:30] 10Grid-Engine-to-K8s-Migration: Migrate phetools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319965 (10Xover) >>! In T319965#9395054, @Soda wrote: > @Xover Could you add me as one of the maintainers (mainly to be able to see the current code). Done. > Also, do we have... [12:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:42:02] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10LucasWerkmeister) Okay, I think the general plan is: - Build an image that contains //all// the needed software. Java, Node, Python 3, whatever else I forgot. -... [13:55:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [14:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:09:12] 10Tool-Global-user-contributions: GUC unreachable - https://phabricator.wikimedia.org/T351194 (10Huajie2020) p:05Triage→03Unbreak! a:03------ [14:46:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:51:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [15:41:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:06:38] 10Grid-Engine-to-K8s-Migration: Migrate afdstats2 from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319487 (10Ahecht) 05Open→03Resolved [16:06:56] 10Grid-Engine-to-K8s-Migration: Migrate croptool-test from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319654 (10Ahecht) 05Open→03Resolved [16:12:44] 10Tool-Global-user-contributions: GUC unreachable - https://phabricator.wikimedia.org/T351194 (10Anoop) p:05Unbreak!→03Triage a:05------→03None [16:15:10] 10Grid-Engine-to-K8s-Migration, 10Toolforge: Cannot stop ahechtbot webservice on gridengine, stuck in "dr" state. - https://phabricator.wikimedia.org/T353112 (10Ahecht) [16:55:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [17:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:09:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:14:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:25:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:30:29] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:33:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:37:21] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:41:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:52:30] 10Tool-ldap: 500 Internal Server Error when visiting https://ldap.toolforge.org/group/tools.php (or any group that doesn't exist) - https://phabricator.wikimedia.org/T329739 (10Legoktm) a:03Legoktm [19:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [20:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:48:50] 10Tool-ldap: Display LDAP group members alphabetically - https://phabricator.wikimedia.org/T334898 (10Legoktm) 05Open→03Resolved Done. [20:48:56] 10Tool-ldap: 500 Internal Server Error when visiting https://ldap.toolforge.org/group/tools.php (or any group that doesn't exist) - https://phabricator.wikimedia.org/T329739 (10Legoktm) 05Open→03Resolved Will now return a 404 for groups that don't exist. [21:46:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:11:10] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] WMF hacks: replace key and metadata panels for VM creation [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/980035 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [22:32:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [23:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed