[00:06:01] 10Grid-Engine-to-K8s-Migration: Migrate telegram-wikilinksbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320082 (10jhsoby) Thanks for the reminder, @komla! I've successfully (thanks to Lucas Werkmeister) been able to change this from using `jstart` to `toolforge jobs`... [00:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:12:03] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:19:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:27:38] PROBLEM - Check unit status of backup_cinder_volumes on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:57:19] 10Grid-Engine-to-K8s-Migration: Migrate governance-timeline from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319778 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need... [00:57:23] 10Grid-Engine-to-K8s-Migration: Migrate gnubotmarcoo from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319776 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to mig... [00:57:26] 10Grid-Engine-to-K8s-Migration: Migrate geophotoreq from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319766 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migr... [00:57:28] 10Grid-Engine-to-K8s-Migration: Migrate gendergapdashboard from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319764 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need... [00:57:30] 10Grid-Engine-to-K8s-Migration: Migrate ganreportbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319763 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to mig... [00:57:32] 10Grid-Engine-to-K8s-Migration: Migrate fvcbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319760 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:57:34] 10Grid-Engine-to-K8s-Migration: Migrate furutani from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319759 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate... [00:57:37] 10Grid-Engine-to-K8s-Migration: Migrate fshbibbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319757 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrat... [00:57:39] 10Grid-Engine-to-K8s-Migration: Migrate fscbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319756 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:57:41] 10Grid-Engine-to-K8s-Migration: Migrate fr-wikiversity-ns from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319753 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need t... [00:57:43] 10Grid-Engine-to-K8s-Migration: Migrate footygen from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319749 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate... [00:57:45] 10Grid-Engine-to-K8s-Migration: Migrate flossbrowser from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319747 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to mig... [00:57:47] 10Grid-Engine-to-K8s-Migration: Migrate family from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319739 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:57:49] 10Grid-Engine-to-K8s-Migration: Migrate ext-lnk-discover from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319736 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to... [00:57:51] 10Grid-Engine-to-K8s-Migration: Migrate energybot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319722 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrat... [00:57:53] 10Grid-Engine-to-K8s-Migration: Migrate enboten from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319721 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate... [00:57:56] 10Grid-Engine-to-K8s-Migration: Migrate embeddedincount from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319720 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to... [00:57:58] 10Grid-Engine-to-K8s-Migration: Migrate edgars from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319717 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:58:00] 10Grid-Engine-to-K8s-Migration: Migrate edcounter from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319716 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrat... [00:58:02] 10Grid-Engine-to-K8s-Migration: Migrate dykstats from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319712 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate... [00:58:04] 10Grid-Engine-to-K8s-Migration: Migrate dykmoverbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319711 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migr... [00:58:06] 10Grid-Engine-to-K8s-Migration: Migrate dutchbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319708 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate... [00:58:08] 10Grid-Engine-to-K8s-Migration: Migrate dplbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319701 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:58:10] 10Grid-Engine-to-K8s-Migration: Migrate dow from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319700 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate to T... [00:58:12] 10Grid-Engine-to-K8s-Migration: Migrate dimastbkbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319679 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migr... [00:58:14] 10Grid-Engine-to-K8s-Migration: Migrate dibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319676 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate to... [00:58:16] 10Grid-Engine-to-K8s-Migration: Migrate dexibotnet from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319675 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migra... [00:58:18] 10Grid-Engine-to-K8s-Migration: Migrate derivative from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319671 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migra... [00:58:24] 10Grid-Engine-to-K8s-Migration: Migrate denisa from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319669 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [00:58:28] 10Grid-Engine-to-K8s-Migration: Migrate deltaquad-bots from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319668 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to m... [00:58:32] 10Grid-Engine-to-K8s-Migration: Migrate dbreps from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319665 (10komla) This is a reminder that the tool for which this ticket is created is still running on the Grid. The grid is deprecated and all remaining tools need to migrate t... [01:02:09] 10Grid-Engine-to-K8s-Migration: Migrate telegram-wikilinksbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320082 (10komla) >>! In T320082#9343451, @jhsoby wrote: > Thanks for the reminder, @komla! > > I've successfully (thanks to Lucas Werkmeister) been able to change... [01:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:39:43] 10Grid-Engine-to-K8s-Migration: Migrate telegram-wikilinksbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320082 (10jhsoby) 05Open→03Resolved Thanks! [01:55:01] 10Grid-Engine-to-K8s-Migration: Migrate jhstools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319829 (10jhsoby) 05Open→03Resolved [02:17:03] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate tools-sgegrid-shadow.tools.eqiad.wmflabs is about to expire in 27d 20h 58m 14s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:34:33] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [03:40:44] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:46:05] 10Tool-nppbrowser: Create a new page for popular unreviewed articles - https://phabricator.wikimedia.org/T344584 (10MPGuy2824) 05Open→03Resolved a:03MPGuy2824 Report: https://nppbrowser.toolforge.org/popular-unreviewed.php Commit: https://gitlab.wikimedia.org/toolforge-repos/nppbrowser/-/commit/fe0b3a4a538... [05:17:03] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate tools-sgegrid-shadow.tools.eqiad.wmflabs is about to expire in 27d 17h 58m 14s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [05:34:33] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [06:43:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:48:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:18:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:23:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:40:44] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:17:03] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate tools-sgegrid-shadow.tools.eqiad.wmflabs is about to expire in 27d 14h 58m 14s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [08:34:33] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:35:00] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [09:35:45] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [09:49:33] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:54:56] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: track and apply Toolforge quota changes via a Git repository - https://phabricator.wikimedia.org/T324558 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/6 Al... [10:02:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-cli' version '0.3.5' [10:02:21] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-cli' version '0.3.5' [10:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [10:27:03] (PuppetCertificateAboutToExpire) resolved: Puppet CA certificate tools-sgegrid-shadow.tools.eqiad.wmflabs is about to expire in 27d 12h 53m 14s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:18:19] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [11:40:45] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:04:07] 10Cloud-VPS (Quota-requests): Additional storage for procbot - https://phabricator.wikimedia.org/T351608 (10rook) 05Open→03In progress a:03rook [12:10:11] 10Cloud-VPS (Quota-requests): Additional storage for procbot - https://phabricator.wikimedia.org/T351608 (10rook) All set! ` root@cloudcontrol1005:~# openstack quota set --gigabytes 120 procbot ` [12:10:51] 10Cloud-VPS (Quota-requests): Additional storage for procbot - https://phabricator.wikimedia.org/T351608 (10rook) 05In progress→03Resolved [12:17:19] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/132 maintain-ku... [12:17:26] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: track and apply Toolforge quota changes via a Git repository - https://phabricator.wikimedia.org/T324558 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/132 ma... [12:33:19] (HAProxyBackendUnavailable) firing: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:36:54] 10Cloud-VPS, 10Data-Services, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Data-Persistence (work done): Support maintain-dbusers on the new network layout - https://phabricator.wikimedia.org/T347381 (10taavi) 05Open→03Resolved [12:36:56] 10cloud-services-team (FY2023/2024-Q1-Q2), 10SRE, 10ops-eqiad: cloudcontrol1006: move to new network setup - https://phabricator.wikimedia.org/T346891 (10taavi) [12:37:00] 10Cloud-VPS, 10cloud-services-team, 10SRE, 10observability, and 3 others: Switch rsyslog from gtls to ossl - https://phabricator.wikimedia.org/T324623 (10jbond) Reading the task it seems like the last blocker was to "wait out buster" (T324623#8449852). however as we have now deployed this to buster (T32462... [12:37:18] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520 (10taavi) [12:37:30] 10Cloud-VPS, 10Data-Services, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-Marostegui: [Feature request] Database as a Service (Trove) for Cloud VPS projects - https://phabricator.wikimedia.org/T212595 (10taavi) [12:37:47] 10Cloud-VPS, 10Data-Services, 10cloud-services-team, 10User-Marostegui: Support Openstack Swift APIs via the radosgw - https://phabricator.wikimedia.org/T276961 (10taavi) 05Open→03Resolved a:03taavi [12:38:32] 10Data-Services: add proper dry-run/diff mode to maintain-views - https://phabricator.wikimedia.org/T351637 (10taavi) [12:47:01] 10Toolforge (Quota-requests): Request increased quota for Legoktm's Rust Toolforge tools - https://phabricator.wikimedia.org/T351604 (10rook) >>! In T351604#9343239, @taavi wrote: > The old default quotas had a K8s `LimitRange` that restricted pods to 1 CPU. The new quotas incremented in {T333979} increase them,... [13:03:41] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:03:55] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [13:04:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:04:16] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [13:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [13:34:17] (03Abandoned) 10Nikerabbit: Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/975795 (owner: 10L10n-bot) [13:34:36] (03Abandoned) 10Nikerabbit: Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/975794 (owner: 10L10n-bot) [13:34:44] (03Abandoned) 10Nikerabbit: Localisation updates from https://translatewiki.net. [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/975793 (owner: 10L10n-bot) [13:41:51] 10Toolforge (Toolforge iteration 02), 10Documentation: [tbs] Create a tutorial on how to deploy a ruby on rails tool using build service - https://phabricator.wikimedia.org/T347402 (10Slst2020) [13:56:47] 10Toolforge (Toolforge iteration 02), 10Documentation, 10Kubernetes: Add a easy way to run a ruby webservice on tools - https://phabricator.wikimedia.org/T141388 (10Slst2020) [13:57:01] 10Toolforge (Toolforge iteration 02), 10Documentation: [tbs] Create a tutorial on how to deploy a ruby on rails tool using build service - https://phabricator.wikimedia.org/T347402 (10Slst2020) 05In progress→03Resolved [14:07:47] 10Toolforge (Toolforge iteration 02), 10Documentation, 10Kubernetes: Add a easy way to run a ruby webservice on tools - https://phabricator.wikimedia.org/T141388 (10Slst2020) As @dcaro mentioned, Toolforge Build Service supports Ruby applications, including Rails and Rack and the ability to serve node.js ass... [14:08:58] 10Toolforge (Toolforge iteration 02), 10Documentation, 10Kubernetes: Add a easy way to run a ruby webservice on tools - https://phabricator.wikimedia.org/T141388 (10Slst2020) a:03Slst2020 [14:09:38] 10Toolforge (Toolforge iteration 02), 10Documentation, 10Kubernetes: Add a easy way to run a ruby webservice on tools - https://phabricator.wikimedia.org/T141388 (10Slst2020) 05Open→03Resolved [14:09:43] 10Cloud-VPS (Project-requests), 10WLM-Italy, 10WLM-Italy-Finder: Request creation of wlmitaly VPS project - https://phabricator.wikimedia.org/T250118 (10Slst2020) [14:45:29] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:47:00] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/8 Automatical... [14:47:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:47:47] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [14:47:51] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: track and apply Toolforge quota changes via a Git repository - https://phabricator.wikimedia.org/T324558 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/132 ma... [14:47:56] 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Patch-For-Review: Automatically apply quota changes to existing tools - https://phabricator.wikimedia.org/T350873 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/132 maintain-ku... [14:48:23] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:48:34] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [15:00:31] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [15:00:43] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [15:00:55] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [15:01:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [15:13:27] 10Tool-refill: Upgrade to Python 3.9.2 - https://phabricator.wikimedia.org/T351648 (10Curb_Safe_Charmer) [15:19:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance metricsinfra-puppetmaster-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:29:03] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:39:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:44:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:49:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:59:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-puppetmaster-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:06:33] 10Grid-Engine-to-K8s-Migration, 10User-bd808: Migrate officewikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319934 (10bd808) Interactive debugging of things in the buildpack created image can be done using `webservice shell`. This can be made a bit easier by first... [16:09:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:09:37] 10Cloud-VPS, 10cloud-services-team: [Cloud VPS alert][puppet-dev] Puppet failure on pdev-pdb.puppet-dev.eqiad1.wikimedia.cloud (172.16.6.86) - https://phabricator.wikimedia.org/T287751 (10jbond) 05Open→03Resolved this has since been fixed [16:14:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [16:19:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:19:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:33:34] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:34:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:34:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:39:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:59:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:59:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:04:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:19:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:24:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:34:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:34:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-haproxy-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:39:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:43:19] (HAProxyBackendUnavailable) firing: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:54:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:54:22] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: maintain-harbor: code refactor for readability and quality - https://phabricator.wikimedia.org/T351277 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/19 [maintain-harbor] minor... [17:59:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:14:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:19:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:23:00] 10Grid-Engine-to-K8s-Migration, 10User-bd808: Migrate officewikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319934 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/toolforge-repos/officewikibot-pywikibot/-/merge_requests/1 Make image work with BotPassw... [18:24:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:29:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:45:45] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:59:21] 10Grid-Engine-to-K8s-Migration, 10User-bd808: Migrate officewikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319934 (10bd808) 05In progress→03Resolved Let's see how this goes: ` $ toolforge jobs show fix-double-redirects +-------------+-------------------------... [19:00:55] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:02:57] RECOVERY - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is OK: OK: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:04:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:04:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:09:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:14:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:14:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:14:41] PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:19:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:19:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:34:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:39:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:54:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:54:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:55:29] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:04:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:09:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:44:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:44:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:49:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:49:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:04:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:04:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:09:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:24:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:29:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:43:20] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [21:44:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:44:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:49:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:49:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:54:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:54:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:59:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:14:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:14:33] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [22:19:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:39:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:44:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:48:30] 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10Kanashimi) @taavi Is this problem due to CPU limit? [22:49:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:53:20] (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [23:04:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:04:03] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:09:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:14:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:19:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:24:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:24:03] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance metricsinfra-alertmanager-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:34:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:39:03] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance metricsinfra-alertmanager-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [23:55:45] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse