[01:34:50] <wikibugs>	 (03update) 10ahecht: Draft: Cache database queries [toolforge-repos/afdstats] - 10https://gitlab.wikimedia.org/toolforge-repos/afdstats/-/merge_requests/3
[01:44:35] <wikibugs>	 (03update) 10ahecht: Draft: Cache database queries [toolforge-repos/afdstats] - 10https://gitlab.wikimedia.org/toolforge-repos/afdstats/-/merge_requests/3
[02:01:48] <wikibugs>	 (03PS1) 10Jacob4code: Cannot read properties of undefined fixed and minor changes. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1163075
[06:58:53] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10940830 (10Jelto) >>! In T396739#10934468, @Andrew wrote: > I'm still wrestling with gitlab-prod-...
[07:00:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-9 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[07:05:30] <wikibugs>	 (03update) 10dcaro: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785
[07:18:34] <wikibugs>	 (03PS1) 10Muehlenhoff: Add dummy secrets for debmonitor_dev [labs/private] - 10https://gerrit.wikimedia.org/r/1163211
[07:26:20] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[07:26:20] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown
[07:26:20] <wmcs-alerts>	 FIRING: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[07:30:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-9 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[07:31:20] <wmcs-alerts>	 RESOLVED: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[07:31:20] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[07:31:25] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown
[07:34:55] <wikibugs>	 (03open) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[07:39:29] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[07:41:04] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[07:42:16] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[07:44:01] <wikibugs>	 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10940955 (10taavi) 05Open→03Resolved a:03ssingh Thanks! Everything looks fine from my end so closing.
[07:45:40] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[07:46:09] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Add dummy secrets for debmonitor_dev [labs/private] - 10https://gerrit.wikimedia.org/r/1163211 (owner: 10Muehlenhoff)
[07:56:20] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[08:35:07] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Create OpenStack role that allows object storage access only - https://phabricator.wikimedia.org/T396594#10941074 (10taavi) 05Resolved→03Open Minor problem: this role doesn't have access to create ec2 creds: `lang=shell-session taavi@cloudcontrol1007 ~ $  export OS_PASSW...
[08:36:05] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[08:37:51] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Add dummy secrets for debmonitor_dev [labs/private] - 10https://gerrit.wikimedia.org/r/1163211 (owner: 10Muehlenhoff)
[08:57:32] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[09:36:17] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10941318 (10Addshore) >>! In T321919#10939110, @dcaro wrote: > Can that be split from the cli?  Yup. It's...
[09:45:35] <wikibugs>	 (03update) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T386480)
[09:56:11] <wikibugs>	 (03approved) 10dcaro: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi)
[09:56:45] <wikibugs>	 (03merge) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T386480)
[09:57:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging
[09:57:21] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging
[09:57:55] <wikibugs>	 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 Several correlated potentially network issues during the night - https://phabricator.wikimedia.org/T397566#10941411 (10dcaro) The issue moved to -9, it had a blip this morning triggeringa page due to missing data..
[09:58:29] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging
[09:58:36] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging
[10:00:45] <wikibugs>	 (03open) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480)
[10:00:48] <wikibugs>	 (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480)
[10:01:29] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging
[10:03:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component logging
[10:04:18] <wikibugs>	 (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480)
[10:04:19] <wikibugs>	 (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:04:21] <wikibugs>	 (03open) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:04:25] <wikibugs>	 (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:04:29] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging
[10:04:42] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging
[10:10:38] <wikibugs>	 (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:16:55] <wikibugs>	 (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480)
[10:16:55] <wikibugs>	 (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:16:55] <wikibugs>	 (03open) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480)
[10:16:56] <wikibugs>	 (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480)
[10:16:57] <wikibugs>	 (03open) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826
[10:16:58] <wikibugs>	 (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826
[10:17:09] <wikibugs>	 (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480)
[10:17:13] <wikibugs>	 (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826
[10:26:51] <wikibugs>	 (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480)
[10:26:51] <wikibugs>	 (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480)
[10:26:52] <wikibugs>	 (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480)
[10:26:53] <wikibugs>	 (03open) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480)
[10:26:54] <wikibugs>	 (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480)
[10:26:55] <wikibugs>	 (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480)
[10:27:01] <wikibugs>	 (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480)
[10:27:04] <wikibugs>	 (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480)
[10:30:37] <wikibugs>	 (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480)
[10:45:17] <wikibugs>	 06cloud-services-team: openstack: mirror cloudrabbit setup from eqiad1 to codfw1dev - https://phabricator.wikimedia.org/T377934#10941722 (10Aklapper) Setting project tag to #cloud-services-team for reeval as this open task has not other //active// project tags otherwise
[10:49:27] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: openstack: mirror cloudrabbit setup from eqiad1 to codfw1dev - https://phabricator.wikimedia.org/T377934#10941740 (10fnegri) Thanks @Aklapper, adding #cloud-vps as well.
[10:51:33] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: openstack: mirror cloudrabbit setup from eqiad1 to codfw1dev - https://phabricator.wikimedia.org/T377934#10941743 (10fnegri) @Andrew is this actually completed? If yes, please resolve this task.
[10:53:56] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: tofu-infra: implement some state backup mechanism - https://phabricator.wikimedia.org/T389964#10941746 (10fnegri) a:05aborrero→03None
[10:53:57] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 07Epic: Cloud VPS: extend tofu-infra coverage - https://phabricator.wikimedia.org/T370037#10941747 (10fnegri) a:05aborrero→03None
[10:53:58] <wikibugs>	 06cloud-services-team, 10Toolforge: lima-kilo: container image caching - https://phabricator.wikimedia.org/T362967#10941748 (10fnegri) a:05aborrero→03None
[10:54:00] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: opentofu-created flavors may be disabled by default - https://phabricator.wikimedia.org/T391252#10941749 (10fnegri) a:05aborrero→03None
[10:54:03] <wikibugs>	 06cloud-services-team, 10Toolforge: [k8s,kyverno]: explore change from per-namespace policy resource to a single ClusterPolicy resource - https://phabricator.wikimedia.org/T368135#10941750 (10fnegri) a:05aborrero→03None
[11:07:13] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[11:09:36] <wikibugs>	 (03PS1) 10Arendpieter: Remove support for SUL 'realname' field. [labs/striker] - 10https://gerrit.wikimedia.org/r/1163331 (https://phabricator.wikimedia.org/T384206)
[11:10:16] <wikibugs>	 (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[11:15:12] <wikibugs>	 (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[11:15:24] <wikibugs>	 (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[11:20:04] <wikibugs>	 06cloud-services-team, 10Striker: Use IDP for authentication in Striker - https://phabricator.wikimedia.org/T359554#10941815 (10Arendpieter)
[11:21:26] <wikibugs>	 (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[11:22:08] <wikibugs>	 (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[11:26:48] <wikibugs>	 (03update) 10dcaro: deploy: add all the missing options for continuous job [repos/cloud/toolforge/components-api] (generate_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T395070)
[11:30:29] <wikibugs>	 (03update) 10dcaro: scheduled: add scheduled component support [repos/cloud/toolforge/components-api] (add_all_continuous_options) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T395071)
[11:31:56] <wikibugs>	 (03open) 10taavi: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54
[11:32:00] <wikibugs>	 (03update) 10taavi: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54
[11:33:26] <wikibugs>	 10Tools: Improving the New-Q5 web application - https://phabricator.wikimedia.org/T337005#10941838 (10Aklapper) 05Open→03Resolved Closing per last comment
[11:33:45] <wikibugs>	 (03update) 10taavi: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54
[11:43:25] <wikibugs>	 (03CR) 10Majavah: "recheck" [labs/striker] - 10https://gerrit.wikimedia.org/r/1163331 (https://phabricator.wikimedia.org/T384206) (owner: 10Arendpieter)
[11:45:09] <wikibugs>	 (03approved) 10dcaro: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54 (owner: 10taavi)
[11:45:59] <wikibugs>	 (03update) 10taavi: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54
[11:46:05] <wikibugs>	 (03merge) 10taavi: Stop setting project ID when not needed [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/54
[11:53:31] <wikibugs>	 06cloud-services-team, 10Toolforge: [toolforge,infra] Cntralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10941876 (10taavi) a:03taavi
[11:53:36] <wikibugs>	 (03open) 10taavi: logging: Deploy remaining Loki buckets [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/55 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861)
[11:53:39] <wikibugs>	 (03update) 10taavi: logging: Deploy remaining Loki buckets [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/55 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861)
[11:53:57] <wikibugs>	 (03update) 10taavi: logging: Deploy remaining Loki buckets [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/55 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861)
[12:03:37] <wikibugs>	 (03merge) 10taavi: toolforge: Install real `become` from misctools [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/248
[12:03:51] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Move cloudsw2-d5-eqiad servers to cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T334644#10941960 (10Aklapper) a:05Jclark-ctr→03None @Jclark-ctr Removing task assignee as this open task has been assigned for more than two years...
[12:05:37] <wikibugs>	 10Tool-refill: Toolforge: refill doesn't work on Wikipedia language versions other than English - https://phabricator.wikimedia.org/T295327#10942015 (10Aklapper) a:05Curb_Safe_Charmer→03None @Curb_Safe_Charmer Removing task assignee as this open task has been assigned for more than two years - See the email...
[12:06:23] <wikibugs>	 06cloud-services-team, 10Toolforge: Store state information for the disable tool process outside NFS - https://phabricator.wikimedia.org/T332514#10942040 (10Aklapper) a:05Andrew→03None @Andrew Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2025-05...
[12:08:02] <wikibugs>	 06cloud-services-team, 10Toolforge: [jobs-cli,jobs-api] make API and CLI key/values coherent - https://phabricator.wikimedia.org/T327280#10942087 (10Aklapper) a:05Raymond_Ndibe→03None @Raymond_Ndibe Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2...
[12:08:52] <wikibugs>	 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Add all missing options for scheduled components - https://phabricator.wikimedia.org/T395071#10942107 (10dcaro) 05Open→03In progress
[12:08:58] <wikibugs>	 10Toolforge (Toolforge iteration 21): [components-api] Add support for scheduled components - https://phabricator.wikimedia.org/T395065#10942109 (10dcaro) a:03dcaro
[12:09:01] <wikibugs>	 10Toolforge (Toolforge iteration 21): [components-api] Add support for scheduled components - https://phabricator.wikimedia.org/T395065#10942111 (10dcaro) 05Open→03In progress
[12:11:25] <wikibugs>	 (03update) 10dcaro: scheduled: add scheduled component support [repos/cloud/toolforge/components-api] (add_all_continuous_options) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T395071)
[12:11:39] <wikibugs>	 (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822
[12:20:20] <wikibugs>	 (03approved) 10taavi: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785 (owner: 10dcaro)
[12:21:22] <wikibugs>	 (03update) 10dcaro: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785
[12:21:52] <wikibugs>	 10VPS-Projects: Cleanup memberships of maps project - https://phabricator.wikimedia.org/T323412#10942174 (10Aklapper) a:05TheDJ→03None @TheDJ: Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2025-05-22. Please assign this task to yourself again if yo...
[12:22:37] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api
[12:22:41] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api
[12:22:57] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api
[12:23:16] <wikibugs>	 10Tool-masto-collab: masto-collab: Support embedding Commons media - https://phabricator.wikimedia.org/T336121#10942232 (10Aklapper) a:05Legoktm→03None @Legoktm: Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2025-05-22. Please assign this task to y...
[12:23:26] <wikibugs>	 (03merge) 10dcaro: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785
[12:25:07] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[12:29:13] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] Deployment token should not be a GET param - https://phabricator.wikimedia.org/T397712 (10taavi) 03NEW
[12:32:20] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Move cloudsw2-d5-eqiad servers to cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T334644#10942550 (10Jclark-ctr) @Aklapper @ayounsi I hadn’t commented earlier because we needed to verify onsite that we still had enough available por...
[12:34:16] <wikibugs>	 (03open) 10dcaro: api-gateway: enable components-api [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/828
[12:38:40] <wikibugs>	 (03update) 10dcaro: api-gateway: enable components-api [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/828
[12:44:55] <wikibugs>	 (03approved) 10dcaro: api-gateway: enable components-api [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/828
[12:44:57] <wikibugs>	 (03merge) 10dcaro: api-gateway: enable components-api [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/828
[12:45:32] <wikibugs>	 (03open) 10ladsgroup: Update ES switchover script [toolforge-repos/switchmaster] - 10https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/merge_requests/11 (https://phabricator.wikimedia.org/T397628)
[12:45:47] <wikibugs>	 (03update) 10ladsgroup: Update ES switchover script [toolforge-repos/switchmaster] - 10https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/merge_requests/11 (https://phabricator.wikimedia.org/T397628)
[12:56:48] <wikibugs>	 (03open) 10dcaro: components-api: use internal api endpoint to talk to toolforge [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/829
[13:01:53] <wikibugs>	 (03approved) 10dcaro: components-api: use internal api endpoint to talk to toolforge [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/829
[13:01:55] <wikibugs>	 (03merge) 10dcaro: components-api: use internal api endpoint to talk to toolforge [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/829
[13:02:37] <wikibugs>	 (03open) 10dcaro: api-gateway: add components-api as superuser [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/830
[13:02:50] <wikibugs>	 (03approved) 10dcaro: api-gateway: add components-api as superuser [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/830
[13:02:51] <wikibugs>	 (03merge) 10dcaro: api-gateway: add components-api as superuser [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/830
[13:08:44] <wikibugs>	 (03open) 10dcaro: CI: add deployment to tools [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/5
[13:11:55] <wikibugs>	 (03open) 10dcaro: schemas: delete not needed type-ignore [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/6
[13:13:28] <wikibugs>	 (03approved) 10dcaro: schemas: delete not needed type-ignore [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/6
[13:13:30] <wikibugs>	 (03merge) 10dcaro: schemas: delete not needed type-ignore [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/6
[13:13:44] <wikibugs>	 (03update) 10dcaro: CI: add deployment to tools [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/5
[13:22:19] <wikibugs>	 (03approved) 10dcaro: CI: add deployment to tools [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/5
[13:22:25] <wikibugs>	 (03merge) 10dcaro: CI: add deployment to tools [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/5
[13:23:18] <wikibugs>	 10Cloud-VPS (Quota-requests): Quota increase required - https://phabricator.wikimedia.org/T397716 (10jnuche) 03NEW
[13:27:36] <wikibugs>	 10Toolforge (Toolforge iteration 21): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10942829 (10dcaro) 05In progress→03Resolved
[13:29:04] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21): [components-cli] Deploy to tools - https://phabricator.wikimedia.org/T397718 (10taavi) 03NEW
[13:29:24] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21): [components-cli] Deploy to tools - https://phabricator.wikimedia.org/T397718#10942849 (10taavi)
[13:29:29] <wikibugs>	 10Toolforge (Toolforge iteration 21): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10942850 (10taavi)
[13:50:16] <wikibugs>	 (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[13:53:18] <wikibugs>	 (03update) 10fnegri: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 (owner: 10dcaro)
[13:53:29] <wikibugs>	 (03approved) 10fnegri: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 (owner: 10dcaro)
[13:53:41] <wikibugs>	 (03merge) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88
[13:53:45] <wikibugs>	 (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[13:55:51] <wikibugs>	 (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[13:56:02] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.121-20250624135356-3eb4ef22 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/831
[13:56:05] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.121-20250624135356-3eb4ef22 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/831
[13:59:22] <jinxer-wm>	 FIRING: HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:04:22] <jinxer-wm>	 RESOLVED: HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:15:07] <wikibugs>	 10Toolforge (Toolforge iteration 21), 07good first task: [components-cli] bash autocomplete does not autocomplete file name when creating config - https://phabricator.wikimedia.org/T395077#10943069 (10Chuckonwumelu) a:03Chuckonwumelu
[14:15:11] <wikibugs>	 10Toolforge (Toolforge iteration 21), 07good first task: [components-cli] bash autocomplete does not autocomplete file name when creating config - https://phabricator.wikimedia.org/T395077#10943071 (10Chuckonwumelu) 05Open→03In progress
[14:20:20] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724 (10taavi) 03NEW
[14:22:43] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10943102 (10dcaro)
[14:22:46] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10943103 (10dcaro)
[14:38:59] <wikibugs>	 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 tools-prometheus-8 stopped responding for a bit - https://phabricator.wikimedia.org/T397563#10943132 (10fnegri) Related: {T397566}
[14:44:17] <wikibugs>	 10Toolforge (Toolforge iteration 21): [components-cli,toolforge-cli] add shortcuts to top-level cli for deploy/config - https://phabricator.wikimedia.org/T397725 (10dcaro) 03NEW
[14:51:52] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: openstack: mirror cloudrabbit setup from eqiad1 to codfw1dev - https://phabricator.wikimedia.org/T377934#10943197 (10Andrew) 05Open→03Resolved a:03Andrew This is done!
[14:55:44] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-cli] Deploy to tools - https://phabricator.wikimedia.org/T397718#10943224 (10taavi) 05Open→03Resolved
[14:55:57] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10943228 (10taavi)
[14:57:09] <wikibugs>	 (03update) 10dcaro: components-api: bump to 0.0.121-20250624135356-3eb4ef22 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/831 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[14:57:30] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[15:01:30] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[15:02:21] <logmsgbot_cloud>	 !log komla@cloudcumin1001 mwoffliner START - Cookbook wmcs.openstack.quota_increase (T396840)
[15:02:24] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api
[15:02:25] <stashbot>	 T396840: Increase RAM quota of mwoffliner project - https://phabricator.wikimedia.org/T396840
[15:02:28] <logmsgbot_cloud>	 !log komla@cloudcumin1001 mwoffliner END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T396840)
[15:03:20] <wikibugs>	 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Increase RAM quota of mwoffliner project - https://phabricator.wikimedia.org/T396840#10943291 (10komla) This has been done:  ` 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i wmcs-ope...-cloud novaadmin'. 100.0% (1/1) success...
[15:03:22] <wikibugs>	 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Increase RAM quota of mwoffliner project - https://phabricator.wikimedia.org/T396840#10943292 (10komla) 05Open→03Resolved
[15:05:10] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[15:05:44] <wikibugs>	 (03approved) 10dcaro: components-api: bump to 0.0.121-20250624135356-3eb4ef22 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/831 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:05:46] <wikibugs>	 (03merge) 10dcaro: components-api: bump to 0.0.121-20250624135356-3eb4ef22 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/831 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:06:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33
[15:10:05] <wikibugs>	 (03open) 10dcaro: components-api: enable deploy tests is tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/832
[15:10:42] <wikibugs>	 (03approved) 10dcaro: components-api: enable deploy tests is tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/832
[15:10:44] <wikibugs>	 (03merge) 10dcaro: components-api: enable deploy tests is tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/832
[15:12:29] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33
[15:14:04] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-33 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[15:15:52] <wikibugs>	 10Tool-centralnotice-banner-editor: Learn Vue - https://phabricator.wikimedia.org/T397729 (10MHorsey-WMF) 03NEW
[15:23:32] <wikibugs>	 (03update) 10dcaro: deploy: add all the missing options for continuous job [repos/cloud/toolforge/components-api] (generate_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T395070)
[15:24:10] <wikibugs>	 10Wikibugs: Wikibugs not reporting Phabricator activity to #wikimedia-zuul as hoped - https://phabricator.wikimedia.org/T396387#10943385 (10bd808) Working now apparently?  https://wm-bot.wmcloud.org/logs/%23wikimedia-zuul/20250620.txt `lang=irc [20:27]  < wikibugs> Continuous-Integration-Infrastructure (Zuul upg...
[15:29:21] <wikibugs>	 (03update) 10dcaro: scheduled: add scheduled component support [repos/cloud/toolforge/components-api] (add_all_continuous_options) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T395071)
[15:30:34] <wikibugs>	 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 tools-prometheus-8 stopped responding for a bit - https://phabricator.wikimedia.org/T397563#10943410 (10fnegri) This happened a few times over the past two weeks, always on the active node (the active node was flipped from -8 to -9 yesterday):  {F62445238}
[15:40:32] <wikibugs>	 (03update) 10dcaro: build: fail if ref failed to resolve [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/96
[15:50:10] <wikibugs>	 (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92
[16:01:37] <wikibugs>	 (03approved) 10fnegri: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) (owner: 10dcaro)
[16:07:11] <wikibugs>	 (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[16:07:12] <wikibugs>	 (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[16:08:57] <wikibugs>	 (03merge) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753)
[16:08:59] <wikibugs>	 (03update) 10dcaro: deploy: add all the missing options for continuous job [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T395070)
[16:11:17] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.122-20250624160905-00d6b4c5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/833 (https://phabricator.wikimedia.org/T394753)
[16:11:21] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.122-20250624160905-00d6b4c5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/833 (https://phabricator.wikimedia.org/T394753)
[16:14:56] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[16:19:14] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[16:19:32] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api
[16:23:10] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10943702 (10Andrew) This is looking like it might be upstream bug https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2020111  aka  https://bugs.launchpad.net/charm-nov...
[16:23:53] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[16:25:42] <wikibugs>	 (03approved) 10dcaro: components-api: bump to 0.0.122-20250624160905-00d6b4c5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/833 (https://phabricator.wikimedia.org/T394753) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[16:25:46] <wikibugs>	 (03merge) 10dcaro: components-api: bump to 0.0.122-20250624160905-00d6b4c5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/833 (https://phabricator.wikimedia.org/T394753) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:00:33] <wikibugs>	 10Cloud-VPS (Quota-requests): Quota increase required for Catalyst - https://phabricator.wikimedia.org/T397716#10943949 (10Aklapper)
[17:01:49] <wikibugs>	 (03approved) 10fnegri: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38 (owner: 10dcaro)
[17:02:20] <wikibugs>	 (03update) 10dcaro: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38
[19:04:10] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10944408 (10Andrew) I've become convinced that this was caused by openstack sometimes failing and leaving an RBD lock that's subequently invisible to openstack. And, in...
[19:34:18] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of  lemmy VPS project - https://phabricator.wikimedia.org/T396948#10944507 (10komla) @Gryllida any updates on this?
[20:13:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services
[20:13:50] <jinxer-wm>	 FIRING: [44x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[20:17:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:19:42] <icinga-wm>	 PROBLEM - Host cloudrabbit1002 is DOWN: PING CRITICAL - Packet loss = 100%
[20:20:34] <icinga-wm>	 RECOVERY - Host cloudrabbit1002 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms
[20:25:31] <wmcs-alerts>	 FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance cvn-app10 in project cvn   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[20:27:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services
[20:28:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services
[20:31:47] <wmcs-alerts>	 FIRING: [3x] ProbeDown: Service api.svc.toolforge.org:443 has failed probes (http_api_svc_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:35:28] <wmcs-alerts>	 FIRING: [3x] InstanceDown: Project cvn instance cvn-apache11 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:35:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project cvn   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[20:35:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-proxy-9 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:35:32] <wmcs-alerts>	 FIRING: InstanceDown: Project toolsbeta instance toolsbeta-proxy-8 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:35:35] <wmcs-alerts>	 FIRING: TargetDown: Job frontproxy-nginx is unreachable in project toolsbeta instance toolsbeta-proxy-8   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[20:37:10] <wmcs-alerts>	 FIRING: ProjectProxyMainProxyInstanceDown: Proxy on proxy-6 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MainProxyInstanceDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyInstanceDown
[20:38:22] <jinxer-wm>	 FIRING: HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[20:38:50] <jinxer-wm>	 FIRING: [2x] NeutronAgentDown: Neutron neutron-l3-agent on cloudnet1006 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[20:39:28] <wmcs-alerts>	 FIRING: TargetDown: Job main-nginx is unreachable in project project-proxy instance proxy-6   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[20:40:28] <wmcs-alerts>	 FIRING: InstanceDown: Project project-proxy instance proxy-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:40:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project toolsbeta instance toolsbeta-prometheus-2 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:40:28] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project tools instance tools-prometheus-8 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:41:47] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:42:28] <wmcs-alerts>	 FIRING: InstanceDown: Project metricsinfra instance metricsinfra-grafana-2 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:44:45] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[20:44:45] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown
[20:44:52] <wmcs-alerts>	 FIRING: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown
[20:44:53] <wmcs-alerts>	 FIRING: ProbeDown: Service toolsbeta-static-2:80 has failed probes (http_toolsbeta_static_wmcloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-static-2:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:45:10] <wmcs-alerts>	 FIRING: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown
[20:45:10] <wmcs-alerts>	 FIRING: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[20:45:27] <wmcs-alerts>	 FIRING: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown
[20:45:28] <wmcs-alerts>	 FIRING: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown
[20:45:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-proxy-10 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:45:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project cvn   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[20:45:31] <wmcs-alerts>	 FIRING: [3x] InstanceDown: Project cvn instance cvn-apache11 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:45:35] <wmcs-alerts>	 RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-proxy-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:45:39] <wmcs-alerts>	 RESOLVED: TargetDown: Job frontproxy-nginx is unreachable in project toolsbeta instance toolsbeta-proxy-7   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[20:45:43] <wmcs-alerts>	 FIRING: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown
[20:45:50] <wmcs-alerts>	 FIRING: ToolforgeKubernetesNodeNotReady: Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady
[20:45:54] <wmcs-alerts>	 FIRING: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown
[20:45:58] <wmcs-alerts>	 FIRING: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown
[20:46:02] <wmcs-alerts>	 FIRING: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown
[20:46:47] <wmcs-alerts>	 FIRING: [14x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:49:45] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown
[20:49:45] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down:  - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[20:49:52] <wmcs-alerts>	 RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown
[20:49:53] <wmcs-alerts>	 RESOLVED: [6x] ProbeDown: Service api.svc.beta.toolforge.org:443 has failed probes (http_api_svc_beta_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:50:10] <wmcs-alerts>	 RESOLVED: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown
[20:50:10] <wmcs-alerts>	 RESOLVED: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[20:50:27] <wmcs-alerts>	 RESOLVED: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown
[20:50:28] <wmcs-alerts>	 RESOLVED: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown
[20:50:28] <wmcs-alerts>	 RESOLVED: [3x] InstanceDown: Project cvn instance cvn-apache11 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:50:30] <wmcs-alerts>	 RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown
[20:50:38] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesNodeNotReady: Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady
[20:50:42] <wmcs-alerts>	 RESOLVED: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown
[20:50:46] <wmcs-alerts>	 RESOLVED: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown
[20:50:51] <wmcs-alerts>	 RESOLVED: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown
[20:50:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for all services
[20:55:26] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit rabbitmq_detect_partition.service is in failed status on host cloudrabbit1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:56:47] <wmcs-alerts>	 FIRING: [18x] ProbeDown: Service api.svc.toolforge.org:443 has failed probes (http_api_svc_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:56:58] <wmcs-alerts>	 RESOLVED: InstanceDown: Project project-proxy instance proxy-6 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:58:40] <wmcs-alerts>	 FIRING: [2x] ProjectProxyMainProxyInstanceDown: Proxy on proxy-5 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MainProxyInstanceDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyInstanceDown
[20:59:08] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:59:58] <wmcs-alerts>	 RESOLVED: InstanceDown: Project metricsinfra instance metricsinfra-grafana-2 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:59:58] <wmcs-alerts>	 FIRING: [4x] InstanceDown: Project project-proxy instance maps-proxy-5 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[20:59:58] <wmcs-alerts>	 FIRING: [3x] TargetDown: Job main-nginx is unreachable in project project-proxy instance proxy-6   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[21:00:08] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:00:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:00:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project project-proxy   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[21:00:43] <wmcs-alerts>	 FIRING: [4x] InstanceDown: Project tools instance tools-legacy-redirector-3 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:00:59] <jinxer-wm>	 FIRING: [2x] MetricsinfraAlertmanagerDown: Metricsinfra alertmanager is unreachable #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MetricsinfraAlertmanagerDown - TODO - https://alerts.wikimedia.org/?q=alertname%3DMetricsinfraAlertmanagerDown
[21:01:08] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:08] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:09] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:09] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1072 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:18] <wikibugs>	 06cloud-services-team: MetricsinfraAlertmanagerDown Metricsinfra alertmanager is unreachable # page - https://phabricator.wikimedia.org/T397782 (10phaultfinder) 03NEW
[21:01:18] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:18] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:21] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1074 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1066 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1064 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:34] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:40] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:42] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1076 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:01:44] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:02:18] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:02:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1074 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:03:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1074 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:03:40] <wmcs-alerts>	 RESOLVED: [2x] ProjectProxyMainProxyInstanceDown: Proxy on proxy-5 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MainProxyInstanceDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyInstanceDown
[21:04:20] <jinxer-wm>	 FIRING: [44x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[21:04:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:04:58] <wmcs-alerts>	 RESOLVED: [3x] InstanceDown: Project project-proxy instance maps-proxy-5 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:04:58] <wmcs-alerts>	 RESOLVED: [5x] TargetDown: Job main-nginx is unreachable in project project-proxy instance proxy-5   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[21:05:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services
[21:05:08] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1063 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:08] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:09] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:09] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1072 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:18] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1071 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:21] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1068 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:21] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1074 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:24] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1066 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:24] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1064 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:05:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project project-proxy   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[21:05:34] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1070 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1075 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:41] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:41] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:41] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1076 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:42] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1073 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:05:43] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-legacy-redirector-3 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[21:05:44] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:06:47] <wmcs-alerts>	 RESOLVED: [8x] ProbeDown: Service tools-legacy-redirector-3:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-3:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:09:07] <jinxer-wm>	 RESOLVED: [7x] HAProxyBackendUnavailable: HAProxy service glance-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[21:09:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:13:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services
[21:17:02] <wmcs-alerts>	 FIRING: [9x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:17:29] <jinxer-wm>	 RESOLVED: [2x] MetricsinfraAlertmanagerDown: Metricsinfra alertmanager is unreachable #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MetricsinfraAlertmanagerDown - TODO - https://alerts.wikimedia.org/?q=alertname%3DMetricsinfraAlertmanagerDown
[21:18:10] <wmcs-alerts>	 RESOLVED: [9x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:20:26] <jinxer-wm>	 FIRING: [4x] SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:23:34] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Rabbitmq, neutron-openvswitch-agent, and network outages - https://phabricator.wikimedia.org/T397783 (10Andrew) 03NEW
[21:23:44] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Rabbitmq, neutron-openvswitch-agent, and network outages - https://phabricator.wikimedia.org/T397783#10945024 (10Andrew) p:05Triage→03High
[21:24:20] <jinxer-wm>	 RESOLVED: [44x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[21:25:26] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit rabbitmq_detect_partition.service is in failed status on host cloudrabbit1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:01:11] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[22:06:27] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Rabbitmq, neutron-openvswitch-agent, and network outages - https://phabricator.wikimedia.org/T397783#10945164 (10bd808) Something I noticed linked from https://wikitech.wikimedia.org/wiki/Incidents/2024-11-26_WMCS_network_problems when I searched Wikitech for notes on neutro...
[22:22:04] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[22:27:04] <wmcs-alerts>	 FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-17 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[22:32:04] <wmcs-alerts>	 FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-17 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:12:04] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:20:07] <wikibugs>	 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10945343 (10Mhurd)
[23:20:30] <wikibugs>	 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266#10945345 (10Mhurd)
[23:29:41] <wikibugs>	 06cloud-services-team, 10Toolforge: [jobs-api] logs internal datetime error - https://phabricator.wikimedia.org/T362521#10945388 (10derenrich) I think it's being caused by programs that print in weird ways (e.g. using terminal escapes). I understand the desire to not just blindly ignore these exceptions this b...