[01:52:15] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2010-dev:9100 - https://phabricator.wikimedia.org/T406290 (10phaultfinder) 03NEW [01:56:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2010-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [03:01:48] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2010-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:33:47] (03Abandoned) 10Abijeet Patro: Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1192116 (owner: 10L10n-bot) [05:50:40] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1193094 (owner: 10L10n-bot) [06:47:03] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/18215013747 (https://github.com/cluebotng/component-configs/commits/7ab2bbe022e2513dc81a13a7055c4c7736e5f876) [06:47:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [07:09:43] (03merge) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [07:17:47] RESOLVED: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-12.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [07:24:44] (03PS1) 10Stevemunene: Add dummy keytabs for analytics-research on stat servers [labs/private] - 10https://gerrit.wikimedia.org/r/1193314 (https://phabricator.wikimedia.org/T403207) [07:38:33] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [07:38:53] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] (fetch_minimal_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [07:53:42] 10Cloud-VPS (Project-requests), 10Wikidata, 10Wikidata-Platform, 06Data-Platform-SRE (2025.09.26 - 2025.10.17): Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11239603 (10Gehel) [08:06:32] (03CR) 10Brouberol: [C:03+1] Add dummy keytabs for analytics-research on stat servers [labs/private] - 10https://gerrit.wikimedia.org/r/1193314 (https://phabricator.wikimedia.org/T403207) (owner: 10Stevemunene) [08:07:15] (03CR) 10Stevemunene: [V:03+2 C:03+2] Add dummy keytabs for analytics-research on stat servers [labs/private] - 10https://gerrit.wikimedia.org/r/1193314 (https://phabricator.wikimedia.org/T403207) (owner: 10Stevemunene) [08:15:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-haproxy-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:20:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-haproxy-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:23:24] (03approved) 10eugene233: Add tool stats page [toolforge-repos/isa] - 10https://gitlab.wikimedia.org/toolforge-repos/isa/-/merge_requests/19 (owner: 10swayamagrahari) [08:23:30] (03merge) 10eugene233: Add tool stats page [toolforge-repos/isa] - 10https://gitlab.wikimedia.org/toolforge-repos/isa/-/merge_requests/19 (owner: 10swayamagrahari) [08:26:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:30:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-haproxy-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:35:28] RESOLVED: [2x] PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-haproxy-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:43:58] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-5 [09:45:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-5 [09:45:27] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-test-k8s-haproxy-5 [09:46:29] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node toolsbeta-test-k8s-haproxy-5 [09:47:46] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:49:39] FIRING: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:52:46] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:57:46] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:58:01] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:02:46] RESOLVED: [3x] ProbeDown: Service tools-k8s-haproxy-5:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:04:21] 06cloud-services-team, 10Toolforge: Cleanup default security groups - https://phabricator.wikimedia.org/T406312 (10taavi) 03NEW p:05Triage→03Medium [10:09:39] RESOLVED: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:42:13] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-test-k8s-haproxy-6 [12:43:15] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node toolsbeta-test-k8s-haproxy-6 [12:45:24] !log filippo@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 [12:46:39] FIRING: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-6:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:49:02] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-6 [12:50:07] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-6 [12:51:18] !log filippo@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 [12:51:39] RESOLVED: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-6:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:52:56] FIRING: ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:56:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:57:56] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:58:11] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:02:56] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:03:11] FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:07:56] RESOLVED: [3x] ProbeDown: Service tools-k8s-haproxy-6:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:40:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:45:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:06:14] 10Toolforge, 06tools-infrastructure-team, 13Patch-For-Review: Rebuild Toolforge HAProxies to support IPv6 - https://phabricator.wikimedia.org/T405078#11241502 (10taavi) 05Open→03Resolved [20:35:21] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] (fetch_minimal_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [20:35:40] (03update) 10dcaro: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (owner: 10raymond-ndibe)