[01:29:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-42 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:39:44] 06cloud-services-team (FY2025/26-Q1), 10PAWS: [Bug] PAWS server not starting - https://phabricator.wikimedia.org/T400542#11235496 (10Octahedron80) It happens again: Server requested 2025-10-02T01:29:58.105910Z [Normal] Successfully assigned prod/jupyter--4fctra-42ot to paws-127c-uwce57bvcgrt-node-4 2025-1... [01:41:06] 06cloud-services-team (FY2025/26-Q1), 10PAWS: [Bug] PAWS server not starting - https://phabricator.wikimedia.org/T400542#11235497 (10Octahedron80) 05Resolved→03Open [03:25:44] 06cloud-services-team (FY2025/26-Q1), 10PAWS: [Bug] PAWS server not starting - https://phabricator.wikimedia.org/T400542#11235566 (10Pppery) 05Open→03Resolved Please create a new task for the unrelated issue rather than reusing a months-old one. [03:46:15] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11235598 (10Octahedron80) [04:19:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-42 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:29:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:17:33] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [05:18:23] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30039 bytes in 0.212 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [08:15:12] 06cloud-services-team, 10Openstack-Magnum: ssh to cloud-vps 'utility' nodes (magnum, trove, octavia) - https://phabricator.wikimedia.org/T402317#11235974 (10fgiunchedi) Plan SGTM! [08:49:31] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236035 (10Johanbenjamin) Same issue here. [08:49:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:52:37] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 [08:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:54:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:58:52] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236061 (10dcaro) 05Open→03In progress p:05Triage→03High a:03dcaro [09:00:44] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236067 (10dcaro) I can reproduce, I can see many events like: ` prod 56m Warning FailedMount pod/jupyter--4collovand Unable to attach or mount volumes: unmounted... [09:07:19] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236105 (10dcaro) After cordoning node-4, things seem to be ok, I'll monitor for a bit and investigate the issue. So far I can see node-4 has a lot of processes stuck on nfs mounts, all from the `paws-nfs.svc.... [09:10:30] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236111 (10Johanbenjamin) Working now, thank you! [09:11:56] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236116 (10dcaro) some of them have been stuck since yesterday: ` [root@paws-127c-uwce57bvcgrt-node-4 ~]# ps aux | grep ' D ' | head root 3371070 0.0 0.0 5860 2944 ? D Oct01 0:00 /sbin/umou... [09:12:10] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 [09:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:19:33] RESOLVED: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:23:07] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236126 (10dcaro) I'm not seeing much in the logs :/, there was a couple OOM events happening yesterday: ` [Wed Oct 1 16:31:06 2025] qemu-system-x86 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|_... [09:27:48] !log dcaro@acme paws START - Cookbook wmcs.vps.instance.force_reboot vm paws-127c-uwce57bvcgrt-node-4 (cluster eqiad1, project paws) [09:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [09:27:53] !log dcaro@acme paws END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm paws-127c-uwce57bvcgrt-node-4 (cluster eqiad1, project paws) [09:27:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [09:41:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:46:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:49:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11236161 (10Count_Count) This issue blocks my migration away from file logs. One hour is very low. [09:59:33] 06cloud-services-team, 10PAWS: PAWS server not starting - https://phabricator.wikimedia.org/T406191#11236242 (10dcaro) 05In progress→03Resolved I tried creating a process that writes to the nfs mount (`journalctl -f | tee -a outfile`), and kill it while it's writing but did not cause the issue. I'll c... [10:01:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236260 (10dcaro) Looking, it seems the cronjob has filelog enabled, but missing the mount labels: ` tools.it... [10:06:13] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236268 (10dcaro) I manually edited the cronjob to add the missing label, though would be good to understand... [10:08:07] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236271 (10dcaro) Note that I tried creating the cronjobs manually, so new jobs should not have this problem,... [10:08:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236272 (10dcaro) p:05High→03Medium [10:10:18] 06cloud-services-team, 10Cloud-VPS: unable to "apt install helmfile" on CloudVPS debian 13 vm - https://phabricator.wikimedia.org/T405970#11236292 (10JMeybohm) Let me try to shed some light: The package `helm` is provided by `thirdparty/kubeadm-*` components which are probably WMCS related and not used in... [10:18:39] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236328 (10dcaro) 05Open→03In progress p:05Medium→03High a:03dcaro There's many cronjobs that don't... [10:20:00] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236335 (10dcaro) All of them are quite old, so my guess is that we did not set that label before all the tim... [10:20:17] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236336 (10dcaro) Hmmm... I wonder how the volume admission catches those, looking [10:22:42] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236391 (10dcaro) >>! In T405828#11236336, @dcaro wrote: > Hmmm... I wonder how the volume admission catches... [10:26:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:36:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:02:01] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): toolforge logs appears to suffer from intermittent latency - https://phabricator.wikimedia.org/T402736#11236492 (10DamianZaremba) > the rate limiting prevents us from sending more than 100 logs at a time While perhaps not great for real time troubles... [11:06:00] 06cloud-services-team, 10Toolforge: [components-api] Intermittent internal API failures / retry internal requests - https://phabricator.wikimedia.org/T403175#11236498 (10DamianZaremba) This morning a cluebotng-review deploy (auto triggered) failed to start 1 job due to the api gateway timing out and thus went... [11:19:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:29:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:35:57] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Regression: Validation error for CommonJob: file logging is only available with --mount=all - https://phabricator.wikimedia.org/T405828#11236532 (10dcaro) p:05High→03Medium I patched all of them with the script: ` dcaro@tools-bastion-15:~$ ca... [12:12:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:22:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:31:34] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/27 [12:32:58] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1193094 (owner: 10L10n-bot) [12:40:17] FIRING: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-12.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:44:13] 10Cloud-Services, 10Wikidata, 10Wikidata-Query-Service, 06Data-Platform-SRE (2025.09.26 - 2025.10.17): DPE SRE work to enable testing of Blazegraph alternatives - https://phabricator.wikimedia.org/T405395#11236684 (10Gehel) The #Cloud-Services project tag is not intended to have any tasks. Please check the... [12:51:03] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [12:55:17] FIRING: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-12.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [13:06:46] (03open) 10damian: delete_job_if_exists -> delete_job [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/138 (https://phabricator.wikimedia.org/T403175) [13:09:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:14:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:16:34] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [13:17:35] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:17:36] (03open) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:17:36] (03update) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:17:39] (03open) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:17:40] (03update) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:17:47] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:21:31] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:21:47] (03update) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:21:50] (03update) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:23:10] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=99) [13:23:51] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-haproxy-7 [13:24:43] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-haproxy-7 [13:26:20] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [13:26:35] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:27:35] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:34:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [13:37:07] 06cloud-services-team, 10Cloud-VPS, 10Wikidata, 10Wikidata-Query-Service, 06Data-Platform-SRE (2025.09.26 - 2025.10.17): DPE SRE work to enable testing of Blazegraph alternatives - https://phabricator.wikimedia.org/T405395#11236922 (10Gehel) [13:38:02] (03open) 10damian: _do_run - retry runtime errors [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T403175) [13:38:23] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [13:38:54] 06cloud-services-team, 10Cloud-VPS, 10Wikidata, 10Wikidata-Query-Service, 06Data-Platform-SRE (2025.09.26 - 2025.10.17): DPE SRE work to enable testing of Blazegraph alternatives - https://phabricator.wikimedia.org/T405395#11236935 (10Gehel) From a quick chat with @taavi : * repurposing existing hardwar... [13:39:28] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] Intermittent internal API failures / retry internal requests - https://phabricator.wikimedia.org/T403175#11236938 (10DamianZaremba) https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/139 this is the simpl... [13:39:46] (03close) 10damian: delete_job_if_exists -> delete_job [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/138 (https://phabricator.wikimedia.org/T403175) [13:45:44] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [13:47:32] (03update) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:47:33] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [13:47:33] (03update) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [13:47:34] (03open) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [13:47:37] (03update) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [13:50:51] (03update) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [13:58:20] (03approved) 10fnegri: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 (owner: 10taavi) [13:59:30] (03merge) 10taavi: service: Move outputs to a dedicated file [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/88 [13:59:33] (03update) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [14:01:39] (03approved) 10fnegri: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) (owner: 10taavi) [14:01:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:02:20] (03approved) 10fnegri: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) (owner: 10taavi) [14:06:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:12:42] (03merge) 10taavi: toolsbeta: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/89 (https://phabricator.wikimedia.org/T405078) [14:12:42] (03update) 10taavi: tools: Point k8s DNS name to the new VIP [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/90 (https://phabricator.wikimedia.org/T405078) [14:15:18] 10VPS-project-Phabricator, 06collaboration-services: Phabricator test project requires email verification but can't send email - https://phabricator.wikimedia.org/T388022#11237054 (10A_smart_kitten) >>! In T388022#11234012, @taavi wrote: > Should those be changed? IMO it would be a shame if these were changed... [14:56:32] 10VPS-project-devtools, 10GitLab: Puppet failure on gitlab-1002.devtools.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T406234 (10dancy) 03NEW [14:58:59] 10VPS-project-devtools, 10GitLab: Puppet failure on gitlab-1002.devtools.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T406234#11237246 (10dancy) @Jelto @Arnoldokoth Bringing this to your attention. [14:59:55] 10VPS-project-devtools, 06collaboration-services, 10GitLab: Puppet failure on gitlab-1002.devtools.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T406234#11237250 (10Jelto) Thank you @dancy , I'll take a look! [15:05:56] 06cloud-services-team, 10Toolforge: [envvars] - kubernetes error, length limit? - https://phabricator.wikimedia.org/T406236 (10DamianZaremba) 03NEW [15:34:24] (03open) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [15:41:37] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata-Query-Service: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240 (10bking) 03NEW [15:42:10] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata-Query-Service: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11237459 (10bking) [15:43:51] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata-Query-Service: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11237469 (10bking) [15:52:49] (03open) 10dcaro: openapi: add missing include_unset parameter [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/225 [15:53:27] (03update) 10dcaro: openapi: add missing include_unset parameter [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/225 [15:58:10] (03open) 10dcaro: fetch minimal jobs [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/141 [16:02:35] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [17:04:30] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11237860 (10Andrew) What sort of lifespan are you expecting for these tests? Will you be using all t... [17:13:19] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [17:32:28] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [18:04:05] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/27 (owner: 10l10n-bot) [18:04:07] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/27 (owner: 10l10n-bot) [18:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:18:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:46:37] 10Tool-wsindex, 10Developer-Outreach, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11238278 (10Bodhisattwa) [18:49:57] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [19:11:28] 10Cloud-VPS (Project-requests): Swift storage for cluebotng-trainer - https://phabricator.wikimedia.org/T405836#11238420 (10Andrew) Hello! Is there any reason for this to be decoupled with the storage for the similar T405835 ? [19:35:28] 10Tool-wsindex: Support works from wikisource.org - https://phabricator.wikimedia.org/T406265 (10Bodhisattwa) 03NEW [19:40:34] 10Tool-wsindex: Books with question mark (?) in title are not producing thumbnails - https://phabricator.wikimedia.org/T406266 (10Bodhisattwa) 03NEW [19:43:27] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [19:45:18] 10Cloud-VPS (Project-requests): Swift storage for cluebotng-trainer - https://phabricator.wikimedia.org/T405836#11238504 (10DamianZaremba) Hi, They are different tools that might in theory have different maintainers (they don't today). The actual file serving app is the same, but with a different config (no wr... [19:50:53] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Platform: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11238516 (10dcaro) +1 [20:00:17] !log andrew@cloudcumin1001 query-service START - Cookbook wmcs.vps.create_project for project query-service in eqiad1 (T406240) [20:00:18] andrew@cloudcumin1001: Unknown project "query-service" [20:00:18] T406240: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240 [20:01:01] (03open) 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49: projects: added project query-service [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/270 (https://phabricator.wikimedia.org/T406240) [20:02:39] (03merge) 10andrew: projects: added project query-service [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/270 (https://phabricator.wikimedia.org/T406240) (owner: 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49) [20:03:12] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Platform, 13Patch-For-Review: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11238564 (10bking) @gmodena is the lead software engineer for the project, so he ma... [20:03:58] !log andrew@cloudcumin1001 query-service END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project query-service in eqiad1 (T406240) [20:03:59] andrew@cloudcumin1001: Unknown project "query-service" [20:05:58] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Platform, 13Patch-For-Review: Request creation of query-service (blazegraph alternatives) VPS project - https://phabricator.wikimedia.org/T406240#11238582 (10Andrew) 05Open→03Resolved a:03Andrew I've created this projec... [20:59:56] (03update) 10dcaro: global: update generated toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/140 [21:00:02] 06cloud-services-team, 10Toolforge: [envvars] - kubernetes error, length limit? - https://phabricator.wikimedia.org/T406236#11238744 (10DamianZaremba) [21:18:15] 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Grant gitlab-runners-staging access to fast-iops volume type and a 4xiops instance flavor - https://phabricator.wikimedia.org/T406271 (10dduvall) 03NEW [21:18:45] 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Grant gitlab-runners-staging access to fast-iops volume type and a 4xiops instance flavor - https://phabricator.wikimedia.org/T406271#11238778 (10dduvall) [21:46:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:47:02] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol1007:9100 - https://phabricator.wikimedia.org/T406274 (10phaultfinder) 03NEW [21:51:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:51:56] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T406275 (10phaultfinder) 03NEW [21:56:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure