[00:07:10] (03open) 10don-vip: Define creationDate at FileMetadata level [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/7 [00:10:09] (03update) 10don-vip: Define creationDate at FileMetadata level [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/7 [00:27:28] (03update) 10don-vip: Define creationDate at FileMetadata level [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/7 [00:36:19] (03merge) 10don-vip: Define creationDate at FileMetadata level [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/7 [02:52:31] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848 (10Frood) 03NEW [06:58:43] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [07:35:22] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848#11222626 (10Ladsgroup) I take a look ASAP. [07:51:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850 (10fnegri) 03NEW [07:51:58] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222644 (10fnegri) 05Open→03In progress p:05Triage→03Unbreak! a:03fnegri [07:53:06] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848#11222651 (10fnegri) Probably related to {T405850} [08:05:25] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222657 (10fnegri) The tool `geohack` shows a big drop in requests at the ingress layer: {F66706671} My hypoth... [08:07:42] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [08:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:07:52] !log dcaro@acme tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [08:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:00] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [08:08:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:03] !log dcaro@acme tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [08:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:09] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [08:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:12] !log dcaro@acme tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [08:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:16] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [08:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:10:11] !log dcaro@acme tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [08:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:10:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [08:10:26] 06cloud-services-team: JobUnavailable Reduced availability for job openstack in cloud@eqiad - https://phabricator.wikimedia.org/T405851 (10phaultfinder) 03NEW [08:10:46] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1 (T405850) [08:10:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:10:52] T405850: 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850 [08:12:53] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1 (T405850) [08:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:13:29] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 (T405850) [08:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:15:12] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 (T405850) [08:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:15:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [08:18:43] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:22:42] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222669 (10fnegri) [08:24:39] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222670 (10fnegri) [08:27:43] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:31:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-1 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:32:43] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:33:13] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:35:38] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 (T405850) [08:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:35:44] T405850: 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850 [08:38:13] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:46:40] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848#11222691 (10Ladsgroup) It had some issues that I fixed but still gives 500 a lot for both: 1- The API call is being blocked by edge (it returns non-json). I added UA but somehow it's not working 2- I'm also getting what fnegri has ment... [08:54:42] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 (T405850) [08:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:54:49] T405850: 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850 [08:55:20] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222693 (10fnegri) p:05Unbreak!→03High a:05fnegri→03None Lowering to high as most tools seem to be worki... [08:57:16] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848#11222696 (10fnegri) The Toolforge-wide issues (T405850) are mostly resolved, so if you're still seeing errors they're probably tool-related now. [09:01:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-1 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:01:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-1 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:03:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-9 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:04:16] 06cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24): 2025-09-28 ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services - https://phabricator.wikimedia.org/T405850#11222700 (10fnegri) Current hypothesis is that this was a combination of: * haproxy correctly applying rate limit... [09:06:18] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-1 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:31:51] 10Tool-nfp: Tool is returning 500 - https://phabricator.wikimedia.org/T405848#11222703 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup The block has been expired and now it's accessible [09:38:28] FIRING: [3x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-19 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:43:28] FIRING: [4x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-19 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:27:49] 06cloud-services-team, 10Toolforge: [envvars] scope to jobs/components - https://phabricator.wikimedia.org/T405022#11222761 (10Edgars2007) see also T376849 [13:46:08] (03open) 10r1f4t: My feature [toolforge-repos/ztools] - 10https://gitlab.wikimedia.org/toolforge-repos/ztools/-/merge_requests/2 [13:46:12] (03merge) 10r1f4t: My feature [toolforge-repos/ztools] - 10https://gitlab.wikimedia.org/toolforge-repos/ztools/-/merge_requests/2 [14:19:12] 06cloud-services-team, 10Toolforge: When a job runs out of memory, no message in the error file can be seen - https://phabricator.wikimedia.org/T405854 (10Wurgl) 03NEW [23:30:11] 10Tool-gawa: [Code] Ajustement de l’affichage pour écrans mobiles - https://phabricator.wikimedia.org/T405863 (10poro26) 03NEW