[06:01:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-74 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:15:44] 06cloud-services-team, 10Toolforge: bootstrap Toolforge IaC automation - https://phabricator.wikimedia.org/T390057#10692583 (10Chuckonwumelu) * We can use a single repo for this project, splitting Tools and Tools beta into separate folders as well as having a shared modules folder that either environment can p... [06:38:49] 10Striker: 500 error on toolsadmin after successfully adding a maintainer - https://phabricator.wikimedia.org/T390516 (10Tamzin) 03NEW [06:57:01] 06cloud-services-team, 10Toolforge: bootstrap Toolforge IaC automation - https://phabricator.wikimedia.org/T390057#10692627 (10aborrero) I have created a gitlab repository: https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning [06:58:53] 10Striker: 500 error on toolsadmin after successfully adding a maintainer - https://phabricator.wikimedia.org/T390516#10692629 (10taavi) `lines=15 Traceback (most recent call last): File "/opt/lib/poetry/striker-2uZo5AhP-py3.11/lib/python3.11/site-packages/django/core/handlers/exception.py", line 34, in inner... [07:26:57] (03CR) 10Thiemo Kreuz (WMDE): Remove unwanted debug print statement micorservice (031 comment) [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132120 (https://phabricator.wikimedia.org/T386328) (owner: 10Juniorbesong) [07:27:53] 06cloud-services-team, 10Toolforge: bootstrap Toolforge IaC automation - https://phabricator.wikimedia.org/T390057#10692682 (10aborrero) Created page https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/tofu-provisioning to host the supporting documentation [07:31:38] (03CR) 10Thiemo Kreuz (WMDE): [C:04-1] T386329 = Remove app pycache files from git (031 comment) [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132165 (owner: 10Bovimacoco) [07:32:52] (03CR) 10Thiemo Kreuz (WMDE): [C:04-1] "I can't tell what happened here. But what the patch does is removing the table of contents from the readme. It doesn't look like this is i" [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132166 (owner: 10Bovimacoco) [07:36:15] (03CR) 10Thiemo Kreuz (WMDE): [C:04-1] "I think what you wanted to do is to add this to the previous patch but not upload it as a new, separate patch. Unfortunately the workflow " [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1132186 (owner: 10Ennyfav) [08:09:29] 10Toolforge (Quota-requests): Increase RAM quota for mbh tool - https://phabricator.wikimedia.org/T389733#10692739 (10dcaro) >>! In T389733#10686326, @MBH wrote: > If my job reqiures high RAM limit in jobs file, but actually uses less RAM, will it work? Or process manager can run my job only if it can alloca... [08:33:34] 10Toolforge (Quota-requests): Increase RAM quota for mbh tool - https://phabricator.wikimedia.org/T389733#10692828 (10dcaro) >>! In T389733#10686326, @MBH wrote: > If my job reqiures high RAM limit in jobs file, but actually uses less RAM, will it work? Or process manager can run my job only if it can alloca... [08:54:12] 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10692930 (10fnegri) 05Open→03Resolved The missing metric magically reappeared from 2025-03-27: {F58951294} [08:59:33] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 [09:04:56] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 [09:16:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-74 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:03:45] (03PS1) 10Creative Gurus: added a design function to get lexeme forms- T388192 [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132567 [10:51:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-76 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:00:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29 - https://phabricator.wikimedia.org/T390212#10693174 (10fnegri) 05Open→03In progress p:05Triage→03High [11:02:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693180 (10fnegri) [11:03:57] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade from 1.28.14 to 1.29.15 (T390212) [11:03:57] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=99) for cluster toolsbeta upgrade from 1.28.14 to 1.29.15 (T390212) [11:04:01] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [11:09:37] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade from 1.28.14 to 1.29.15 (T390212) [11:09:42] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [11:10:06] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster toolsbeta upgrade from 1.28.14 to 1.29.15 (T390212) [11:10:45] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693219 (10fnegri) [11:13:46] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-10 from 1.28.14 to 1.29.15 (T390212) [11:21:02] (03PS1) 10Bovimacoco: Removing the duplicate app.__pycache__ from gitignore leaving only 1 [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132593 [11:36:41] FIRING: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:42:20] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-10 from 1.28.14 to 1.29.15 (T390212) [11:42:24] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [11:42:56] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.28.14 to 1.29.15 (T390212) [11:42:57] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693291 (10fnegri) [11:51:35] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693317 (10fnegri) [11:53:09] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-11 from 1.28.14 to 1.29.15 (T390212) [11:53:14] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [11:53:25] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-12 from 1.28.14 to 1.29.15 (T390212) [11:58:32] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 [12:03:54] !log root@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 [12:08:54] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-12 from 1.28.14 to 1.29.15 (T390212) [12:08:57] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [12:09:17] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693376 (10fnegri) [12:10:12] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693389 (10fnegri) [12:21:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:22:26] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/13 [12:36:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [12:42:51] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 [12:46:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:47:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:48:14] !log root@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 [12:52:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:53:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:58:33] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [13:02:54] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-12 from 1.28.14 to 1.29.15 (T390212) [13:02:58] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:03:53] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-12 from 1.28.14 to 1.29.15 (T390212) [13:04:01] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-13 from 1.28.14 to 1.29.15 (T390212) [13:05:11] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-13 from 1.28.14 to 1.29.15 (T390212) [13:05:37] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 (T390212) [13:06:39] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 (T390212) [13:07:02] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:07:03] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:07:04] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:07:05] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:07:58] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:07:59] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:08:00] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:08:01] !log fnegri@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:08:02] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-8.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:08:02] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:08:02] !log fnegri@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-8.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:08:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:10:14] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:10:16] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:10:17] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:10:17] !log fnegri@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-7.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:10:43] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:10:45] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-5.toolsbeta.eqiad1.wikimedia.cloud from 1.28.14 to 1.29.15 (T390212) [13:11:23] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 (T390212) [13:12:23] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 (T390212) [13:12:24] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 (T390212) [13:13:24] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 (T390212) [13:13:25] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 (T390212) [13:13:28] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:14:24] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 (T390212) [13:14:25] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:18:26] RESOLVED: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:20:15] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:20:19] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:24:42] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:25:26] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693609 (10fnegri) [13:30:20] !log fnegri@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:30:24] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:31:28] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:36:43] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 (T390212) [13:36:47] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [13:40:26] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10693684 (10fnegri) [13:57:29] (03PS1) 10Jelto: ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) [13:59:42] (03CR) 10Arnaudb: [C:03+1] ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [14:00:00] 06cloud-services-team, 10Horizon, 05Cloud-Services-Origin-User, 07Upstream: Horizon: network topology panel ignores user policy, suggests deleting networks and instances - https://phabricator.wikimedia.org/T389965#10693960 (10Andrew) p:05Triage→03Low Waiting on Horizon upgrade in eqiad and upstream review [14:01:53] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10693982 (10Andrew) Sent: > Reports from my users are various. One, at least, is still getting throttled, see log below. > > Can you tell me... [14:02:15] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10693991 (10Andrew) I'm not sure what that last response means but this feels like a dead end to me. I'm open to suggestions! [14:03:35] 06cloud-services-team, 10Toolforge: [harbor] Update HarborDown runbook with the incident debugging details - https://phabricator.wikimedia.org/T354739#10694005 (10Andrew) a:05Andrew→03dcaro I'm not sure why I assigned this to myself :/ David, want to have a go, or close? [14:11:30] 06cloud-services-team, 10Cloud-VPS: Volumes stuck in "Reserved" state - https://phabricator.wikimedia.org/T322448#10694036 (10Andrew) 05Open→03Resolved //...not quite a year later...// I got these volumes unstuck with ` root@cloudcontrol1005:~# openstack volume attachment list --os-volume-api-versi... [14:13:05] 06cloud-services-team, 10Cloud-VPS: 'backy2 cleanup' fails on cloudbackup1004 - https://phabricator.wikimedia.org/T381548#10694038 (10Andrew) 05Open→03Resolved [14:43:07] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-10 from 1.28.14 to 1.29.15 (T390212) [14:43:11] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [14:44:01] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-10 from 1.28.14 to 1.29.15 (T390212) [14:45:20] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-11 from 1.28.14 to 1.29.15 (T390212) [14:46:18] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-11 from 1.28.14 to 1.29.15 (T390212) [14:49:01] !log fnegri@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-9 from 1.28.14 to 1.29.15 (T390212) [14:49:05] T390212: Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212 [14:49:59] !log fnegri@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-9 from 1.28.14 to 1.29.15 (T390212) [14:51:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10694186 (10fnegri) [14:55:10] 06cloud-services-team, 10Toolforge: [toolforge] increase worker sizes in tools - https://phabricator.wikimedia.org/T390228#10694201 (10aborrero) [15:03:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:53:28] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.set_cluster_in_maintenance [15:53:32] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.set_cluster_in_maintenance (exit_code=0) [15:53:39] (03PS1) 10Volans: wmcs.common: update wrap_with_sudo_icinga [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 [15:54:34] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.unset_cluster_maintenance [15:54:34] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.unset_cluster_maintenance (exit_code=0) [15:54:46] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.unset_cluster_maintenance [15:54:46] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.unset_cluster_maintenance (exit_code=0) [15:55:30] (03CR) 10Volans: "CI is passing locally. @David if you could give it a try with spicerack v10.0.0 (available now in pypi) that would be great to confirm if " [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [15:57:58] 10Striker: 500 error on toolsadmin after successfully adding a maintainer - https://phabricator.wikimedia.org/T390516#10694486 (10bd808) I just tried adding the Theleekycauldron user to the tool and things worked as expected. * https://toolsadmin.wikimedia.org/tools/id/n-ninety-five * https://ldap.toolforge.or... [16:21:51] (03CR) 10David Caro: "Did not test it, though LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [16:27:58] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "tools" cluster to k8s 1.29 - https://phabricator.wikimedia.org/T390214#10694681 (10dcaro) p:05Triage→03High [16:28:03] 10Toolforge (Toolforge iteration 19): [jobs-api] Split the core layer and create the core models - https://phabricator.wikimedia.org/T390135#10694692 (10dcaro) p:05Triage→03High [16:28:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:28:24] 10Toolforge (Toolforge iteration 19): [jobs-api] Split the core layer and create the core models - https://phabricator.wikimedia.org/T390135#10694695 (10dcaro) a:05dcaro→03Raymond_Ndibe [16:29:01] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#10694697 (10dcaro) [16:30:03] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api] Split the API, core, and storage and runtime models - https://phabricator.wikimedia.org/T359808#10694698 (10dcaro) a:05dcaro→03Raymond_Ndibe [16:33:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): Upgrade "toolsbeta" cluster to k8s 1.29.15 - https://phabricator.wikimedia.org/T390212#10694733 (10fnegri) [16:43:50] 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10694769 (10Andrew) The magic was https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131806 [16:50:32] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10694816 (10Volans) @Andrew thanks for setting this up. I did a quick tour and found some issues: 1. The first page is very very slow to load, I th... [16:51:28] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10694824 (10TheDJ) >>! In T384468#10515894, @Don-vip wrote: > Is there a specific user-agent to use? I'm not using a particular one. Anything th... [16:52:48] 06cloud-services-team, 10Toolforge: [harbor] Update HarborDown runbook with the incident debugging details - https://phabricator.wikimedia.org/T354739#10694839 (10dcaro) This one was quite tricky, it was in the end because the cleanup processes of harbor were not working correctly, @Raymond_Ndibe worked a lot... [16:56:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:59:22] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Ceph, 06DC-Ops, and 2 others: [cloudceph] test the new DELL hard drives throughput - https://phabricator.wikimedia.org/T390134#10694888 (10Andrew) a:05dcaro→03Andrew [17:02:25] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Ceph, 06DC-Ops, and 2 others: [cloudceph] test the new DELL hard drives throughput - https://phabricator.wikimedia.org/T390134#10694917 (10Andrew) During dcaro's PTO he wants me to get the host back up and confirm that the drive appears to the OS. H... [17:24:50] (03CR) 10Volans: "reply inline" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [17:35:30] 10VPS-project-Wikistats, 07affects-Miraheze: wikistats - import miraheze timer/service failed - https://phabricator.wikimedia.org/T390593 (10Dzahn) 03NEW [17:36:13] 10VPS-project-Wikistats, 07affects-Miraheze: wikistats - import miraheze timer/service failed - https://phabricator.wikimedia.org/T390593#10695070 (10Dzahn) I am impressed by the "affects-Miraheze" automatic herald rule. [17:47:41] 06cloud-services-team, 10Toolforge: [toolforge] increase worker sizes in tools - https://phabricator.wikimedia.org/T390228#10695112 (10Andrew) When making new flavors I suggest increasing the CPU:RAM ratio; our hypvervisors have a surplus of ram these days. Right now most workers have 2 GB of ram allocated pe... [17:52:29] 06cloud-services-team, 10Toolforge: [toolforge] increase worker sizes in tools - https://phabricator.wikimedia.org/T390228#10695131 (10taavi) Yeah, looking at `kubectl sudo top node` the average worker RAM utilization is much higher than CPU. [17:54:59] (03CR) 10David Caro: wmcs.common: update wrap_with_sudo_icinga (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [17:57:49] 10VPS-project-Wikistats, 07affects-Miraheze: wikistats - import miraheze timer/service failed - https://phabricator.wikimedia.org/T390593#10695142 (10Paladox) This is fixed now. You can re-run your script. [18:09:59] 10VPS-project-Wikistats, 07affects-Miraheze: wikistats - import miraheze timer/service failed - https://phabricator.wikimedia.org/T390593#10695170 (10Paladox) FYI the api is changing for WikiDiscover - https://github.com/miraheze/WikiDiscover/pull/88 [18:51:18] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:55:39] (03open) 10rutsavi09: Rut [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/1 [18:56:38] (03approved) 10naorleizer: Rut [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/1 (owner: 10rutsavi09) [18:56:47] (03merge) 10naorleizer: Rut [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/1 (owner: 10rutsavi09) [18:57:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:57:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:59:48] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:44:13] 10Tool-suggestbotbn: Unable to login - https://phabricator.wikimedia.org/T390614 (10ShohagS) 03NEW [19:52:12] 06cloud-services-team, 10Tool-suggestbotbn, 10Toolforge: Ssh login to `login.toolforge.org` failing for uid=shohag - https://phabricator.wikimedia.org/T390614#10695581 (10bd808) [19:57:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:59:36] 06cloud-services-team, 10Tool-suggestbotbn, 10Toolforge: Ssh login to `login.toolforge.org` failing for uid=shohag - https://phabricator.wikimedia.org/T390614#10695593 (10taavi) Can you run ssh with the `-v` flag included and paste the output here? [19:59:51] 06cloud-services-team, 10Tool-suggestbotbn, 10Toolforge: Ssh login to `login.toolforge.org` failing for uid=shohag - https://phabricator.wikimedia.org/T390614#10695595 (10bd808) sshd is apparently crashing: `lang=shell-session # grep 2055577 auth.log 2025-03-31T19:45:48.802991+00:00 tools-bastion-13 sshd[205... [20:10:57] 10VPS-project-Wikistats, 07affects-Miraheze: wikistats - import miraheze timer/service failed - https://phabricator.wikimedia.org/T390593#10695656 (10Dzahn) Thank you! The service runs again. regarding the API change: Maybe it could be reconsidered if renaming `siteprop` to `prop` (not sure it says why) is w... [20:53:19] 06cloud-services-team, 10Toolforge, 03Wikimedia-Hackathon-2025: [Session] Introducing and exploring Toolforge UI with prospective users - https://phabricator.wikimedia.org/T383149#10695828 (10debt) [20:53:52] 06cloud-services-team, 10Toolforge, 03Wikimedia-Hackathon-2025: [Session] Introducing and exploring Toolforge UI with prospective users - https://phabricator.wikimedia.org/T383149#10695846 (10debt) This sounds like a great session! We'll be publishing the schedule soon and will ping you here to get you sched... [21:41:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:00:51] 10wikitech.wikimedia.org: wikitech: Current deployments sidebar link uses a new line on new Vector - https://phabricator.wikimedia.org/T389923#10696127 (10Krinkle) Fixed. https://wikitech.wikimedia.org/w/index.php?title=MediaWiki:Gadget-site-deploycal.css&oldid=2288603 {F58954752 height=300} [22:00:57] 10wikitech.wikimedia.org: wikitech: Current deployments sidebar link uses a new line on new Vector - https://phabricator.wikimedia.org/T389923#10696129 (10Krinkle) 05Open→03Resolved [22:23:57] (03PS1) 10Krinkle: Main: Rename "isIp" to "isSingleIp" for clarity [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/1132774 [22:45:37] 06cloud-services-team, 10Tool-suggestbotbn, 10Toolforge: Ssh login to `login.toolforge.org` failing for uid=shohag - https://phabricator.wikimedia.org/T390614#10696272 (10bd808) The `pam_env(sshd:session): deprecated reading of user environment enabled` lines in the auth.log are unrelated noise from https://... [23:11:11] 06cloud-services-team, 10Tool-suggestbotbn, 10Toolforge: Ssh login to `login.toolforge.org` failing for uid=shohag - https://phabricator.wikimedia.org/T390614#10696391 (10bd808) A comment on https://serverfault.com/questions/813542/ssh-fork-of-unprivileged-child-failed-on-connection about `fatal: fork of unp... [23:46:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-48 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses