[00:10:22] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:18:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:19:24] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:21:27] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:26:47] 10Cloud-VPS (Quota-requests), 10XTools: Request increased quota for xtools Cloud VPS project - https://phabricator.wikimedia.org/T400853 (10MusikAnimal) 03NEW [00:28:04] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [00:36:55] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:38:44] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:49:30] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:01:33] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:18:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [01:18:16] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:23:53] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:27:33] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:28:04] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [01:32:30] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:38:13] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:41:32] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:52:06] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [01:55:47] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048799 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1025.eqiad.wmnet with OS bookworm [02:03:19] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048802 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1024.eqiad.wmnet with OS bookworm [02:23:28] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048849 (10VRiley-WMF) [02:26:20] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [02:26:25] (03update) 10raymond-ndibe: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [02:26:36] (03update) 10raymond-ndibe: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [02:29:07] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1025.eqiad.wmnet with OS bookworm completed: - c... [02:29:07] (03update) 10raymond-ndibe: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [02:34:03] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1024.eqiad.wmnet with OS bookworm completed: - c... [02:34:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11048867 (10VRiley-WMF) 05Open→03Resolved This has been completed [03:10:41] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/52 (https://phabricator.wikimedia.org/T400616) [03:10:42] (03approved) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/52 (https://phabricator.wikimedia.org/T400616) [03:10:49] (03merge) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/52 (https://phabricator.wikimedia.org/T400616) [03:11:15] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) [03:11:20] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) [03:18:22] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) [03:18:37] (03approved) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) [03:19:42] (03merge) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) [03:23:59] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) [03:25:43] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) [03:27:49] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) [03:28:12] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) [03:28:13] (03approved) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) [03:29:03] (03merge) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) [03:38:01] (03open) 10raymond-ndibe: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/86 (https://phabricator.wikimedia.org/T363544 https://phabricator.wikimedia.org/T400616) [03:38:27] (03open) 10raymond-ndibe: d/changelog: bump to 0.0.22 [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/114 [03:39:08] (03open) 10raymond-ndibe: d/changelog: bump to 0.0.13 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/53 [03:39:54] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [03:41:03] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [03:41:20] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli [03:41:51] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli [03:42:24] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [03:42:33] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [03:43:04] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli [03:43:30] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [03:43:48] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-cli [03:44:16] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [03:44:32] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-cli [03:44:34] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli [03:45:32] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [03:45:49] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-cli [03:45:53] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [03:46:25] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli [03:47:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [03:47:25] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-cli [03:47:49] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [03:48:10] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli [03:48:15] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [03:48:33] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli [03:49:08] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [03:49:27] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli [03:56:31] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [03:56:49] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli [04:03:39] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [04:04:29] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli [04:04:45] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [04:05:03] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli [04:05:30] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [04:07:10] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli [04:07:37] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [04:07:54] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-cli [04:09:59] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [04:13:52] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [04:14:19] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [04:17:58] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [04:26:32] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [04:30:03] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [04:30:17] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [04:33:49] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [04:34:00] 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [toolforge-deploy.tests] account for warning messages printed to stderr - https://phabricator.wikimedia.org/T400390#11048923 (10Raymond_Ndibe) 05Open→03In progress [04:34:01] 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-cli,builds-cli,toolforge-cli,components-cli,envvars-cli] move the packaging scripts to bookworm - https://phabricator.wikimedia.org/T400616#11048925 (10Raymond_Ndibe) 05Open→03In progress [06:31:29] FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 87.26%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [06:39:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:45:05] (03open) 10raymond-ndibe: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) [06:45:08] (03update) 10raymond-ndibe: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) [06:48:01] (03update) 10raymond-ndibe: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) [06:48:40] (03update) 10raymond-ndibe: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) [06:49:09] 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [components-cli] Allow reading tool configuration from stdin - https://phabricator.wikimedia.org/T398424#11048995 (10Raymond_Ndibe) a:05dcaro→03Raymond_Ndibe [06:49:22] 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [components-cli] Allow reading tool configuration from stdin - https://phabricator.wikimedia.org/T398424#11048997 (10Raymond_Ndibe) 05Open→03In progress [07:15:57] 10Tool-itwiki: If the section "Voci correlate" becomes empty, remove that section - https://phabricator.wikimedia.org/T338084#11049020 (10valerio.bozzolan) a:05valerio.bozzolan→03None Uhm. Let's flag as open. I've not done anything in 2 years here. [07:29:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:12:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:16:41] (03update) 10dcaro: kyverno: upgrade to 3.3.9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/889 [08:17:43] (03update) 10dcaro: kyverno: upgrade to 3.3.9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/889 [08:57:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:00:27] (03open) 10eliza189: Eliza new labels (fix for sql) [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/10 [09:07:51] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [09:08:46] (03merge) 10eliza189: Eliza new labels (fix for sql) [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/10 [09:10:21] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [09:14:43] (03update) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [09:16:22] (03update) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [10:08:14] 06cloud-services-team, 10Cloud-VPS: Rebuild all cloud-vps acme-chief hosts - https://phabricator.wikimedia.org/T400163#11049274 (10taavi) a:03taavi [10:17:56] 06cloud-services-team, 10Cloud-VPS: Rebuild all cloud-vps acme-chief hosts - https://phabricator.wikimedia.org/T400163#11049288 (10taavi) 05Open→03Resolved I applied the workaround to all the acme-chief instances. [10:19:24] 06cloud-services-team, 10Cloud-VPS: cloudinfra mx certificate fails to renew - https://phabricator.wikimedia.org/T400873 (10taavi) 03NEW p:05Triage→03High [10:24:53] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: cloudinfra mx certificate fails to renew - https://phabricator.wikimedia.org/T400873#11049317 (10taavi) With the above applied: ` RuntimeError: Did not find zone for domain '_acme-challenge.mx-out.wmflabs.org.' ` The list of domain names for this certif... [10:25:48] 10Tool-Global-user-contributions: Global contributions show wrong namespace in page names - https://phabricator.wikimedia.org/T400874 (10Ennomien) 03NEW [10:28:39] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [10:36:34] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: cloudinfra mx certificate fails to renew - https://phabricator.wikimedia.org/T400873#11049353 (10taavi) 05Open→03Resolved Removed domains that do not resolve from that list of SNIs. [10:36:47] PROBLEM - Disk space on cloudbackup1004 is CRITICAL: DISK CRITICAL - free space: /srv 634645MiB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1004&var-datasource=eqiad+prometheus/ops [10:50:01] 10Tool-Global-user-contributions: Global contributions show wrong namespace in page names - https://phabricator.wikimedia.org/T400874#11049387 (10Johannnes89) →14Duplicate dup:03T380903 [11:24:23] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [11:32:07] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [11:40:44] 06cloud-services-team, 10Cloud-VPS: Update OpenStack policy files - https://phabricator.wikimedia.org/T247795#11049625 (10taavi) @andrew anything left to do here? [11:41:28] 10Tool-openstack-browser: openstack-browser: add multi-region support - https://phabricator.wikimedia.org/T242975#11049627 (10taavi) [11:42:48] 06cloud-services-team, 10Cloud-VPS: Make a monitoring solution for acme-chief issued certs inside Cloud VPS - https://phabricator.wikimedia.org/T262292#11049630 (10taavi) 05Open→03Resolved a:03taavi Metricsinfra allows this, and has already spotted issues like {T400163}. [11:44:01] 06cloud-services-team, 10Cloud-VPS: Redesign for wmcs custom puppet settings - https://phabricator.wikimedia.org/T235708#11049636 (10taavi) 05Stalled→03Resolved I think I'm happy with the current solution. [11:44:22] 10Cloud-Services, 14cloud-services-team (Kanban), 06SRE, 13Patch-For-Review, 07Puppet: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#11049640 (10taavi) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wik... [11:45:09] 14cloud-services-team (Kanban), 10Cloud-VPS, 06SRE, 13Patch-For-Review, 07Puppet: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#11049643 (10taavi) [11:45:50] 06cloud-services-team, 10Cloud-VPS: Provide a way for OpenStack admins to manage projects they don't belong to in Horizon - https://phabricator.wikimedia.org/T196200#11049645 (10taavi) 05Open→03Resolved [11:46:40] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): cloudvirts: ensure we're running the latest raid controller firmware - https://phabricator.wikimedia.org/T216733#11049652 (10taavi) 05Stalled→03Resolved Assuming this is no longer relevant with none of the hardware mentioned in... [11:47:07] 06cloud-services-team, 10Cloud-VPS: Create nova service account for openstack - https://phabricator.wikimedia.org/T167467#11049658 (10taavi) @andrew I think this is done already? [11:49:28] 06cloud-services-team, 10Cloud-VPS: Some systemd services appear to be broken on all VMs - https://phabricator.wikimedia.org/T287309#11049677 (10taavi) 05Open→03Resolved [11:51:46] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review, 07patch-welcome, 07Python3-Porting: Upgrade various Cloud VPS Python 2 scripts to Python 3 - https://phabricator.wikimedia.org/T218426#11049696 (10taavi) [11:54:25] 06cloud-services-team, 10Cloud-VPS, 07patch-welcome: Print SSH host key fingerprints to the log during boot - https://phabricator.wikimedia.org/T340828#11049725 (10taavi) p:05Triage→03Low [11:59:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:59:13] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07Python3-Porting: WMCS: migrate python2 scripts to python3 - https://phabricator.wikimedia.org/T229920#11049734 (10taavi) →14Duplicate dup:03T218426 [11:59:24] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review, 07patch-welcome, 07Python3-Porting: Upgrade various Cloud VPS Python 2 scripts to Python 3 - https://phabricator.wikimedia.org/T218426#11049736 (10taavi) [12:00:54] 06cloud-services-team, 10Cloud-VPS: tofu-infra: consider extending to support nova hosts aggregates - https://phabricator.wikimedia.org/T380981#11049744 (10taavi) I could see an argument for listing which aggregates exist in tofu-infra, but I don't see the benefit of maintaining the list of members in `mainten... [12:03:04] 06cloud-services-team, 10Cloud-VPS: Enable use of web proxy for wikipeoplestats.org domain - https://phabricator.wikimedia.org/T390800#11049752 (10taavi) 05Open→03Declined Closing due to lack of response to T390800#10706906. Please re-open if this is still wanted. [12:04:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:05:28] (03close) 10taavi: tofu-infra: introduce gitlab CI/CD workflow [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/236 (https://phabricator.wikimedia.org/T370652) (owner: 10aborrero) [12:06:27] 06cloud-services-team, 10Cloud-VPS, 07Epic, 13Patch-For-Review: tofu-infra: introduce additional gitlab-ci automation - https://phabricator.wikimedia.org/T370652#11049787 (10taavi) 05Open→03Declined I don't think we're interested in pursuing this now. [12:09:33] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-24 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [12:12:38] 06cloud-services-team, 10Cloud-VPS, 10WikiWho: Enable use of web proxy for wikiwho.net domain - https://phabricator.wikimedia.org/T376637#11049830 (10taavi) Coming back to this - I see `www.wikiwho.net` has been pointed to the proxy service but `wikiwho.net` has not. Would you like me to set this up for www.... [12:13:42] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops, 06SRE: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114#11049843 (10taavi) 05Open→03Resolved I /think/ this is done for cloudvirts and ceph nodes are tracked separately... [12:19:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesse [12:21:58] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Add tracing to understand Toolforge and CloudVPS usage and dependencies - https://phabricator.wikimedia.org/T399313#11049858 (10dcaro) p:05Triage→03High [12:23:42] (03update) 10dcaro: cancel: add the missing autocomplete [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/51 [12:23:52] (03update) 10dcaro: cancel: add the missing autocomplete [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/51 [12:24:39] 10Cloud-VPS (Quota-requests), 10XTools: Request increased quota for xtools Cloud VPS project - https://phabricator.wikimedia.org/T400853#11049860 (10taavi) +1 [12:25:35] !log dcaro@cloudcumin1001 xtools START - Cookbook wmcs.openstack.quota_increase (T400853) [12:25:39] T400853: Request increased quota for xtools Cloud VPS project - https://phabricator.wikimedia.org/T400853 [12:25:42] !log dcaro@cloudcumin1001 xtools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T400853) [12:26:36] 06cloud-services-team, 10Cloud-VPS: [cloudvirt] Enable and test jumbo frames to ceph osds - https://phabricator.wikimedia.org/T273792#11049886 (10taavi) →14Duplicate dup:03T273596 [12:26:44] 06cloud-services-team, 10Cloud-VPS: Investigate and enable jumbo frames in cloudvirt nodes - https://phabricator.wikimedia.org/T273596#11049888 (10taavi) [12:26:57] 10Cloud-VPS (Quota-requests), 10XTools: Request increased quota for xtools Cloud VPS project - https://phabricator.wikimedia.org/T400853#11049891 (10dcaro) 05Open→03Resolved a:03dcaro Done! enjoy :) I'll be interested to know how it goes with anubis when you have it running. [12:31:55] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/42 [12:32:12] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/7 [13:11:32] 06cloud-services-team, 10Cloud-VPS: tofu-infra: consider extending to support nova hosts aggregates - https://phabricator.wikimedia.org/T380981#11050095 (10fnegri) +1 for maintaining the list of aggregates in tofu (as they are quite static), but not the members (that are quite dynamic). [13:21:46] 06cloud-services-team, 10Cloud-VPS: Create nova service account for openstack - https://phabricator.wikimedia.org/T167467#11050134 (10Andrew) 05Open→03Resolved a:03Andrew Yep, we have 'novaservice' now. [13:36:47] RECOVERY - Disk space on cloudbackup1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1004&var-datasource=eqiad+prometheus/ops [13:40:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: Support for TCP health checking - https://phabricator.wikimedia.org/T400025#11050200 (10DamianZaremba) This appears to be working as expected for my TCP: Before: ` $ kubectl get pods core-55457674d-f6px8 -ojson | jq '.spec.cont... [13:53:56] 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-cli,builds-cli,toolforge-cli,components-cli,envvars-cli] move the packaging scripts to bookworm - https://phabricator.wikimedia.org/T400616#11050258 (10dcaro) p:05Triage→03High [14:28:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:31:29] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:31:38] (03update) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [14:33:41] (03update) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [14:33:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:34:40] (03approved) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [14:44:56] 06cloud-services-team, 10Cloud-VPS: Update OpenStack policy files - https://phabricator.wikimedia.org/T247795#11050403 (10Andrew) [14:48:55] 06cloud-services-team, 10Cloud-VPS: Update OpenStack policy files - https://phabricator.wikimedia.org/T247795#11050417 (10Andrew) 05Stalled→03Resolved [14:49:00] 06cloud-services-team, 10Cloud-VPS: Investigate any discrepancies between Horizon permissions and real permissions - https://phabricator.wikimedia.org/T247575#11050420 (10Andrew) [14:54:58] (03approved) 10dcaro: loki_logs: Raise user-friendly error when exceeding line limit [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/188 (https://phabricator.wikimedia.org/T400795) (owner: 10taavi) [14:55:00] (03update) 10dcaro: loki_logs: Raise user-friendly error when exceeding line limit [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/188 (https://phabricator.wikimedia.org/T400795) (owner: 10taavi) [14:58:03] (03merge) 10taavi: loki_logs: Raise user-friendly error when exceeding line limit [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/188 (https://phabricator.wikimedia.org/T400795) [14:59:51] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: Support for TCP health checking - https://phabricator.wikimedia.org/T400025#11050484 (10dcaro) 05In progress→03Resolved Yay \o/, I think we can close this one then, the upd support is being done in the other task [15:00:57] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.389-20250731145816-0cabba30 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/890 (https://phabricator.wikimedia.org/T400795) [15:01:02] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.389-20250731145816-0cabba30 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/890 (https://phabricator.wikimedia.org/T400795) [15:01:17] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:07:18] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] 400 bad request when trying to load a large number of logs from Loki - https://phabricator.wikimedia.org/T400795#11050517 (10taavi) 05In progress→03Resolved [15:11:50] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:12:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:21:27] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:22:58] (03merge) 10taavi: jobs-api: bump to 0.0.389-20250731145816-0cabba30 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/890 (https://phabricator.wikimedia.org/T400795) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:39:49] (03open) 10taavi: jobs-api: tools: Add Loki URL setting [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/891 (https://phabricator.wikimedia.org/T398645) [15:40:11] (03update) 10taavi: jobs-api: tools: Add Loki URL setting [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/891 (https://phabricator.wikimedia.org/T398645) [15:43:45] 06cloud-services-team, 10Toolforge: [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913 (10taavi) 03NEW [15:43:52] 06cloud-services-team, 10Toolforge: [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913#11050820 (10taavi) [15:44:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] Jobs API should query logs from Loki - https://phabricator.wikimedia.org/T398645#11050821 (10taavi) [15:51:34] 06cloud-services-team, 10Toolforge: [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913#11050869 (10taavi) p:05Triage→03Medium [15:51:55] 06cloud-services-team, 10Toolforge: [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913#11050870 (10taavi) [15:51:58] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#11050871 (10taavi) [15:52:23] 06cloud-services-team, 10Toolforge: [jobs-api] Remove file logging support - https://phabricator.wikimedia.org/T400914 (10taavi) 03NEW [15:52:41] 06cloud-services-team, 10Toolforge: [jobs-api] Remove file logging support - https://phabricator.wikimedia.org/T400914#11050885 (10taavi) p:05Triage→03Medium [15:53:02] 06cloud-services-team, 10Toolforge: [jobs-api] Disable file logging by default - https://phabricator.wikimedia.org/T400915 (10taavi) 03NEW [15:53:17] 06cloud-services-team, 10Toolforge: [jobs-api] Disable file logging by default - https://phabricator.wikimedia.org/T400915#11050900 (10taavi) p:05Triage→03Medium [15:53:38] 06cloud-services-team, 10Toolforge: [jobs-api] Disable file logging by default - https://phabricator.wikimedia.org/T400915#11050902 (10taavi) [15:53:42] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] Jobs API should query logs from Loki - https://phabricator.wikimedia.org/T398645#11050903 (10taavi) [15:55:01] 06cloud-services-team, 10Toolforge: [jobs-api] Support following logs from Loki - https://phabricator.wikimedia.org/T400916 (10taavi) 03NEW [15:55:10] 06cloud-services-team, 10Toolforge: [jobs-api] Support following logs from Loki - https://phabricator.wikimedia.org/T400916#11050918 (10taavi) p:05Triage→03Medium [15:55:23] 06cloud-services-team, 10Toolforge: [jobs-api] Support following logs from Loki - https://phabricator.wikimedia.org/T400916#11050920 (10taavi) [15:55:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] Jobs API should query logs from Loki - https://phabricator.wikimedia.org/T398645#11050919 (10taavi) [16:00:25] 06cloud-services-team, 10Toolforge: [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11050969 (10taavi) p:05Triage→03Medium [16:01:13] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] Jobs API should query logs from Loki - https://phabricator.wikimedia.org/T398645#11050982 (10taavi) 05In progress→03Resolved This is live and I have filed a bunch of tasks to improve the basic implementation. [16:02:13] (03merge) 10taavi: jobs-api: tools: Add Loki URL setting [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/891 (https://phabricator.wikimedia.org/T398645) [16:02:42] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component logging [16:12:55] 06cloud-services-team, 10Tool-quickcategories, 10Toolforge, 13Patch-For-Review: Relax restrictions on toolforge envvar names - https://phabricator.wikimedia.org/T374780#11051060 (10dcaro) Yep, I think that at this point we might want to introduce a 'storage' for configured envvars, instead of relying on th... [16:14:21] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [16:18:11] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging [16:22:14] (03update) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [16:26:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 22): [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913#11051134 (10taavi) [16:32:24] (03open) 10taavi: api: Allow querying logs for non-existent jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/189 (https://phabricator.wikimedia.org/T400913) [16:36:46] (03update) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [16:37:21] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: [jobs-api] Allow querying logs for non-existent jobs - https://phabricator.wikimedia.org/T400913#11051179 (10taavi) a:03taavi [16:50:25] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [17:22:28] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/7 (owner: 10l10n-bot) [17:22:34] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/7 (owner: 10l10n-bot) [17:28:02] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:04:51] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:06:57] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:07:28] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:07:33] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [21:01:15] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: Support for TCP health checking - https://phabricator.wikimedia.org/T400025#11052023 (10DamianZaremba) It did cause by UDP service to get constantly killed.. but I will just hack the probe config out of the deployment as wel... [21:10:42] 06cloud-services-team, 10Toolforge: Allow limiting exposed port access - https://phabricator.wikimedia.org/T400940 (10DamianZaremba) 03NEW [22:49:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesse [23:09:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesse [23:27:25] 06cloud-services-team, 10Toolforge: Job not restarting despite liveness probe failures - https://phabricator.wikimedia.org/T400957 (10Sakretsu) 03NEW