[00:08:56] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:11:12] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org does not show trixie instances - https://phabricator.wikimedia.org/T401876 (10JJMC89) 03NEW [01:08:01] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [01:08:05] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [01:14:33] (03update) 10raymond-ndibe: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) (owner: 10dcaro) [01:22:46] (03open) 10raymond-ndibe: Draft: Test for launcher in buildpack image command [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/922 (https://phabricator.wikimedia.org/T401846) [01:25:17] (03close) 10raymond-ndibe: Draft: Test for launcher in buildpack image command [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/922 (https://phabricator.wikimedia.org/T401846) [01:27:44] (03approved) 10raymond-ndibe: runtime: don't overwite command [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 (owner: 10dcaro) [01:27:51] (03merge) 10raymond-ndibe: runtime: don't overwite command [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 (owner: 10dcaro) [01:32:35] (03update) 10raymond-ndibe: logs-api: add new component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/911 (owner: 10dcaro) [01:33:12] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [01:34:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:40:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [01:41:10] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.398-20250814013804-be4bd73d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/923 (https://phabricator.wikimedia.org/T357112 https://phabricator.wikimedia.org/T401846) [01:46:48] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [01:49:46] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org does not show trixie instances - https://phabricator.wikimedia.org/T401876#11084441 (10Andrew) p:05Triage→03Medium [01:51:06] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11084443 (10Andrew) [01:51:28] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11084445 (10Andrew) +1 seems fine to me [01:52:11] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6, 13Patch-For-Review: Refresh Cloud VPS NTP servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401848#11084446 (10Andrew) p:05Triage→03Medium [01:53:27] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [01:53:48] 06cloud-services-team, 10Cloud-VPS: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11084449 (10Andrew) p:05Triage→03Medium I suspect that enabling this automatically will be a big project, but we can at... [01:54:18] 06cloud-services-team, 10Cloud-VPS: TLS support for OpenStack trove - https://phabricator.wikimedia.org/T294118#11084452 (10Andrew) →14Duplicate dup:03T401861 [01:54:21] 06cloud-services-team, 10Cloud-VPS: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11084454 (10Andrew) [01:54:42] (03open) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [01:54:48] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [01:56:22] 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T401783#11084459 (10Andrew) 05Open→03Resolved a:03Andrew [01:56:26] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T401736#11084461 (10Andrew) 05Open→03Resolved a:03Andrew [01:56:33] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol1011:9100 - https://phabricator.wikimedia.org/T401735#11084463 (10Andrew) 05Open→03Resolved a:03Andrew [01:57:12] 06cloud-services-team, 10Cloud-VPS: Create debian 13.0 Trixie base images in cloud-vps - https://phabricator.wikimedia.org/T401584#11084465 (10Andrew) 05Open→03Resolved a:03Andrew [01:58:31] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [01:59:09] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [01:59:52] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] (add_logs_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [02:02:39] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] (add_logs_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [02:02:43] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [02:05:51] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [02:17:47] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [02:19:26] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [02:20:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [02:24:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:29:29] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [02:29:54] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [02:30:59] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [02:32:31] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.398-20250814013804-be4bd73d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/923 (https://phabricator.wikimedia.org/T357112 https://phabricator.wikimedia.org/T401846) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [02:32:33] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.398-20250814013804-be4bd73d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/923 (https://phabricator.wikimedia.org/T357112 https://phabricator.wikimedia.org/T401846) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [02:32:39] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.398-20250814013804-be4bd73d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/923 (https://phabricator.wikimedia.org/T357112 https://phabricator.wikimedia.org/T401846) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [02:32:59] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [02:49:36] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [02:50:07] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [02:54:32] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [03:15:12] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [03:23:43] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [03:32:04] 10Tool-translatetagger: Proper processing of different colons in text - https://phabricator.wikimedia.org/T393260#11084525 (10Super_nabla) Hi! @Ata and the others :) Sorry for the delay. I just gave a quick look and I couldn't reproduce the error. If I go to https://translatetagger.toolforge.org/convert and ins... [03:34:08] 10Tool-translatetagger: Proper processing of different colons in text - https://phabricator.wikimedia.org/T393260#11084527 (10Super_nabla) Could you provide the exact input that generated the wrong output? @Ata [03:48:11] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [03:51:08] (03update) 10raymond-ndibe: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) (owner: 10dcaro) [03:52:46] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [03:53:00] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [03:53:44] (03update) 10raymond-ndibe: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) [03:56:03] 10Tool-translatetagger: Proper processing of different colons in text - https://phabricator.wikimedia.org/T393260#11084542 (10Super_nabla) a:03Super_nabla [03:57:41] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [03:58:07] (03update) 10samwilson: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [03:59:12] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11084543 (10Danilo) 05Open→03Resolved a:03fgiunchedi It is fixed! None of the bots that disconnected daily are disconnecting anymore. @fgiunchedi: I update... [04:08:56] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:48:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:18:28] (03update) 10samwilson: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [05:22:53] (03update) 10samwilson: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [05:31:56] (03update) 10samwilson: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [05:39:44] (03CR) 10Eugene233: [C:03+2] Search results for actors include description [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1168284 (owner: 10Jacob4code) [05:40:33] (03Merged) 10jenkins-bot: Search results for actors include description [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1168284 (owner: 10Jacob4code) [05:48:23] (03open) 10samwilson: Add job queue database and processing [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/2 (https://phabricator.wikimedia.org/T385138) [05:51:56] (03update) 10samwilson: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [05:57:09] (03update) 10samwilson: Draft: Add job queue database and processing [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/2 (https://phabricator.wikimedia.org/T385138) [05:57:13] (03update) 10samwilson: Draft: Add job queue database and processing [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/2 (https://phabricator.wikimedia.org/T385138) [05:59:31] (03update) 10samwilson: Migrate CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [06:01:32] (03CR) 10Eugene233: "recheck" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1178628 (owner: 10Jacob4code) [06:01:38] (03CR) 10CI reject: [V:04-1] Only display "No co-actors found" when search result is empty [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1178628 (owner: 10Jacob4code) [06:36:31] 06cloud-services-team: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880 (10fgiunchedi) 03NEW [06:37:35] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11084626 (10fgiunchedi) I haven't seen any disconnections since yesterday, I'm confident to call this done. The followup to fix all VMs is at {T401880} [06:38:30] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11084628 (10fgiunchedi) >>! In T400223#11084544, @Danilo wrote: > It is fixed! None of the bots that disconnected daily are disconnecting anymore. > > @fgiunched... [06:54:08] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11084637 (10taavi) [07:40:57] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org does not show trixie instances - https://phabricator.wikimedia.org/T401876#11084787 (10taavi) a:03taavi [07:45:05] (03PS1) 10Majavah: maintain-projects: Delete before adding [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178725 [07:45:25] (03CR) 10Majavah: [C:03+2] maintain-projects: Delete before adding [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178725 (owner: 10Majavah) [07:46:01] (03Merged) 10jenkins-bot: maintain-projects: Delete before adding [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178725 (owner: 10Majavah) [07:48:03] (03PS1) 10Majavah: maintain-projects: Fix typo in variable access [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178726 [07:48:13] (03CR) 10Majavah: [C:03+2] maintain-projects: Fix typo in variable access [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178726 (owner: 10Majavah) [07:48:49] (03Merged) 10jenkins-bot: maintain-projects: Fix typo in variable access [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1178726 (owner: 10Majavah) [07:56:34] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org does not show trixie instances - https://phabricator.wikimedia.org/T401876#11084798 (10taavi) The issue was with the maintain-projects script failing in a situation where a project had been deleted and then re-created with the same name. I fixed that... [07:56:39] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org does not show trixie instances - https://phabricator.wikimedia.org/T401876#11084799 (10taavi) 05Open→03Resolved [07:58:30] (03update) 10dcaro: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 [08:07:12] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate cloudinfra project off of Debian Bullseye - https://phabricator.wikimedia.org/T401811#11084827 (10taavi) [08:07:17] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6, 13Patch-For-Review: Refresh Cloud VPS NTP servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401848#11084826 (10taavi) [08:12:52] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:15:05] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11084835 (10fgiunchedi) The initial audit for VMs with the same machine id is at P81346, it was generated with: ` root@cloudcumin1001:~# cumin 'O{*}' 'cat /etc/machine-id' --output jso... [08:20:18] (03update) 10samwilson: Migrate CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [08:21:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:27:52] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:28:11] (03update) 10dcaro: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) (owner: 10raymond-ndibe) [08:28:54] 06cloud-services-team, 10Cloud-VPS: monitoring and swift project instances not permitting access from cloud-cumin-01 - https://phabricator.wikimedia.org/T254041#11084867 (10fgiunchedi) 05Open→03Invalid `monitoring` doesn't exist anymore, and all `swift` VMs are accessible now: ` root@cloudcumin1001:~#... [08:30:00] (03update) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [08:30:57] 06cloud-services-team, 10Cloud-VPS: Investigate instances not allowing access from cloud-cumin - https://phabricator.wikimedia.org/T247198#11084895 (10fgiunchedi) 05Open→03Invalid `monitoring` doesn't exist anymore, nor any of the VMs listed here ` root@cloudcumin1001:~# cumin 'D{snuggle-enwiki-02.snu... [08:31:35] 06cloud-services-team, 10Cloud-VPS: Investigate instances not allowing access from cloud-cumin - https://phabricator.wikimedia.org/T247198#11084903 (10fgiunchedi) The problem as a whole (audit and react to unaccessible VMs) still remains though! [09:05:11] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11084997 (10dcaro) On the NFS side, I checked the dbus ids (`/var/lib/dbus/machine-id`) and are all different, and the nfs-client ids are empty, so it should be using the default ("Linu... [09:05:58] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher - https://phabricator.wikimedia.org/T401846#11085000 (10dcaro) 05Open→03Resolv... [09:12:29] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11085036 (10dcaro) > I'm actually not sure what would happen if I set different refs for the same repo here... I assume the last build wins and updat... [09:16:04] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:20:21] 06cloud-services-team, 10Toolforge: [jobs-cli,components-api] Provide YAML schema file for toolforge-jobs definition files - https://phabricator.wikimedia.org/T314729#11085050 (10dcaro) fyi. The config schema for tool configuration was created in {T397724} [09:21:04] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:26:04] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesse [09:31:25] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11085091 (10dcaro) Two ideas come right away to me: * Allow reusing images from other components (this forces the components to use the same build, t... [09:42:47] 10Toolforge (Toolforge iteration 23): [components-api] Allow reusing another component build - https://phabricator.wikimedia.org/T401893 (10dcaro) 03NEW [09:43:57] 10Toolforge (Toolforge iteration 23): [builds-api] Allow queuing builds - https://phabricator.wikimedia.org/T401894 (10dcaro) 03NEW [09:44:39] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11085169 (10dcaro) 05Open→03In progress p:05Triage→03High a:03dcaro [09:45:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#11085177 (10dcaro) 05In progress→03Resolved [09:47:21] 06cloud-services-team: Upgrade remaining WMCS hardware and Ganeti VMs on Bullseye - https://phabricator.wikimedia.org/T401896 (10taavi) 03NEW [09:55:52] (03update) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [09:56:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:59:31] 06cloud-services-team: Upgrade remaining WMCS hardware and Ganeti VMs on Bullseye - https://phabricator.wikimedia.org/T401896#11085269 (10taavi) →14Duplicate dup:03T375217 [09:59:36] 06cloud-services-team, 10Cloud-VPS: Complete upgrading WMCS bare metal hosts from Bullseye to Bookworm - https://phabricator.wikimedia.org/T375217#11085271 (10taavi) [10:01:09] 06cloud-services-team, 10Horizon, 10Striker, 10wikitech.wikimedia.org: Reimage cloudweb hosts to trixie - https://phabricator.wikimedia.org/T376277#11085286 (10taavi) [10:01:43] 06cloud-services-team, 10Cloud-VPS: Reimage cloudgw hosts to Trixie - https://phabricator.wikimedia.org/T401899 (10taavi) 03NEW [10:02:03] 06cloud-services-team, 10Cloud-VPS: Complete upgrading WMCS bare metal hosts from Bullseye to Bookworm - https://phabricator.wikimedia.org/T375217#11085314 (10taavi) p:05Low→03Medium [10:02:31] 06cloud-services-team, 10Cloud-VPS: Reimage cloudgw hosts to Trixie - https://phabricator.wikimedia.org/T401899#11085320 (10taavi) p:05Triage→03Medium [10:03:55] (03CR) 10NkwadaNora: [C:03+1] elimininate shared productions duplicates [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1169783 (owner: 10Jacob4code) [10:04:44] (03CR) 10NkwadaNora: [C:03+1] Search results for actors include description [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1168284 (owner: 10Jacob4code) [10:25:09] (03update) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [10:43:05] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11085629 (10DamianZaremba) >>! In T401851#11085036, @dcaro wrote: >> I'm actually not sure what would happen if I set different refs for the same rep... [10:44:40] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11085640 (10DamianZaremba) >>! In T401851#11085091, @dcaro wrote: > Two ideas come right away to me: > > * Allow reusing images from other components... [10:46:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:46:06] (03update) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [11:15:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:21:58] (03approved) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [11:22:12] (03merge) 10dcaro: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 (owner: 10raymond-ndibe) [11:22:56] !log taavi@cloudcumin1001 metricsinfra START - Cookbook wmcs.vps.refresh_puppet_certs on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud [11:22:57] (03update) 10dcaro: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) (owner: 10raymond-ndibe) [11:24:19] !log taavi@cloudcumin1001 metricsinfra END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud [11:26:28] FIRING: TargetDown: Job thanos-rule is unreachable in project metricsinfra instance metricsinfra-thanos-fe-2 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:31:28] RESOLVED: TargetDown: Job thanos-rule is unreachable in project metricsinfra instance metricsinfra-thanos-fe-2 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:32:06] 06cloud-services-team, 10Cloud-VPS: trixie puppet 8->7 downgrade code does not work - https://phabricator.wikimedia.org/T401913 (10taavi) 03NEW [11:33:27] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found [11:33:27] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found [11:33:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:35:05] (03PS4) 10David Caro: reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [11:36:30] (03approved) 10dcaro: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) (owner: 10raymond-ndibe) [11:36:38] (03merge) 10dcaro: [jobs-api, jobs-cli] test for job logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/924 (https://phabricator.wikimedia.org/T127367) (owner: 10raymond-ndibe) [11:38:24] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 [11:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:38:29] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 [11:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:38:56] (03CR) 10CI reject: [V:04-1] reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [11:39:54] (03CR) 10David Caro: "Now it will ask unles `--yes-i-know-what-im-doing` is passed (had to tweak the limit D processes to test right now):" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [11:41:16] (03PS5) 10David Caro: reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [11:45:22] (03CR) 10CI reject: [V:04-1] reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [11:53:18] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [11:53:24] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [11:55:38] (03update) 10dcaro: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) [11:56:01] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [11:56:10] (03update) 10dcaro: logs-api: add new component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/911 [11:57:31] (03PS6) 10David Caro: reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [11:59:05] (03PS7) 10David Caro: reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [12:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:15:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:20:52] 06cloud-services-team, 10Cloud-VPS: trixie puppet 8->7 downgrade code does not work - https://phabricator.wikimedia.org/T401913#11086054 (10MoritzMuehlenhoff) puppet is just a transition package towards puppet-agent, for trixie installs in cloud it's probably best to simply only install puppet-agent [12:27:17] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/42 [12:27:33] (03open) 10damian: Draft: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) [12:27:33] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/9 [12:50:19] 06cloud-services-team, 10Toolforge: [build service] failure due to transient issue - https://phabricator.wikimedia.org/T401917 (10DamianZaremba) 03NEW [13:31:48] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [13:41:36] 06cloud-services-team, 10Toolforge: [loki] persist build logs for each tool on their loki namespace - https://phabricator.wikimedia.org/T401830#11086283 (10taavi) @dcaro Given a specific pod in the `image-build` namespace, is there some label that is guaranteed to link it to a particular `tool-$NAME` namespace? [13:45:24] 06cloud-services-team, 10Toolforge: [loki] persist build logs for each tool on their loki namespace - https://phabricator.wikimedia.org/T401830#11086291 (10dcaro) >>! In T401830#11086283, @taavi wrote: > @dcaro Given a specific pod in the `image-build` namespace, is there some label that is guaranteed to link... [13:56:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file - https://phabricator.wikimedia.org/T401922 (10dcaro) 03NEW [13:56:08] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file - https://phabricator.wikimedia.org/T401922#11086352 (10dcaro) [13:59:26] (03approved) 10dcaro: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 [13:59:41] (03approved) 10dcaro: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) (owner: 10raymond-ndibe) [14:05:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:15:38] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [14:15:39] (03approved) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [14:20:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:29:35] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [14:29:37] (03approved) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [14:30:48] (03merge) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [14:30:52] (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [14:31:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:35:50] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.399-20250814143101-d4c07a8c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/925 (https://phabricator.wikimedia.org/T400024) [14:38:25] !log taavi@cloudcumin1001 mediawiki-quickstart START - Cookbook wmcs.openstack.quota_increase (T401864) [14:38:29] T401864: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864 [14:38:32] !log taavi@cloudcumin1001 mediawiki-quickstart END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T401864) [14:39:19] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11086510 (10taavi) 05Open→03Resolved a:03taavi [14:40:05] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [14:47:34] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [14:47:46] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [14:57:05] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [15:02:38] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:03:22] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:13:54] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [15:15:27] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:15:47] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.399-20250814143101-d4c07a8c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/925 (https://phabricator.wikimedia.org/T400024) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:15:48] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.399-20250814143101-d4c07a8c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/925 (https://phabricator.wikimedia.org/T400024) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:15:52] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.399-20250814143101-d4c07a8c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/925 (https://phabricator.wikimedia.org/T400024) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:16:11] (03merge) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [15:18:34] (03open) 10raymond-ndibe: d/changelog: bump to 16.1.18 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/122 (https://phabricator.wikimedia.org/T400024) [15:32:08] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli [15:34:22] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [15:37:18] (03close) 10dcaro: [T400024] Allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/183 (owner: 10damian) [15:41:17] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli [15:44:21] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli [15:50:25] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [15:53:33] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli [16:08:07] (03update) 10raymond-ndibe: d/changelog: bump to 16.1.18 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/122 (https://phabricator.wikimedia.org/T400024) [16:08:09] (03approved) 10raymond-ndibe: d/changelog: bump to 16.1.18 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/122 (https://phabricator.wikimedia.org/T400024) [16:08:56] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:10:10] (03merge) 10raymond-ndibe: d/changelog: bump to 16.1.18 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/122 (https://phabricator.wikimedia.org/T400024) [16:16:56] 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: Support for UDP ports in jobs - https://phabricator.wikimedia.org/T400024#11086900 (10DamianZaremba) FYI this is working for me on 2 tool accounts, example of the re-created service: ` $ date && kubectl get service irc-relay -oj... [16:23:41] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/9 (owner: 10l10n-bot) [16:23:44] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/9 (owner: 10l10n-bot) [16:24:06] (03open) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [16:46:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:47:31] 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: Support for UDP ports in jobs - https://phabricator.wikimedia.org/T400024#11087033 (10Raymond_Ndibe) >>! In T400024#11086900, @DamianZaremba wrote: > FYI this is working for me on 2 tool accounts, example of the re-created servi... [16:47:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:47:48] 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: Support for UDP ports in jobs - https://phabricator.wikimedia.org/T400024#11087035 (10Raymond_Ndibe) 05In progress→03Resolved [16:53:56] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [17:03:07] 10Tool-gawa: [Code]Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767#11087078 (10PenScribe) `Intégration d'une page statistiques Gawa V6 avec Flask, SQLModel, UI responsive et mode sombre` **//C'EST UN MODELE//** Résumé du travail effectué 1. Backend • Mise en place d’une appli... [17:06:39] (03open) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [17:06:44] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [17:09:52] 06cloud-services-team, 10Toolforge: [Build service] latest builder has old PHP - https://phabricator.wikimedia.org/T401875#11087098 (10bd808) >>! In T401875#11084327, @DamianZaremba wrote: > This could be a feature request, but I'll put it as a bug since "use latest version" This "latest version" in this sens... [17:10:03] 06cloud-services-team, 10Toolforge: [Build service] latest builder has old PHP - https://phabricator.wikimedia.org/T401875#11087099 (10bd808) [17:10:08] 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11087100 (10bd808) [17:13:46] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [17:14:20] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [17:15:12] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [17:21:52] (03open) 10dcaro: builds-api,jobs-api: when checking for launcher, use k8s [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/926 [17:22:25] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [17:31:02] (03update) 10dcaro: builds-api,jobs-api: when checking for launcher, use k8s [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/926 [17:52:56] FIRING: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:53:12] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (owner: 10raymond-ndibe) [18:22:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:23:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:28:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:29:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:39:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:21:55] 10Cloud-VPS (Project-requests): Request creation of eseaphub VPS project - https://phabricator.wikimedia.org/T401957 (10Robertsky) 03NEW [19:22:28] FIRING: [2x] InstanceDown: Project cloudinfra instance ntp-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:22:50] 10Cloud-VPS (Project-requests): Request creation of eseap VPS project - https://phabricator.wikimedia.org/T401957#11087564 (10Chlod) [19:23:20] 10Cloud-VPS (Project-requests): Request creation of eseap VPS project - https://phabricator.wikimedia.org/T401957#11087565 (10Chlod) [19:24:03] (03open) 10andrew: secgroups: open ipv6 access to codfw1dev ntp server [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/261 [19:25:10] 10Cloud-VPS (Project-requests): Request deletion of wikimania-mautic VPS project - https://phabricator.wikimedia.org/T401958 (10Robertsky) 03NEW [19:27:28] RESOLVED: [2x] InstanceDown: Project cloudinfra instance ntp-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:37:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:47:56] FIRING: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:48:08] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://phabricator.wikimedia.org/T401959 (10phaultfinder) 03NEW [20:10:33] (03open) 10bd808: dev: Update dependencies with `cargo update` [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/12 [20:17:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:17:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:17:56] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:17:56] RESOLVED: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:18:15] (03update) 10bd808: dev: Update dependencies with `cargo update` [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/12 [20:21:17] (03merge) 10bd808: dev: Update dependencies with `cargo update` [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/12 [20:22:03] (03update) 10bd808: Fix cut-off URL in IRC real name [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/11 (owner: 10krinkle) [20:26:32] (03merge) 10bd808: Fix cut-off URL in IRC real name [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/11 (owner: 10krinkle) [21:06:45] 06cloud-services-team, 10Toolforge: !log automated deployments so that a tool’s SAL records system changes - https://phabricator.wikimedia.org/T401963 (10bd808) 03NEW [21:27:07] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11087944 (10Andrew) [21:59:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:22:38] 10Cloud-VPS (Project-requests): Request creation of wikimania-mautic VPS project - https://phabricator.wikimedia.org/T340439#11088070 (10Robertsky) @Andrew apologies for the belated reply, I have requested for it to be deleted: T401958. [22:32:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:33:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:38:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:39:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:44:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:45:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:55:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:30:02] (03PS2) 10Jacob4code: Only display "No co-actors found" when search result is empty [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1178628 [23:57:38] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [jobs-api] check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#11088186 (10Raymond_Ndibe) 05In progress→03Stalled