[00:14:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:29:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:23:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:26:38] 10Tool-gawa: Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767 (10PenScribe) 03NEW [02:30:18] 10Tool-gawa: Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767#11080929 (10PenScribe) 05Open→03In progress la conception est en cours. [02:54:40] 10Tool-gawa: Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767#11080964 (10PenScribe) a:03PenScribe [02:58:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:58:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:03:35] FIRING: NetworkOutSaturated: Outgoing network saturation detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DNetworkOutSaturated [03:12:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:17:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:18:53] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Decision request - Reuse toolforge user tools central logging for toolforge infrastructure logging - https://phabricator.wikimedia.org/T398285#11080987 (10Andrew) Given finite Taavi availability, A seems like the best op... [03:58:35] RESOLVED: NetworkOutSaturated: Outgoing network saturation detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DNetworkOutSaturated [04:13:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:29:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:57:31] 10Toolforge (Toolforge iteration 23), 07good first task, 13Patch-For-Review: [components-api] use the `build.params.image_name` to compare with the `component` - https://phabricator.wikimedia.org/T395076#11081056 (10Raymond_Ndibe) 05In progress→03Resolved [04:57:32] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [jobs-cli,builds-cli,toolforge-cli,components-cli,envvars-cli,webservice-cli] move the packaging scripts to bookworm - https://phabricator.wikimedia.org/T400616#11081058 (10Raymond_Ndibe) 05In progress→03Resolved [05:18:10] 06cloud-services-team, 10Toolforge: [jobs-api] rename variable/parameter type to job_type - https://phabricator.wikimedia.org/T387727#11081105 (10Raymond_Ndibe) a:05Raymond_Ndibe→03None [05:49:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:19:20] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove dummy keytab for sretest1001 (decommed) [labs/private] - 10https://gerrit.wikimedia.org/r/1169040 (owner: 10Muehlenhoff) [06:45:29] (03open) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [06:48:45] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [06:57:18] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [06:59:43] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [07:01:12] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [07:02:41] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [07:03:44] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [07:06:45] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [07:38:45] (03update) 10raymond-ndibe: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) (owner: 10dcaro) [07:57:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [07:58:14] FIRING: [2x] ProbeDown: Service clouddumps1002:443 has failed probes (http_dumps_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Dumps/SQL-XML_Dumps#NFS_share_and/or_web_server_issues - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:58:19] 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T401783 (10phaultfinder) 03NEW [07:59:05] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11081374 (10fgiunchedi) The fix is `rm /etc/machine-id && systemd-machine-id-setup && systemctl restart systemd-networkd` . Or alternatively `rm /etc/machine-id` and... [08:03:14] RESOLVED: [2x] ProbeDown: Service clouddumps1002:443 has failed probes (http_dumps_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Dumps/SQL-XML_Dumps#NFS_share_and/or_web_server_issues - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:13:26] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [08:13:35] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [08:43:26] (03update) 10dcaro: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) [08:56:17] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [09:09:23] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [09:11:22] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [09:24:29] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [09:27:36] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [09:31:53] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11081795 (10fgiunchedi) Earlier today I refreshed `/etc/machine-id` on these hosts `tools-k8s-worker-[102-103,105-112].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-n... [09:48:58] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Cloud VPS Debian Bullseye deprecation - https://phabricator.wikimedia.org/T401804 (10taavi) 03NEW [09:49:50] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Disable creation of new Bullseye instances - https://phabricator.wikimedia.org/T401805 (10taavi) 03NEW [09:50:21] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Update os-deprecation tool to track Bullseye - https://phabricator.wikimedia.org/T401806 (10taavi) 03NEW [09:53:36] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (support-port-protocol-selection) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [09:54:53] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (support-port-protocol-selection) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [09:57:12] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Update os-deprecation tool to track Bullseye - https://phabricator.wikimedia.org/T401806#11081886 (10taavi) p:05Triage→03Medium [10:14:12] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Update os-deprecation tool to track Bullseye - https://phabricator.wikimedia.org/T401806#11081975 (10taavi) 05Open→03Resolved [10:15:51] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 10Toolforge: [infra] Toolforge: migrate to Debian Bookworm or later - https://phabricator.wikimedia.org/T387005#11081980 (10taavi) [10:16:29] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace all codfw1dev Bullseye VMs - https://phabricator.wikimedia.org/T401810 (10taavi) 03NEW [10:17:38] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh Cloud VPS bastions to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T392689#11081993 (10taavi) [10:18:31] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate cloudinfra project off of Debian Bullseye - https://phabricator.wikimedia.org/T401811 (10taavi) 03NEW [10:18:53] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate WMCS-managed NFS servers off of Bullseye - https://phabricator.wikimedia.org/T401812 (10taavi) 03NEW [10:19:52] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate metricsinfra project off of Bullseye - https://phabricator.wikimedia.org/T401813 (10taavi) 03NEW [10:28:19] 10Cloud-VPS (Debian Bullseye Deprecation): Replace tf-registry-2.terraform with new Trixie instance in tofu project - https://phabricator.wikimedia.org/T401814 (10taavi) 03NEW p:05Triage→03Medium [10:28:34] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [10:30:28] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace tf-registry-2.terraform with new Trixie instance in tofu project - https://phabricator.wikimedia.org/T401814#11082094 (10taavi) [10:30:30] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [10:35:30] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [10:38:06] 06cloud-services-team, 10Toolforge: [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082109 (10Raymond_Ndibe) a:03Raymond_Ndibe [10:38:37] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082111 (10Raymond_Ndibe) [10:39:57] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [10:47:01] (03update) 10raymond-ndibe: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) (owner: 10dcaro) [10:56:13] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:08:02] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 13Patch-For-Review: Replace tf-registry-2.terraform with new Trixie instance in tofu project - https://phabricator.wikimedia.org/T401814#11082156 (10taavi) This is almost done, except that the `terraform.wmcloud.org` proxy cannot be moved to t... [11:09:29] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:15:11] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:15:14] 06cloud-services-team, 10Toolforge: Update Toolforge Cumin nodes off of Bullseye - https://phabricator.wikimedia.org/T401817 (10taavi) 03NEW [11:16:50] 06cloud-services-team, 10Toolforge: Upgrade Toolforge (Elastic|Open)Search cluster off of Bullseye - https://phabricator.wikimedia.org/T401818 (10taavi) 03NEW [11:17:28] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 10Toolforge: [infra] Toolforge: migrate to Debian Bookworm or later - https://phabricator.wikimedia.org/T387005#11082207 (10taavi) [11:17:29] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate WMCS-managed NFS servers off of Bullseye - https://phabricator.wikimedia.org/T401812#11082208 (10taavi) [11:18:10] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:19:33] 06cloud-services-team, 10Toolforge: Upgrade or retire tools-package-builder-04 - https://phabricator.wikimedia.org/T401819 (10taavi) 03NEW [11:22:50] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:24:58] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:27:13] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:31:08] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:32:22] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [11:40:34] (03open) 10taavi: Update tofu registry domain [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/64 (https://phabricator.wikimedia.org/T401814) [11:40:37] (03update) 10taavi: Update tofu registry domain [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/64 (https://phabricator.wikimedia.org/T401814) [11:40:50] (03update) 10taavi: Update tofu registry domain [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/64 (https://phabricator.wikimedia.org/T401814) [11:41:50] 10Tool-gawa: [Code]Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767#11082305 (10PenScribe) [11:42:50] (03open) 10taavi: Update canonical address [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/9 (https://phabricator.wikimedia.org/T401814) [11:45:14] (03update) 10dcaro: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) [11:47:11] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [11:48:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082326 (10dcaro) Ideally this would be sorted by having system logs, as in logging the system events (ex. job restarted, job stopped, job de... [11:49:16] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082327 (10dcaro) @Raymond_Ndibe the list of status here is just a proposal, to be discussed/refined, so that's the first part of the task. [12:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:15:35] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [12:17:37] (03update) 10dcaro: logs_api: add the option to enable logs-api [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/75 [12:24:17] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [12:40:02] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11082399 (10DamianZaremba) FYI I migrated the schema over to the trove instance yesterday. Only issue I found was the "root access" described doesn't work because I don't have a v... [12:54:41] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082431 (10Raymond_Ndibe) >>! In T401172#11082326, @dcaro wrote: > Ideally this would be sorted by having system logs, as in logging the syst... [12:56:44] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [12:57:13] (03open) 10eliza189: Added comments, optimized imports and reformatted the files [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/13 [12:59:47] (03close) 10eliza189: Eliza views bugs [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/9 [13:00:43] (03close) 10eliza189: Ilanmerge [toolforge-repos/miss-search] (update-cycle) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/8 (owner: 10ilanen1) [13:01:30] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [13:03:11] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [13:10:45] (03update) 10dcaro: logs-api: add new component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/911 [13:14:51] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [13:15:28] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Disable creation of new Bullseye instances - https://phabricator.wikimedia.org/T401805#11082542 (10Andrew) ` openstack image set --project testlabs --shared 94b776c5-72f9-4d72-9b0d-4043d8eee421 ` it's now available in testlabs only (and can b... [13:18:13] (03update) 10dcaro: global: first commit [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/1 (https://phabricator.wikimedia.org/T127367) [13:23:13] (03update) 10samwilson: Draft: Start migrating CI from GitHub to GitLab [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/1 (https://phabricator.wikimedia.org/T395398) [13:55:27] 10Toolforge (Toolforge iteration 23): [loki] persist build logs for each tool on their loki namespace - https://phabricator.wikimedia.org/T401830 (10dcaro) 03NEW [13:56:20] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Decision request - Reuse toolforge user tools central logging for toolforge infrastructure logging - https://phabricator.wikimedia.org/T398285#11082687 (10taavi) 05Open→03Resolved The result from the decision mee... [13:57:07] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Disable creation of new Bullseye instances - https://phabricator.wikimedia.org/T401805#11082694 (10taavi) 05Open→03Resolved a:03Andrew [14:03:19] (03merge) 10galrach600: Added comments, optimized imports and reformatted the files [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/13 (owner: 10eliza189) [14:03:59] 06cloud-services-team, 10Data-Services, 06Data-Persistence, 06Data-Platform-SRE: Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11082732 (10Ottomata) [14:05:49] (03open) 10vriaa: fix: ColorPicker component [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/4 [14:11:58] (03open) 10galrach600: merge request for final submission [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/14 [14:16:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11082770 (10dcaro) > unsure what you meant by `system logs` in this context. Is the plan to do this in `logs-api` instead? if that's the case... [14:31:25] (03open) 10vriaa: feat: Add icon buttons for selecting font style [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/5 [14:34:11] (03open) 10vriaa: Close button editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/6 [14:35:09] (03update) 10vriaa: Text editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/5 [14:37:08] (03update) 10vriaa: Text editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/5 [14:43:40] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11082867 (10Raymond_Ndibe) [14:58:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:10:18] 10Tool-gawa: [Code]Conception de la page de statistiques - https://phabricator.wikimedia.org/T401767#11082959 (10poro26) [15:11:37] 06cloud-services-team, 10Toolforge: Upgrade Toolforge (Elastic|Open)Search cluster off of Bullseye - https://phabricator.wikimedia.org/T401818#11082965 (10bd808) [15:21:33] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11083008 (10Raymond_Ndibe) [15:27:31] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11083022 (10Raymond_Ndibe) [15:28:22] 10Cloud-VPS (Debian Bullseye Deprecation), 10Beta-Cluster-Infrastructure: Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie - https://phabricator.wikimedia.org/T401839 (10taavi) 03NEW [15:53:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:58:23] (03open) 10vriaa: Banner editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/7 [15:59:17] 10Cloud-VPS (Debian Bullseye Deprecation), 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Priority Backlog 📥): Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie - https://phabricator.wikimedia.org/T401839#11083190 (10bd808) [15:59:32] (03update) 10vriaa: fix: ColorPicker component [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/4 [16:01:23] (03update) 10vriaa: fix: ColorPicker component [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/4 [16:01:41] (03update) 10vriaa: fix: ColorPicker component [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/4 [16:08:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:18:35] 10cloud-services-team (FY2025/26-Q1), 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher - https://phabricator.wikimedia.org/T401846 (10dcaro) 03NEW [16:18:41] RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:19:05] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher - https://phabricator.wikimedia.org/T401846#11083263 (10dcaro) p:05Triage→03High [16:19:08] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher - https://phabricator.wikimedia.org/T401846#11083265 (10dcaro) [16:20:12] (03open) 10dcaro: Revert "jobs-api: bump to 0.0.397-20250808001841-d9ce682d" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/919 (https://phabricator.wikimedia.org/T401846) [16:22:24] (03approved) 10dcaro: Revert "jobs-api: bump to 0.0.397-20250808001841-d9ce682d" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/919 (https://phabricator.wikimedia.org/T401846) [16:22:34] (03merge) 10dcaro: Revert "jobs-api: bump to 0.0.397-20250808001841-d9ce682d" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/919 (https://phabricator.wikimedia.org/T401846) [16:27:36] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Engineering-Radar, and 2 others: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#11083290 (10Ottomata) [16:42:53] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh Cloud VPS NTP servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401848 (10taavi) 03NEW [16:43:04] (03open) 10dcaro: Draft: DONOTMERGE [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 [16:47:03] (03update) 10dcaro: Draft: DONOTMERGE [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 [16:59:38] (03update) 10dcaro: Draft: DONOTMERGE [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 [17:00:50] (03update) 10dcaro: runtime: don't overwite command [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 [17:05:07] (03update) 10naorleizer: merge request for final submission [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/14 (owner: 10galrach600) [17:07:16] (03merge) 10naorleizer: merge request for final submission [toolforge-repos/miss-search] - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/14 (owner: 10galrach600) [17:21:49] 06cloud-services-team, 10Toolforge: [jobs-api,logs-api] When listing logs without --follow, the logs are sorted first by pod, then by timestamp - https://phabricator.wikimedia.org/T401850 (10dcaro) 03NEW [17:24:05] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851 (10DamianZaremba) 03NEW [17:26:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:27:57] 06cloud-services-team, 10Toolforge: [jobs-api,logs-api] When listing logs without --follow, the logs are sorted first by pod, then by timestamp - https://phabricator.wikimedia.org/T401850#11083451 (10taavi) →14Duplicate dup:03T401552 [17:27:58] 06cloud-services-team, 10Toolforge: `toolforge jobs logs` has inconsistent ordering - https://phabricator.wikimedia.org/T401552#11083453 (10taavi) [17:28:45] 06cloud-services-team, 10Toolforge: [loki] persist build logs for each tool on their loki namespace - https://phabricator.wikimedia.org/T401830#11083457 (10taavi) [17:33:15] (03open) 10raymond-ndibe: [jobs-api,builds-api] test for launcher str in buildpack image commands [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/920 (https://phabricator.wikimedia.org/T401846) [17:33:20] (03update) 10raymond-ndibe: [jobs-api,builds-api] test for launcher str in buildpack image commands [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/920 (https://phabricator.wikimedia.org/T401846) [17:35:50] (03open) 10raymond-ndibe: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 [17:36:23] (03approved) 10dcaro: [jobs-api,builds-api] test for launcher str in buildpack image commands [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/920 (https://phabricator.wikimedia.org/T401846) (owner: 10raymond-ndibe) [17:36:57] (03merge) 10dcaro: [jobs-api,builds-api] test for launcher str in buildpack image commands [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/920 (https://phabricator.wikimedia.org/T401846) (owner: 10raymond-ndibe) [18:16:01] (03update) 10raymond-ndibe: runtime: don't overwite command [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/195 (owner: 10dcaro) [18:17:28] (03update) 10raymond-ndibe: [maintain-harbor] fix delete_stale_toolforge_artifacts bug [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/921 [18:20:59] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [18:23:31] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [18:23:49] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [18:29:11] (03CR) 10Eugene233: [C:03+2] elimininate shared productions duplicates [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1169783 (owner: 10Jacob4code) [18:29:57] (03Merged) 10jenkins-bot: elimininate shared productions duplicates [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1169783 (owner: 10Jacob4code) [18:30:11] (03CR) 10Eugene233: [C:03+1] Search results for actors include description [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1168284 (owner: 10Jacob4code) [18:30:19] 10Toolforge (Toolforge iteration 23): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11083654 (10DamianZaremba) This is sort of related to T401388 in the context of the comment regarding which component <> ref; I'm actually not sure wh... [18:33:45] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher - https://phabricator.wikimedia.org/T401846#11083689 (10DamianZaremba) That would exp... [18:39:24] 06cloud-services-team, 10Toolforge: Support installing packages from non-upstream repo and/or build pack for C/C++code - https://phabricator.wikimedia.org/T401075#11083702 (10DamianZaremba) As a workaround to get this running in a pack/container until we have a proper build pack, I'm pretending to be a Python... [18:43:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [18:44:31] 10VPS-project-Wikistats: upgrade wikistats cloud VPS to trixie - https://phabricator.wikimedia.org/T401859 (10Dzahn) 03NEW [18:45:01] 10VPS-project-Wikistats: upgrade wikistats cloud VPS to trixie - https://phabricator.wikimedia.org/T401859#11083720 (10Dzahn) 05Open→03In progress created instance `wikistats-trixie.wikistats.eqiad1.wikimedia.cloud` [18:45:17] 10VPS-project-Wikistats: upgrade wikistats cloud VPS project to trixie - https://phabricator.wikimedia.org/T401859#11083722 (10Dzahn) [18:45:56] 10VPS-project-Wikistats: upgrade wikistats cloud VPS project to trixie - https://phabricator.wikimedia.org/T401859#11083723 (10Dzahn) [18:50:41] 06cloud-services-team, 10Cloud-VPS: Fix Puppet version/legacy fact issues with Cloud VPS Trixie image - https://phabricator.wikimedia.org/T401586#11083740 (10taavi) 05Open→03Resolved a:03taavi [18:50:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [19:04:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:04:58] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [19:06:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:11:37] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [19:11:43] 06cloud-services-team, 10Cloud-VPS: Enable SSL in Trove eMariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861 (10JJMC89) 03NEW [19:14:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:20:09] 06cloud-services-team, 10Cloud-VPS: Request to enable XFF headers for test XTools hostnames - https://phabricator.wikimedia.org/T400964#11083835 (10MusikAnimal) 05Open→03Invalid Finally got back to working on this, and realized I can simply temporarily point `xtools-dev.wmcoud.org` to the new prod serv... [19:31:17] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864 (10Mhurd) 03NEW [19:36:46] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083913 (10Mhurd) [19:41:07] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083922 (10Mhurd) [19:42:48] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083926 (10Mhurd) [19:45:47] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083947 (10Mhurd) [19:51:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:51:23] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [19:51:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:51:51] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [19:52:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:52:50] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [19:53:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:53:28] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [20:05:06] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083982 (10Mhurd) [20:08:06] 06cloud-services-team, 10Cloud-VPS: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11083991 (10JJMC89) [20:08:07] 06cloud-services-team, 10Cloud-VPS: Increase "mediawiki-quickstart" project disk quota to 160 GB - https://phabricator.wikimedia.org/T401864#11083992 (10Mhurd) [20:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:18:50] 06cloud-services-team, 10Cloud-VPS: [tofu-cloudvps] cloudvps_puppet_prefix.hiera settings show dirty diffs based on YAML canonicalization - https://phabricator.wikimedia.org/T398643#11084014 (10bd808) I tried to get `delve` setup so I could set a breakpoint at [[https://github.com/opentofu/opentofu/blob/9d4763... [20:29:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:51:38] 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868 (10DamianZaremba) 03NEW [20:52:09] 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11084084 (10DamianZaremba) [21:21:19] 06cloud-services-team, 10Toolforge: [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11084137 (10JJMC89) [21:27:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:52:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [21:52:07] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [21:52:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [21:52:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [21:54:21] 06cloud-services-team, 10Cloud-VPS: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11084202 (10JJMC89) Since my application works without any changes, I've modified my configuration files to disable SSL whe... [21:59:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:59:30] 06cloud-services-team, 10Toolforge: [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11084207 (10DamianZaremba) @jjmc89 just FYI the link https://w.wiki/EYoq on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_your_tool added the current iteration... [22:01:17] (03CR) 10Essa237: [C:03+1] "ok" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1168284 (owner: 10Jacob4code) [22:32:37] (03open) 10krinkle: channels.yaml: Remove wikimedia-perf-bots [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/60 [22:35:29] (03merge) 10krinkle: channels.yaml: Remove wikimedia-perf-bots [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/60 [22:37:18] 06cloud-services-team, 10Toolforge: Loki usage - https://phabricator.wikimedia.org/T401151#11084273 (10DamianZaremba) Realistic historical usage on staging. ` 11G botng-20250723.log 11G botng-20250724.log 11G botng-20250725.log 11G botng-20250726.log 11G botng-20250727.log 11G botng-20250728.log 11G botng-2025... [22:37:58] (03open) 10krinkle: Fix cut-off URL in IRC real name [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/11 [22:39:40] (03update) 10krinkle: Fix cut-off URL in IRC real name [toolforge-repos/ircservserv] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv/-/merge_requests/11 [22:43:40] (03PS1) 10Jacob4code: Only display "No co-actors found" when search result is empty [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1178628 [22:52:10] (03open) 10krinkle: Remove wikimedia-perf-bots [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/29 [22:58:06] (03merge) 10jjmc89: Remove wikimedia-perf-bots [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/29 (owner: 10krinkle) [22:58:14] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [22:58:15] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [22:58:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [23:03:30] 06cloud-services-team, 10Toolforge: [Build service] latest builder has old PHP - https://phabricator.wikimedia.org/T401875 (10DamianZaremba) 03NEW [23:05:07] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [23:08:31] 06cloud-services-team, 10Toolforge: [Build service] latest builder has old PHP - https://phabricator.wikimedia.org/T401875#11084327 (10DamianZaremba) This could be a feature request, but I'll put it as a bug since "use latest version" [23:08:34] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#11084328 (10Dzahn) for postgresql maybe we can use https://docker-registry.wikimedia.org/repos/data-engineering/postgresql-kubernetes/postgresql/tags/ [23:09:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [23:09:47] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [23:15:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate [23:22:02] 06cloud-services-team, 10Toolforge: [Build service] latest builder has old PHP - https://phabricator.wikimedia.org/T401875#11084329 (10DamianZaremba) Outside of 8.4 support 8.3.13 was released oct 2024, 8.3.24 was released less than 2 weeks ago. Between the 2 are numerous security fixes. [23:24:35] andrew@cloudcumin1001 reactivate (PID 2425239) is awaiting input [23:27:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-107 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:37:00] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#11084349 (10DamianZaremba) I hit the k8s api RBAC today with https://github.com/cluebotng/external-grafan... [23:43:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)