[00:21:36] <wikibugs>	 (03update) 10samwilson: Use HTTP client object from API, with User-Agent set [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/4 (https://phabricator.wikimedia.org/T403435)
[00:31:55] <wmcs-alerts>	 FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[00:48:16] <wikibugs>	 (03update) 10samwilson: Use HTTP client object from API, with User-Agent set [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/4 (https://phabricator.wikimedia.org/T403435)
[01:16:55] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 close to running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[01:21:55] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 close to running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[03:08:55] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove some imports that were removed upstream. [openstack/horizon/trove-dashboard] - 10https://gerrit.wikimedia.org/r/1188495
[03:09:14] <wikibugs>	 (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Remove some imports that were removed upstream. [openstack/horizon/trove-dashboard] - 10https://gerrit.wikimedia.org/r/1188495 (owner: 10Andrew Bogott)
[04:01:41] <wikibugs>	 (03PS1) 10Andrew Bogott: Further attempt to get that merge conflict resolved properly [openstack/horizon/trove-dashboard] - 10https://gerrit.wikimedia.org/r/1188500
[04:02:03] <wikibugs>	 (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Further attempt to get that merge conflict resolved properly [openstack/horizon/trove-dashboard] - 10https://gerrit.wikimedia.org/r/1188500 (owner: 10Andrew Bogott)
[05:01:55] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[05:31:55] <wmcs-alerts>	 FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[06:01:55] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[06:36:03] <wmcs-alerts>	 FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[06:51:15] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[06:56:59] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[06:57:24] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75
[07:11:44] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75
[07:16:03] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[07:41:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[07:41:18] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[07:46:03] <wmcs-alerts>	 RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce
[07:51:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[07:56:18] <wmcs-alerts>	 RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-43 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce
[07:57:05] <wikibugs>	 (03update) 10dcaro: loki.alloy: decrease frequency for fetching logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/962
[08:00:47] <wikibugs>	 (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121
[08:04:57] <wikibugs>	 (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121
[08:13:49] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 24): Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584#11183743 (10fgiunchedi) p:05Triage→03High
[08:15:17] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 24): Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584#11183744 (10fgiunchedi) re: nfs server update I'm reading https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Runbooks/Create_an_NFS_serv...
[08:41:51] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 24): Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584#11183806 (10fgiunchedi) Also as pointed out by @taavi we're looking at changing the VIP address, as opposed to VIP failover, because the new servers...
[09:08:02] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:14:54] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[09:15:00] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:15:44] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[09:18:04] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:30:55] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[09:35:29] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs START - Cookbook wmcs.nfs.add_server
[09:45:00] <logmsgbot_cloud>	 !log filippo@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.nfs.add_server (exit_code=0)
[10:11:53] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[10:34:46] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 24): Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584#11184317 (10fgiunchedi) I did some tests in `testlabs` today:  1. Created a `nfs-client-2` instance with Trixie for client testing. Mounts are prese...
[10:51:02] <wikibugs>	 10VPS-project-Codesearch, 10m3api: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11184389 (10Ladsgroup) There is a `wmf_gitlab_group_projects` which takes a group and adds all of the projects using https://gitlab.wikimedia.org/groups/{group}/-/children.json (here https://gi...
[11:27:37] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] rebuilds un-changed images - https://phabricator.wikimedia.org/T403167#11184588 (10DamianZaremba) Hi @Raymond_Ndibe,  Essentially what you describe is how you get into this state.  I included it as an example along the lines of perhaps builds-api should be t...
[11:37:18] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[11:45:07] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[11:46:57] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[11:59:37] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[12:00:56] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[12:15:25] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[12:24:02] <wikibugs>	 10Tool-global-search: Export as HTML table - https://phabricator.wikimedia.org/T404713 (10Reedy) 03NEW
[12:24:40] <wikibugs>	 10Tool-global-search: Export as markdown table - https://phabricator.wikimedia.org/T404714 (10Reedy) 03NEW
[12:24:56] <wikibugs>	 10Tool-global-search: Export as HTML table - https://phabricator.wikimedia.org/T404713#11184750 (10Reedy)
[12:25:35] <wikibugs>	 10Tool-global-search: Export as markdown table - https://phabricator.wikimedia.org/T404714#11184753 (10Reedy) 05Open→03Invalid Apparently I can't spot the option (maybe because of the sorting order?)...
[12:25:36] <wikibugs>	 10Tool-global-search: Export as HTML table - https://phabricator.wikimedia.org/T404713#11184755 (10Reedy) p:05Triage→03Low
[12:34:21] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[12:44:54] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[12:50:48] <wikibugs>	 10Tool-archive-externa-links: Création de tableau de bord - https://phabricator.wikimedia.org/T399889#11184817 (10poro26) 05Open→03In progress
[12:54:23] <wikibugs>	 10Tool-archive-externa-links: [Documentation] Réalisation d'une nouvelle capsule vidéo pour l'installation du script utilisateur ArchiveExternaLinks - https://phabricator.wikimedia.org/T404193#11184849 (10poro26) 05Open→03Resolved Lien de la vidéo réalisée : https://w.wiki/FJK$
[12:56:03] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[12:57:06] <wikibugs>	 10Tool-archive-externa-links: Création de tableau de bord - https://phabricator.wikimedia.org/T399889#11184859 (10poro26) 05In progress→03Resolved
[12:57:42] <wikibugs>	 (03merge) 10dcaro: package: upgrade all deps [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/64
[12:58:09] <wikibugs>	 (03merge) 10dcaro: pre-commit: add check for openapi spec version bump [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/116
[13:02:00] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.156-20250916125822-74722783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/968 (https://phabricator.wikimedia.org/T401388)
[13:03:55] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[13:04:23] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[13:04:46] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: envvars-api: bump to 0.0.75-20250916125754-a88de155 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/969 (https://phabricator.wikimedia.org/T362869)
[13:06:22] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[13:08:19] <wmcs-alerts>	 FIRING: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown
[13:08:38] <wmcs-alerts>	 FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:13:38] <wmcs-alerts>	 RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:20:52] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for tools-test-k8s-worker-nfs-5
[13:20:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:21:06] <wm-bot2>	 !log dcaro@acme toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-test-k8s-worker-nfs-5
[13:21:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:21:29] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-5
[13:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:22:38] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-5
[13:22:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:23:19] <wmcs-alerts>	 RESOLVED: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown
[13:25:55] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review, 07Security: Move cloud-wide root keys to the main puppet repo - https://phabricator.wikimedia.org/T317362#11184952 (10fgiunchedi)
[13:26:22] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review, 07Security: Move cloud-wide root keys to the main puppet repo - https://phabricator.wikimedia.org/T317362#11184959 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done -- root-authorized-keys for cloud vps now lives in puppet.git
[13:27:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] inventory: Remove Bookworm based bastions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1187762 (https://phabricator.wikimedia.org/T392510) (owner: 10Majavah)
[13:27:53] <wikibugs>	 (03CR) 10Majavah: [C:03+2] inventory: Remove Bookworm based bastions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1187762 (https://phabricator.wikimedia.org/T392510) (owner: 10Majavah)
[13:28:14] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-12
[13:29:03] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[13:29:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-12
[13:31:43] <wikibugs>	 (03Merged) 10jenkins-bot: inventory: Remove Bookworm based bastions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1187762 (https://phabricator.wikimedia.org/T392510) (owner: 10Majavah)
[13:31:52] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.vps.instance.force_reboot vm toolsbeta-test-k8s-worker-nfs-5 (cluster eqiad1, project toolsbeta)
[13:31:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:31:57] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm toolsbeta-test-k8s-worker-nfs-5 (cluster eqiad1, project toolsbeta)
[13:31:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:32:10] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 bastion START - Cookbook wmcs.vps.remove_instance for instance bastion-eqiad1-03
[13:32:25] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 bastion END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance bastion-eqiad1-03
[13:38:35] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node for host toolsbeta-test-k8s-worker-nfs-5
[13:38:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:40:15] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node (exit_code=0) for host toolsbeta-test-k8s-worker-nfs-5
[13:40:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:40:51] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721 (10dcaro) 03NEW
[13:40:55] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185071 (10dcaro) p:05Triage→03High
[13:41:01] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185073 (10dcaro) 05Open→03In progress
[13:41:42] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster (T404721)
[13:41:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:41:46] <stashbot>	 T404721: [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721
[13:42:16] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185076 (10dcaro) Deleted with: ` dcaro@acme$ wmcs-cookbooks wmcs.toolforge.k8s.worker.depool_and_remove_node --hostname-to-remove tools...
[13:44:48] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[13:46:28] <wmcs-alerts>	 FIRING: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-worker-nfs-5 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:48:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[13:50:29] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: wmf-auto-restart can get wedged on nfs4 mounts even when the filesystem is excluded - https://phabricator.wikimedia.org/T404322#11185126 (10fgiunchedi) 05Open→03Invalid Will address as part of {T404584}
[13:51:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-worker-nfs-5 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:53:19] <wm-bot2>	 !log dcaro@acme toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the toolsbeta cluster
[13:53:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:55:11] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185162 (10dcaro) It failed adding the new node with prefilght checks: ` ----- OUTPUT of 'sudo -i kubeadm ...16f541ca6dd18704' -----...
[13:55:18] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api
[13:59:47] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[14:00:16] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 24): Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584#11185182 (10Andrew) That plan looks good to me. I haven't tested the add_server cookbook in a long time so I'm glad it's still working.  This is def...
[14:00:53] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[14:01:49] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Persistence, 06Data-Platform-SRE: Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11185210 (10akosiaris) 05Open→03Stalled Setting to stalled, while we figure out the exact details of this one.
[14:02:49] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[14:08:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:09:48] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-13
[14:09:58] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-test-k8s-worker-nfs-11 (T404721)
[14:10:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[14:10:02] <stashbot>	 T404721: [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721
[14:10:43] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-13
[14:10:58] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 bastion START - Cookbook wmcs.vps.remove_instance for instance bastion-eqiad1-04
[14:11:11] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolsbeta-test-k8s-worker-nfs-11 (T404721)
[14:11:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 bastion END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance bastion-eqiad1-04
[14:11:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[14:11:21] <wikibugs>	 06cloud-services-team, 10Toolforge, 07IPv6, 13Patch-For-Review: Upgrade Toolforge bastions to Trixie and enable IPv6 - https://phabricator.wikimedia.org/T392510#11185270 (10taavi) 05Open→03Resolved
[14:11:33] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster (T404721)
[14:11:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[14:11:37] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6, 13Patch-For-Review: Refresh Cloud VPS bastions to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T392689#11185274 (10taavi) 05Open→03Resolved
[14:12:12] <wikibugs>	 (03merge) 10taavi: volume-admission: bump to 0.0.72-20250915164649-3238fa82 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/966 (https://phabricator.wikimedia.org/T404438) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[14:12:28] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component volume-admission
[14:15:27] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission
[14:15:34] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission
[14:15:51] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api
[14:16:54] <wikibugs>	 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Mount /etc/openstack/clouds.yaml in mount-enabled containers - https://phabricator.wikimedia.org/T404438#11185293 (10taavi) 05Open→03Resolved
[14:18:33] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission
[14:18:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:20:19] <wikibugs>	 (03approved) 10dcaro: jobs-api: bump to 0.0.414-20250915172125-3b82d2c2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/967 (https://phabricator.wikimedia.org/T404176) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[14:20:24] <wikibugs>	 (03update) 10dcaro: jobs-api: bump to 0.0.414-20250915172125-3b82d2c2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/967 (https://phabricator.wikimedia.org/T404176) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[14:21:15] <wikibugs>	 (03merge) 10dcaro: jobs-api: bump to 0.0.414-20250915172125-3b82d2c2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/967 (https://phabricator.wikimedia.org/T404176) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[14:21:20] <wikibugs>	 (03merge) 10dcaro: package: upgrade deps [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/35
[14:21:29] <wikibugs>	 (03merge) 10dcaro: package: upgrade dependencies [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/29
[14:21:33] <wikibugs>	 (03merge) 10dcaro: pacakage: bump dependencies [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/141
[14:21:39] <wikibugs>	 (03approved) 10dcaro: toolforge_deploy_mr: also wait when pipeline is creating [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/272
[14:21:45] <wikibugs>	 (03merge) 10dcaro: toolforge_deploy_mr: also wait when pipeline is creating [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/272
[14:24:41] <wm-bot2>	 !log dcaro@acme toolsbeta Added a new k8s worker-nfs toolsbeta-test-k8s-worker-nfs-11.toolsbeta.eqiad1.wikimedia.cloud to the cluster
[14:24:42] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the toolsbeta cluster
[14:24:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[14:24:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[14:25:11] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: volume-admission: bump to 0.0.73-20250916142135-79fa734c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/970 (https://phabricator.wikimedia.org/T362869)
[14:26:41] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance: cloud: review lldp setup on hypervisors and VMs - https://phabricator.wikimedia.org/T304504#11185320 (10fgiunchedi) p:05High→03Low
[14:27:14] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: registry-admission: bump to 0.0.66-20250916142141-810024bf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/971 (https://phabricator.wikimedia.org/T362869)
[14:27:36] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185324 (10dcaro) 05In progress→03Resolved
[14:29:44] <wikibugs>	 10Toolforge (Toolforge iteration 24): [tools,infra,k8s] scale up the cluster, specifically CPU - https://phabricator.wikimedia.org/T404726 (10dcaro) 03NEW
[14:32:14] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-api: bump to 0.0.199-20250916142147-5e8adc0f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/972 (https://phabricator.wikimedia.org/T362869)
[14:32:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[14:32:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[14:33:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[14:33:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[14:33:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[14:36:48] <icinga-wm>	 PROBLEM - Host cloudcephosd1017 is DOWN: PING CRITICAL - Packet loss = 100%
[14:38:26] <icinga-wm>	 RECOVERY - Host cloudcephosd1017 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[14:38:51] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api
[14:41:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[14:42:49] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api
[14:45:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[14:47:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[15:07:17] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[15:08:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[15:10:33] <wikibugs>	 (03PS1) 10Brouberol: kubernetes: add service secrets for dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188811
[15:10:49] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] kubernetes: add service secrets for dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188811 (owner: 10Brouberol)
[15:10:51] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] kubernetes: add service secrets for dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188811 (owner: 10Brouberol)
[15:13:44] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] Intermittent internal API failures / retry internal requests - https://phabricator.wikimedia.org/T403175#11185536 (10DamianZaremba) Another example in production ` {     "deploy_id": "20250916-145825-hmaalsrpe6",     "creation_time": "20250916-145825",     "...
[15:20:40] <wikibugs>	 (03PS1) 10Brouberol: kubernetes: add service secrets for airflow-dev/dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188816
[15:20:54] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] kubernetes: add service secrets for airflow-dev/dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188816 (owner: 10Brouberol)
[15:21:00] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] kubernetes: add service secrets for airflow-dev/dse-k8s-eqiad [labs/private] - 10https://gerrit.wikimedia.org/r/1188816 (owner: 10Brouberol)
[15:23:58] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the toolsbeta cluster (T404721)
[15:24:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[15:24:03] <stashbot>	 T404721: [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721
[15:24:09] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#11185565 (10taavi) a:05dcaro→03taavi
[15:24:11] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 24), 07Epic, 05Goal: [infra] Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#11185567 (10taavi) a:05dcaro→03taavi
[15:25:28] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#11185576 (10taavi) a:05dcaro→03taavi
[15:27:23] <wikibugs>	 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Missing Perl packages on dev.toolforge.org for anomiebot workflows - https://phabricator.wikimedia.org/T360488#11185597 (10taavi) 05Open→03Resolved Thanks. In that case I'm moving forward with retiring the anchient grid bastion VM.
[15:27:28] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api
[15:28:25] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#11185603 (10taavi) I've shut down the bastion, will delete in a few days unless anything urgen...
[15:29:58] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api
[15:35:44] <wm-bot2>	 !log dcaro@acme toolsbeta Added a new k8s ingress toolsbeta-test-k8s-ingress-12.toolsbeta.eqiad1.wikimedia.cloud to the cluster
[15:35:45] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the toolsbeta cluster
[15:35:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[15:35:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[15:42:08] <wm-bot2>	 !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node for host toolsbeta-test-k8s-ingress-10 (T404721)
[15:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[15:42:15] <stashbot>	 T404721: [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721
[15:42:21] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185692 (10dcaro) 05Resolved→03In progress
[15:43:29] <wm-bot2>	 !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node (exit_code=0) for host toolsbeta-test-k8s-ingress-10 (T404721)
[15:43:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[15:43:54] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api
[15:44:25] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#11185700 (10taavi) 05Stalled→03Open
[15:44:29] <wikibugs>	 (03PS1) 10Majavah: inventory: Remove tools-sgebastion-10 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188831 (https://phabricator.wikimedia.org/T314665)
[15:45:14] <wmcs-alerts>	 FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-10.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[15:46:30] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "🎉" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188831 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah)
[15:46:44] <wikibugs>	 (03CR) 10Majavah: [C:03+2] inventory: Remove tools-sgebastion-10 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188831 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah)
[15:48:06] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api
[15:48:24] <wikibugs>	 06cloud-services-team, 10Toolforge: Update Toolforge client packages to build on Trixie only - https://phabricator.wikimedia.org/T404733 (10taavi) 03NEW
[15:49:16] <wikibugs>	 (03PS1) 10Majavah: aptly: Stop updating pre-Trixie repositories [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188832 (https://phabricator.wikimedia.org/T404733)
[15:49:56] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api
[15:50:13] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185766 (10dcaro) 05In progress→03Resolved
[15:50:35] <wikibugs>	 10Toolforge (Toolforge iteration 24): [infra,k8s,toolsbeta] k8s worker node toolsbeta-test-k8s-worker-nfs-5 is failing to tail pods - https://phabricator.wikimedia.org/T404721#11185768 (10dcaro) ended up also scrubbing toolsbeta-test-k8s-ingress-10
[15:50:46] <wikibugs>	 (03Merged) 10jenkins-bot: inventory: Remove tools-sgebastion-10 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188831 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah)
[15:54:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] aptly: Stop updating pre-Trixie repositories [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188832 (https://phabricator.wikimedia.org/T404733) (owner: 10Majavah)
[15:54:34] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#11185793 (10taavi)
[15:55:10] <wikibugs>	 (03PS2) 10Majavah: aptly: Stop updating pre-Trixie repositories [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188832 (https://phabricator.wikimedia.org/T404733)
[15:55:50] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api
[15:59:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] aptly: Stop updating pre-Trixie repositories [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1188832 (https://phabricator.wikimedia.org/T404733) (owner: 10Majavah)
[15:59:26] <wikibugs>	 (03approved) 10dcaro: builds-api: bump to 0.0.199-20250916142147-5e8adc0f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/972 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:59:29] <wikibugs>	 (03merge) 10dcaro: builds-api: bump to 0.0.199-20250916142147-5e8adc0f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/972 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:59:36] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#11185806 (10taavi)
[15:59:39] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component registry-admission
[16:00:14] <wmcs-alerts>	 RESOLVED: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-10.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[16:03:21] <wikibugs>	 (03open) 10taavi: Retire login-buster address [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/80 (https://phabricator.wikimedia.org/T314665)
[16:03:26] <wikibugs>	 (03update) 10taavi: Retire login-buster address [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/80 (https://phabricator.wikimedia.org/T314665)
[16:04:06] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: wikilink: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402055#11185844 (10Samwalton9-WMF)
[16:05:14] <wmcs-alerts>	 FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-10.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[16:08:59] <wikibugs>	 (03update) 10taavi: Retire login-buster address [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/80 (https://phabricator.wikimedia.org/T314665)
[16:09:20] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission
[16:10:37] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component registry-admission
[16:12:47] <wikibugs>	 (03open) 10don-vip: Update to OpenJDK 25 [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/5
[16:13:01] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Doing 😎): 'Fulltext' searches fail on the test Phabricator instance (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11185883 (10Dzahn) reverts do n...
[16:14:10] <wikibugs>	 06cloud-services-team, 10Toolforge: [jobs-api] use `launcher` also for health-check script commands - https://phabricator.wikimedia.org/T403735#11185905 (10DamianZaremba) I started looking at this and the current logic isn't super clear.  There are essentially 3 parts;  1. `_get_k8s_podtemplate` - This actuall...
[16:15:14] <wmcs-alerts>	 RESOLVED: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-10.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[16:17:39] <wikibugs>	 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] reuse_from components are not explicitly re-created in jobs-api - https://phabricator.wikimedia.org/T403285#11185912 (10DamianZaremba) Any chance of getting https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_reque...
[16:20:36] <wikibugs>	 (03update) 10dcaro: Ensure reuse_from components are re-run [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/125 (https://phabricator.wikimedia.org/T403285) (owner: 10damian)
[16:21:03] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission
[16:22:00] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component registry-admission
[16:32:13] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission
[16:51:31] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[16:51:37] <wikibugs>	 (03approved) 10dcaro: registry-admission: bump to 0.0.66-20250916142141-810024bf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/971 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[16:51:41] <wikibugs>	 (03update) 10dcaro: registry-admission: bump to 0.0.66-20250916142141-810024bf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/971 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[16:52:06] <wikibugs>	 (03merge) 10dcaro: registry-admission: bump to 0.0.66-20250916142141-810024bf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/971 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[16:53:07] <wikibugs>	 (03update) 10dcaro: Ensure reuse_from components are re-run [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/125 (https://phabricator.wikimedia.org/T403285) (owner: 10damian)
[17:02:45] <Guest204>	 !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission
[17:02:47] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:03:57] <Guest204>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission
[17:03:57] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:06:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[17:06:10] <Guest204>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission
[17:06:20] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:06:43] <wikibugs>	 (03approved) 10dcaro: volume-admission: bump to 0.0.73-20250916142135-79fa734c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/970 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:06:47] <wikibugs>	 (03update) 10dcaro: volume-admission: bump to 0.0.73-20250916142135-79fa734c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/970 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:07:12] <Guest204>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-api
[17:07:51] <wikibugs>	 (03merge) 10dcaro: volume-admission: bump to 0.0.73-20250916142135-79fa734c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/970 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:08:08] <Guest204>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api
[17:12:25] <wikibugs>	 (03update) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T401648 https://phabricator.wikimedia.org/T402572)
[17:14:58] <Guest204>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-api
[17:16:09] <Guest204>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api
[17:16:11] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:18:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-34 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[17:18:36] <wikibugs>	 (03approved) 10dcaro: envvars-api: bump to 0.0.75-20250916125754-a88de155 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/969 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:18:40] <wikibugs>	 (03update) 10dcaro: envvars-api: bump to 0.0.75-20250916125754-a88de155 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/969 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:18:51] <Guest204>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[17:18:52] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:19:01] <wikibugs>	 (03merge) 10dcaro: envvars-api: bump to 0.0.75-20250916125754-a88de155 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/969 (https://phabricator.wikimedia.org/T362869) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:20:30] <wikibugs>	 (03open) 10dcaro: cli: ignore replicas if not sent back from API [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/129
[17:23:44] <Guest204>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[17:23:46] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:25:28] <wikibugs>	 (03update) 10dcaro: cli: ignore replicas if not sent back from API [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/129
[17:25:58] <wikibugs>	 (03update) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T401648 https://phabricator.wikimedia.org/T402572)
[17:27:11] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:27:33] <Guest204>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api
[17:27:33] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:29:47] <wikibugs>	 (03approved) 10dcaro: Retire login-buster address [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/80 (https://phabricator.wikimedia.org/T314665) (owner: 10taavi)
[17:30:15] <wikibugs>	 (03update) 10dcaro: [tool home dir] revert change in dir permission [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/271 (https://phabricator.wikimedia.org/T403513) (owner: 10raymond-ndibe)
[17:30:52] <wikibugs>	 (03update) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/60 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:32:36] <Guest204>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[17:32:37] <stashbot>	 Guest204: Unknown project "dcaro@cloudcumin1001"
[17:37:45] <wikibugs>	 (03approved) 10dcaro: components-api: bump to 0.0.156-20250916125822-74722783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/968 (https://phabricator.wikimedia.org/T401388) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:37:49] <wikibugs>	 (03update) 10dcaro: components-api: bump to 0.0.156-20250916125822-74722783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/968 (https://phabricator.wikimedia.org/T401388) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:38:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-10 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[17:38:42] <wikibugs>	 (03merge) 10dcaro: components-api: bump to 0.0.156-20250916125822-74722783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/968 (https://phabricator.wikimedia.org/T401388) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[17:49:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1052 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1052 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[17:49:23] <wikibugs>	 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T404745 (10phaultfinder) 03NEW
[17:56:08] <wikibugs>	 10VPS-project-Codesearch: T371191 - https://phabricator.wikimedia.org/T404746 (10ALFAN_SOFARI) 03NEW
[17:56:56] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Ceph: Review RAM allocation for cloudceph OSDs - https://phabricator.wikimedia.org/T404747 (10Andrew) 03NEW
[19:08:22] <Guest204>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[19:08:24] <stashbot>	 Guest204: Unknown project "andrew@cloudcumin1001"
[19:09:45] <Guest204>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[19:09:45] <stashbot>	 Guest204: Unknown project "andrew@cloudcumin1001"
[19:32:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[19:48:56] <wmcs-alerts>	 FIRING: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown
[19:49:28] <wmcs-alerts>	 FIRING: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[19:53:56] <wmcs-alerts>	 RESOLVED: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown
[19:54:28] <wmcs-alerts>	 RESOLVED: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[20:20:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Ceph: Review RAM allocation for cloudceph OSDs - https://phabricator.wikimedia.org/T404747#11187076 (10Andrew) Before:  cloudcephosd2004-dev: total use 16GB, 8 OSDs total, 64GB RAM total  After   ` ceph config set osd osd_memory_target_autotune true `  cloudcephosd2004-d...
[20:23:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-10 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[20:31:15] <wikibugs>	 10VPS-project-Codesearch, 10m3api: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187130 (10LucasWerkmeister) https://gitlab.wikimedia.org/groups/repos/m3api/-/children.json works (extra `repos/`), I think that would be okay! (I just scheduled the `tmp-*` repositories for...
[21:14:33] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Doing 😎): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187376...
[21:17:39] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187381 (...
[22:03:19] <wikibugs>	 (03close) 10raymond-ndibe: [tool home dir] revert change in dir permission [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/271 (https://phabricator.wikimedia.org/T403513)
[22:03:55] <wmcs-alerts>	 FIRING: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown
[22:04:28] <wmcs-alerts>	 FIRING: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[22:04:48] <wikibugs>	 (03update) 10raymond-ndibe: [build] run pipeline cleanup per repo [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/142 (https://phabricator.wikimedia.org/T404157)
[22:08:21] <wmcs-alerts>	 FIRING: MaintainKubeusersHang: maintain-kubeusers last finished run is 29.3M minutes old - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersHang
[22:08:56] <wmcs-alerts>	 RESOLVED: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown
[22:09:28] <wmcs-alerts>	 RESOLVED: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[22:20:21] <wikibugs>	 10VPS-project-Codesearch, 10m3api: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187608 (10Ladsgroup) If you can make the patch to write_config.py I'd appreciate it. Otherwise, I try to do it when I find some free time.
[22:22:43] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[22:24:35] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213
[22:31:43] <wikibugs>	 10VPS-project-Codesearch, 10m3api: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187666 (10LucasWerkmeister) Hm, I guess we need to pick a group first, I didn’t think about that yet 😅  I guess it could fall under CI & Development? Or a new group, like Pywikibot. But I’ll...
[22:33:20] <wikibugs>	 (03PS1) 10Lucas Werkmeister: devtools: add repos/m3api group [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1188896 (https://phabricator.wikimedia.org/T404517)
[22:36:05] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[22:36:40] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[22:37:03] <wikibugs>	 10VPS-project-Codesearch, 10m3api, 13Patch-For-Review: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187673 (10LucasWerkmeister) I also moved all the to-be-deleted `tmp-*` repositories to the `lucaswerkmeister/` namespace, to get them out of the `children.json` list imm...
[22:37:11] <wikibugs>	 (03CR) 10Lucas Werkmeister: "Disclaimer: I haven’t tested this whatsoever." [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1188896 (https://phabricator.wikimedia.org/T404517) (owner: 10Lucas Werkmeister)
[22:38:03] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213
[22:38:36] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] "I test it in production 😊" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1188896 (https://phabricator.wikimedia.org/T404517) (owner: 10Lucas Werkmeister)
[22:39:46] <wikibugs>	 (03Merged) 10jenkins-bot: devtools: add repos/m3api group [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1188896 (https://phabricator.wikimedia.org/T404517) (owner: 10Lucas Werkmeister)
[22:44:50] <wikibugs>	 (03CR) 10Lucas Werkmeister: "https://bash.toolforge.org/quip/-RazVJkBffdvpiTrlWJk :P" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1188896 (https://phabricator.wikimedia.org/T404517) (owner: 10Lucas Werkmeister)
[22:54:43] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[22:55:47] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213
[23:05:31] <wikibugs>	 (03update) 10don-vip: Update to OpenJDK 25 [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/5
[23:09:36] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187713 (...
[23:12:28] <wikibugs>	 10VPS-project-Codesearch, 10m3api, 13Patch-For-Review: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187719 (10Ladsgroup) 05Open→03Resolved a:03LucasWerkmeister https://codesearch.wmcloud.org/search/?q=m3api&files=&excludeFiles=&repos=
[23:13:23] <wikibugs>	 10VPS-project-Codesearch, 10m3api, 13Patch-For-Review: Index m3api repositories in Codesearch - https://phabricator.wikimedia.org/T404517#11187723 (10LucasWerkmeister) \o/ thanks!
[23:14:08] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187729 (...
[23:18:03] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-10 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:18:28] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187733 (...
[23:26:58] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187746 (...
[23:31:12] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187755 (...
[23:32:27] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187773 (...
[23:34:45] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187774 (...
[23:38:32] <wikibugs>	 10VPS-project-Phabricator, 06collaboration-services, 10Release-Engineering-Team (Radar): 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) - https://phabricator.wikimedia.org/T403948#11187791 (...
[23:42:07] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[23:43:03] <wmcs-alerts>	 FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-10 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:43:08] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213
[23:46:44] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[23:48:11] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213
[23:54:44] <wikibugs>	 (03update) 10raymond-ndibe: [helm image publish]: publish to reggie repo if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595)
[23:55:18] <wikibugs>	 (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/213