[00:13:59] (03open) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/7 (https://phabricator.wikimedia.org/T400616) [00:16:12] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/7 (https://phabricator.wikimedia.org/T400616) [00:21:29] (03close) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/7 (https://phabricator.wikimedia.org/T400616) [00:23:28] RESOLVED: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.56%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [00:28:19] (03open) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) [00:32:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [00:35:03] FIRING: PuppetFailure: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [00:37:12] (03open) 10raymond-ndibe: [cicd] remove py3.9-bullseye-tox-pypi-debian [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/68 (https://phabricator.wikimedia.org/T400616) [00:38:14] RECOVERY - Disk space on cloudbackup1002-dev is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1002-dev&var-datasource=eqiad+prometheus/ops [00:38:35] RESOLVED: DiskSpace: Disk space cloudbackup1002-dev:9100:/srv/cinder-backups 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [00:39:37] (03update) 10raymond-ndibe: [cicd] remove py3.9-bullseye-tox-pypi-debian [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/68 (https://phabricator.wikimedia.org/T400616) [00:40:28] FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.06%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [00:44:48] RESOLVED: PuppetFailure: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:02:52] (03update) 10raymond-ndibe: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) [01:04:11] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:22:55] FIRING: [2x] ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 close to running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [01:27:55] FIRING: [2x] ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 close to running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [01:28:34] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [01:28:42] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [01:53:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [02:02:36] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [02:11:41] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [02:25:28] RESOLVED: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.39%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [02:26:08] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [02:42:22] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [02:43:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [02:44:32] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [03:02:20] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [03:22:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [03:23:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:43:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:45:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:55:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:08:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [04:23:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:28:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [04:58:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [05:01:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [05:04:11] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:06:11] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [05:08:34] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [05:09:40] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [05:13:34] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [05:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [05:36:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [05:48:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [06:23:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [06:31:28] FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.12%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [06:53:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [07:59:49] (03update) 10taavi: cloudinfra: Cleanup Puppetserver security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/253 [08:04:26] (03update) 10taavi: cloudinfra: Cleanup Puppetserver security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/253 [08:04:31] (03merge) 10taavi: cloudinfra: Cleanup Puppetserver security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/253 [08:04:59] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [08:08:48] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [08:14:48] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [08:14:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:15:04] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [08:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:15:30] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [08:15:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:15:48] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [08:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:16:17] (03approved) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/52 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [08:20:51] (03approved) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [08:26:24] (03update) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [08:26:37] (03approved) 10dcaro: [cicd] remove py3.9-bullseye-tox-pypi-debian [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/68 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [08:49:17] (03PS1) 10David Caro: server_create: use the right name for the dualstack network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173890 [08:49:21] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [08:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:49:34] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [08:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:49:54] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [08:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:57:04] (03approved) 10dcaro: [cicd] replace bullseye with bookworm [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/113 (https://phabricator.wikimedia.org/T400616) (owner: 10raymond-ndibe) [08:59:17] !log dcaro@acme tools Added a new k8s worker tools-k8s-worker-112.tools.eqiad1.wikimedia.cloud to the cluster [08:59:18] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [08:59:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:01:49] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [09:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:02:10] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [09:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:02:12] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [09:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:04:11] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:06:38] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T394614#11042132 (10taavi) 05Open→03Resolved a:03RhinosF1 [09:07:26] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [09:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:10:30] (03merge) 10taavi: Use logging multi-pod fix moved to toolforge-weld [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/179 (https://phabricator.wikimedia.org/T398647) [09:10:31] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [09:11:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-80 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:13:22] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.386-20250729091039-7e54536b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/884 (https://phabricator.wikimedia.org/T398647) [09:13:26] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.386-20250729091039-7e54536b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/884 (https://phabricator.wikimedia.org/T398647) [09:14:00] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [09:23:32] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [09:25:22] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [09:27:52] (03PS1) 10David Caro: add_k8s_node: select the first empty device [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173897 [09:28:40] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:28:54] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) [09:28:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:29:00] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:29:17] !log dcaro@acme tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [09:29:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:29:37] !log dcaro@acme tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:29:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:32:04] (03CR) 10CI reject: [V:04-1] add_k8s_node: select the first empty device [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173897 (owner: 10David Caro) [09:33:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [09:35:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [09:37:24] (03merge) 10taavi: jobs-api: bump to 0.0.386-20250729091039-7e54536b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/884 (https://phabricator.wikimedia.org/T398647) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:38:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [09:39:05] (03Abandoned) 10David Caro: add_k8s_node: select the first empty device [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173897 (owner: 10David Caro) [09:40:18] 06cloud-services-team, 10Toolforge (Toolforge iteration 22), 13Patch-For-Review: Move Kubernetes log source multi-pod handling from jobs-api to toolforge-weld - https://phabricator.wikimedia.org/T398647#11042213 (10taavi) 05Open→03Resolved [09:41:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-80 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:49:02] (03CR) 10FNegri: [C:03+2] create_project: add option to skip tofu apply [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1172350 (owner: 10FNegri) [09:53:19] (03Merged) 10jenkins-bot: create_project: add option to skip tofu apply [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1172350 (owner: 10FNegri) [09:54:08] !log dcaro@acme tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=97) [09:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:54:25] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [09:54:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:54:33] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [09:54:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:54:36] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [09:54:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:54:52] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Add tracing to understand Toolforge and CloudVPS usage and dependencies - https://phabricator.wikimedia.org/T399313#11042261 (10taavi) [10:00:37] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [10:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:12:34] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Add tracing to understand Toolforge and CloudVPS usage and dependencies - https://phabricator.wikimedia.org/T399313#11042306 (10dcaro) For reference, there was a POC created several years back that was able to extract the network connections (to redis, wikis,... [10:15:11] dhinus closed https://github.com/toolforge/paws/pull/495 [10:46:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-80 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:56:14] 10Cloud-VPS (Project-requests): Request creation of Clipi VPS project - https://phabricator.wikimedia.org/T399237#11042419 (10Aklapper) 05Stalled→03Declined Unfortunately closing this Phabricator task as no further information has been provided. @IhsaanKhan: After you have provided all the information r... [11:01:44] 10VPS-project-Codesearch, 10VPS-project-devtools, 10VPS-project-Extdist, 10VPS-project-icinga2, and 8 others: Seed my codesearch to get work done and clean choosing rights for resolved. - https://phabricator.wikimedia.org/T268199#11042430 (10Yankees199) 05Open→03In progress a:05Dzahn→03Yankees199 [11:04:04] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#11042438 (10Peachey88) 05In progress→03Open a:05Yankees199→03Dzahn [12:16:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-80 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:18:23] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:18:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:22:03] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:22:06] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:22:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:22:13] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:22:16] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:31:51] !log dcaro@acme tools Added a new k8s worker-nfs tools-k8s-worker-nfs-80.tools.eqiad1.wikimedia.cloud to the cluster [12:31:51] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:38:17] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:38:49] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:38:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:38:52] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:39:11] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:39:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:40:12] !log dcaro@acme tools START - Cookbook wmcs.openstack.quota_increase [12:40:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:40:18] !log dcaro@acme tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [12:40:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:40:25] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:43:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [12:46:16] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:48:47] dhinus opened https://github.com/toolforge/paws/pull/496 [12:49:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-81 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:53:27] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:53:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:53:34] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:53:36] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:54:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-81 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:04:11] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:05:03] dhinus closed https://github.com/toolforge/paws/pull/496 [13:05:07] !log dcaro@acme tools Added a new k8s worker-nfs tools-k8s-worker-nfs-81.tools.eqiad1.wikimedia.cloud to the cluster [13:05:07] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [13:05:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:06:03] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [13:06:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:06:13] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [13:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:06:17] !log dcaro@acme tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [13:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:13:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [13:16:33] !log dcaro@acme tools Added a new k8s worker-nfs tools-k8s-worker-nfs-82.tools.eqiad1.wikimedia.cloud to the cluster [13:16:33] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [13:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:23:34] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [14:01:56] (03CR) 10Krinkle: [C:03+2] build: Install MediaWiki codesniffer and make pass [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173390 (owner: 10Jforrester) [14:04:09] (03update) 10raymond-ndibe: api: fix default probe [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/185 (owner: 10dcaro) [14:04:18] (03update) 10raymond-ndibe: api: fix default probe [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/185 (owner: 10dcaro) [14:04:19] (03approved) 10raymond-ndibe: api: fix default probe [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/185 (owner: 10dcaro) [14:05:01] (03PS1) 10Krinkle: Upgrade to krinkle/toollabs-base and upgrade deployment to PHP 8.2 [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173944 [14:08:33] (03Merged) 10jenkins-bot: build: Install MediaWiki codesniffer and make pass [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173390 (owner: 10Jforrester) [14:20:48] (03CR) 10Krinkle: "check experimental" [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173944 (owner: 10Krinkle) [14:24:09] (03PS1) 10Krinkle: Import 'getwikiapi' messages from https://gerrit.wikimedia.org/g/labs/tools/intuition [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173948 [14:24:31] (03CR) 10Krinkle: [C:03+2] Upgrade to krinkle/toollabs-base and upgrade deployment to PHP 8.2 [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173944 (owner: 10Krinkle) [14:31:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:42:46] (03Merged) 10jenkins-bot: Upgrade to krinkle/toollabs-base and upgrade deployment to PHP 8.2 [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173944 (owner: 10Krinkle) [14:43:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [14:51:01] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli [14:58:05] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli [14:58:21] (03update) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [15:00:23] (03PS1) 10Krinkle: Remove 'getwikiapi' messages [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1173957 [15:03:12] (03CR) 10Jforrester: [C:03+1] "Looks to be the same as https://gerrit.wikimedia.org/g/labs/tools/intuition/+/refs/heads/master/language/messages/getwikiapi/" [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173948 (owner: 10Krinkle) [15:03:50] (03CR) 10Jforrester: [C:03+1] Remove 'getwikiapi' messages [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1173957 (owner: 10Krinkle) [15:17:05] (03update) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [15:20:32] (03CR) 10Majavah: [C:03+1] server_create: use the right name for the dualstack network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173890 (owner: 10David Caro) [15:22:21] (03CR) 10David Caro: [C:03+2] server_create: use the right name for the dualstack network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173890 (owner: 10David Caro) [15:26:20] (03Merged) 10jenkins-bot: server_create: use the right name for the dualstack network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1173890 (owner: 10David Caro) [15:27:10] (03update) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [15:27:50] (03update) 10dcaro: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) (owner: 10raymond-ndibe) [15:32:48] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 [15:42:48] 10Tool-extjsonuploader: Split Lua data module used by extjsonuploader - https://phabricator.wikimedia.org/T315923#11043818 (10Bawolff) I think we have reached the point where we just have to do this. I think maybe its best just to make a separate subpage for each extension. [15:43:41] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [15:44:48] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 [15:46:50] (03PS2) 10Krinkle: Import 'getwikiapi' messages from https://gerrit.wikimedia.org/g/labs/tools/intuition [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173948 [15:46:55] (03CR) 10Krinkle: [C:03+2] Import 'getwikiapi' messages from https://gerrit.wikimedia.org/g/labs/tools/intuition [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173948 (owner: 10Krinkle) [15:47:22] (03Merged) 10jenkins-bot: Import 'getwikiapi' messages from https://gerrit.wikimedia.org/g/labs/tools/intuition [labs/tools/wikiinfo] - 10https://gerrit.wikimedia.org/r/1173948 (owner: 10Krinkle) [15:51:36] (03merge) 10dcaro: api: fix default probe [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/185 [15:53:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [15:54:44] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.387-20250729155150-ceb45a54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/885 (https://phabricator.wikimedia.org/T400025) [15:56:14] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 [16:02:11] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 [16:02:24] (03PS1) 10Brian Wolff: Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) [16:02:42] (03CR) 10CI reject: [V:04-1] Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) (owner: 10Brian Wolff) [16:03:36] (03PS2) 10Brian Wolff: Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) [16:04:29] (03update) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [16:05:49] (03approved) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [16:05:56] (03merge) 10dcaro: shell: wrap the shell in a launcher for buildservices [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/61 (https://phabricator.wikimedia.org/T360488) [16:07:15] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:08:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:10:25] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11043919 (10dcaro) a:03dcaro [16:12:34] !log dcaro@acme zuul START - Cookbook wmcs.openstack.quota_increase (T400305) [16:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Zuul/SAL [16:12:38] T400305: Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305 [16:12:43] !log dcaro@acme zuul END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T400305) [16:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Zuul/SAL [16:12:57] !log dcaro@acme zuul START - Cookbook wmcs.openstack.quota_increase (T400305) [16:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Zuul/SAL [16:13:04] !log dcaro@acme zuul END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T400305) [16:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Zuul/SAL [16:13:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:15:03] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11043959 (10dcaro) 05Open→03Resolved p:05Triage→03High I think I got all the quotas :) {F65689252} Enjoy! [16:16:30] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:18:34] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:20:49] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:23:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:23:56] RESOLVED: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:26:41] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11044009 (10taavi) `lang=shell-session taavi@tools-bastion-12:~ $ k logs -n kube-system kube-proxy-tmlz2 --previous I0704 14:50:59.010024 1 server_others.go:72... [16:27:49] (03CR) 10Krinkle: [C:03+2] Remove 'getwikiapi' messages [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1173957 (owner: 10Krinkle) [16:28:18] (03Merged) 10jenkins-bot: Remove 'getwikiapi' messages [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1173957 (owner: 10Krinkle) [16:28:23] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11044011 (10taavi) Generally kube-proxy and calico being restarted at the same time just means that the worker was rebooted at that point. [16:30:33] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:33:34] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [16:34:32] (03approved) 10dcaro: jobs-api: bump to 0.0.387-20250729155150-ceb45a54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/885 (https://phabricator.wikimedia.org/T400025) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:34:34] (03merge) 10dcaro: jobs-api: bump to 0.0.387-20250729155150-ceb45a54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/885 (https://phabricator.wikimedia.org/T400025) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:35:10] (03PS3) 10Brian Wolff: Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) [16:37:37] (03open) 10dcaro: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/60 (https://phabricator.wikimedia.org/T374780) [16:43:43] (03update) 10dcaro: cli: only send fields that are set [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/112 [16:51:43] 06cloud-services-team, 10PAWS: Grand membership in cloud-vps project 'PAWS' to vivian rook for volunteer work - https://phabricator.wikimedia.org/T400733 (10Andrew) 03NEW [16:51:50] (03PS4) 10Brian Wolff: Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) [16:52:03] (03update) 10dcaro: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/60 (https://phabricator.wikimedia.org/T374780) [16:55:34] 06cloud-services-team, 10PAWS: Grand membership in cloud-vps project 'PAWS' to vivian rook for volunteer work - https://phabricator.wikimedia.org/T400733#11044105 (10Corvid4444) Neat! I'm not sure if the rook account is attached to my wmf account or if it can be enabled again. Otherwise this account is the one... [17:00:58] (03update) 10dcaro: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/60 (https://phabricator.wikimedia.org/T374780) [17:07:13] 06cloud-services-team, 10PAWS: Grand membership in cloud-vps project 'PAWS' to vivian rook for volunteer work - https://phabricator.wikimedia.org/T400733#11044144 (10Andrew) Emailed all current PAWS cloud-vps members: ` Former staff member Vivian Rook would like resume contributing to (and patrolling) the PA... [17:17:29] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#11044189 (10thcipriani) 05Open→03Resolved a:03thcipriani Optimistically closing this since I updated senders permiss... [17:17:51] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#11044192 (10thcipriani) a:05thcipriani→03dcaro [17:25:11] 06cloud-services-team, 10Tool-quickcategories, 10Toolforge, 13Patch-For-Review: Relax restrictions on toolforge envvar names - https://phabricator.wikimedia.org/T374780#11044221 (10LucasWerkmeister) Seems like it’s not so easy after all :( see this [GitLab conversation](https://gitlab.wikimedia.org/repos/c... [17:25:53] 06cloud-services-team, 10Tool-quickcategories, 10Toolforge, 13Patch-For-Review: Relax restrictions on toolforge envvar names - https://phabricator.wikimedia.org/T374780#11044223 (10LucasWerkmeister) [17:29:28] 06cloud-services-team, 10PAWS: Grand membership in cloud-vps project 'PAWS' to vivian rook for volunteer work - https://phabricator.wikimedia.org/T400733#11044232 (10bd808) >>! In T400733#11044105, @Corvid4444 wrote: > Neat! I'm not sure if the rook account is attached to my wmf account or if it can be enabled... [17:29:39] 06cloud-services-team, 10PAWS: Grant membership in cloud-vps project 'PAWS' to vivian rook for volunteer work - https://phabricator.wikimedia.org/T400733#11044234 (10bd808) [17:36:14] (03update) 10raymond-ndibe: d/changelog: bump to 16.1.16 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/117 (https://phabricator.wikimedia.org/T400616) [17:36:18] (03approved) 10raymond-ndibe: d/changelog: bump to 16.1.16 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/117 (https://phabricator.wikimedia.org/T400616) [17:36:24] (03merge) 10raymond-ndibe: d/changelog: bump to 16.1.16 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/117 (https://phabricator.wikimedia.org/T400616) [17:41:39] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [17:43:50] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044306 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with... [17:47:42] (03CR) 10Brian Wolff: [C:03+2] Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) (owner: 10Brian Wolff) [17:48:23] (03Merged) 10jenkins-bot: Split into separate pages for each extension. [labs/tools/extjsonuploader] - 10https://gerrit.wikimedia.org/r/1173970 (https://phabricator.wikimedia.org/T315923) (owner: 10Brian Wolff) [17:49:21] 10Tool-extjsonuploader, 13Patch-For-Review: Split Lua data module used by extjsonuploader - https://phabricator.wikimedia.org/T315923#11044316 (10Bawolff) 05Open→03Resolved a:03Bawolff Everything is now Module:ExtensionJson/ExtensionName.json [18:01:23] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044388 (10VRiley-WMF) [18:02:30] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044393 (10VRiley-WMF) Tried to run reimage on clouddb1022, to no avail. Running through clouddb1023 to see if there is a difference. [18:03:31] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044395 (10VRiley-WMF) [18:08:04] 10Tool-wosretbot: [Wosretbot] Current logic for checking and replacing daily headers may delete content - https://phabricator.wikimedia.org/T400742 (10Tkarcher) 03NEW [18:50:49] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [18:59:05] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044655 (10VRiley-WMF) [19:14:25] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [19:15:13] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [19:23:39] 10Tool-paulina: Minor improvements in the copyright status of the work - https://phabricator.wikimedia.org/T400748 (10Pepe_piton) 03NEW [19:25:03] 10Tool-paulina: Minor improvements in the copyright status of the work - https://phabricator.wikimedia.org/T400748#11044769 (10Pepe_piton) [19:29:01] 10Tool-paulina: Minor improvements in the copyright status of the work - https://phabricator.wikimedia.org/T400748#11044810 (10Pepe_piton) a:03Pepe_piton [19:31:03] 10Tool-paulina: Minor improvements in the copyright status of the work - https://phabricator.wikimedia.org/T400748#11044831 (10Pepe_piton) p:05Triage→03Medium [19:31:33] 10Tool-paulina: Results pagination - https://phabricator.wikimedia.org/T399974#11044841 (10Pepe_piton) p:05Triage→03Medium [19:39:55] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [19:40:03] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [19:40:51] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [19:41:06] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [19:45:46] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11044874 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS b... [20:05:08] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#11044975 (10Andrew) hostname should be cloudcephosd1052. Attached patch sets up initial puppet and partman. [20:08:23] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#11044990 (10Andrew) @Jclark-ctr are we waiting on more DACs before we can move ahead with these? [20:13:00] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11045008 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1023.eqiad.wmnet with... [20:17:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11045018 (10VRiley-WMF) I have it the same issue with the same error on clouddb1023 [20:43:58] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [20:57:58] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [21:13:43] 06cloud-services-team, 06DC-Ops, 06SRE, 13Patch-For-Review: cloudcephosd10[48-51] service implementation - https://phabricator.wikimedia.org/T395910#11045171 (10wiki_willy) [21:26:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:36:27] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [21:44:53] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [22:01:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-36 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [22:46:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:27:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses