[00:09:32] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:31:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:46:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:46:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:51:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:52:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:57:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:58:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:18:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:08:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:33:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:06:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:09:56] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:14:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:09:32] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:46:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:47:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:52:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:53:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:58:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:59:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:49:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:13:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:18:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:56:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:56:24] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#11088497 (10A_smart_kitten) >>! In T268199#10404078, @Dzahn wrote: > We have been evaluating software for a refreshed codesearch and https://www.sourcebot.dev/ seems like a viable c... [07:56:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:56:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:01:35] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:02:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:09:32] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:22:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:25:20] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6, 13Patch-For-Review: Refresh Cloud VPS NTP servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401848#11088571 (10taavi) a:03taavi [08:25:39] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh Cloud VPS Puppet ENC servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401986 (10taavi) 03NEW [08:25:50] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Enable IPv6 on Cloud VPS infrastructure services - https://phabricator.wikimedia.org/T392688#11088584 (10taavi) [08:25:51] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh Cloud VPS Puppet ENC servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401986#11088583 (10taavi) [08:26:04] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh Cloud VPS Puppet ENC servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401986#11088585 (10taavi) p:05Triage→03Medium [08:28:16] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.refresh_puppet_certs on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:30:49] !log taavi@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:36:01] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.refresh_puppet_certs on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:38:21] !log taavi@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:39:07] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.refresh_puppet_certs on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:43:47] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on enc-3.cloudinfra.eqiad1.wikimedia.cloud [08:47:47] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.refresh_puppet_certs on enc-4.cloudinfra.eqiad1.wikimedia.cloud [08:49:34] (03CR) 10Eugene233: "recheck" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1178628 (owner: 10Jacob4code) [08:51:14] !log taavi@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on enc-4.cloudinfra.eqiad1.wikimedia.cloud [08:51:24] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.refresh_puppet_certs on enc-4.cloudinfra.eqiad1.wikimedia.cloud [08:53:05] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on enc-4.cloudinfra.eqiad1.wikimedia.cloud [09:02:04] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6, 13Patch-For-Review: Refresh Cloud VPS Puppet ENC servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T401986#11088651 (10taavi) The ENC app is failing to start: ` Traceback (most recent call last): File "/u... [09:05:14] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [09:18:18] 06cloud-services-team, 10Toolforge: !log automated deployments so that a tool’s SAL records system changes - https://phabricator.wikimedia.org/T401963#11088752 (10dcaro) →14Duplicate dup:03T393169 [09:18:20] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] optionally log deployments to SAL automatically - https://phabricator.wikimedia.org/T393169#11088754 (10dcaro) [09:23:43] !log taavi@cloudcumin1001 wikimania-mautic START - Cookbook wmcs.vps.delete_project for project wikimania-mautic in eqiad1 [09:23:57] !log taavi@cloudcumin1001 wikimania-mautic END (FAIL) - Cookbook wmcs.vps.delete_project (exit_code=99) for project wikimania-mautic in eqiad1 [09:24:14] 06cloud-services-team, 10Toolforge: [components-api] Add a "description" field to the deployment - https://phabricator.wikimedia.org/T401993 (10dcaro) 03NEW [09:25:49] 10Cloud-VPS (Project-requests): Request deletion of wikimania-mautic VPS project - https://phabricator.wikimedia.org/T401958#11088846 (10taavi) a:03taavi [09:26:13] (03open) 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49: projects: delete project wikimania-mautic [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/262 [09:26:23] !log taavi@cloudcumin1001 wikimania-mautic START - Cookbook wmcs.vps.delete_project for project wikimania-mautic in eqiad1 [09:26:43] (03close) 10taavi: projects: delete project wikimania-mautic [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/262 (owner: 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49) [09:26:47] (03PS2) 10Majavah: vps: Add cookbook to delete a project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1139027 (https://phabricator.wikimedia.org/T391836) [09:26:59] (03open) 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49: projects: delete project wikimania-mautic [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/263 [09:27:34] (03merge) 10taavi: projects: delete project wikimania-mautic [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/263 (owner: 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49) [09:28:21] !log taavi@cloudcumin1001 wikimania-mautic END (PASS) - Cookbook wmcs.vps.delete_project (exit_code=0) for project wikimania-mautic in eqiad1 [09:29:00] 10Cloud-VPS (Project-requests): Request deletion of wikimania-mautic VPS project - https://phabricator.wikimedia.org/T401958#11088866 (10taavi) 05Open→03Resolved [09:52:01] 10Toolforge (Toolforge iteration 23): [components-api] support port protocol in config - https://phabricator.wikimedia.org/T401994 (10dcaro) 03NEW [10:29:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:55:18] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [11:13:49] FIRING: PuppetFailure: Puppet has failed on cloudcontrol1011:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:14:01] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol1011:9100 - https://phabricator.wikimedia.org/T402000 (10phaultfinder) 03NEW [11:18:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:18:55] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T402003 (10phaultfinder) 03NEW [11:40:11] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [12:09:32] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:18:37] RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:25:42] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005 (10taavi) 03NEW [12:25:59] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade IDP server in cloudinfra project off of Bullseye - https://phabricator.wikimedia.org/T402006 (10taavi) 03NEW [12:28:14] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh cloudinfra central syslog audit servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T402007 (10taavi) 03NEW [12:28:52] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11089501 (10taavi) @fnegri Is this something of interest to you? [12:31:04] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh cloudinfra central syslog audit servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T402007#11089505 (10taavi) [12:31:06] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Enable IPv6 on Cloud VPS infrastructure services - https://phabricator.wikimedia.org/T392688#11089506 (10taavi) [12:38:45] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Enable IPv6 on Cloud VPS mail servers - https://phabricator.wikimedia.org/T402008 (10taavi) 03NEW [12:54:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:01:21] (03open) 10taavi: logging: Disable wide metrics policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/927 (https://phabricator.wikimedia.org/T401190) [13:01:22] (03update) 10taavi: logging: Disable wide metrics policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/927 (https://phabricator.wikimedia.org/T401190) [13:03:03] (03merge) 10taavi: logging: Disable wide metrics policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/927 (https://phabricator.wikimedia.org/T401190) [13:04:11] 06cloud-services-team, 10Toolforge, 07SecTeam-Processed, 07Security, 05Vuln-Misconfiguration: Toolforge loki allows unauthenticated access to logs across namespaces - https://phabricator.wikimedia.org/T401190#11089589 (10taavi) 05Open→03Resolved This is resolved with the live hack being committed... [13:04:17] 06cloud-services-team, 10Toolforge, 07SecTeam-Processed, 07Security, 05Vuln-Misconfiguration: Toolforge loki allows unauthenticated access to logs across namespaces - https://phabricator.wikimedia.org/T401190#11089591 (10taavi) p:05Triage→03High [13:05:55] 06cloud-services-team, 10Toolforge: `toolforge jobs logs` is limited to 5000 lines - https://phabricator.wikimedia.org/T401553#11089599 (10taavi) p:05Triage→03Low This restriction comes from the default query limit from Loki upstream. While we could bump that limit, I'm hoping that {T400917} and possible c... [13:09:35] 06cloud-services-team, 10Toolforge: toolforge jobs logs api returns 404 on no log entries - https://phabricator.wikimedia.org/T401420#11089608 (10taavi) When there is an endpoint to query logs, to me a 404 seems like a reasonable status code to return when no logs were found. But I do see your point about how... [13:10:04] 06cloud-services-team, 10Toolforge: [TjfCliError] `toolforge jobs logs` breaks on long log lines - https://phabricator.wikimedia.org/T401422#11089611 (10taavi) p:05Triage→03Medium [13:10:27] 06cloud-services-team, 10Toolforge: `toolforge jobs logs` has inconsistent ordering - https://phabricator.wikimedia.org/T401552#11089613 (10taavi) p:05Triage→03High [13:11:12] 06cloud-services-team, 10Toolforge: Remove or replace "default" web service image - https://phabricator.wikimedia.org/T401715#11089627 (10taavi) p:05Triage→03Medium [13:12:15] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Enable IPv6 on Cloud VPS mail servers - https://phabricator.wikimedia.org/T402008#11089628 (10taavi) p:05Triage→03Low [13:12:53] 06cloud-services-team, 10Tool-gawa: Request to be Added as Co-Owner of the GAWA Repository - https://phabricator.wikimedia.org/T401569#11089630 (10taavi) 05Open→03Resolved a:03Andrew [13:13:04] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade IDP server in cloudinfra project off of Bullseye - https://phabricator.wikimedia.org/T402006#11089632 (10taavi) p:05Triage→03Medium [13:13:09] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11089633 (10taavi) p:05Triage→03Medium [13:13:17] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation), 07IPv6: Refresh cloudinfra central syslog audit servers to run on Trixie and enable IPv6 - https://phabricator.wikimedia.org/T402007#11089634 (10taavi) p:05Triage→03Medium [13:13:47] 06cloud-services-team, 10Cloud-VPS: trixie puppet 8->7 downgrade code does not work - https://phabricator.wikimedia.org/T401913#11089635 (10taavi) 05Open→03Resolved a:03taavi [13:13:58] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11089637 (10taavi) p:05Triage→03High a:03fgiunchedi [13:15:06] 06cloud-services-team, 10Toolforge: Set up new Prometheus instance for user-created data - https://phabricator.wikimedia.org/T366923#11089642 (10taavi) p:05Triage→03Low [13:15:53] 06cloud-services-team, 10Toolforge: [jobs] Allow configuration of Promethus scraping of a specific endpoint for publication in grafana.wmcloud.org - https://phabricator.wikimedia.org/T362012#11089643 (10taavi) p:05Triage→03Low [13:17:12] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate WMCS-managed NFS servers off of Bullseye - https://phabricator.wikimedia.org/T401812#11089644 (10taavi) p:05Triage→03Medium [13:17:16] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Migrate cloudinfra project off of Debian Bullseye - https://phabricator.wikimedia.org/T401811#11089645 (10taavi) p:05Triage→03Medium [13:17:25] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace all codfw1dev Bullseye VMs - https://phabricator.wikimedia.org/T401810#11089646 (10taavi) p:05Triage→03Medium [13:18:43] 06cloud-services-team, 10Toolforge: [loki] persist build logs for each tool on their loki namespace - https://phabricator.wikimedia.org/T401830#11089648 (10taavi) p:05Triage→03Medium That's probably fine, although it means we need to append the `tool-` prefix at the ingestion level. [13:19:21] 06cloud-services-team, 10Toolforge: Upgrade or retire tools-package-builder-04 - https://phabricator.wikimedia.org/T401819#11089652 (10taavi) p:05Triage→03Medium [13:19:28] 06cloud-services-team, 10Toolforge: Upgrade Toolforge (Elastic|Open)Search cluster off of Bullseye - https://phabricator.wikimedia.org/T401818#11089653 (10taavi) p:05Triage→03Medium [13:19:32] 06cloud-services-team, 10Toolforge: Update Toolforge Cumin nodes off of Bullseye - https://phabricator.wikimedia.org/T401817#11089654 (10taavi) p:05Triage→03Medium [13:25:23] 06cloud-services-team, 10Cloud-VPS: Issue with project "catalyst-dev" - https://phabricator.wikimedia.org/T402013 (10jnuche) 03NEW [13:32:01] 06cloud-services-team, 10Cloud-VPS: Issue with project "catalyst-dev" - https://phabricator.wikimedia.org/T402013#11089700 (10taavi) 05Open→03Invalid The proxy name `patchdemo.wmcloud.org` is already in use in the `catalyst` project, thus as the "Can't edit backend of another project" tries to say you... [13:38:16] 06cloud-services-team, 10Cloud-VPS: Issue with project "catalyst-dev" - https://phabricator.wikimedia.org/T402013#11089703 (10jnuche) >>! In T402013#11089700, @taavi wrote: > The proxy name `patchdemo.wmcloud.org` is already in use in the `catalyst` project, thus as the "Can't edit backend of another proje... [14:53:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:18:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:39:49] (03open) 10xcollazo: Fix broken share link. [toolforge-repos/wikirun-game] - 10https://gitlab.wikimedia.org/toolforge-repos/wikirun-game/-/merge_requests/3 [15:43:23] 10Cloud-VPS (Project-requests): Request creation of eseap VPS project - https://phabricator.wikimedia.org/T401957#11089999 (10taavi) Hey. I have two concerns in particular here that I'd like your input on: * First, the project scope. As the description of #cloud-vps-project-requests says, we prefer that projects... [15:57:18] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol1011:9100 - https://phabricator.wikimedia.org/T402000#11090059 (10taavi) →14Duplicate dup:03T402003 [15:57:20] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T402003#11090061 (10taavi) [15:58:42] 06cloud-services-team, 10Cloud-VPS: PuppetFailure - https://phabricator.wikimedia.org/T402003#11090064 (10taavi) a:03taavi Let's see if this helps: `lang=shell-session taavi@cloudcontrol1007 /srv/tofu-infra $ sudo git branch * main remotes/origin/HEAD taavi@cloudcontrol1007 /srv/tofu-infra $ sudo git branc... [15:58:51] 06cloud-services-team, 10Toolforge: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032 (10bd808) 03NEW [16:00:05] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#11090083 (10Dzahn) So far the plan is to introduce a new codesearch to prod while the existing codesearch stays unchanged. So it would just keep working as it has before. New softw... [16:02:44] 10Tool-translatetagger: Adding tvars for links - https://phabricator.wikimedia.org/T393258#11090102 (10Super_nabla) a:03Super_nabla [16:05:18] 10Tool-translatetagger: Adding tvars for links - https://phabricator.wikimedia.org/T393258#11090107 (10Super_nabla) 05Open→03Resolved The fork has been reviewed and merged on GitHub. See https://github.com/indictechcom/translatable-wikitext-converter [16:06:50] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://phabricator.wikimedia.org/T401959#11090111 (10taavi) 05Open→03Resolved ` Aug 14 17:46:12 clouddumps1001 bash[3200315]: rsync: [Receiver] failed to... [16:28:48] RESOLVED: [2x] PuppetFailure: Puppet has failed on cloudcontrol1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [16:35:48] (03open) 10taavi: Fix tab completion [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/123 [16:35:52] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/123 [16:46:23] 10Tools: Help needed to deploy a react tool in Toolforge - https://phabricator.wikimedia.org/T374304#11090249 (10Aklapper) 05Open→03Declined No reply; closing. [16:47:59] 06cloud-services-team, 10Toolforge: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11090257 (10dcaro) Yep, that is the external endpoint, for which certificate -based Auth is not allowed, the other is internal, for which it... [16:48:53] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/123 [16:49:18] (03open) 10taavi: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 [16:49:31] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 [16:49:36] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 [16:58:05] (03open) 10taavi: Fix tab completion [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/89 [16:58:11] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/89 [17:05:11] 06cloud-services-team, 10Toolforge: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11090367 (10bd808) I don't see anything on https://wikitech.wikimedia.org/wiki/Help:Toolforge/API about OAuth authentication. The OpenAPI sp... [17:18:51] (03update) 10taavi: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 [18:33:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:48:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:20:57] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#11090676 (10A_smart_kitten) Ah, thank you for the clarification & confirmation! <3 [19:21:01] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 [19:21:57] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 [19:22:40] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 [19:23:49] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 [19:26:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [19:27:02] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [19:32:57] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693) [19:33:05] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [19:34:10] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:44:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:51:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [19:51:28] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [19:53:03] 10Tool-paulina: Add credits to README.md - https://phabricator.wikimedia.org/T402052 (10Pepe_piton) 03NEW [19:56:05] (03PS1) 10Dzahn: add fake profile::zuul::main::nodepool::user_token [labs/private] - 10https://gerrit.wikimedia.org/r/1179219 (https://phabricator.wikimedia.org/T400850) [19:56:22] (03CR) 10Dzahn: [V:03+2 C:03+2] add fake profile::zuul::main::nodepool::user_token [labs/private] - 10https://gerrit.wikimedia.org/r/1179219 (https://phabricator.wikimedia.org/T400850) (owner: 10Dzahn) [19:57:58] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693) [19:58:05] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [19:58:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:59:22] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library, 07Epic: Replace deprecated Bullseye VMs in Cloud VPS - https://phabricator.wikimedia.org/T402053 (10jsn.sherman) 03NEW [20:01:19] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: twl: Replace deprecated Bullseye VMs in Cloud VPS - https://phabricator.wikimedia.org/T402054 (10jsn.sherman) 03NEW [20:03:10] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: wikilink: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402055 (10jsn.sherman) 03NEW [20:05:15] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library, 07Epic: hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056 (10jsn.sherman) 03NEW [20:08:03] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesse [20:08:45] 10Tool-wikicordo: Grouped filtering by DR category (e.g., copyright vs. scope) - https://phabricator.wikimedia.org/T402057 (10Josve05a) 03NEW [20:18:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [20:18:11] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [20:24:28] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693) [20:24:35] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [20:28:19] 10Tool-wikicordo: Grouped filtering by DR category (e.g., copyright vs. scope) - https://phabricator.wikimedia.org/T402057#11090858 (10Josve05a) 05Open→03Invalid Seems like I could just choose the COM:COPYRIGHT tag, lol [20:30:21] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: twl: Replace deprecated Bullseye VMs in Cloud VPS - https://phabricator.wikimedia.org/T402054#11090860 (10jsn.sherman) [20:33:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:40:28] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library, 07Epic: hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056#11090874 (10jsn.sherman) [20:47:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-111 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:49:23] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: wikilink: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402055#11090884 (10jsn.sherman) [20:52:13] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: twl: Replace deprecated Bullseye VMs in Cloud VPS - https://phabricator.wikimedia.org/T402054#11090889 (10jsn.sherman) [20:54:12] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library, 07Epic: hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056#11090892 (10jsn.sherman) [21:03:20] 10Cloud-VPS (Debian Bullseye Deprecation), 06Moderator-Tools-Team, 06The-Wikipedia-Library: twl: Replace deprecated Bullseye VMs in Cloud VPS - https://phabricator.wikimedia.org/T402054#11090899 (10jsn.sherman) [21:35:34] 06cloud-services-team, 10Toolforge, 07Upstream: Python buildpack does not detect requirements from pyproject.toml - https://phabricator.wikimedia.org/T353762#11090988 (10bd808) >>! In T353762#9600150, @dcaro wrote: > Waiting for upstream https://github.com/heroku/buildpacks-python/issues/7 >>! In T353762#10... [21:52:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-111 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:59:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:21:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:36:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown