[00:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:54:44] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node for host toolsbeta-test-k8s-worker-nfs-6 (T359641) [00:54:45] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [00:54:46] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [00:56:57] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.depool_and_remove_node (exit_code=0) for host toolsbeta-test-k8s-worker-nfs-6 (T359641) [00:56:57] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [00:59:24] RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node toolsbeta-test-k8s-worker-nfs-6 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [01:04:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:41:28] !log dcaro@urcuchillay admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt2004-dev.codfw.wmnet' (T374467) [07:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:41:34] T374467: 2024-09-10: hardware error on cloudvirt2004-dev - https://phabricator.wikimedia.org/T374467 [07:49:15] 10cloud-services-team (FY2024/2025-Q1-Q2): 2024-09-10: hardware error on cloudvirt2004-dev - https://phabricator.wikimedia.org/T374467#10136493 (10dcaro) Yep, this morning it woke up with more memory corruption errors, I'm draining it waiting for the memory replacement: ` [Wed Sep 11 03:38:30 2024] {3}[Hardware... [07:49:26] 10cloud-services-team (FY2024/2025-Q1-Q2): 2024-09-10: hardware error on cloudvirt2004-dev - https://phabricator.wikimedia.org/T374467#10136494 (10dcaro) [07:49:29] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt2004-dev.codfw.wmnet' (T374467) [07:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:49:35] T374467: 2024-09-10: hardware error on cloudvirt2004-dev - https://phabricator.wikimedia.org/T374467 [07:52:55] 10Cloud-Services: Lint problems for NeutronAgentDownForLong and NeutronAgentDown - https://phabricator.wikimedia.org/T374513 (10fgiunchedi) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it wit... [07:55:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:00:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:34:29] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: MVP: Privately serve wikitech via mwdebug1001 - https://phabricator.wikimedia.org/T371537#10136529 (10jijiki) [08:45:07] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10136568 (10dcaro) [08:52:30] 10Toolforge: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10136577 (10aborrero) [08:59:31] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster [08:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:09:33] !log dcaro@urcuchillay toolsbeta Added a new k8s worker toolsbeta-test-k8s-worker-12.toolsbeta.eqiad1.wikimedia.cloud to the cluster [09:09:33] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the toolsbeta cluster [09:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:09:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:14:22] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster [09:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:16:58] 10cloud-services-team (FY2024/2025-Q1-Q2): Lint problems for NeutronAgentDownForLong and NeutronAgentDown - https://phabricator.wikimedia.org/T374513#10136624 (10dcaro) [09:23:06] 10cloud-services-team (FY2024/2025-Q1-Q2): Lint problems for NeutronAgentDownForLong and NeutronAgentDown - https://phabricator.wikimedia.org/T374513#10136629 (10dcaro) Possibly related: {T335943} [09:24:16] !log dcaro@urcuchillay toolsbeta Added a new k8s worker toolsbeta-test-k8s-worker-13.toolsbeta.eqiad1.wikimedia.cloud to the cluster [09:24:16] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the toolsbeta cluster [09:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:24:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:41:39] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node [09:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:43:53] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) [09:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:45:47] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node [09:45:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:47:05] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) [09:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:58:10] 10Toolforge: [builds-cli] No obvious way to delete individual `toolforge build` generated artifacts other than `toolforge clean` - https://phabricator.wikimedia.org/T368317#10136747 (10dcaro) This is half-intentional, in the sense that we decided to avoid exposing the concept of 'images' to users, so there's no... [10:04:36] 10cloud-services-team (FY2024/2025-Q1-Q2): [ceph] install and put in the cluster the cloudcephmon100[1-3] replacements - https://phabricator.wikimedia.org/T374005#10136774 (10dcaro) [10:04:56] (03update) 10aborrero: tofu-infra: introduce DNS records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 (https://phabricator.wikimedia.org/T374338) [10:05:19] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 [10:05:49] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 [10:07:52] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [10:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:10:17] (03PS1) 10David Caro: toolforge.component.deploy: run tests by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1072162 [10:14:32] (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: run tests by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1072162 (owner: 10David Caro) [10:15:08] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [10:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:19:01] (03approved) 10fnegri: tofu-infra: introduce DNS records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 (https://phabricator.wikimedia.org/T374338) (owner: 10aborrero) [10:19:26] (03merge) 10aborrero: tofu-infra: introduce DNS records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 (https://phabricator.wikimedia.org/T374338) [10:19:49] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:20:11] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [10:20:46] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [10:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:21:45] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:22:42] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:25:09] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the toolsbeta cluster [10:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:27:35] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [10:27:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:29:17] (03approved) 10dcaro: maintain-kubeusers: bump to 0.0.168-20240910133124-0c3e395c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/519 (https://phabricator.wikimedia.org/T372720) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:29:21] (03merge) 10dcaro: maintain-kubeusers: bump to 0.0.168-20240910133124-0c3e395c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/519 (https://phabricator.wikimedia.org/T372720) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:34:51] !log dcaro@urcuchillay toolsbeta Added a new k8s ingress toolsbeta-test-k8s-ingress-9.toolsbeta.eqiad1.wikimedia.cloud to the cluster [10:34:51] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the toolsbeta cluster [10:34:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:17:56] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [11:19:11] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [11:27:05] (03open) 10aborrero: records: add default TTL of 3600 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/41 (https://phabricator.wikimedia.org/T374338) [11:27:11] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/41 [11:27:48] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/41 [11:34:38] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the toolsbeta cluster [11:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:41:24] (03approved) 10fnegri: records: add default TTL of 3600 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/41 (https://phabricator.wikimedia.org/T374338) (owner: 10aborrero) [11:44:11] !log dcaro@urcuchillay toolsbeta Added a new k8s ingress toolsbeta-test-k8s-ingress-10.toolsbeta.eqiad1.wikimedia.cloud to the cluster [11:44:11] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the toolsbeta cluster [11:44:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:55:37] (03merge) 10aborrero: records: add default TTL of 3600 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/41 (https://phabricator.wikimedia.org/T374338) [11:55:46] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [11:56:22] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [11:57:27] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [11:58:03] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:08:31] (03open) 10aborrero: wmcloud.org: update MX records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 (https://phabricator.wikimedia.org/T374278) [12:08:32] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 [12:08:58] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 [12:09:06] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 [12:09:40] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 [12:14:04] (03open) 10aborrero: dns: codfw1dev: track additional records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 (https://phabricator.wikimedia.org/T374338) [12:14:51] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [12:15:32] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [12:17:02] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the toolsbeta cluster [12:17:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:18:14] (03open) 10aborrero: dns: records: add a default description to indicate the record is managed [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 (https://phabricator.wikimedia.org/T374338) [12:18:24] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 [12:18:56] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 [12:26:28] !log dcaro@urcuchillay toolsbeta Added a new k8s ingress toolsbeta-test-k8s-ingress-11.toolsbeta.eqiad1.wikimedia.cloud to the cluster [12:26:28] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the toolsbeta cluster [12:26:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:37:35] (03open) 10aborrero: tofu-infra: update openstack provider from 2.0.0 to 2.1.0 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/45 [12:51:00] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node [12:51:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:52:30] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) [12:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:52:39] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node [12:52:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:53:58] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) [12:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:54:14] FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:55:04] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node [12:55:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:56:56] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) [12:56:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:03:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-ingress-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:08:28] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-ingress-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:20:44] RESOLVED: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [13:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:27:07] (03update) 10raymond-ndibe: [jobs-cli] remove unknown keys from dump [repos/cloud/toolforge/jobs-cli] (update_autocomplete_and_man_files) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/64 (https://phabricator.wikimedia.org/T341066) [14:32:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10137610 (10dcaro) >>! In T348643#10113626, @wiki_willy wrote: > Thanks @dcaro, sounds good. I'll bug them again abo... [14:40:40] (03close) 10raymond-ndibe: Draft: [envvars-api] DO_NOT_MERGE: schedule all pods on toolforge-worker [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/45 (https://phabricator.wikimedia.org/T358203) [14:41:02] (03close) 10raymond-ndibe: Draft: [lima-kilo] DO_NOT_MERGE: enable node inclusion policy feature gate [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/183 (https://phabricator.wikimedia.org/T358203) [14:41:20] (03close) 10raymond-ndibe: Draft: [toolforge-deploy] DO_NOT_MERGE : increase envvars-api replicas in local env [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/481 (https://phabricator.wikimedia.org/T358203) [14:42:54] 10Tools: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10137642 (10Fnielsen) I haven't seen the error in the last couple of days. [14:47:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:52:56] FIRING: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:57:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:07:50] (03merge) 10aborrero: wmcloud.org: update MX records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/42 (https://phabricator.wikimedia.org/T374278) [15:07:55] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:08:39] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [15:08:48] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:09:27] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [15:11:19] 06cloud-services-team, 13Patch-For-Review: Update wmcloud.org MX records - https://phabricator.wikimedia.org/T374278#10137713 (10aborrero) done! `lang=shell-session $ dig MX wmcloud.org +short 10 mx-in2001.wikimedia.org. 10 mx-in1001.wikimedia.org. ` [15:12:22] 06cloud-services-team, 13Patch-For-Review: Update wmcloud.org MX records - https://phabricator.wikimedia.org/T374278#10137728 (10aborrero) 05Open→03Resolved [15:19:14] 10Tools: Update welcome message in Zulip's goodbot - https://phabricator.wikimedia.org/T310826#10137768 (10debt) Hi! I recently found out about the goodbot for Zulip...and it needs some updating! Is there someone that can help or direct me as to how to update the wording of it? Thanks! {F57499690} [15:28:49] (03update) 10aborrero: dns: codfw1dev: track additional records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 (https://phabricator.wikimedia.org/T374338) [15:28:59] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [15:29:31] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [15:30:06] (03update) 10aborrero: dns: records: add a default description to indicate the record is managed [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 (https://phabricator.wikimedia.org/T374338) [15:30:11] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 [15:30:37] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 [15:31:30] (03merge) 10aborrero: dns: records: add a default description to indicate the record is managed [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/44 (https://phabricator.wikimedia.org/T374338) [15:31:43] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:32:20] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [15:32:48] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:33:29] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [15:40:01] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10137875 (10rook) Oh some good answers here https://etherpad.wikimedia.org/p/rooks-questions-to-alex [15:58:14] (03update) 10aborrero: dns: codfw1dev: track additional records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 (https://phabricator.wikimedia.org/T374338) [15:58:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [15:58:53] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [15:59:00] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [15:59:04] (03update) 10aborrero: dns: codfw1dev: track additional records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 (https://phabricator.wikimedia.org/T374338) [15:59:28] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 [16:00:44] (03merge) 10aborrero: dns: codfw1dev: track additional records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/43 (https://phabricator.wikimedia.org/T374338) [16:00:46] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [16:01:15] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [16:03:26] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [16:03:56] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [16:04:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [16:04:34] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [16:14:59] 10wikitech.wikimedia.org: Clean up LQT leftovers on Wikitech - https://phabricator.wikimedia.org/T374553 (10Pppery) 03NEW [16:21:33] 10wikitech.wikimedia.org: Clean up LQT leftovers on Wikitech - https://phabricator.wikimedia.org/T374553#10138027 (10JJMC89) 05Open→03Resolved a:03JJMC89 [16:23:43] 10wikitech.wikimedia.org: Clean up LQT leftovers on Wikitech - https://phabricator.wikimedia.org/T374553#10138042 (10Aklapper) [16:24:05] 10wikitech.wikimedia.org: Clean up LQT leftovers on Wikitech - https://phabricator.wikimedia.org/T374553#10138043 (10Pppery) [16:24:35] 10wikitech.wikimedia.org: Clean up LQT leftovers on Wikitech - https://phabricator.wikimedia.org/T374553#10138045 (10Pppery) @Aklapper This has nothing to do with that spike. LQT was undeployed from Wikitech already over a decade ago, this is just unrelated cleanup. [16:51:57] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Investigate options for pulling more relevant images for video - https://phabricator.wikimedia.org/T374557 (10Maryana) 03NEW [17:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:54:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:59:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:19:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:24:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown