[02:18:52] 10Toolforge: Support jdk21 on toolforge - https://phabricator.wikimedia.org/T346477#10120050 (10Don-vip) I managed tonight to: - donwload jdk21 from Adoptium at build time on GitLab CI and cache it using GitLab cache mechanism. It's fast enough for me - use jdk21 with toolforge build service / buildpacks. It's a... [02:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:22:48] 10Toolforge: [builds-builder] Cache .m2 folder (local maven repository) between builds - https://phabricator.wikimedia.org/T350307#10120054 (10Don-vip) @dcaro @Slst2020 have you enabled the cache already? I switched to Toolforge Build Service tonight in order to get Java 21 and the build is really fast! I don't... [02:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:02:28] (03open) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [03:06:52] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [03:07:17] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [03:07:53] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [03:24:56] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [03:38:11] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [04:22:38] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [05:37:58] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [05:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:55:53] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T373986) [07:36:44] 10PAWS, 10Quarry: PR usually not posting to phabricator - https://phabricator.wikimedia.org/T373134#10120246 (10Jelto) >>! In T373134#10101244, @rook wrote: > May be caused by T362401 Yes this is right, we are rate-limiting cloud providers heavily (which includes Azure) because of abuse and scraping we've had... [07:58:39] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10120276 (10dcaro) [08:32:05] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10120379 (10aborrero) >>! In T374043#10119678, @Andrew wrote: > We should drain the osds and cloudvirts. The few other should be fine. yeah, cloudnet, cloudservices, cloudlb, cloudgw, the service they provide sh... [08:32:26] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10120380 (10aborrero) [08:49:54] 06cloud-services-team: openstack: eqiad1: designate is maybe not working as expected (2024-09-04) - https://phabricator.wikimedia.org/T374023#10120405 (10aborrero) 05Open→03Resolved a:03aborrero thanks! [09:00:27] 06cloud-services-team: codfw1dev: rabbitmq is not working because some auth failures - https://phabricator.wikimedia.org/T374002#10120463 (10aborrero) this is better today. [09:00:58] 06cloud-services-team: codfw1dev: rabbitmq is not working because some auth failures - https://phabricator.wikimedia.org/T374002#10120468 (10aborrero) p:05Medium→03Low [09:03:03] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:03:29] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:03:47] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:03:57] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:09:31] (03CR) 10Arturo Borrero Gonzalez: [C:04-1] "How are you testing the change?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [09:10:09] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:10:33] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 [09:12:22] (03merge) 10aborrero: codfw1dev: instrument VXLAN-based flat network [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/30 (https://phabricator.wikimedia.org/T374020) [09:12:35] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:14:26] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:15:27] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:16:04] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:20:23] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/31 [09:20:26] (03open) 10aborrero: imports: drop network-related imports [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/31 (https://phabricator.wikimedia.org/T374020) [09:20:53] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/31 [09:21:37] (03merge) 10aborrero: imports: drop network-related imports [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/31 (https://phabricator.wikimedia.org/T374020) [09:22:00] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:22:46] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:30:50] (03open) 10aborrero: router_interfaces: account for subnetid and portid being mutually exclusive [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 (https://phabricator.wikimedia.org/T374020) [09:31:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 [09:31:43] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 [09:34:28] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 [09:34:29] (03update) 10aborrero: router_interfaces: account for subnetid and portid being mutually exclusive [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 (https://phabricator.wikimedia.org/T374020) [09:34:53] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 [09:35:51] (03merge) 10aborrero: router_interfaces: account for subnetid and portid being mutually exclusive [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32 (https://phabricator.wikimedia.org/T374020) [09:35:57] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:36:25] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:52:32] (03open) 10aborrero: imports: add import for renamed neutron router port [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/33 (https://phabricator.wikimedia.org/T374020) [09:52:43] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/33 [09:52:55] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/33 [09:53:37] (03merge) 10aborrero: imports: add import for renamed neutron router port [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/33 (https://phabricator.wikimedia.org/T374020) [09:55:15] (03open) 10aborrero: Revert "imports: add import for renamed neutron router port" [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/34 [09:55:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/34 [09:55:42] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/34 [09:55:52] (03merge) 10aborrero: Revert "imports: add import for renamed neutron router port" [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/34 [09:56:03] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:56:28] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:56:41] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:56:56] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:57:04] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:59:08] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: investigate S3 spurious endpoint errors - https://phabricator.wikimedia.org/T370660#10120678 (10aborrero) Had this again today: ` Initializing the backend... Error loading state: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 5... [09:59:27] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:19:28] (03open) 10aborrero: codfw1dev: temporary removal of cloud-flat router interface [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/35 (https://phabricator.wikimedia.org/T374020) [10:19:35] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/35 [10:20:07] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/35 [10:24:14] 06cloud-services-team, 06DBA: Add visitingwatchers to watchlist_count - https://phabricator.wikimedia.org/T150547#10120775 (10fnegri) p:05Lowest→03Low [10:25:59] 06cloud-services-team, 10Toolforge, 06collaboration-services, 10GitLab (Infrastructure), 07Kubernetes: gitlab: enable agent server for kubernetes (KAS) - https://phabricator.wikimedia.org/T320483#10120777 (10fnegri) p:05Lowest→03Low [10:26:10] 06cloud-services-team, 10Toolforge, 06Toolforge-standards-committee: Keep track of tools without stated default licenses - https://phabricator.wikimedia.org/T190377#10120783 (10fnegri) p:05Lowest→03Low [10:26:31] 06cloud-services-team, 10Cloud-VPS, 07Documentation: Improve documentation on SSH host fingerprints - https://phabricator.wikimedia.org/T193648#10120785 (10fnegri) p:05Lowest→03Low [10:26:53] 06cloud-services-team, 10Cloud-VPS: Consider different varieties of Cloud VPS instance flavors - https://phabricator.wikimedia.org/T188941#10120781 (10fnegri) p:05Lowest→03Low [10:27:03] 06cloud-services-team, 10Cloud-VPS: cloudnet: consider increasing network neighbour table - https://phabricator.wikimedia.org/T327512#10120779 (10fnegri) p:05Lowest→03Low [10:27:54] 06cloud-services-team, 10Toolforge, 10docker-pkg: Port operations/docker-images/toollabs-images to use docker-pkg - https://phabricator.wikimedia.org/T200649#10120787 (10fnegri) p:05Lowest→03Low [10:29:34] 06cloud-services-team, 10Data-Services: Allow self-serve database credential and permissions management for Toolforge projects - https://phabricator.wikimedia.org/T136335#10120792 (10fnegri) p:05Lowest→03Low [10:29:36] 06cloud-services-team, 10Toolforge: Automate kubeadm config change deployment - https://phabricator.wikimedia.org/T292945#10120797 (10fnegri) p:05Lowest→03Low [10:29:41] 06cloud-services-team, 10Toolforge, 07Kubernetes: Extremely high latency over NFS between kubernetes node and bastion host - https://phabricator.wikimedia.org/T256426#10120790 (10fnegri) p:05Lowest→03Low [10:30:19] 06cloud-services-team: Rename labs/toollabs components to toolforge/wmcs where appropriate - https://phabricator.wikimedia.org/T208387#10120803 (10fnegri) p:05Lowest→03Low [10:30:37] 06cloud-services-team, 10Tool-toolviews, 10Toolforge: Provide basic page view metrics for individual tools on toolforge - https://phabricator.wikimedia.org/T87001#10120795 (10fnegri) p:05Lowest→03Low [10:30:52] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:31:22] 06cloud-services-team, 10Toolforge: toolforge: rework toollabs debian package (misctools and jobutils) - https://phabricator.wikimedia.org/T207968#10120801 (10fnegri) p:05Lowest→03Low [10:31:46] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [10:32:28] 06cloud-services-team, 10Toolforge, 07Upstream: Debian Stretch lighttpd does not allow overriding existing mimetype.assign values - https://phabricator.wikimedia.org/T215683#10120805 (10fnegri) p:05Lowest→03Low [10:33:22] 06cloud-services-team, 10Toolforge: toolsbeta.automated-toolforge-tests membership causes "groups: cannot find name for group ID 54872" error message - https://phabricator.wikimedia.org/T301736#10120807 (10fnegri) p:05Lowest→03Low [10:42:26] (03open) 10aborrero: ports: remove device owner data [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 (https://phabricator.wikimedia.org/T374020) [10:43:12] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 [10:43:38] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 [10:43:50] (03close) 10aborrero: codfw1dev: temporary removal of cloud-flat router interface [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/35 (https://phabricator.wikimedia.org/T374020) [10:48:24] (03approved) 10fnegri: ports: remove device owner data [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 (https://phabricator.wikimedia.org/T374020) (owner: 10aborrero) [10:48:35] (03merge) 10aborrero: ports: remove device owner data [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 (https://phabricator.wikimedia.org/T374020) [10:49:23] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:50:05] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:57:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 07Epic: [Hypothesis] WE6.3.2 Create "standard" tool (Sample Complex Tool, SCT) to measure the number of steps for a deployment - https://phabricator.wikimedia.org/T368602#10120877 (10dcaro) [11:09:59] 10cloud-services-team (FY2024/2025-Q1-Q2): cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986#10120914 (10cmooney) [11:19:23] (03approved) 10dcaro: calico: correct kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/516 (owner: 10sstefanova) [11:19:51] (03CR) 10Slyngshede: [V:03+2 C:03+2] P:idp Add Keystone dummy secret [labs/private] - 10https://gerrit.wikimedia.org/r/1070588 (owner: 10Slyngshede) [11:20:15] (03update) 10sstefanova: calico: correct kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/516 [11:20:16] (03approved) 10sstefanova: calico: correct kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/516 [11:20:22] (03merge) 10sstefanova: calico: correct kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/516 [11:24:24] (03CR) 10David Caro: [C:03+2] "> How are you testing the change?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [11:24:34] (03CR) 10David Caro: [C:04-1] Revert^2 "openstack.tofu: use run_script instead of reimplementing it" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [11:25:10] (03CR) 10CI reject: [V:04-1] Revert^2 "openstack.tofu: use run_script instead of reimplementing it" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [11:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:18:18] (03PS8) 10David Caro: Revert^2 "openstack.tofu: use run_script instead of reimplementing it" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 [12:18:18] (03CR) 10David Caro: Revert^2 "openstack.tofu: use run_script instead of reimplementing it" (034 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [12:31:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [12:31:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [12:35:44] (03open) 10dcaro: cronjob: add simple cronjob [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/3 (https://phabricator.wikimedia.org/T368602) [12:36:25] 06cloud-services-team, 10Toolforge: toolsbeta.automated-toolforge-tests membership causes "groups: cannot find name for group ID 54872" error message - https://phabricator.wikimedia.org/T301736#10121204 (10dcaro) 05Stalled→03Resolved a:03dcaro This does not happen anymore: ` dcaro@cloudcumin1001:~$ s... [12:44:26] (03update) 10dcaro: cronjob: add simple cronjob [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/3 (https://phabricator.wikimedia.org/T368602) [12:44:52] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10121284 (10aborrero) With the above patches, I was able to create a VM attached to the new VXLAN-based subnet: `lines=10,lang=shell-session aborrero@cloudcontrol2... [12:46:25] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10121298 (10cmooney) nice work! I'll log on when I have some time to familiarise myself. [12:47:37] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T373986) [12:47:43] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [12:49:03] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111 (10aborrero) 03NEW [12:51:36] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T373986) [12:54:04] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121329 (10cmooney) > cloud-flat-eqiad1: either 172.16.8.0/22 (1k addresses) or 172.16.8.0/21 (2k addresses) Even with the ability to more easily... [12:54:41] 10Tool-Global-user-contributions, 10Special:GlobalContributions, 06Stewards-and-global-tools, 07Epic, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Epic] Implement global user contributions feature - https://phabricator.wikimedia.org/T337089#10121331 (10kostajh) a:03STran [12:54:51] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Cloud VPS: design target vxlan setup - https://phabricator.wikimedia.org/T373869#10121333 (10aborrero) To keep with the pattern in netbox, we will attach the deployment suffix to the network name: * cloud-flat-eqiad1 * cloud-flat-codfw1dev [12:55:28] RESOLVED: PuppetAgentDisabled: Puppet agent disabled on instance tools-prometheus-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentDisabled [13:00:53] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121368 (10aborrero) As of this writing we have a `172.16.0.0/21` for eqiad1, with about ~800 VMs running. source: * tofu https://gitlab.wikimedia... [13:01:49] (03open) 10aborrero: codfw1dev: rename cloud-flat to cloud-flat-codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 (https://phabricator.wikimedia.org/T374020) [13:01:50] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [13:02:16] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [13:03:33] (03update) 10aborrero: codfw1dev: rename cloud-flat to cloud-flat-codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 (https://phabricator.wikimedia.org/T374020) [13:03:51] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [13:04:25] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [13:07:50] 06cloud-services-team, 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Review/update wikitech-static syncing after wikitech moves to Kubernetes - https://phabricator.wikimedia.org/T374114 (10Andrew) 03NEW [13:16:35] 06cloud-services-team, 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Review/update wikitech-static syncing after wikitech moves to Kubernetes - https://phabricator.wikimedia.org/T374114#10121414 (10Andrew) cc'ing @Dzahn because he's done some wikitech-static maintenance in the past and might be inte... [13:23:50] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:23:50] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/controller:v1.15.3 (T359641) [13:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:23:54] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [13:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:24:27] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641) [13:24:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:26:07] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:26:07] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/webhook:v1.15.3 (T359641) [13:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:26:37] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641) [13:26:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:27:33] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:27:33] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/cainjector:v1.15.3 (T359641) [13:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:27:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:28:03] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641) [13:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:40:27] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:40:27] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 (T359641) [13:40:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:40:32] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [13:40:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:40:37] !log raymondndibe@wmf3402 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) (T359641) [13:40:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:41:09] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:41:09] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 (T359641) [13:41:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:41:17] !log raymondndibe@wmf3402 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) (T359641) [13:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:45:43] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:45:43] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 (T359641) [13:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:45:47] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [13:45:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:46:15] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641) [13:46:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:50:03] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641) [13:50:03] !log raymondndibe@wmf3402 tools Updating container image docker-registry.tools.wmflabs.org/cert-manager/stakater-reloader:v1.1.0 (T359641) [13:50:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:50:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:50:35] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641) [13:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:55:34] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [13:56:11] (03update) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [13:57:09] 06cloud-services-team, 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Review/update wikitech-static syncing after wikitech moves to Kubernetes - https://phabricator.wikimedia.org/T374114#10121624 (10jijiki) a:05jijiki→03None [14:07:54] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#10121666 (10dcaro) [14:10:03] 06cloud-services-team, 10Toolforge: [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10121664 (10dcaro) Summary of today's deep dive: * `--container-runtime` is not there anymore on our nodes (yay!), it seems kubadm cleaned it up? * `--container-runti... [14:15:35] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [14:16:02] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 [14:17:47] (03merge) 10aborrero: codfw1dev: rename cloud-flat to cloud-flat-codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 (https://phabricator.wikimedia.org/T374020) [14:17:52] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [14:18:56] (03update) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [14:19:52] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [14:21:22] FIRING: [4x] HAProxyBackendUnavailable: HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:22:59] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade from 1.26.15 to 1.27.16 (T359641) [14:23:03] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [14:24:30] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [14:25:21] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster toolsbeta upgrade from 1.26.15 to 1.27.16 (T359641) [14:25:32] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [14:26:22] RESOLVED: [4x] HAProxyBackendUnavailable: HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:27:26] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-7 from 1.26.15 to 1.27.16 (T359641) [14:27:58] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121756 (10aborrero) I agree going with 172.16.8.0/22 (1k addresses) for now for eqiad1. [14:28:20] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121760 (10aborrero) [14:33:10] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121782 (10aborrero) Created https://netbox.wikimedia.org/ipam/prefixes/1076/ [14:35:25] 06cloud-services-team: IDP/SSO logout behavior is weird - https://phabricator.wikimedia.org/T374123 (10Andrew) 03NEW [14:35:31] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: netbox: allocate CIDRs for openstack VXLAN-based flat networks - https://phabricator.wikimedia.org/T374111#10121783 (10aborrero) 05Open→03Resolved p:05Triage→03Medium [14:37:51] 10Horizon: Use IDP for authentication in Horizon - https://phabricator.wikimedia.org/T359590#10121809 (10Andrew) 05Stalled→03Resolved It works! Thanks @SLyngshede-WMF for doing all the hard bits. [14:39:46] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10121818 (10aborrero) I deleted the previous VM and created 2 new ones, in different hypervisors: * arturo-test-vm 469d4e3a-f222-45ab-a442-3d84ec7043a9 172.16.129.... [14:45:04] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-7 from 1.26.15 to 1.27.16 (T359641) [14:45:08] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [14:46:25] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-8 from 1.26.15 to 1.27.16 (T359641) [14:51:41] FIRING: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:53:12] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-8 from 1.26.15 to 1.27.16 (T359641) [14:53:15] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [14:55:19] 10Striker: User should be told name of existing Developer account when SUL is already in use - https://phabricator.wikimedia.org/T294767#10121885 (10bd808) [15:08:34] (03merge) 10raymond-ndibe: k8s: upgrade to 1.27.16 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/188 (https://phabricator.wikimedia.org/T359641) (owner: 10dcaro) [15:10:23] 06cloud-services-team, 06Security-Team, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: toolforge: k8s-status: prevent it from accessing some information - https://phabricator.wikimedia.org/T346313#10121959 (10sbassett) [15:10:46] 06cloud-services-team: IDP/SSO logout behavior is weird - https://phabricator.wikimedia.org/T374123#10121962 (10bd808) Andrew are you asking for a shared URL on the IDP side that would return a "you are logged out from service X, would you like to log in again?" sort of page? This to me feels like a client conce... [15:17:27] 10Cloud-Services, 06cloud-services-team, 13Patch-For-Review: Upgrade Openstack Horizon to Mitaka - https://phabricator.wikimedia.org/T158099#10121982 (10Lucas_Werkmeister_WMDE) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/pr... [15:17:28] 06cloud-services-team: codfw1dev: rabbitmq is not working because some auth failures - https://phabricator.wikimedia.org/T374002#10121993 (10aborrero) comment by andrew: this is maybe a consequence of rabbit being collocated in cloudcontrols. Consider having them running on separate hardware like in eqiad1. [15:17:41] 06cloud-services-team, 10Horizon: Horizon Mitaka 'remember me' checkbox immune to keyboard focus - https://phabricator.wikimedia.org/T158103#10121978 (10Lucas_Werkmeister_WMDE) 05Open→03Invalid With the deployment of {T359590} I guess this task became invalid today (but I can confirm the issue still ex... [15:25:42] 06cloud-services-team: IDP/SSO logout behavior is weird - https://phabricator.wikimedia.org/T374123#10122013 (10Andrew) >>! In T374123#10121962, @bd808 wrote: > Andrew are you asking for a shared URL on the IDP side that would return a "you are logged out from service X, would you like to log in again?" sort of... [15:35:55] (03merge) 10lucaswerkmeister: shell: drop --wait [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/55 (https://phabricator.wikimedia.org/T373866) [15:37:18] 06cloud-services-team: IDP/SSO logout behavior is weird - https://phabricator.wikimedia.org/T374123#10122048 (10bd808) >>! In T374123#10122013, @Andrew wrote: > I'm also premising this on @SLyngshede-WMF saying that other dev services have the same weird behavior as Horizon so I'm assuming this is a general prob... [15:40:02] 06cloud-services-team, 10Horizon: Upgrade Openstack Horizon to Mitaka - https://phabricator.wikimedia.org/T158099#10122068 (10JJMC89) [16:02:16] 06cloud-services-team: openstack: consider removing labs-ip-aliaser - https://phabricator.wikimedia.org/T374129 (10aborrero) 03NEW [16:02:39] 06cloud-services-team: openstack: consider removing labs-ip-aliaser - https://phabricator.wikimedia.org/T374129#10122159 (10aborrero) p:05Triage→03Low [16:02:47] 06cloud-services-team, 10Cloud-VPS: openstack: consider removing labs-ip-aliaser - https://phabricator.wikimedia.org/T374129#10122160 (10fnegri) [16:12:36] 06cloud-services-team, 10Cloud-VPS: openstack: consider removing labs-ip-aliaser - https://phabricator.wikimedia.org/T374129#10122179 (10Andrew) I would live to get rid of this! [16:21:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1031.eqiad.wmnet' (T374043) [16:21:07] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [16:42:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1031.eqiad.wmnet' (T374043) [16:42:46] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [16:43:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1032.eqiad.wmnet' (T374043) [16:43:56] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10122299 (10Andrew) [16:48:51] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T373986) [16:48:57] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [17:08:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1032.eqiad.wmnet' (T374043) [17:08:18] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [17:08:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T374043) [17:13:04] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10122457 (10Andrew) I have shut down those two VMs due to lack of response [17:14:21] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "striker" project Buster deprecation - https://phabricator.wikimedia.org/T367555#10122463 (10Andrew) 05Open→03Resolved a:03Andrew deleted! [17:15:18] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikicommunityhealth" project Buster deprecation - https://phabricator.wikimedia.org/T367560#10122468 (10Andrew) @CristianCantoro can you please update us as to the status of this? Both VMs are still present and shut off. [17:15:47] 10Cloud-VPS (Debian Buster Deprecation), 06Infrastructure-Foundations, 10Puppet CI: Cloud VPS "puppet-diffs" project Buster deprecation - https://phabricator.wikimedia.org/T367547#10122469 (10Andrew) 05Open→03Resolved [17:16:26] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-control-7 [17:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:16:45] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "schematreerecommender" project Buster deprecation - https://phabricator.wikimedia.org/T367552#10122482 (10Andrew) These VMs are still shut off, there does not seem to have been any progress. @Michaelcochez do you still intend to work on this project? [17:18:16] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-control-7 [17:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:18:25] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-control-8 [17:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:18:41] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-control-8 [17:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:20:10] (03update) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [17:21:25] (03PS1) 10David Caro: toolsbeta: update the control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071010 [17:22:02] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-control-8 [17:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:22:43] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T374043) [17:22:48] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [17:23:26] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-control-8 [17:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:28:15] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the toolsbeta cluster [17:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:32:07] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T373986) [17:32:12] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [17:32:25] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10122566 (10dcaro) [17:39:30] !log dcaro@urcuchillay toolsbeta Added a new k8s control toolsbeta-test-k8s-control-12.toolsbeta.eqiad1.wikimedia.cloud to the cluster [17:39:30] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the toolsbeta cluster [17:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:39:46] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-control-9 [17:39:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:41:00] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-control-9 [17:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:41:52] (03PS2) 10David Caro: toolsbeta: update the control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071010 [17:53:19] (03CR) 10David Caro: [C:03+2] toolsbeta: update the control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071010 (owner: 10David Caro) [17:58:29] (03update) 10dcaro: cronjob: add simple cronjob [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/3 (https://phabricator.wikimedia.org/T368602) [18:01:42] RESOLVED: CloudVPSDesignateLeaks: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:05:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T374043) [18:06:00] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [18:06:29] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10122699 (10Andrew) [18:07:33] (03Merged) 10jenkins-bot: toolsbeta: update the control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071010 (owner: 10David Caro) [18:24:56] 06cloud-services-team, 10Toolforge, 07Upstream: Debian Stretch lighttpd does not allow overriding existing mimetype.assign values - https://phabricator.wikimedia.org/T215683#10122764 (10gstrauss-wiki) Why is this still open? Debian Stretch is EOL. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Ligh... [18:29:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1034.eqiad.wmnet' (T374043) [18:29:10] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [18:45:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1035.eqiad.wmnet' (T374043) [18:45:38] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [18:58:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1035.eqiad.wmnet' (T374043) [18:58:49] T374043: Drain C8 rack - https://phabricator.wikimedia.org/T374043 [19:07:50] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10122937 (10Andrew) [19:23:32] PROBLEM - SSH on cloudbackup2004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:26:22] RECOVERY - SSH on cloudbackup2004 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:37:48] 10Tools: QuickCategories background runner sometimes hangs for no apparent reason - https://phabricator.wikimedia.org/T374152 (10LucasWerkmeister) 03NEW [19:38:14] 10Tools: QuickCategories background runner sometimes hangs for no apparent reason - https://phabricator.wikimedia.org/T374152#10123061 (10LucasWerkmeister) I’m leaving the background runner in its current state for a bit in case someone else wants to take a look, but at some point I’ll restart it again to get th... [19:42:26] 10Tools: QuickCategories background runner sometimes hangs for no apparent reason - https://phabricator.wikimedia.org/T374152#10123062 (10LucasWerkmeister) If we can’t figure out the underlying issue, I suppose I could: - convert the background runner to toolforge-jobs - make the code touch some tmp file each t... [19:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:36:26] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component cert-manager [20:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:36:30] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component cert-manager [20:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:37:15] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component cert-manager [20:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:37:19] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component cert-manager [20:37:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:49:48] FIRING: PuppetFailure: Puppet has failed on cloudbackup2004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:50:03] 06cloud-services-team: PuppetFailure Puppet failure on cloudbackup2004:9100 - https://phabricator.wikimedia.org/T374158 (10phaultfinder) 03NEW [20:50:23] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component cert-manager [20:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:51:55] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component cert-manager [20:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:54:39] (03PS1) 10JHathaway: geoip: add fake info [labs/private] - 10https://gerrit.wikimedia.org/r/1071034 [20:55:56] (03CR) 10JHathaway: [C:03+2] geoip: add fake info [labs/private] - 10https://gerrit.wikimedia.org/r/1071034 (owner: 10JHathaway) [20:56:02] (03CR) 10JHathaway: [V:03+2 C:03+2] geoip: add fake info [labs/private] - 10https://gerrit.wikimedia.org/r/1071034 (owner: 10JHathaway) [21:09:48] RESOLVED: PuppetFailure: Puppet has failed on cloudbackup2004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:32:31] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T373986) [21:32:36] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [22:05:07] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 13Patch-For-Review: puppet problems mounting cinder volumes (and suggested fixes) - https://phabricator.wikimedia.org/T371573#10123486 (10Dzahn) a:03Dzahn [22:05:37] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 13Patch-For-Review: puppet problems mounting cinder volumes (and suggested fixes) - https://phabricator.wikimedia.org/T371573#10123489 (10Dzahn) Thanks for the merge. let me test before closing it as resolved. [22:26:25] 10Tools: QuickCategories background runner sometimes hangs for no apparent reason - https://phabricator.wikimedia.org/T374152#10123532 (10LucasWerkmeister) That `read()` seems to be happily blocking forever, by the way: `lang=shell-session root@tools-k8s-worker-nfs-63:~# timeout 60m strace -p793594 -yy strace:... [23:35:37] (03PS1) 10Raymond Ndibe: [wmcs-cookbook] update toolsbeta-test-k8s-control vms [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071052 (https://phabricator.wikimedia.org/T359641)