[00:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:26:05] (03CR) 10Krinkle: "1. I suggest citing in the code where the pattern comes from. It's not obvious that the pattern is correct in both directions (matches all" [labs/countervandalism/CVNBot] - 10https://gerrit.wikimedia.org/r/1084298 (https://phabricator.wikimedia.org/T378530) (owner: 10AntiCompositeNumber) [01:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:32:11] 10Cloud-VPS, 10Wikispore: Vanity domain for Wikispore - https://phabricator.wikimedia.org/T368236#10298287 (10Samwilson) [01:34:36] 10Cloud-VPS, 10Wikispore: Vanity domain for Wikispore - https://phabricator.wikimedia.org/T368236#10298286 (10Samwilson) > Wikispore would like to serve some content using the domain name map.wikinyc.org. What content? Is this a separate wiki to the Wikispore one? map.wikinyc.org currently redirects to wikisp... [02:40:14] FIRING: Kernel warning: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [02:40:14] FIRING: Kernel err priority: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+err+priority [02:44:28] FIRING: [5x] NodeTextfileStale: Stale textfile for cloudvirt1063:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [03:09:28] RESOLVED: NodeTextfileStale: Stale textfile for cloudvirt1063:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [03:10:14] RESOLVED: Kernel warning: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [03:10:14] RESOLVED: Kernel err priority: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+err+priority [03:10:56] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:15:56] RESOLVED: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:12:50] (03update) 10dcaro: token: add created_at field to the token [repos/cloud/toolforge/components-api] (rename_deploy_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 [09:13:23] (03update) 10raymond-ndibe: [maintain-harbor] do not clean up images currently running in production [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/35 (https://phabricator.wikimedia.org/T377854) [09:13:37] (03open) 10dcaro: global: rename deployment token to deploy token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/35 [09:13:38] (03update) 10raymond-ndibe: [maintain-harbor] do not clean up images currently running in production [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/35 (https://phabricator.wikimedia.org/T377854) [09:13:41] (03update) 10dcaro: global: rename deployment token to deploy token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/35 [09:18:31] 06cloud-services-team, 10Cloud-VPS: tofu-infra: check for leaked resources when deleting projects - https://phabricator.wikimedia.org/T379231 (10aborrero) 03NEW [09:18:37] 06cloud-services-team, 10Cloud-VPS: tofu-infra: check for leaked resources when deleting projects - https://phabricator.wikimedia.org/T379231#10298730 (10aborrero) p:05Triage→03Medium [09:19:10] (03update) 10dcaro: token: add created_at field to the token [repos/cloud/toolforge/components-api] (rename_deploy_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 [09:19:23] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] (add_creation_date_to_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [09:27:25] (03update) 10raymond-ndibe: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (https://phabricator.wikimedia.org/T378180) [09:28:35] 10Tool-lexeme-forms, 06translatewiki.net: translatewiki export for Wikidata Lexeme Forms tries to remove sh-latn translations - https://phabricator.wikimedia.org/T379188#10298763 (10Nikerabbit) p:05Triage→03High [09:39:06] (03update) 10raymond-ndibe: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (https://phabricator.wikimedia.org/T378180) [09:43:55] (03update) 10raymond-ndibe: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (https://phabricator.wikimedia.org/T378180) [09:44:36] (03update) 10raymond-ndibe: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (https://phabricator.wikimedia.org/T378180) [09:45:05] (03update) 10raymond-ndibe: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (https://phabricator.wikimedia.org/T378180) [09:55:16] (03approved) 10sstefanova: token: add created_at field to the token [repos/cloud/toolforge/components-api] (rename_deploy_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 (owner: 10dcaro) [09:55:26] (03update) 10sstefanova: token: add created_at field to the token [repos/cloud/toolforge/components-api] (rename_deploy_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 (owner: 10dcaro) [09:57:25] (03approved) 10sstefanova: global: rename deployment token to deploy token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/35 (owner: 10dcaro) [09:58:55] (03merge) 10dcaro: global: rename deployment token to deploy token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/35 [09:58:57] (03update) 10dcaro: token: add created_at field to the token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 [10:01:04] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: DONOTMERGE components-api: bump to 0.0.29-20241002095441-cd2060f1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) [10:09:20] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:11:01] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:11:34] (03open) 10dcaro: api_client: add the async version of the API client [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/65 (https://phabricator.wikimedia.org/T379053) [10:14:32] (03update) 10dcaro: api_client: add the async version of the API client [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/65 (https://phabricator.wikimedia.org/T379053) [10:17:33] (03update) 10dcaro: api_client: add the async version of the API client [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/65 (https://phabricator.wikimedia.org/T379053) [10:41:23] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:42:36] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:43:13] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:44:06] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:46:22] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:48:50] (03update) 10aborrero: WIP: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:48:51] (03update) 10aborrero: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [10:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:50:57] 06cloud-services-team, 10Cloud-VPS: tofu-infra: check for leaked resources when deleting projects - https://phabricator.wikimedia.org/T379231#10298892 (10aborrero) see : https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 [10:57:30] (03open) 10dcaro: use async toolforge cli [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T379053) [10:57:36] (03merge) 10dcaro: token: add created_at field to the token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 [10:57:38] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [10:58:23] (03update) 10dcaro: use async toolforge cli [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T379053) [10:58:31] (03update) 10dcaro: use async toolforge cli [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T379053) [10:58:58] (03update) 10dcaro: use async toolforge cli [repos/cloud/toolforge/components-api] (add_token_validation) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T379053) [10:59:34] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: DONOTMERGE components-api: bump to 0.0.29-20241002095441-cd2060f1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) [11:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:14:16] 06cloud-services-team, 10Toolforge: [builds-builder] Cache .m2 folder (local maven repository) between builds - https://phabricator.wikimedia.org/T350307#10298939 (10Slst2020) @Don-vip is this something you'd still want, or could we close this task? [11:18:58] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] Use an asynchronous toolforge client to interact with toolforge - https://phabricator.wikimedia.org/T379053#10298955 (10dcaro) This might not actually be needed :) Reading https://fastapi.tiangolo.com/async/ and doing some tests to verify [11:30:51] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] Use an asynchronous toolforge client to interact with toolforge - https://phabricator.wikimedia.org/T379053#10298981 (10dcaro) Yep, we don't need to use async libraries to avoid the api from getting blocked, it will handle non-async... [11:31:20] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] Use an asynchronous toolforge client to interact with toolforge - https://phabricator.wikimedia.org/T379053#10298987 (10dcaro) 05Resolved→03Invalid [11:31:33] (03close) 10dcaro: api_client: add the async version of the API client [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/65 (https://phabricator.wikimedia.org/T379053) [11:31:50] (03close) 10dcaro: use async toolforge cli [repos/cloud/toolforge/components-api] (add_token_validation) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T379053) [11:34:29] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] Use an asynchronous toolforge client to interact with toolforge - https://phabricator.wikimedia.org/T379053#10298983 (10dcaro) 05In progress→03Resolved [11:37:18] (03CR) 10Kosta Harlan: [C:03+1] "The pattern is very unlikely to change. So for fetching the pattern config via Siteinfo API, I would suggest leaving that for a different " [labs/countervandalism/CVNBot] - 10https://gerrit.wikimedia.org/r/1084298 (https://phabricator.wikimedia.org/T378530) (owner: 10AntiCompositeNumber) [11:48:44] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10299054 (10fnegri) 05Stalled→03In progress [12:05:55] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: vxlan: potential changes to cloudvirt MTU to enable jumbo frames - https://phabricator.wikimedia.org/T379154#10299113 (10aborrero) 05Open→03Declined We have decided not to perform any changes to the MTU settings for now. We will re-eval... [12:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:34:46] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10299227 (10jijiki) [12:36:47] 06cloud-services-team, 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Review/update wikitech-static syncing after wikitech moves to Kubernetes - https://phabricator.wikimedia.org/T374114#10299203 (10jijiki) 05In progress→03Resolved I am closing this in favour of T376400. If any other issues... [12:43:45] 10Cloud-VPS (Project-requests), 10WMDE-TechWish-Survey, 07Unplanned-Sprint-Work, 03WMDE-TechWish-Sprint-2024-10-16: Request creation of wmde-techwishes-survey VPS project - https://phabricator.wikimedia.org/T378975#10299293 (10awight) We won't need object storage, so perhaps the dashes in the project name... [12:48:33] 10Cloud-VPS (Project-requests), 10WMDE-TechWish-Survey, 07Unplanned-Sprint-Work, 03WMDE-TechWish-Sprint-2024-10-16: Request creation of wmde-techwishes-survey VPS project - https://phabricator.wikimedia.org/T378975#10299298 (10aborrero) looks good to me, +1 [12:53:28] 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378676#10299328 (10Slst2020) 05In progress→03Resolved [12:53:31] 06cloud-services-team, 06Infrastructure-Foundations, 10Puppet CI: puppet catalog compiler (pcc) failing with internal error - https://phabricator.wikimedia.org/T347358#10299324 (10aborrero) 05Open→03Resolved a:03aborrero [12:57:13] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 16): [components-api] Add endpoint to delete a deployment - https://phabricator.wikimedia.org/T379093#10299333 (10Slst2020) 05Open→03In progress [13:02:49] (03open) 10aborrero: eqiad1: create wmde-techwishes-survey project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/119 (https://phabricator.wikimedia.org/T378975) [13:03:04] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10299374 (10fnegri) While this is being discussed, who is taking care of maintaining the current installation of wikitech-static? This alert has been firing for more tha... [13:04:46] 10PAWS, 10Quarry: PR usually not posting to phabricator - https://phabricator.wikimedia.org/T373134#10299398 (10rook) 05Open→03Declined [13:05:17] (03close) 10aborrero: tofu-infra: add code to validate no leaking VMs exist [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/118 (https://phabricator.wikimedia.org/T379231) [13:15:19] 10Cloud-VPS (Project-requests), 10WMDE-TechWish-Survey, 13Patch-For-Review, 07Unplanned-Sprint-Work, 03WMDE-TechWish-Sprint-2024-10-16: Request creation of wmde-techwishes-survey VPS project - https://phabricator.wikimedia.org/T378975#10299474 (10dcaro) +1 [13:23:11] (03merge) 10aborrero: eqiad1: create wmde-techwishes-survey project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/119 (https://phabricator.wikimedia.org/T378975) [13:23:19] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [13:24:19] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [13:30:23] (03update) 10dcaro: DONOTMERGE components-api: bump to 0.0.29-20241002095441-cd2060f1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:31:21] (03update) 10dcaro: DONOTMERGE components-api: bump to 0.0.29-20241002095441-cd2060f1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:32:45] (03update) 10dcaro: components-api: bump to 0.0.29-20241002095441-cd2060f1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:33:08] (03update) 10dcaro: components-api: bump to 0.0.42-20241015121530-8b9350de [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:33:21] (03approved) 10sstefanova: components-api: bump to 0.0.42-20241015121530-8b9350de [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:41:07] (03update) 10dcaro: components-api: bump to 0.0.42-20241015121530-8b9350de [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:41:38] (03update) 10dcaro: components-api: bump to 0.0.21-20241107105745-795f4143 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:43:27] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:48:00] (03open) 10dcaro: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) [13:48:34] (03update) 10dcaro: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] (bump_components-api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) [13:50:14] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:51:17] (03merge) 10dcaro: components-api: bump to 0.0.21-20241107105745-795f4143 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/544 (https://phabricator.wikimedia.org/T356261) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:51:18] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) (owner: 10dcaro) [13:52:40] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate deployment-poolcounter06.deployment-prep.eqiad.wmflabs is about to expire in 20d 23h 58m 30s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [14:02:34] (03update) 10raymond-ndibe: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] (add_cache_disk) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585) [14:07:59] (03update) 10raymond-ndibe: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] (add_cache_disk) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585) [14:21:34] (03approved) 10sstefanova: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) (owner: 10dcaro) [14:22:41] 10Toolforge (Toolforge iteration 16): [components-api] Try to make show up the token auth for the deployment in the openapi spec/swagger UI - https://phabricator.wikimedia.org/T379257 (10dcaro) 03NEW [14:25:51] (03update) 10sstefanova: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) (owner: 10dcaro) [14:26:16] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [14:30:34] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [14:30:42] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [14:33:56] FIRING: SystemdUnitDown: The service unit wmf_auto_restart_virtlogd.service is in failed status on host cloudvirt1063. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:36:03] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [14:36:03] (03update) 10raymond-ndibe: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] (add_cache_disk) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585) [14:45:55] (03update) 10raymond-ndibe: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] (add_cache_disk) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585) [14:46:11] (03approved) 10sstefanova: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) (owner: 10dcaro) [14:51:40] (03update) 10raymond-ndibe: [lima-kilo] test k8s 1.28 upgrade [repos/cloud/toolforge/lima-kilo] (configure_high_availability) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/193 (https://phabricator.wikimedia.org/T362867) [14:56:23] 10Quarry: unused dns proxies? - https://phabricator.wikimedia.org/T373528#10299898 (10rook) 05Open→03Resolved a:03rook [14:57:25] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [14:57:34] (03approved) 10dcaro: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) [15:01:00] vivian-rook opened https://github.com/toolforge/quarry/pull/71 [15:03:41] 10Toolforge (Toolforge iteration 16): [components-api] Try to make show up the token auth for the deployment in the openapi spec/swagger UI - https://phabricator.wikimedia.org/T379257#10299923 (10dcaro) p:05Triage→03Low [15:04:56] (03merge) 10dcaro: enable components in gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/583 (https://phabricator.wikimedia.org/T362051) [15:05:04] 10Toolforge (Toolforge iteration 16): [components-api] Try to make show up the token auth for the deployment in the openapi spec/swagger UI - https://phabricator.wikimedia.org/T379257#10299922 (10dcaro) It does show up :) And when set, it's used only by the endpoints that need it (that is just the create de... [15:05:20] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [15:05:26] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [15:05:37] (03merge) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [15:05:43] 10Toolforge (Toolforge iteration 16): [components-api] Try to make show up the token auth for the deployment in the openapi spec/swagger UI - https://phabricator.wikimedia.org/T379257#10299930 (10dcaro) 05Open→03Resolved a:03dcaro [15:07:37] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: components-api: bump to 0.0.48-20241010123512-f12ab9d2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/584 (https://phabricator.wikimedia.org/T356261 https://phabricator.wikimedia.org/T362066 https://phabricator.wikimedia.org/T362069) [15:09:28] vivian-rook closed https://github.com/toolforge/quarry/pull/71 [15:10:05] 10Quarry: update build-and-push - https://phabricator.wikimedia.org/T378978#10299955 (10rook) https://github.com/toolforge/quarry/pull/71 [15:11:16] 10Quarry: update build-and-push - https://phabricator.wikimedia.org/T378978#10299956 (10rook) 05Open→03Resolved a:03rook [15:17:05] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: components-api: bump to 0.0.48-20241010123512-f12ab9d2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/584 (https://phabricator.wikimedia.org/T356261 https://phabricator.wikimedia.org/T362066 https://phabricator.wikimedia.org/T362069) [15:28:56] RESOLVED: SystemdUnitDown: The service unit wmf_auto_restart_virtlogd.service is in failed status on host cloudvirt1063. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:29:11] (03update) 10dcaro: components-api: bump to 0.0.64-20241107151530-f4a19dfa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/584 (https://phabricator.wikimedia.org/T356261 https://phabricator.wikimedia.org/T362066 https://phabricator.wikimedia.org/T362069) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:30:44] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Frequent radosgw 500 errors with Object Storage - https://phabricator.wikimedia.org/T360626#10300112 (10Raymond_Ndibe) 05Open→03Resolved [15:31:14] 10wikitech.wikimedia.org, 10Wikimedia-Site-requests: fold contentadmin group to sysop in Wikitech - https://phabricator.wikimedia.org/T375950#10300119 (10taavi) a:03taavi [15:35:58] 06cloud-services-team: SystemdUnitDown cloudcontrol1007:9100 The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T379133#10300164 (10fnegri) [15:36:00] 06cloud-services-team, 10Cloud-VPS: Remove tf-infra-test project - https://phabricator.wikimedia.org/T379076#10300165 (10fnegri) [15:36:13] 06cloud-services-team: SystemdUnitDown cloudcontrol1007:9100 The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T379133#10300168 (10fnegri) 05Open→03Resolved a:03fnegri This was caused by {T379076} and is n... [15:38:14] FIRING: Kernel warning: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [15:38:14] FIRING: Kernel err priority: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+err+priority [15:38:52] 06cloud-services-team: NodeDownForLong cloudvirt1063:9100 The node cloudvirt1063 has been unreachable for more than two hours. - https://phabricator.wikimedia.org/T378642#10300177 (10fnegri) 05Open→03Resolved a:03fnegri This was raised by a manual shutdown of the host to test {T375479}. The host is cur... [15:46:48] 10Tool-video-answer-tool, 06Future-Audiences: [Spike] Experiment with nightcore video mode - https://phabricator.wikimedia.org/T378639#10300193 (10etz) a:03etz [15:51:12] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [15:51:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Improve WMCS NodeDown alerts - https://phabricator.wikimedia.org/T375479#10300206 (10fnegri) 05In progress→03Resolved > It should now trigger 1 alert, and open 1 phab task, with the host name in the task title. I merged all the patches and tested... [15:56:22] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270 (10Raymond_Ndibe) 03NEW [15:56:38] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [15:56:38] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10300263 (10fnegri) Mainboard was replaced. I'm gonna reimage the host before putting it back into service. [15:58:29] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271 (10Raymond_Ndibe) 03NEW [15:59:09] (03approved) 10dcaro: components-api: bump to 0.0.64-20241107151530-f4a19dfa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/584 (https://phabricator.wikimedia.org/T356261 https://phabricator.wikimedia.org/T362066 https://phabricator.wikimedia.org/T362069) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:59:15] (03merge) 10dcaro: components-api: bump to 0.0.64-20241107151530-f4a19dfa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/584 (https://phabricator.wikimedia.org/T356261 https://phabricator.wikimedia.org/T362066 https://phabricator.wikimedia.org/T362069) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [16:02:11] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10300294 (10aborrero) LGTM. [16:03:16] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10300297 (10aborrero) LGTM. [16:12:20] 06cloud-services-team, 10Cloud-VPS: tofu-infra: check for leaked resources when deleting projects - https://phabricator.wikimedia.org/T379231#10300355 (10aborrero) 05Open→03Declined We discussed the tofu-infra approach, but is maybe not very smooth. See https://gitlab.wikimedia.org/repos/cloud/cloud-vp... [16:16:44] RESOLVED: Kernel warning: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [16:17:12] 06cloud-services-team, 10Cloud-VPS: codfw1dev ldap tls certificate names do not match dns used by labtestwikitech - https://phabricator.wikimedia.org/T342185#10300374 (10taavi) 05Open→03Declined Closing as labtestwikitech is gone. [16:20:44] RESOLVED: Kernel err priority: Server cloudvirt1063 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DKernel+err+priority [16:32:44] 06cloud-services-team, 10Cloud-VPS: openstack: wmf sink: extend it to support IPv6 - https://phabricator.wikimedia.org/T378192#10300408 (10aborrero) 05Open→03In progress code was deployed and enabled in codfw1dev. Tomorrow I'll check everything is working as expected before declaring victory. [16:38:17] (03open) 10sstefanova: deploy: add delete endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/37 (https://phabricator.wikimedia.org/T379093) [16:41:42] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10300457 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host cloudvirt1063.eqiad.wmnet with OS bookworm [16:42:06] 10Toolforge (Toolforge iteration 16): [sample-complex-app] trigger continuous job deployment on merge - https://phabricator.wikimedia.org/T379277 (10dcaro) 03NEW [16:42:36] 10Toolforge (Toolforge iteration 16): [sample-complex-app] trigger continuous job deployment on merge - https://phabricator.wikimedia.org/T379277#10300470 (10dcaro) p:05Triage→03High [16:42:59] 06cloud-services-team, 10Toolforge: Jobs hang on toolforge - https://phabricator.wikimedia.org/T379132#10300475 (10dcaro) p:05Triage→03High [16:45:57] 06cloud-services-team, 10Toolforge: Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10300489 (10dcaro) p:05Triage→03High [16:47:45] 10Tool-gitlab-account-approval: Approval job can get stuck and prevent subsequent jobs from firing - https://phabricator.wikimedia.org/T379130#10300486 (10dcaro) p:05Triage→03High >>! In T379130#10297198, @bd808 wrote: > This has all happened before: T306391#9436882 Yep, NFS/kernel/not sure what has changed... [16:51:11] (03update) 10sstefanova: deploy: add delete endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/37 (https://phabricator.wikimedia.org/T379093) [16:54:39] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10300573 (10Raymond_Ndibe) [16:55:37] 06cloud-services-team, 10Cloud-VPS: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282 (10taavi) 03NEW [16:55:53] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10300587 (10Raymond_Ndibe) [16:56:05] 06cloud-services-team, 10Cloud-VPS: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282#10300590 (10taavi) [16:58:08] 06cloud-services-team, 10Cloud-VPS: IPv6 support in cloud-private - https://phabricator.wikimedia.org/T379283 (10taavi) 03NEW [16:58:40] 06cloud-services-team, 10Cloud-VPS, 07IPv6: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282#10300607 (10taavi) [16:58:45] 06cloud-services-team, 10Cloud-VPS, 07IPv6: IPv6 support in cloud-private - https://phabricator.wikimedia.org/T379283#10300608 (10taavi) [17:10:09] 06cloud-services-team, 10Cloud-VPS, 07IPv6: IPv6 support in cloud-private - https://phabricator.wikimedia.org/T379283#10300635 (10taavi) Hi @cmooney (and cc @aborrero) - For this and the parent task I need v6 subnets for the following: * per-rack cloud-private subnets for hosts (v4 uses [[ https://netbox.wik... [17:11:15] 06cloud-services-team, 10Toolforge: Provide a simple list of the built container images for a given tool (`toolforge build list` subset) - https://phabricator.wikimedia.org/T362836#10300659 (10dcaro) You can work around that by running `toolforge jobs images`, I know it's kinda hidden in the other command, but... [17:15:17] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): Introduce health checks for Toolforge Jobs Framework cronjobs - https://phabricator.wikimedia.org/T377420#10300721 (10dcaro) Note, health checks would not force the pod to be reallocated to another worker, just restart the container, so this would no... [17:15:24] 06cloud-services-team, 10Toolforge: Add --timeout to toolforge jobs - https://phabricator.wikimedia.org/T377782#10300722 (10dcaro) >>! In T377782#10296805, @AntiCompositeNumber wrote: > Duplicate of {T306391}? yep, I'll merge there, thanks! [17:17:00] 10Tool-gitlab-account-approval: Approval job can get stuck and prevent subsequent jobs from firing - https://phabricator.wikimedia.org/T379130#10300734 (10dcaro) >>! In T379130#10296703, @dcaro wrote: > > That makes me wonder, do liveness probe failures move the pod to a different node? Nope :/, so livenessProb... [17:17:54] 06cloud-services-team, 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#10300726 (10dcaro) [17:18:41] 06cloud-services-team, 10Toolforge: Jobs hang on toolforge - https://phabricator.wikimedia.org/T379132#10300738 (10dcaro) [17:18:43] 06cloud-services-team, 10Toolforge: Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10300739 (10dcaro) [17:18:48] 06cloud-services-team, 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#10300740 (10dcaro) [17:18:57] 06cloud-services-team, 10Toolforge: Add --timeout to toolforge jobs - https://phabricator.wikimedia.org/T377782#10300724 (10dcaro) →14Duplicate dup:03T306391 [17:24:43] 06cloud-services-team, 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#10300762 (10dcaro) >>! In T306391#10132235, @AntiCompositeNumber wrote: > My preferred solution for this is a `concurrencyPolicy` of `Replace`, whi... [17:29:32] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10300786 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host cloudvirt1063.eqiad.wmnet with OS bookworm completed: - cloudvirt1063 (... [17:31:52] 06cloud-services-team, 10Toolforge: Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10300807 (10dcaro) For the concurrency configuration, I'm thinking something a bit more explicit than replace, usua... [17:33:04] 06cloud-services-team, 10Toolforge: Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10300810 (10JJMC89) >>! In T377781#10295383, @aborrero wrote: > I would set this unconditionally, document the beha... [17:34:58] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): [jobs-api,jobs-cli] Introduce health checks for Toolforge Jobs Framework cronjobs - https://phabricator.wikimedia.org/T377420#10300822 (10dcaro) [17:35:17] 10Tools: chie-bot: Jobs hang on toolforge - https://phabricator.wikimedia.org/T379132#10300825 (10JJMC89) [17:35:18] 06cloud-services-team, 10Toolforge: [jobs-api,jobs-cli] Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10300824 (10dcaro) [17:37:11] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [jobs-api,jobs-cli] restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10300829 (10dcaro) [17:40:59] 10Tool-gitlab-account-approval: Approval job can get stuck and prevent subsequent jobs from firing - https://phabricator.wikimedia.org/T379130#10300838 (10bd808) p:05High→03Medium > dcaro triaged this task as High priority. Setting this to the normal "medium" priority. Note this task is specific to the #too... [17:41:57] 10Tool-gitlab-account-approval: Approval job can get stuck and prevent subsequent jobs from firing - https://phabricator.wikimedia.org/T379130#10300874 (10dcaro) > Setting this to the normal "medium" priority. Note this task is specific to the Tool-gitlab-account-approval tool and not a general "make things bett... [17:42:22] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [17:42:28] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [17:43:09] (03update) 10dcaro: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) (owner: 10raymond-ndibe) [17:44:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:49:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:51:58] RECOVERY - ensure kvm processes are running on cloudvirt1063 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:56:58] PROBLEM - ensure kvm processes are running on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:00:09] !log fnegri@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1063'] [18:00:31] !log fnegri@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1063'] [18:00:58] RECOVERY - ensure kvm processes are running on cloudvirt1063 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:15:00] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10301047 (10fnegri) The host is reimaged and repooled! We noticed a kernel message that is not present in other cloudvirts: ` root@cloudvirt1063:~# journalctl -k -p err Nov 07... [18:26:41] (03open) 10dcaro: openapi: add external url setting [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/52 [18:44:29] 06cloud-services-team, 10Toolforge: [builds-builder] Cache .m2 folder (local maven repository) between builds - https://phabricator.wikimedia.org/T350307#10301227 (10Don-vip) Hi @Slst2020! I'm sorry I didn't see your last reply. The Maven part of the build currently takes less than a minute, and the whole buil... [18:58:27] 10Cloud-VPS, 10Wikispore: Vanity domain for Wikispore - https://phabricator.wikimedia.org/T368236#10301280 (10Tgr) We did want a wikispore vanity domain. T365641 was about a different server though, which runs [[https://www.dsantini.it/owmf.pdf|OWMF]]. [19:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:43:45] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:48:45] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:15:54] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10301557 (10Eevans) >>! In T376267#10297262, @bd808 wrote: >>>! In T376267#10295702, @roti_WMDE wrote: >>>>! In T376267#10293251, @Ladsgroup wrote: >>> Hi, can you try the 2fa value for your... [20:43:53] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10301715 (10Raymond_Ndibe) [20:44:52] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10301716 (10Raymond_Ndibe) [20:49:06] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10301785 (10Raymond_Ndibe) ` radosgw-admin quota set --quota-scope=user --uid=toolsbeta\$toolsbeta --max-size=50G --max-objects=51107 ` [20:51:00] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10301787 (10Raymond_Ndibe) **Before:** ` root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid toolsbeta\$toolsbeta { "user_id": "toolsbeta$toolsbeta", "di... [20:51:53] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10301788 (10Raymond_Ndibe) ` radosgw-admin quota set --quota-scope=user --uid=tools\$tools --max-size=50G --max-objects=51107 ` [20:54:05] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10301815 (10Raymond_Ndibe) **Before:** ` root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid tools\$tools { "user_id": "tools$tools", "display_name": "tools"... [20:54:46] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Tools project - https://phabricator.wikimedia.org/T379271#10301817 (10Raymond_Ndibe) 05Open→03Resolved [20:54:48] 10Cloud-VPS (Quota-requests): Increase Object Storage quota for Toolsbeta project - https://phabricator.wikimedia.org/T379270#10301818 (10Raymond_Ndibe) 05Open→03Resolved [21:00:33] 10Tool-video-answer-tool, 06Future-Audiences: Change default audio output to 1.25x - https://phabricator.wikimedia.org/T379307 (10Maryana) 03NEW [21:00:44] 10Tool-video-answer-tool, 06Future-Audiences: Change default audio output to 1.25x - https://phabricator.wikimedia.org/T379307#10301846 (10Maryana) [21:00:46] 10Tool-video-answer-tool, 06Future-Audiences, 07Epic: [Epic] Video tool refinements - https://phabricator.wikimedia.org/T377392#10301847 (10Maryana) [21:02:38] 10Tool-video-answer-tool, 06Future-Audiences: Create static image output option - https://phabricator.wikimedia.org/T379308 (10Maryana) 03NEW [21:02:49] 10Tool-video-answer-tool, 06Future-Audiences: Create static image output option - https://phabricator.wikimedia.org/T379308#10301864 (10Maryana) [21:02:50] 10Tool-video-answer-tool, 06Future-Audiences, 07Epic: [Epic] Video tool refinements - https://phabricator.wikimedia.org/T377392#10301865 (10Maryana) [21:18:13] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-harbor-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:28:58] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10302000 (10bd808) >>! In T376267#10301557, @Eevans wrote: >>>! In T376267#10297262, @bd808 wrote: >> Wikitech itself no longer has 2FA installed as it was prior to October 1, 2024. If you ar... [23:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks