[01:19:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:46:41] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [01:46:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:46:53] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [01:46:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [02:17:23] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [02:17:25] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [02:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [02:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [02:23:30] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9988997 (10MusikAnimal) Hi! If possible, I'm requesting a date and approximate time be chosen for when you'd like to do this for XTools. The db isn't big or... [04:05:52] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9989025 (10Audiodude) It would be best for mwoffliner (which runs the WP 1.0 Bot) if the maintenance wasn't between 0:00 - 4:00 UTC, because that's when the... [04:42:23] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [04:43:16] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [04:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [04:43:28] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [04:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [04:46:21] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29687 bytes in 7.797 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [05:13:45] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [05:13:47] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [05:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [05:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [05:19:32] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: add support for punjabi in yearinreview tool - https://phabricator.wikimedia.org/T369465#9989064 (10Soda) +1 to this, a Wikisource yearinreview will be a good idea [05:19:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:38:22] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request to increase catalyst project: cores and memory (2024-07-16) - https://phabricator.wikimedia.org/T370195#9989087 (10Slst2020) 05Open→03In progress a:03Slst2020 [05:40:26] !log sstefanova@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase [05:40:34] !log sstefanova@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [05:42:21] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request to increase catalyst project: cores and memory (2024-07-16) - https://phabricator.wikimedia.org/T370195#9989090 (10Slst2020) Done! ` sstefanova@cloudcontrol1005:~$ sudo wmcs-openstack quota show catalyst +-----------------------+-------+ | Resource... [05:42:27] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request to increase catalyst project: cores and memory (2024-07-16) - https://phabricator.wikimedia.org/T370195#9989091 (10Slst2020) 05In progress→03Resolved [07:17:28] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [07:19:24] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29689 bytes in 4.591 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [07:39:31] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [07:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:39:43] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [07:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:55:37] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26 - https://phabricator.wikimedia.org/T327025#9989184 (10Slst2020) [08:05:31] 06cloud-services-team, 10Toolforge: [lima-kilo, k8s] Upgrade Kubernetes in lima-kilo to version 1.26 - https://phabricator.wikimedia.org/T370244 (10Slst2020) 03NEW [08:06:32] 06cloud-services-team, 10Toolforge: [lima-kilo, k8s] Upgrade Kubernetes in lima-kilo to version 1.26 - https://phabricator.wikimedia.org/T370244#9989205 (10Slst2020) [08:06:33] 06cloud-services-team, 10Toolforge: toolforge: upgrade all Kubernetes components to versions supporting Kubernetes 1.26 - https://phabricator.wikimedia.org/T370046#9989206 (10Slst2020) [08:09:59] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [08:10:01] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [08:10:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:12:46] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26 - https://phabricator.wikimedia.org/T327025#9989211 (10Slst2020) [08:13:27] (03update) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:14:48] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx [08:14:59] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx [08:18:32] (03update) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:19:24] 06cloud-services-team, 10Toolforge: [infra,k8s] review kubelet flags before 1.26 upgrade - https://phabricator.wikimedia.org/T370245 (10Slst2020) 03NEW [08:20:26] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx [08:20:36] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx [08:21:30] 06cloud-services-team, 10Toolforge: [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246 (10Slst2020) 03NEW [08:21:56] FIRING: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:22:44] (03update) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:22:56] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx [08:23:06] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx [08:26:07] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx [08:26:18] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx [08:27:35] (03update) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:28:04] 06cloud-services-team, 10Toolforge: [infra,k8s] review k8s API usage by custom components for 1.26 upgrade - https://phabricator.wikimedia.org/T370247 (10Slst2020) 03NEW [08:29:58] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248 (10Slst2020) 03NEW [08:30:40] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248#9989274 (10Slst2020) [08:31:39] (03update) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:32:38] (03merge) 10aborrero: ingress-nginx: scale up deployment [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/422 (https://phabricator.wikimedia.org/T370162) [08:34:37] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Tools to k8s version 1.26 - https://phabricator.wikimedia.org/T370249 (10Slst2020) 03NEW [08:37:30] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolforge: ingress-nginx pods get OOMkilled, consider scaling up - https://phabricator.wikimedia.org/T370162#9989279 (10aborrero) 05Open→03Resolved a:03aborrero [08:48:56] (03update) 10dcaro: run_functional_tests: enable running as a different user [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 [08:52:41] (03update) 10sstefanova: jobs-api: bump to 0.0.319-20240716153429-ac8e3c99 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/423 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T367181) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:54:18] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [08:54:22] !log sstefanova@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-api [08:54:47] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [08:54:57] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [09:02:43] (03update) 10sstefanova: jobs-api: bump to 0.0.319-20240716153429-ac8e3c99 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/423 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T367181) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:02:59] 10Horizon: Horizon does not commit changes to cloud/instance-puppet git repo since June 24th 2024 - https://phabricator.wikimedia.org/T370136#9989327 (10dcaro) 05Open→03Resolved a:03dcaro I think @Andrew has fixed that already (thanks!), I see the changes already applied [09:04:41] (03open) 10dcaro: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] (allow_running_as_non_tool) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:05:30] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [09:05:36] (03update) 10dcaro: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] (allow_running_as_non_tool) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:06:59] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [09:07:10] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [09:08:24] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29641 bytes in 3.698 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [09:11:58] 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: [jobs-api,builds-api,envvars-api] consolidate api paths - https://phabricator.wikimedia.org/T365014#9989354 (10Slst2020) [09:12:52] (03update) 10sstefanova: remove /api prefix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/50 [09:14:38] (03update) 10sstefanova: jobs-api: bump to 0.0.319-20240716153429-ac8e3c99 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/423 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T367181) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:14:45] (03approved) 10sstefanova: jobs-api: bump to 0.0.319-20240716153429-ac8e3c99 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/423 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T367181) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:14:52] (03merge) 10sstefanova: jobs-api: bump to 0.0.319-20240716153429-ac8e3c99 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/423 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T367181) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:19:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:22:32] (03update) 10sstefanova: run_functional_tests: enable running as a different user [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 (owner: 10dcaro) [09:26:56] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:31:06] (03update) 10sstefanova: run_functional_tests: enable running as a different user [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 (owner: 10dcaro) [09:31:09] (03approved) 10sstefanova: run_functional_tests: enable running as a different user [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 (owner: 10dcaro) [09:31:31] (03update) 10sstefanova: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] (allow_running_as_non_tool) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 (owner: 10dcaro) [09:33:26] 10cloud-services-team (FY2023/2024-Q3-Q4), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252 (10dcaro) 03NEW p:05Triage→03High [09:33:39] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 12), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252#9989396 (10dcaro) [09:34:41] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 12), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252#9989398 (10dcaro) @aborrero you might be able to help... [09:41:04] (03update) 10sstefanova: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] (allow_running_as_non_tool) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 (owner: 10dcaro) [09:41:14] (03approved) 10sstefanova: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] (allow_running_as_non_tool) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 (owner: 10dcaro) [09:42:28] (03merge) 10dcaro: run_functional_tests: enable running as a different user [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 [09:42:30] (03update) 10dcaro: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:43:09] (03approved) 10dcaro: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:45:15] (03merge) 10dcaro: run_functional_tests: ensure only one is running [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:47:57] (03update) 10sstefanova: pre-commit: add openapi version bump check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T356972) (owner: 10dcaro) [09:48:08] (03approved) 10sstefanova: pre-commit: add openapi version bump check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T356972) (owner: 10dcaro) [09:48:48] (03update) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/40 (https://phabricator.wikimedia.org/T365014) [09:49:20] (03update) 10sstefanova: [builds-cli] bug fix [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/79 (owner: 10raymond-ndibe) [10:00:21] (03CR) 10David Caro: [C:03+2] ceph: fix off-by-one index when draining/undraining in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1053646 (owner: 10David Caro) [10:00:23] (03CR) 10David Caro: [C:03+2] depool_and_destroy: also zap the devices [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054376 (owner: 10David Caro) [10:00:25] (03CR) 10David Caro: [C:03+2] bootstrap_and_add: skip host if no new devices found [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054494 (owner: 10David Caro) [10:00:28] (03CR) 10David Caro: [C:03+2] ceph.checks: add extra logs for easy following [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054508 (owner: 10David Caro) [10:00:31] (03CR) 10David Caro: [C:03+2] bootstrap_and_add: Use correct device path [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054509 (owner: 10David Caro) [10:03:22] (03merge) 10dcaro: pre-commit: add openapi version bump check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T356972) [10:03:29] (03Merged) 10jenkins-bot: ceph: fix off-by-one index when draining/undraining in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1053646 (owner: 10David Caro) [10:03:29] (03Merged) 10jenkins-bot: depool_and_destroy: also zap the devices [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054376 (owner: 10David Caro) [10:03:30] (03Merged) 10jenkins-bot: bootstrap_and_add: skip host if no new devices found [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054494 (owner: 10David Caro) [10:03:53] (03Merged) 10jenkins-bot: ceph.checks: add extra logs for easy following [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054508 (owner: 10David Caro) [10:03:53] (03Merged) 10jenkins-bot: bootstrap_and_add: Use correct device path [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054509 (owner: 10David Caro) [10:08:10] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) [10:12:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 12), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned, 13Patch-For-Review: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252#9989480 (10dcaro) Just reuse... [10:13:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [10:13:36] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [10:13:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 12), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned, 13Patch-For-Review: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252#9989482 (10dcaro) 05Open→... [10:13:40] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [10:13:51] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [10:22:17] (03approved) 10dcaro: builds-api: bump to 0.0.164-20240716153428-d1c47de5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/424 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:22:21] (03update) 10dcaro: builds-api: bump to 0.0.164-20240716153428-d1c47de5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/424 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:22:46] (03merge) 10dcaro: builds-api: bump to 0.0.164-20240716153428-d1c47de5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/424 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:23:08] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [10:23:17] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [10:24:55] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [10:25:06] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [10:26:03] (03update) 10dcaro: envvars-api: bump to 0.0.55-20240716163331-6f3efd6d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/425 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:30:28] (03approved) 10dcaro: envvars-api: bump to 0.0.55-20240716163331-6f3efd6d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/425 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:30:30] (03update) 10dcaro: envvars-api: bump to 0.0.55-20240716163331-6f3efd6d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/425 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:31:09] (03update) 10dcaro: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:32:50] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [10:33:01] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [10:34:23] 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: [builds-api,envvars-api,jobs-api] bump the version in the openapi definition when bumping the package version - https://phabricator.wikimedia.org/T356972#9989509 (10dcaro) 05In progress→03Resolved [10:36:28] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [10:36:30] (03merge) 10dcaro: envvars-api: bump to 0.0.55-20240716163331-6f3efd6d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/425 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:36:40] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [10:36:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:44:11] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [10:44:22] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [10:48:17] 10Toolforge (Toolforge iteration 12): [jobs-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367180#9989546 (10dcaro) 05Open→03In progress [10:48:30] 10Toolforge (Toolforge iteration 12): [builds-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367182#9989549 (10dcaro) 05Open→03In progress [10:54:17] (03approved) 10dcaro: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:54:19] (03update) 10dcaro: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:54:27] (03update) 10dcaro: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:54:50] (03merge) 10dcaro: jobs-api: bump to 0.0.320-20240717100338-661aeaaf [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/428 (https://phabricator.wikimedia.org/T356972) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [11:01:36] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [11:02:30] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29641 bytes in 4.478 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [11:03:37] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "maps-experiments" project Buster deprecation - https://phabricator.wikimedia.org/T367539#9989573 (10Jgiannelos) This project is not used any more. Lets delete it. [11:06:44] (03open) 10dcaro: run_functional_tests: embed the version checking [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/429 [11:07:16] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [11:07:18] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [11:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:07:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:12:48] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [11:12:52] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder [11:13:06] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [11:13:17] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [11:19:20] (03open) 10dcaro: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 [11:22:39] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: `webservice` requires effective user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369569#9989633 (10dcaro) [11:22:42] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 13), 05Goal: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9989641 (10dcaro) [11:24:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: Refresh certs that are not controlled by kubeadm (mid 2024 edition) - https://phabricator.wikimedia.org/T309782#9989637 (10dcaro) [11:24:31] 06cloud-services-team, 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#9989639 (10dcaro) [11:24:47] 10Toolforge (Toolforge iteration 13), 07Documentation: [harbor,docs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#9989643 (10dcaro) [11:24:56] 10Toolforge (Toolforge iteration 13): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9989649 (10dcaro) [11:25:06] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review, 07Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417#9989651 (10dcaro) [11:25:09] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api, jobs-cli] Prefix all endpoints with `/tool/` - https://phabricator.wikimedia.org/T363346#9989635 (10dcaro) [11:25:12] 10Toolforge (Toolforge iteration 13), 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#9989647 (10dcaro) [11:25:26] (03update) 10dcaro: Draft: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [11:25:35] 10Toolforge (Toolforge iteration 13), 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#9989645 (10dcaro) [11:26:17] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 07Epic: [Hypothesis] WE6.3.1 Consulting Toolforge roots/maintainers - https://phabricator.wikimedia.org/T368601#9989660 (10dcaro) [11:26:20] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 07Epic: [Hypothesis] WE6.3.2 Create "standard" tool (Sample Complex Tool, SCT) to measure the number of steps for a deployment - https://phabricator.wikimedia.org/T368602#9989658 (10dcaro) [11:26:21] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes - https://phabricator.wikimedia.org/T356974#9989653 (10dcaro) [11:26:27] 10Toolforge (Toolforge iteration 13): [builds-cli,builds-api] `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701#9989655 (10dcaro) [11:26:29] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s,monitoring] Add an alert to warn when the prometheus k8s cert is about to expire - https://phabricator.wikimedia.org/T366579#9989664 (10dcaro) [11:26:51] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [envvars-api] Remove authentication and use api-gateway provided headers - https://phabricator.wikimedia.org/T367181#9989662 (10dcaro) [11:27:12] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api] Split the API, business, and k8s models - https://phabricator.wikimedia.org/T359808#9989672 (10dcaro) [11:27:25] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [toolforge] Investigate authentication - https://phabricator.wikimedia.org/T363983#9989670 (10dcaro) [11:27:54] 10Toolforge (Toolforge iteration 13): [toolforge-cli,jobs-cli,builds-cli,envvars-cli] Explore OpenAPI SDK tooling for client consolidation - https://phabricator.wikimedia.org/T356261#9989668 (10dcaro) [11:27:56] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 13): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9989666 (10dcaro) [11:28:41] 10Toolforge (Toolforge iteration 13): [jobs-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367180#9989676 (10dcaro) [11:28:43] 10Toolforge (Toolforge iteration 13): [builds-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367182#9989674 (10dcaro) [11:29:18] (03update) 10dcaro: Draft: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [11:29:37] 10Toolforge (Toolforge iteration 13): [api-gateway] Move authentication from the APIs - https://phabricator.wikimedia.org/T367179#9989680 (10dcaro) [11:29:40] 10Toolforge (Toolforge iteration 13): envvars-api 0.0.50 depends on unreleased envvars-cli changes - https://phabricator.wikimedia.org/T367961#9989682 (10dcaro) [11:29:55] (03update) 10dcaro: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [11:30:17] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#9989678 (10dcaro) [11:30:47] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api] move jobs load feature to the backend - https://phabricator.wikimedia.org/T366209#9989686 (10dcaro) [11:30:53] (03update) 10dcaro: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [11:31:02] 10Toolforge (Toolforge iteration 13): Toolforge Aptfile not producing working copy of `ffmpeg` - https://phabricator.wikimedia.org/T365633#9989688 (10dcaro) [11:31:23] 10Toolforge (Toolforge iteration 13): [cli] the generic cli swallows the `--` from other commands - https://phabricator.wikimedia.org/T370184#9989696 (10dcaro) [11:31:28] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton Vue.js application skeleton - https://phabricator.wikimedia.org/T370178#9989697 (10dcaro) [11:31:31] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9989698 (10dcaro) [11:31:34] 10Toolforge (Toolforge iteration 13): [jobs-cli] enforce proper validation for load jobs before calculate_changes - https://phabricator.wikimedia.org/T366211#9989692 (10dcaro) [11:31:37] 10Toolforge (Toolforge iteration 13): [toolforge deploy] direct-api tests fail intermittently on toolsbeta - https://phabricator.wikimedia.org/T369891#9989699 (10dcaro) [11:31:41] 10Toolforge (Toolforge iteration 13): [toolforge, toolforge-cli] Experiment with PyInstaller to package CLI tools for buildpack images - https://phabricator.wikimedia.org/T369693#9989700 (10dcaro) [11:31:42] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: integrate fourohfour as a custom component, rather than a normal tool - https://phabricator.wikimedia.org/T369364#9989701 (10dcaro) [11:31:43] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api,builds-api,envvars-api] consolidate api paths - https://phabricator.wikimedia.org/T365014#9989684 (10dcaro) [11:31:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9989702 (10dcaro) [11:31:49] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 13), 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#9989704 (10dcaro) [11:31:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: kubernetes can't revoke certificates - https://phabricator.wikimedia.org/T365681#9989705 (10dcaro) [11:31:55] 10Toolforge (Toolforge iteration 13), 07Epic: [jobs-cli,builds-cli,toolforge-cli,webservice] Consolidate the Toolforge CLIs - https://phabricator.wikimedia.org/T356262#9989707 (10dcaro) [11:31:58] 06cloud-services-team, 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#9989706 (10dcaro) [11:32:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [api-gateway] add alert for uptime - https://phabricator.wikimedia.org/T348633#9989708 (10dcaro) [11:32:06] 10Toolforge (Toolforge iteration 13), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [maintain-harbor,docs] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176#9989709 (10dcaro) [11:32:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): Upgrade Toolforge (Elastic|Open)Search cluster to Debian Bullseye - https://phabricator.wikimedia.org/T311905#9989710 (10dcaro) [11:33:05] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api] Save business models in a DB - https://phabricator.wikimedia.org/T359650#9989690 (10dcaro) [11:34:13] (03open) 10dcaro: ingress: add the ingress component [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/172 [11:35:04] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9989722 (10dcaro) a:03dcaro [11:37:28] 06cloud-services-team, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance: Move coludcephmon1001 from B7 to rack F4 - https://phabricator.wikimedia.org/T330733#9989729 (10dcaro) 05Open→03Resolved a:03dcaro This will be done as a refresh of the host instead {T364870} [11:52:26] (03merge) 10sstefanova: remove /api prefix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/50 [12:05:07] (03open) 10sstefanova: d/changelog: bump to 16.0.14 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/52 [12:21:23] (03approved) 10aborrero: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 (owner: 10dcaro) [12:22:48] (03merge) 10sstefanova: d/changelog: bump to 16.0.14 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/52 [12:22:52] (03approved) 10aborrero: run_functional_tests: embed the version checking [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/429 (owner: 10dcaro) [12:26:24] (03approved) 10sstefanova: ingress: add the ingress component [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/172 (owner: 10dcaro) [12:33:36] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [12:36:23] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [12:37:38] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [12:39:32] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29641 bytes in 3.825 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [12:45:39] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [12:50:38] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [12:51:16] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [12:55:41] (03merge) 10dcaro: run_functional_tests: embed the version checking [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/429 [12:55:45] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 07Epic, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Epic] Implement global user contributions feature - https://phabricator.wikimedia.org/T337089#9989917 (10Niharika) @Bugreporter thanks for explaining that. @T... [12:55:50] (03update) 10dcaro: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [12:56:18] (03merge) 10dcaro: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [12:56:30] (03merge) 10dcaro: ingress: add the ingress component [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/172 [12:57:32] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9989920 (10dcaro) We have a simple setup working now, will close this, though the move of fourohfour as a component is still som... [12:58:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9989922 (10dcaro) 05Open→03Resolved [13:10:13] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [envvars-api] Remove authentication and use api-gateway provided headers - https://phabricator.wikimedia.org/T367181#9989953 (10dcaro) 05In progress→03Resolved [13:11:15] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:13:29] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:15:43] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:15:50] (03update) 10sstefanova: d/changelog: bump to 16.0.14 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/52 [13:17:56] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:18:19] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9989985 (10Andrew) [13:18:22] (03open) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180) [13:18:51] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:19:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:21:07] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [envvars-api] Remove authentication and use api-gateway provided headers - https://phabricator.wikimedia.org/T367181#9989990 (10dcaro) 05Resolved→03In progress [13:21:40] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [13:21:45] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9989999 (10Andrew) >>! In T369723#9988997, @MusikAnimal wrote: > Hi! If possible, I'm requesting a date and approximate time be chosen for when you'd like to... [13:21:48] 10Toolforge (Toolforge iteration 13): [builds-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367182#9990001 (10dcaro) This was done in https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/102, though the bug tag was in a commit a... [13:22:09] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9990008 (10Andrew) >>! In T369723#9989025, @Audiodude wrote: > It would be best for mwoffliner (which runs the WP 1.0 Bot) if the maintenance wasn't between... [13:23:30] 10Toolforge (Toolforge iteration 13): [api-gateway] Move authentication from the APIs - https://phabricator.wikimedia.org/T367179#9990003 (10dcaro) 05Open→03Resolved [13:24:00] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367180#9990006 (10dcaro) 05In progress→03Stalled [13:27:14] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api,builds-api,envvars-api] consolidate api paths - https://phabricator.wikimedia.org/T365014#9990012 (10Slst2020) [13:32:50] (03update) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180) [13:34:03] 06cloud-services-team, 07Epic, 13Patch-For-Review: Cloud VPS: consider extending tofu-infra coverage - https://phabricator.wikimedia.org/T370037#9990020 (10aborrero) 05Open→03In progress p:05Triage→03Medium a:03aborrero [13:34:53] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [13:34:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:35:05] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [13:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:36:59] (03update) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180) [13:38:26] (03update) 10dcaro: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 [13:42:34] (03update) 10sstefanova: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 (owner: 10dcaro) [13:43:28] (03approved) 10sstefanova: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 (owner: 10dcaro) [13:43:32] (03update) 10sstefanova: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 (owner: 10dcaro) [13:44:33] 06cloud-services-team, 10Toolforge, 10Cumin, 06Infrastructure-Foundations: Allow interacting with Toolforge PuppetDB from wmcs-cookbooks - https://phabricator.wikimedia.org/T362629#9990078 (10fnegri) I just discovered that we do have dedicated Cumin instances for tools and toolsbeta, where `/etc/cumin/conf... [13:55:10] (03merge) 10dcaro: toolforge_get_versions: moved to the toolforge-deploy repo [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/171 [13:55:51] (03open) 10sstefanova: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) [13:56:37] (03open) 10sstefanova: cli: use /quotas endpoint [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/53 (https://phabricator.wikimedia.org/T365014) [13:59:07] (03approved) 10dcaro: cli: use /quotas endpoint [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/53 (https://phabricator.wikimedia.org/T365014) (owner: 10sstefanova) [13:59:25] (03approved) 10dcaro: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) (owner: 10sstefanova) [14:02:27] (03update) 10sstefanova: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) [14:05:29] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [14:05:32] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [14:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:07:17] (03update) 10sstefanova: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) [14:14:03] (03update) 10sstefanova: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) [14:22:22] 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations: Cumin: create external backend for WMCS Puppet API - https://phabricator.wikimedia.org/T179816#9990459 (10fnegri) > to allow to query hosts also by their Puppet classes. A bit of a tangent, but I'm adding some notes of a few things... [14:23:09] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9990467 (10Andrew) [14:23:42] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990473 (10dcaro) a:03dcaro [14:24:24] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990475 (10dcaro) 05Open→03In progress [14:25:11] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367180#9990471 (10dcaro) 05Stalled→03In progress [14:29:46] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990493 (10Slst2020) if you are feeling lazy, you can probably steal a bunch of stuff from here, including a local dev environment using docker-compose: https://github.com/blancadesal/fasta... [14:38:29] (03update) 10sstefanova: cli: use /quotas endpoint [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/53 (https://phabricator.wikimedia.org/T365014) [14:39:41] (03merge) 10sstefanova: api: make quota plural [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/107 (https://phabricator.wikimedia.org/T365014) [14:39:52] (03merge) 10sstefanova: cli: use /quotas endpoint [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/53 (https://phabricator.wikimedia.org/T365014) [14:42:41] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.321-20240717143951-2c8a3296 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/430 (https://phabricator.wikimedia.org/T365014) [14:42:45] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.321-20240717143951-2c8a3296 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/430 (https://phabricator.wikimedia.org/T365014) [14:45:01] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:45:12] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:49:07] 06cloud-services-team, 10Cloud-VPS, 07Epic, 13Patch-For-Review: Cloud VPS: consider extending tofu-infra coverage - https://phabricator.wikimedia.org/T370037#9990562 (10fnegri) [14:50:36] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:50:47] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:56:55] (03update) 10sstefanova: jobs-api: bump to 0.0.321-20240717143951-2c8a3296 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/430 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:57:07] (03approved) 10sstefanova: jobs-api: bump to 0.0.321-20240717143951-2c8a3296 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/430 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:57:12] (03merge) 10sstefanova: jobs-api: bump to 0.0.321-20240717143951-2c8a3296 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/430 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:02:13] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990605 (10dcaro) >>! In T370176#9990493, @Slst2020 wrote: > if you are feeling lazy, you can probably steal a bunch of stuff from here, including a local dev environment using docker-compo... [15:14:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:19:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:25:34] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:29:30] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:30:05] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:31:03] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:31:57] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:39:08] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:52:06] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:52:57] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:53:33] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:54:19] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:55:00] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:55:24] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:56:51] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:57:34] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:58:35] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:59:37] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [15:59:49] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990848 (10dcaro) Copied a minimal setup (tox+pre-commit), starting small, too much stuff in there xd [16:05:49] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990896 (10dcaro) Built and running in tools: ` tools.sample-complex-app@tools-bastion-13:~$ toolforge jobs run --command "curl http://backend-api:8000/" --image python3.11 --wait 120 check... [16:06:10] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990897 (10dcaro) Hmpf... I'm using the wrong port! xd [16:06:18] (03open) 10sstefanova: d/changelog: bump to 16.0.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T365014) [16:09:21] (03close) 10sstefanova: d/changelog: bump to 16.0.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T365014) [16:10:11] (03open) 10sstefanova: d/changelog: bump to 16.0.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/55 (https://phabricator.wikimedia.org/T365014) [16:10:25] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990924 (10dcaro) Fixed: ` tools.sample-complex-app@tools-bastion-13:~$ toolforge jobs run --command "curl http://backend-api:8080/" --image python3.11 --wait 120 check-api INFO: job 'check... [16:11:27] 10Toolforge (Toolforge iteration 13): [sct.backend] Create skeleton fastapi API - https://phabricator.wikimedia.org/T370176#9990926 (10dcaro) 05In progress→03Resolved [16:11:31] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton Vue.js application skeleton - https://phabricator.wikimedia.org/T370178#9990928 (10dcaro) a:03dcaro [16:13:08] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton Vue.js application skeleton - https://phabricator.wikimedia.org/T370178#9990930 (10dcaro) 05Open→03In progress [16:29:25] (03approved) 10sstefanova: d/changelog: bump to 16.0.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/55 (https://phabricator.wikimedia.org/T365014) [16:29:29] (03merge) 10sstefanova: d/changelog: bump to 16.0.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/55 (https://phabricator.wikimedia.org/T365014) [16:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:31:59] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [16:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:32:11] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [16:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:35:41] 10Cloud-VPS (Quota-requests): Request new flavor for integration project - https://phabricator.wikimedia.org/T370127#9991058 (10hashar) @Slst2020 thank you! [16:39:54] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [jobs-api,builds-api,envvars-api] consolidate api paths - https://phabricator.wikimedia.org/T365014#9991088 (10Slst2020) [16:43:19] (03open) 10sstefanova: api: drop deprecated endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/108 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T365014) [16:47:07] (03PS1) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [16:47:49] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312 (10hashar) 03NEW [16:49:49] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [16:51:45] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991209 (10hashar) ` bash -x /usr/local/sbin/make-instance-vol second-local-disk 100%FREE ext4 + name=seco... [16:52:01] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991212 (10hashar) a:05hashar→03None [16:52:45] (03PS2) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [16:55:33] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [16:57:59] (03PS3) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:01:04] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [17:02:44] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [17:02:46] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [17:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:03:34] (03PS4) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:06:23] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [17:07:38] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton node + Vue.js application - https://phabricator.wikimedia.org/T370178#9991314 (10dcaro) [17:08:33] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton node + Vue.js application - https://phabricator.wikimedia.org/T370178#9991309 (10dcaro) Created, up and running: {F56485564} [17:08:47] 10Toolforge (Toolforge iteration 13): [sct.frontend] Create skeleton node + Vue.js application - https://phabricator.wikimedia.org/T370178#9991311 (10dcaro) 05In progress→03Resolved [17:11:36] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.backend] Create trove database - https://phabricator.wikimedia.org/T370317 (10dcaro) 03NEW [17:13:37] (03PS5) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:16:29] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [17:16:42] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.backend] Create worker and connect to redis - https://phabricator.wikimedia.org/T370321 (10dcaro) 03NEW [17:20:29] (03PS6) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:20:42] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.backend] Transform the "/" API reply to json - https://phabricator.wikimedia.org/T370323 (10dcaro) 03NEW [17:21:54] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.frontend] Show the backend status - https://phabricator.wikimedia.org/T370324 (10dcaro) 03NEW [17:24:44] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991481 (10hashar) a:03hashar From the manpage on > `-l|--extents Number[PERCENT]` > > Specifies the... [17:25:17] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:25:20] !log andrew@cloudcumin1001 quarry END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) [17:26:51] (03PS7) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:28:51] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:28:55] !log andrew@cloudcumin1001 quarry END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) [17:30:21] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991536 (10hashar) Probably due to me passing shellcheck on the script with a2f7e3cf5134eba4788f7da21de54c... [17:33:31] (03PS8) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:33:41] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:33:44] !log andrew@cloudcumin1001 quarry END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) [17:34:51] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:34:54] !log andrew@cloudcumin1001 quarry END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) [17:36:44] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [17:38:07] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991642 (10hashar) I have cherry picked https://gerrit.wikimedia.org/r/c/operations/... [17:38:20] (03PS9) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:41:02] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9991644 (10hashar) 05Open→03In progress [17:41:31] (03PS10) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:42:44] (03PS11) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:42:47] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:42:49] !log andrew@cloudcumin1001 quarry END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99) [17:43:00] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:43:48] (03PS12) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:44:58] !log andrew@cloudcumin1001 quarry END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) [17:45:25] !log andrew@cloudcumin1001 quarry START - Cookbook wmcs.openstack.rebuild_dbinstance [17:47:14] (03CR) 10CI reject: [V:04-1] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [17:47:24] !log andrew@cloudcumin1001 quarry END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) [17:49:01] (03PS13) 10Andrew Bogott: Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) [17:49:56] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9991717 (10Andrew) [17:56:21] (03CR) 10Andrew Bogott: "This has worked a couple of times" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [18:07:10] (03update) 10sstefanova: api: drop deprecated endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/108 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T365014) [18:20:28] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9992018 (10Andrew) [18:51:10] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [18:51:31] PROBLEM - Host cloudcontrol1005 is DOWN: PING CRITICAL - Packet loss = 100% [18:52:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:55:47] FIRING: NodeDown: The node cloudcontrol1005 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [18:56:06] I'm rebooting ^ to see if it helps. It was running but totally cut off from the network [18:56:10] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [18:58:49] FIRING: [7x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:08:15] RECOVERY - Host cloudcontrol1005 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [19:10:47] RESOLVED: NodeDown: The node cloudcontrol1005 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [19:11:10] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [19:12:22] RESOLVED: [6x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:13:31] PROBLEM - Host cloudcontrol1005 is DOWN: PING CRITICAL - Packet loss = 100% [19:14:16] ^that's me rebooting for good measure. it should recover shortly. [19:14:25] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [19:15:13] RECOVERY - Host cloudcontrol1005 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [19:16:40] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [19:19:25] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [19:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:22:22] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#9992372 (10Andrew) ...anyone there? Hosts like these are in the way of some upgrades that I'd like to complete. [19:23:49] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9992377 (10Andrew) [19:24:18] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 05Goal: Update all trove VMs to a modern guest image - https://phabricator.wikimedia.org/T369723#9992378 (10Andrew) [19:25:44] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9992386 (10VRiley-WMF) @Jclark-ctr and @cmooney I have plugged in a 2nd network cable. Here is that information cloudcephosd1035 - CableID 5328 : Port 42... [19:26:19] RESOLVED: [3x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:29:40] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723#9992389 (10Andrew) 05In progress→03Resolved [19:29:46] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): Upgrade Toolforge (Elastic|Open)Search cluster to Debian Bullseye - https://phabricator.wikimedia.org/T311905#9992397 (10Andrew) 05Open→03Resolved [19:29:55] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [19:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:30:07] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [19:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:36:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-3 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:00:41] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [20:00:43] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:00:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:01:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-3 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:07:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:12:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:04:18] 10Cloud-VPS (Quota-requests): Request: add 80Gb storage to catalyst project quota - https://phabricator.wikimedia.org/T370365 (10SDunlap) 03NEW [21:42:06] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request: add 80Gb storage to catalyst project quota - https://phabricator.wikimedia.org/T370365#9992875 (10bd808) +1 [22:28:37] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [22:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:28:49] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [22:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:40:05] 10Tool-openstack-browser: openstack-browser support for projects where id != name - https://phabricator.wikimedia.org/T366679#9993040 (10Andrew) One minor quibble, I think in the 'instances' section it should use the uuid, since that's what 'hostname -f' returns on the VM. I /think/ that's correct since it's the... [22:40:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9993042 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephmon1004.eqiad.wmnet with OS bullseye [22:40:15] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9993043 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephmon1005.eqiad.wmnet with OS bullseye [22:40:19] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9993044 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephmon1006.eqiad.wmnet with OS bullseye [22:43:24] 10Data-Services, 10MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), 13Patch-For-Review, 10Wiki-Setup (Create): Create a Wikimedians of United Arab Emirates User Group Wiki - https://phabricator.wikimedia.org/T362529#9993059 (10Zabe) 05Open→03Resolved a:03Zabe Wiki is live [22:45:43] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9993068 (10Jclark-ctr) [22:58:44] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [22:58:46] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [22:58:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [23:08:24] 10Tool-openstack-browser: openstack-browser support for projects where id != name - https://phabricator.wikimedia.org/T366679#9993083 (10bd808) >>! In T366679#9993080, @bd808 wrote: > I'm actually hoping to find out that we will still be ensuring that project names are unique as well as project ids. If so I can... [23:09:02] 10Tool-openstack-browser: openstack-browser support for projects where id != name - https://phabricator.wikimedia.org/T366679#9993080 (10bd808) I'm actually hoping to find out that we will still be ensuring that project names are unique as well as project ids. If so I can fix the tool to use names as the visible... [23:19:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:27:51] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'appservers' cloud-vps project - https://phabricator.wikimedia.org/T360700#9993124 (10Andrew) 05Open→03Resolved a:03Andrew this seems to be done, as that project is now empty of VMs. [23:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:38:31] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "petscan" project Buster deprecation - https://phabricator.wikimedia.org/T367545#9993134 (10Andrew) 05Open→03Resolved a:03Andrew Thank you! [23:41:24] 10Cloud-VPS (Debian Buster Deprecation), 10Community-Tech (Darwin's Fox (July 15-26, 2024)): Cloud VPS "eventmetrics" project Buster deprecation - https://phabricator.wikimedia.org/T367530#9993139 (10Andrew) End of the month is fine, I'll revisit then. [23:43:00] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "etytree" project Buster deprecation - https://phabricator.wikimedia.org/T367529#9993141 (10Andrew) I am shutting down the 'etytree-a' host today. @Epantaleo, if you restart this VM please respond here with your plan for upgrade. [23:45:05] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "image-suggestion-api" project Buster deprecation - https://phabricator.wikimedia.org/T367533#9993146 (10Andrew) @BPirkle, just to double-check, would you like me to delete this project entirely? [23:53:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [23:54:51] 10Cloud-VPS (Debian Buster Deprecation), 10WMIT-Infrastructure: Cloud VPS "osmit" project Buster deprecation - https://phabricator.wikimedia.org/T367543#9993165 (10Andrew) 05Open→03Resolved a:03Andrew Thank you! [23:56:03] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikicommunityhealth" project Buster deprecation - https://phabricator.wikimedia.org/T367560#9993169 (10Andrew) I am shutting these VMs down as there has been no response on this ticket or elsewhere. If this project can be deleted, please respond and let me know! [23:58:20] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispeech: Cloud VPS "wikispeech" project Buster deprecation - https://phabricator.wikimedia.org/T367565#9993176 (10Andrew) 05Open→03Resolved a:03Andrew [23:59:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown