[00:00:49] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T397994 https://phabricator.wikimedia.org/T398643) [00:10:26] 06cloud-services-team, 10Cloud-VPS, 07Documentation: [tofu-cloudvps] Document using `cloudvps_puppet_project` to manage project-wide and instance specific puppet classes and hiera settings - https://phabricator.wikimedia.org/T397994#11121974 (10bd808) 05Open→03In progress a:03bd808 [00:55:31] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T397994 https://phabricator.wikimedia.org/T398643) [01:04:12] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [01:04:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [01:04:46] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [01:04:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [01:38:38] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=97) [01:39:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [03:42:34] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [03:44:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [03:48:00] andrew@cloudcumin1001 depool_and_destroy (PID 45043) is awaiting input [06:47:23] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [08:38:52] (03update) 10dcaro: pre-commit: add check for openapi spec version bump [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/116 [08:47:06] (03open) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [08:49:01] (03approved) 10dcaro: [cli] add tool config to deployment object [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/58 (https://phabricator.wikimedia.org/T400064) (owner: 10raymond-ndibe) [08:54:48] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:21:43] (03update) 10dcaro: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) (owner: 10raymond-ndibe) [09:26:10] 06cloud-services-team, 10Toolforge: tool dirs are created with different permissions than maintain-kubeusers - https://phabricator.wikimedia.org/T402688#11122567 (10dcaro) 05Open→03Resolved p:05Triage→03Low a:03DamianZaremba [09:26:46] 06cloud-services-team, 10Toolforge: Only restart docker if the config has changed - https://phabricator.wikimedia.org/T402687#11122581 (10dcaro) 05Open→03Resolved p:05Triage→03Low a:03DamianZaremba [09:28:46] 06cloud-services-team, 10Toolforge: Only download & setup harbor once - https://phabricator.wikimedia.org/T402685#11122609 (10dcaro) 05Open→03Resolved p:05Triage→03Low a:03DamianZaremba We are not deploying harbor inside k8s (chicken and egg problem, might change in the future), so it needs a dif... [09:29:07] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:29:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): kubectl alias and auto-complete is duplicated - https://phabricator.wikimedia.org/T402683#11122619 (10dcaro) 05Open→03Resolved p:05Triage→03Low a:03DamianZaremba Fyi. for future patches, if you add `Bug: TXXXXX` to the MR/commit, it wil... [09:30:24] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): tool dirs are created with different permissions than maintain-kubeusers - https://phabricator.wikimedia.org/T402688#11122626 (10dcaro) [09:30:38] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Only restart docker if the config has changed - https://phabricator.wikimedia.org/T402687#11122627 (10dcaro) [09:30:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Only download & setup harbor once - https://phabricator.wikimedia.org/T402685#11122628 (10dcaro) [09:31:59] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): toolforge components are reported as changed on every run - https://phabricator.wikimedia.org/T402689#11122632 (10dcaro) 05Open→03In progress p:05Triage→03Low a:03DamianZaremba [09:32:28] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:32:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Only download artefacts if target binary checksum does not match - https://phabricator.wikimedia.org/T402684#11122639 (10dcaro) 05Open→03In progress a:03DamianZaremba [09:33:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): Only download artefacts if target binary checksum does not match - https://phabricator.wikimedia.org/T402684#11122646 (10dcaro) p:05Triage→03Low [09:33:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [lima-kilo] Only download artefacts if target binary checksum does not match - https://phabricator.wikimedia.org/T402684#11122649 (10dcaro) [09:33:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [lima-kilo] toolforge components are reported as changed on every ansible run - https://phabricator.wikimedia.org/T402689#11122650 (10dcaro) [09:34:39] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [09:37:46] (03update) 10dcaro: README - drop --workers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/122 (owner: 10damian) [09:37:56] (03update) 10dcaro: README - drop --workers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/122 (owner: 10damian) [09:38:06] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:39:47] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11122680 (10dcaro) 05In progress→03Resolved a:03dcaro Created a subtask to follow up [09:39:58] 10Toolforge (Toolforge iteration 23): [harbor,infra] gather stats about object storage qutoa usage and add an alert when tools is getting out of quota - https://phabricator.wikimedia.org/T402932#11122684 (10dcaro) p:05Triage→03High [09:40:24] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [09:41:47] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:43:00] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11122687 (10dcaro) [09:43:01] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11122685 (10dcaro) [09:43:06] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [components-api] exclude defaults when getting deployment - https://phabricator.wikimedia.org/T401648#11122689 (10dcaro) [09:43:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] Split the `*Job` API models into three - https://phabricator.wikimedia.org/T390136#11122691 (10dcaro) [09:43:12] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] refactor models - https://phabricator.wikimedia.org/T389118#11122693 (10dcaro) [09:43:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [harbor,infra] Find a way to manage toolforge project policies with code - https://phabricator.wikimedia.org/T360509#11122695 (10dcaro) [09:43:28] 10Toolforge (Toolforge iteration 24), 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11122711 (10dcaro) [09:43:29] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Epic: [jobs-api,webservice] Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#11122709 (10dcaro) [09:43:31] 10Toolforge (Toolforge iteration 24): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#11122713 (10dcaro) [09:43:34] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#11122707 (10dcaro) [09:43:37] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#11122715 (10dcaro) [09:43:38] 10Toolforge (Toolforge iteration 24), 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#11122719 (10dcaro) [09:43:40] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#11122717 (10dcaro) [09:43:56] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [lima-kilo] Only download artefacts if target binary checksum does not match - https://phabricator.wikimedia.org/T402684#11122723 (10dcaro) [09:44:00] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [lima-kilo] toolforge components are reported as changed on every ansible run - https://phabricator.wikimedia.org/T402689#11122725 (10dcaro) [09:44:06] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11122727 (10dcaro) [09:44:11] 10Toolforge (Toolforge iteration 24): [clis] standardize the package names - https://phabricator.wikimedia.org/T399080#11122735 (10dcaro) [09:44:15] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30 - https://phabricator.wikimedia.org/T402377#11122733 (10dcaro) [09:44:19] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [components-api] support port protocol in config - https://phabricator.wikimedia.org/T401994#11122729 (10dcaro) [09:44:23] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402569#11122731 (10dcaro) [09:44:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [builds-builder] Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#11122741 (10dcaro) [09:44:31] 10Cloud Services Proposals, 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 05Cloud-Services-Origin-Team, and 3 others: [builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-d... - https://phabricator.wikimedia.org/T194332#11122737 [09:44:35] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 24), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#11122743 (10dcaro) [09:44:39] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#11122739 (10dcaro) [09:44:43] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#11122745 (10dcaro) [09:44:47] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#11122747 (10dcaro) [09:44:51] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [jobs-api] when running a command with wrong quoting, no logs nor useful feedback is given to the user - https://phabricator.wikimedia.org/T356267#11122749 (10dcaro) [09:44:55] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11122751 (10dcaro) [09:44:59] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11122753 (10dcaro) 05Open→03In progress a:03Raymond_Ndibe [09:45:03] 06cloud-services-team, 10Toolforge: [builds-api] Allow queuing builds - https://phabricator.wikimedia.org/T401894#11122756 (10dcaro) [09:45:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11122760 (10dcaro) 05Open→03In progress [09:46:24] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11122762 (10dcaro) [09:46:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [k8s,infra] Upgrade tools to Uwubernetes 1.30 - https://phabricator.wikimedia.org/T402378#11122764 (10dcaro) [09:46:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11122766 (10dcaro) [09:46:57] 10Toolforge (Toolforge iteration 24): [harbor,infra] gather stats about object storage qutoa usage and add an alert when tools is getting out of quota - https://phabricator.wikimedia.org/T402932#11122770 (10dcaro) [09:47:04] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11122771 (10dcaro) [09:47:09] 10Toolforge (Toolforge iteration 24): [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#11122772 (10dcaro) [09:47:19] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [components-api] Allow reusing another component build - https://phabricator.wikimedia.org/T401893#11122773 (10dcaro) [09:47:24] 10Toolforge (Toolforge iteration 24): [components-api,beta] Image should only be build once when re-used in components - https://phabricator.wikimedia.org/T401851#11122774 (10dcaro) [09:47:29] 10Toolforge (Toolforge iteration 24): [components-api] bump the openapi version on every change - https://phabricator.wikimedia.org/T401374#11122775 (10dcaro) [09:47:37] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [builds-builder] review and potentially add cmake buildpack - https://phabricator.wikimedia.org/T401169#11122776 (10dcaro) [09:47:42] 10Toolforge (Toolforge iteration 24): [foxtrot-ldap] publish image in harbor repos - https://phabricator.wikimedia.org/T400167#11122777 (10dcaro) [09:47:47] 10Toolforge (Toolforge iteration 24): [docs] enable docs linter in one of the repos - https://phabricator.wikimedia.org/T397949#11122778 (10dcaro) [09:47:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [tools-static,infra] NFS issues should not bring tools-static down - https://phabricator.wikimedia.org/T397634#11122779 (10dcaro) [09:48:02] 10Toolforge (Toolforge iteration 24): [builds-cli] add resolved reference when showing a build - https://phabricator.wikimedia.org/T394300#11122780 (10dcaro) [09:48:08] 06cloud-services-team, 10Toolforge (Toolforge iteration 24): [components-api] optionally log deployments to SAL automatically - https://phabricator.wikimedia.org/T393169#11122781 (10dcaro) [09:48:12] (03update) 10dcaro: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121 (owner: 10damian) [09:48:16] 06cloud-services-team, 10Toolforge (Toolforge iteration 24), 07Epic: [jobs-api] expose jobs-api continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#11122782 (10dcaro) [09:48:18] (03update) 10dcaro: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121 (owner: 10damian) [09:48:25] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 24), 07Epic, 05Goal: [infra] Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#11122783 (10dcaro) [09:48:33] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [09:48:59] (03approved) 10dcaro: README - drop --workers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/122 (owner: 10damian) [09:49:12] (03merge) 10dcaro: README - drop --workers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/122 (owner: 10damian) [09:49:32] (03update) 10dcaro: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) (owner: 10damian) [09:51:54] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.145-20250827094925-95d0b9a8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/934 [09:51:57] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.145-20250827094925-95d0b9a8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/934 [09:57:32] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [10:09:40] (03PS2) 10Majavah: build: Support Python 3.13 with Tox [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177371 [10:09:53] (03PS3) 10Majavah: build: Support Python 3.13 with Tox [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177371 [10:11:38] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [10:14:33] 06cloud-services-team, 10Toolforge: `toolforge build start` returns success status on build failure - https://phabricator.wikimedia.org/T402648#11122880 (10dcaro) Starting the build did not fail, tailing the logs did not fail, so I don't think that the cli should return error in this case. For you to reliably... [10:15:08] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [10:16:17] 06cloud-services-team, 10Toolforge: "toolforge-jobs list" error - "TjfCliError: Unable to find image in the supported list or harbor" - https://phabricator.wikimedia.org/T402724#11122881 (10dcaro) p:05Triage→03Medium [10:21:01] (03merge) 10taavi: Update canonical address [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/9 (https://phabricator.wikimedia.org/T401814) [10:21:33] (03approved) 10filippo: Update tofu registry domain [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/64 (https://phabricator.wikimedia.org/T401814) (owner: 10taavi) [10:25:48] (03merge) 10taavi: Update tofu registry domain [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/64 (https://phabricator.wikimedia.org/T401814) [10:26:04] (03open) 10taavi: toolsbeta: Drop toolsbeta-harbor-2 volume [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/65 [10:26:07] (03update) 10taavi: toolsbeta: Drop toolsbeta-harbor-2 volume [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/65 [10:27:33] (03approved) 10fnegri: toolsbeta: Drop toolsbeta-harbor-2 volume [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/65 (owner: 10taavi) [10:27:51] 06cloud-services-team, 10Toolforge, 07IPv6: Enable IPv6 on the Toolforge bastion - https://phabricator.wikimedia.org/T392510#11122952 (10taavi) a:03taavi [10:31:56] FIRING: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:32:54] (03merge) 10taavi: toolsbeta: Drop toolsbeta-harbor-2 volume [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/65 [10:33:16] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [10:35:35] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [10:35:47] (03open) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [10:35:48] (03update) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [10:35:48] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [10:35:49] (03update) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [10:35:56] (03open) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [10:36:04] (03open) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [10:36:08] (03update) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [10:36:12] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [10:36:16] (03update) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [10:36:56] FIRING: [2x] SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:39:59] (03update) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [10:40:04] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [10:40:20] (03update) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [10:41:18] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [10:43:41] 06cloud-services-team, 10Horizon, 07Upstream: Horizon: Selected server groups do not get cleared after deleting them - https://phabricator.wikimedia.org/T403026 (10taavi) 03NEW [10:47:47] 06cloud-services-team, 10Toolforge: toolforge tofu-provisioning: Cache terraform-provider-openstack binary somewhere - https://phabricator.wikimedia.org/T403028 (10taavi) 03NEW [10:50:19] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [10:50:34] (03update) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [10:54:30] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123107 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views run by ladsgrou... [11:00:41] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123163 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views started by lads... [11:01:09] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123168 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views run by ladsgrou... [11:03:45] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123178 (10fnegri) @Ladsgroup usual issue with table locks {T300427} The workaround I used in... [11:04:58] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123180 (10Ladsgroup) If you do it, I'd be grateful. These days we are a bit under-staffed. [11:06:49] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123181 (10fnegri) a:03fnegri No probs, we agreed it's a WMCS responsibility to sync the view... [11:08:12] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123184 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views started by lads... [11:08:32] (03update) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [11:08:33] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [11:08:33] (03update) 10taavi: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) [11:08:33] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:08:34] (03open) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:08:39] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:09:49] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:11:52] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:12:11] 10cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123218 (10fnegri) [11:13:22] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:13:31] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [11:17:25] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:19:21] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:21:08] (03update) 10taavi: toolsbeta: Allocate public address to the new bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/69 (https://phabricator.wikimedia.org/T392510) [11:21:55] 10cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123241 (10Ladsgroup) (I re-tried it immediately, also failed) [11:46:14] (03approved) 10dcaro: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 (owner: 10taavi) [11:47:15] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [11:53:33] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [11:55:16] (03merge) 10taavi: Drop old moved blocks [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/66 [11:55:23] (03update) 10taavi: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) [11:55:34] (03approved) 10dcaro: toolsbeta: Provision new Toolsbeta bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/68 (https://phabricator.wikimedia.org/T392510) (owner: 10taavi) [12:02:03] (03approved) 10dcaro: shared: Add bastion module [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/67 (https://phabricator.wikimedia.org/T392510) (owner: 10taavi) [12:09:57] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043 (10dcaro) 03NEW [12:10:42] 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-f4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402758#11123416 (10dcaro) [12:10:43] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123417 (10dcaro) [12:10:46] 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-f4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402758#11123418 (10dcaro) 05Open→03Resolved a:03dcaro [12:10:58] 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402658#11123421 (10dcaro) [12:11:00] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123422 (10dcaro) [12:11:07] 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402658#11123423 (10dcaro) 05Open→03Resolved a:03dcaro [12:11:50] 06cloud-services-team: Primary cloud switch port utilisation over 80% Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80% - https://phabricator.wikimedia.org/T402657#11123430 (10dcaro) [12:11:53] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123431 (10dcaro) [12:11:59] 06cloud-services-team: Primary cloud switch port utilisation over 80% Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80% - https://phabricator.wikimedia.org/T402657#11123432 (10dcaro) 05Open→03Resolved a:03dcaro [12:12:35] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 931 slow ops - https://phabricator.wikimedia.org/T402656#11123447 (10dcaro) [12:12:37] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123448 (10dcaro) [12:12:59] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T402488#11123449 (10dcaro) [12:13:01] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123450 (10dcaro) [12:13:06] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T402488#11123451 (10dcaro) 05Open→03Resolved a:03dcaro [12:13:49] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123454 (10dcaro) [12:16:53] (03CR) 10David Caro: [C:03+1] "LGTM, should we remove 310/312? the cumin hosts have only 39 and 311" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177371 (owner: 10Majavah) [12:16:56] FIRING: [2x] SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:19:01] (03approved) 10dcaro: components-api: bump to 0.0.145-20250827094925-95d0b9a8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/934 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:19:04] (03merge) 10dcaro: components-api: bump to 0.0.145-20250827094925-95d0b9a8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/934 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:20:47] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1258 slow ops - https://phabricator.wikimedia.org/T402481#11123507 (10dcaro) [12:20:50] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123508 (10dcaro) [12:20:54] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1258 slow ops - https://phabricator.wikimedia.org/T402481#11123509 (10dcaro) 05Open→03Resolved a:03dcaro [12:21:14] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123513 (10dcaro) [12:21:16] 06cloud-services-team, 10Cloud-VPS: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#11123512 (10dcaro) [12:21:47] 06cloud-services-team, 10Cloud-VPS: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#11123514 (10dcaro) 05Open→03Resolved a:03dcaro [12:22:09] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123517 (10dcaro) Closing in favor of {T403043} [12:22:39] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 670 slow ops - https://phabricator.wikimedia.org/T402839#11123518 (10dcaro) [12:22:40] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123519 (10dcaro) [12:22:41] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 670 slow ops - https://phabricator.wikimedia.org/T402839#11123520 (10dcaro) 05Open→03Resolved a:03dcaro [12:23:20] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 931 slow ops - https://phabricator.wikimedia.org/T402656#11123523 (10dcaro) 05Open→03Resolved a:03dcaro [12:23:30] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123527 (10dcaro) [12:23:33] 06cloud-services-team, 13Patch-For-Review: MaxConntrack Max conntrack at 100% on cloudcephosd1042:9100 - https://phabricator.wikimedia.org/T402480#11123526 (10dcaro) [12:27:56] FIRING: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:28:53] 06cloud-services-team, 13Patch-For-Review: MaxConntrack Max conntrack at 100% on cloudcephosd1042:9100 - https://phabricator.wikimedia.org/T402480#11123538 (10dcaro) [12:28:54] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123539 (10dcaro) [12:28:55] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123540 (10dcaro) [12:29:04] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [ceph] 2025-08-21 ceph issues bringing new osds up - https://phabricator.wikimedia.org/T402499#11123541 (10dcaro) 05Open→03Resolved a:03dcaro [12:29:31] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once - https://phabricator.wikimedia.org/T403043#11123544 (10dcaro) [12:31:56] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:33:40] (03update) 10dcaro: WIP k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [12:33:55] (03update) 10dcaro: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [12:36:46] (03CR) 10David Caro: "@andrew this would still be nice to have, are you still working on it?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/998492 (owner: 10Andrew Bogott) [12:38:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:40:05] (03update) 10dcaro: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) (owner: 10damian) [12:40:59] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) - https://phabricator.wikimedia.org/T403043#11123591 (10fgiunchedi) [12:45:49] (03update) 10dcaro: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) (owner: 10damian) [12:46:20] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) - https://phabricator.wikimedia.org/T403043#11123628 (10fgiunchedi) Random notes/thoughts I collected so far: * It appears undrain_node didn't stop rebalance of the cluster prior to undrain? Would... [12:49:04] 06cloud-services-team, 10Toolforge: [lima-kilo] Improve convergence - https://phabricator.wikimedia.org/T402672#11123648 (10dcaro) p:05Triage→03Medium [12:49:19] 10Cloud Services Proposals: DRAFT Decision request - Improving lima-kilo developer experience - https://phabricator.wikimedia.org/T403051 (10dcaro) 03NEW [12:50:06] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11123678 (10Urbanecm_WMF) [12:50:30] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11123684 (10Urbanecm_WMF) [12:50:35] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#11123685 (10Urbanecm_WMF) [12:50:47] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11123689 (10Urbanecm_WMF) p:05Triage→03High [12:51:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [12:51:04] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11123694 (10Urbanecm_WMF) Thanks to @RhinosF1 for pointing me towards the issue. [12:52:31] 10Cloud Services Proposals: DRAFT Decision request - Improving lima-kilo developer experience - https://phabricator.wikimedia.org/T403051#11123703 (10dcaro) [12:54:02] andrew@cloudcumin1001 depool_and_destroy (PID 98246) is awaiting input [12:54:21] 10Cloud Services Proposals: DRAFT Decision request - Improving lima-kilo developer experience - https://phabricator.wikimedia.org/T403051#11123709 (10dcaro) [12:54:24] 10Cloud Services Proposals: DRAFT Decision request - Improving lima-kilo developer experience - https://phabricator.wikimedia.org/T403051#11123712 (10dcaro) p:05Triage→03Medium [12:54:58] 10cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123715 (10fnegri) 05Open→03Resolved I did run the following command on all clouddb hosts: `... [12:55:05] 06cloud-services-team, 10Toolforge: [lima-kilo] Improve convergence - https://phabricator.wikimedia.org/T402672#11123717 (10dcaro) I started creating a draft of decision request that would potentially address this too. [12:56:01] 06cloud-services-team, 13Patch-For-Review: MaxConntrack Max conntrack at 100% on cloudcephosd1042:9100 - https://phabricator.wikimedia.org/T402480#11123718 (10dcaro) p:05Triage→03Medium [12:56:07] 10cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11123719 (10Ladsgroup) Thanks!!! [12:57:04] 10cloud-services-team (FY2025/26-Q1), 13Patch-For-Review: MaxConntrack Max conntrack at 100% on cloudcephosd1042:9100 - https://phabricator.wikimedia.org/T402480#11123721 (10dcaro) [12:57:37] 06cloud-services-team, 10decommission-hardware: decommission cloudcephosd1004-10015 - https://phabricator.wikimedia.org/T402881#11123723 (10Andrew) [12:59:05] 06cloud-services-team: PuppetFailure Puppet has failed on cloudnet1005:9100 - https://phabricator.wikimedia.org/T402561#11123728 (10dcaro) This was the timesyncd package issue, already solved: ` Aug 21 15:51:35 cloudnet1005 puppet-agent[693136]: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force... [12:59:09] 06cloud-services-team: PuppetFailure Puppet has failed on cloudnet1005:9100 - https://phabricator.wikimedia.org/T402561#11123729 (10dcaro) 05Open→03Resolved a:03dcaro [12:59:55] 06cloud-services-team, 10Data-Services, 06Data-Platform-SRE: Automate maintain-views replica depooling - https://phabricator.wikimedia.org/T300427#11123731 (10fnegri) Interestingly I noticed that often the session that is holding the lock is in `State=Sleep`, e.g. ` | 13003434 | s52741 | 10.64.151.2:... [13:00:51] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T402708#11123735 (10dcaro) It's getting connection refused: ` Aug 27 10:26:00 clouddumps1002 bash[2654220]: rsync: [Receiver] fa... [13:02:56] RESOLVED: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:07:12] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T402708#11123765 (10dcaro) 05Open→03Resolved a:03dcaro Restarted the service and it seems it's running now, I'll close... [13:07:59] (03CR) 10Andrew Bogott: "The keystone hooks no longer clean up project ldap (https://phabricator.wikimedia.org/T397648) and I'm writing a script to cover that; onc" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1139027 (https://phabricator.wikimedia.org/T391836) (owner: 10Majavah) [13:08:01] (03update) 10dcaro: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [13:09:06] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Evaluate higher level signals for nova troubles rather than paging on nova-compute down - https://phabricator.wikimedia.org/T402778#11123787 (10dcaro) p:05Triage→03Medium [13:09:37] 06cloud-services-team: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) - https://phabricator.wikimedia.org/T403043#11123789 (10dcaro) p:05Triage→03High [13:09:46] 10cloud-services-team (FY2025/26-Q1): [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) - https://phabricator.wikimedia.org/T403043#11123792 (10dcaro) [13:18:22] (03open) 10dcaro: ldap: setup and populate ldap before the components [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/269 [13:18:33] (03update) 10dcaro: ldap: setup and populate ldap before the components [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/269 [13:26:34] (03approved) 10fnegri: ldap: setup and populate ldap before the components [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/269 (owner: 10dcaro) [13:27:25] (03merge) 10dcaro: ldap: setup and populate ldap before the components [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/269 [13:27:37] (03update) 10dcaro: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [13:37:43] (03approved) 10dcaro: Setup pytest, add first test [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/4 (owner: 10fnegri) [13:40:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:42:01] 10Toolforge (Toolforge iteration 24): [harbor,infra] gather stats about object storage qutoa usage and add an alert when tools is getting out of quota - https://phabricator.wikimedia.org/T402932#11124007 (10dcaro) [13:55:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [14:24:27] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11124250 (10dcaro) I don't see the user being shown as bastionles by the script, and the roles look ok: ` root@cloudcontrol1006:~# wmcs-bastionless root@cloudcontrol10... [14:29:43] (03update) 10damian: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) [14:29:49] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11124302 (10Urbanecm_WMF) @dcaro Huh, surprising. The user doesn't seem to be a member of the `project-bastion` LDAP group, and they reported issues SSH'ing in, so I a... [14:34:53] (03update) 10damian: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) [14:43:41] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11124391 (10Jclark-ctr) @wiki_willy The error Returned in Dmesg. The best option might be to purchase a 25G Broadcom NIC to avoid future problems wi... [14:50:41] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11124441 (10dcaro) Just had a chat with @Andrew and tried removing and adding the user again, and it added them correctly: ` root@cloudcontrol1006:~# sudo wmcs-opensta... [14:51:19] 06cloud-services-team, 10Toolforge: `toolforge build start` returns success status on build failure - https://phabricator.wikimedia.org/T402648#11124442 (10DamianZaremba) > Starting the build did not fail, tailing the logs did not fail, so I don't think that the cli should return error in this case. On the bas... [14:53:12] (03update) 10damian: Allow re-using builds across components [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T401893) [14:54:55] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Catalyst: Quota increase request for catalyst-dev - https://phabricator.wikimedia.org/T402521#11124477 (10Andrew) This seems fine to me, obviously if you're able to use smaller VMs please do :) +1 [14:58:44] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11124506 (10wiki_willy) ++ @RobH - can you work with John on getting a 25g Broadcom NIC for this one? >>! In T402938#11124390, @Jclark-ctr wrote: > @wi... [15:00:32] 06cloud-services-team, 10Cloud-VPS: Newly-added member of deployment-prep is not in bastion project - https://phabricator.wikimedia.org/T403052#11124535 (10Urbanecm_WMF) Thanks! I see the LDAP membership got fixed now. @hueitan, can you try again, please?