[02:52:32] (03update) 10chuckonwumelu: Start [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/1 [03:15:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:20:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [04:17:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [04:37:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [08:19:32] 10Cloud-VPS (Project-requests): Request creation of wmgmc VPS project - https://phabricator.wikimedia.org/T391742#10746561 (10dcaro) Hi @XtexChooser! Would you mind if we name it `wmgmc_observability`, `wmgmc_monitoring` or similar? I say because CloudVPS projects should not be "generic", but tied to a specific... [08:35:37] (03open) 10dcaro: Add vm name to /etc/hosts [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/235 [08:37:39] (03update) 10dcaro: Add vm name to /etc/hosts [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/235 [08:50:55] (03approved) 10aborrero: Add vm name to /etc/hosts [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/235 (owner: 10dcaro) [09:00:24] (03update) 10dcaro: Add vm name to /etc/hosts [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/235 [09:00:24] (03merge) 10dcaro: Add vm name to /etc/hosts [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/235 [09:17:51] (03open) 10dcaro: harbor: start on VM reboot [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/236 [09:21:02] (03update) 10dcaro: harbor: start on VM reboot [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/236 [09:46:19] (03open) 10dcaro: webhook: ignore the kyverno namespace [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/23 [09:54:40] (03update) 10dcaro: webhook: ignore the kyverno namespace [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/23 [10:01:09] (03approved) 10aborrero: webhook: ignore the kyverno namespace [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/23 (owner: 10dcaro) [10:07:53] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node [10:08:35] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) [10:09:19] (03CR) 10David Caro: [C:03+2] wmcs.common: update wrap_with_sudo_icinga [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [10:13:27] (03Merged) 10jenkins-bot: wmcs.common: update wrap_with_sudo_icinga [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1132676 (owner: 10Volans) [10:17:55] (03update) 10dcaro: namespace: add comment about the tenancy label usage [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/66 [10:18:12] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki nupwiki - https://phabricator.wikimedia.org/T390714#10746774 (10fnegri) 05Open→03In progress [10:18:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki tlwikisource - https://phabricator.wikimedia.org/T388657#10746776 (10fnegri) 05Open→03In progress [10:27:14] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki tlwikisource - https://phabricator.wikimedia.org/T388657#10746825 (10fnegri) 05In progress→03Open [10:27:24] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki nupwiki - https://phabricator.wikimedia.org/T390714#10746827 (10fnegri) 05In progress→03Open [10:27:48] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki tlwikisource - https://phabricator.wikimedia.org/T388657#10746829 (10fnegri) p:05Medium→03Low [10:51:01] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki nupwiki - https://phabricator.wikimedia.org/T390714#10746934 (10fnegri) I did run the cookbook too soon: the database for this new wiki hasn't been created yet: {T390710} Nothing was created by the add-wiki co... [11:01:27] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikicommunityhealth" project Buster deprecation - https://phabricator.wikimedia.org/T367560#10746988 (10CristianCantoro) Hello. I would like to know if it is possible to recover these machines? I would need to recover some code from them (I am going though... [11:12:36] (03update) 10dcaro: [jobs-cli] schedule timeout default None [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/96 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [11:13:08] (03update) 10dcaro: [jobs-cli] only send timeout if it's set by the user [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/96 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [11:13:24] (03update) 10dcaro: [jobs-cli] only send timeout if it's set by the user [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/96 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [11:22:22] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747056 (10Raymond_Ndibe) @dcaro advised we email the tool owners and given them a chance to do this... [12:00:32] (03open) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:00:38] (03update) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:00:44] (03update) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:01:16] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747189 (10dcaro) Copied the scripts/docs to an MR for easy review and such https://gitlab.wikimedia... [12:06:36] (03close) 10aborrero: eqiad1: add new VXLAN and IPv6 network settings [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/196 [12:07:17] (03update) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:11:30] (03update) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:16:57] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747236 (10Raymond_Ndibe) >>! In T359649#10747189, @dcaro wrote: > Copied the scripts/docs to an MR... [12:18:24] (03update) 10dcaro: jobs-api: add jobs version migration script and docs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/746 (https://phabricator.wikimedia.org/T359649) [12:18:48] (03open) 10aborrero: eqiad1: introduce VXLAN/IPv4-only settings [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/198 (https://phabricator.wikimedia.org/T380174) [12:22:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747254 (10dcaro) >>! In T359649#10747236, @Raymond_Ndibe wrote: >>>! In T359649#10747189, @dcaro wr... [12:25:22] (03update) 10aborrero: eqiad1: introduce VXLAN/IPv4-only settings [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/198 (https://phabricator.wikimedia.org/T380174) [12:26:31] (03merge) 10aborrero: eqiad1: introduce VXLAN/IPv4-only settings [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/198 (https://phabricator.wikimedia.org/T380174) [12:26:33] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:26:53] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [12:31:54] (03open) 10aborrero: eqiad1: network: fix VXLAN segmentation_id for VXLAN/IPv4-only [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/199 [12:33:01] (03merge) 10aborrero: eqiad1: network: fix VXLAN segmentation_id for VXLAN/IPv4-only [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/199 [12:33:04] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:34:02] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:48:18] (03open) 10aborrero: eqiad1: add support for operations in the deployment [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/15 (https://phabricator.wikimedia.org/T391325) [13:05:56] (03update) 10dcaro: [jobs-api] delete completed one-off jobs when getting jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/157 (https://phabricator.wikimedia.org/T352989) (owner: 10raymond-ndibe) [13:13:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747500 (10dcaro) >>! In T359649#10747056, @Raymond_Ndibe wrote: > @dcaro advised we email the tool... [13:14:08] (03approved) 10dcaro: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) (owner: 10raymond-ndibe) [13:16:44] (03update) 10dcaro: [jobs-cli] only send timeout if it's set by the user [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/96 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [13:38:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747590 (10Raymond_Ndibe) >>! In T359649#10747500, @dcaro wrote: >>>! In T359649#10747056, @Raymond_... [13:43:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747632 (10dcaro) > > I have the same opinion too. You are correct it's almost like moving to a dif... [13:44:04] 06cloud-services-team, 10Cloud-VPS: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089 (10fnegri) 03NEW [13:45:23] 06cloud-services-team, 10Cloud-VPS: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10747660 (10fnegri) [13:46:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10747664 (10Raymond_Ndibe) >>! In T359649#10747632, @dcaro wrote: >> >> I have the same opinion too.... [13:49:43] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10747687 (10CristianCantoro) [13:50:39] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikicommunityhealth" project Buster deprecation - https://phabricator.wikimedia.org/T367560#10747702 (10Andrew) I'm sorry @CristianCantoro, any backups that we might have run on those VMs have long since been purged. I encourage you to enable email notifica... [13:57:36] (03merge) 10dcaro: webhook: ignore the kyverno namespace [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/23 [13:58:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,cinder [13:58:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,cinder [14:01:05] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: registry-admission: bump to 0.0.60-20250416135747-60a94ec2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/747 [14:05:08] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10747838 (10fnegri) p:05Triage→03High a:03Andrew [14:05:38] 06cloud-services-team, 10Cloud-VPS: openstack magnum (or heat) resource leak - https://phabricator.wikimedia.org/T392031#10747842 (10fnegri) p:05Triage→03High [14:07:16] 06cloud-services-team: KernelErrors Server cloudcontrol1011 logged kernel errors - https://phabricator.wikimedia.org/T391408#10747850 (10aborrero) 05Open→03Resolved a:03aborrero on boot the server had: ` aborrero@cloudcontrol1011:~ $ sudo dmesg --level err [ 2.033523] x86/cpu: VMX (outside TXT) dis... [14:07:24] 06cloud-services-team: KernelErrors Server cloudcontrol1011 logged kernel errors - https://phabricator.wikimedia.org/T391407#10747854 (10aborrero) 05Open→03Resolved a:03aborrero on boot the server had: ` aborrero@cloudcontrol1011:~ $ sudo dmesg --level err [ 2.033523] x86/cpu: VMX (outside TXT) dis... [14:07:35] 06cloud-services-team, 10Data-Services: [wikireplicas] Create views for new wiki madwikisource - https://phabricator.wikimedia.org/T391770#10747860 (10fnegri) p:05Triage→03Low [14:09:26] 06cloud-services-team: SystemdUnitDown The systemd unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T391424#10747876 (10dcaro) 05Open→03Resolved a:03dcaro This happe... [14:09:56] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T391425#10747881 (10dcaro) 05Open→03Resolved a:03dcaro This happened when removing an old cloudcontrol and adding a new one, fixed [14:10:12] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T391428#10747885 (10dcaro) 05Open→03Resolved a:03dcaro This happened when removing an old cloudcontrol and adding a new one, fixed [14:10:17] 06cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T391430#10747889 (10dcaro) 05Open→03Resolved a:03dcaro This happened when removing an old cloudcontrol and adding a new one, fixed [14:13:47] 06cloud-services-team, 10Toolforge: Check for non-libre vscode-server installs/processes on Toolforge bastions - https://phabricator.wikimedia.org/T390885#10747903 (10dcaro) p:05Triage→03Medium [14:14:47] 06cloud-services-team, 10Toolforge: toolforge-cli: Allow configuring description for external subcommands - https://phabricator.wikimedia.org/T336052#10747909 (10dcaro) p:05Triage→03Medium [14:17:30] 10Striker: Make it possible to maintain Toolforge tools via an easy-to-use web interface instead of a command-line one - https://phabricator.wikimedia.org/T332480#10747949 (10dcaro) Related task {T375914} [14:20:21] 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10747971 (10dcaro) Somehow I missed this, we re-took this discussion here {T363983}, leaning on using idp/CAS instead of oauth, at least for starters (same... [14:22:15] 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10747989 (10dcaro) @taavi should I close this as duplicate? Or do you want to refresh/extend the oauth+dedicated auth server specific proposal? [14:22:31] 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10747992 (10dcaro) p:05Triage→03High [14:23:53] 06cloud-services-team, 10Toolforge: Toolforge: expose API gateway to the internet - https://phabricator.wikimedia.org/T332476#10748005 (10dcaro) 05Open→03Resolved a:03dcaro I'll close this for now, it's alrleady exposed, there's though a very limited subset of the API that you can authenticate agains... [14:24:03] 06cloud-services-team, 10Toolforge: Toolforge Terraform support - https://phabricator.wikimedia.org/T329425#10748010 (10dcaro) [14:29:10] 10Striker: Add a web shell allowing people to perform actions as their tool from striker - https://phabricator.wikimedia.org/T144713#10748056 (10dcaro) >>! In T144713#2718431, @bd808 wrote: > from irc chat: > ` > [22:34] so in striker > [22:34] we'll have a 'launch web console' thing >... [14:31:41] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component registry-admission [14:34:31] 06cloud-services-team, 10Toolforge: toolforge-cli: Allow configuring description for external subcommands - https://phabricator.wikimedia.org/T336052#10748091 (10dcaro) p:05Medium→03Low Should not be hard to implement I think, so if anyone feels like it feel free to tackle :), we might have to change how i... [14:34:53] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: cloudgw: replace keepalived with BGP - https://phabricator.wikimedia.org/T347687#10748096 (10cmooney) a:05cmooney→03None Happy to advise on how to set up Bird on the cloudgw side, but I'm not gonna start merging patches and over-stepping the mark :)... [14:38:40] 06cloud-services-team, 10Toolforge: Use Let's Encrypt certificates for the Toolforge API gateway - https://phabricator.wikimedia.org/T332479#10748108 (10dcaro) 05Open→03Resolved a:03dcaro This is done already :) ` dcaro@acme$ openssl s_client -showcerts -connect api.svc.toolforge.org:443 | grep -i '... [14:39:35] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Data-Services, 13Patch-For-Review: Move some DNS records from wmcs-wikireplica-dns.py to tofu-infra - https://phabricator.wikimedia.org/T374953#10748127 (10fnegri) 05In progress→03Resolved [14:41:43] 06cloud-services-team, 10Toolforge: Toolforge Terraform support - https://phabricator.wikimedia.org/T329425#10748138 (10dcaro) This work is happening here {T390056} [14:41:50] 06cloud-services-team, 10Toolforge: Toolforge Terraform support - https://phabricator.wikimedia.org/T329425#10748143 (10dcaro) →14Duplicate dup:03T390056 [14:41:59] 06cloud-services-team, 10Toolforge, 07Epic: toolforge: introduce additional IaC automation - https://phabricator.wikimedia.org/T390056#10748145 (10dcaro) [14:42:37] 06cloud-services-team, 10Toolforge: Toolforge Terraform support - https://phabricator.wikimedia.org/T329425#10748147 (10dcaro) 05Duplicate→03Open Oh, wait, no, @taavi the idea here is for tool administrator to manage their tools using terraform? [14:42:56] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission [14:45:56] 06cloud-services-team, 10Data-Services, 06Data-Persistence: wikireplicas: maintain-views should not create _p databases - https://phabricator.wikimedia.org/T392105 (10fnegri) 03NEW [14:46:10] 06cloud-services-team, 10Data-Services, 06Data-Persistence: wikireplicas: maintain-views should not create _p databases - https://phabricator.wikimedia.org/T392105#10748183 (10fnegri) 05Open→03In progress a:03fnegri [14:47:00] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10748196 (10Andrew) I manually removed the broken attachment in the database and reset the state of the volume. It's still not behaving quite right but w... [14:47:43] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component registry-admission [14:51:57] 06cloud-services-team, 10Toolforge: Check for non-libre vscode-server installs/processes on Toolforge bastions - https://phabricator.wikimedia.org/T390885#10748213 (10dcaro) I wonder if there's a way to nicely communicate to the user why their vscode stopped working, to avoid frustration and such [14:54:11] 10Quarry: Quarry down? - https://phabricator.wikimedia.org/T392107 (10Alien333) 03NEW [14:55:33] 10Striker: Make it possible to maintain Toolforge tools via an easy-to-use web interface instead of a command-line one - https://phabricator.wikimedia.org/T332480#10748243 (10dcaro) p:05Triage→03High [14:56:02] 10Striker: Make it possible to maintain Toolforge tools via an easy-to-use web interface instead of a command-line one - https://phabricator.wikimedia.org/T332480#10748244 (10dcaro) [14:56:03] 06cloud-services-team, 10Toolforge, 07Epic: [Epic] Toolforge UI: Discovery - https://phabricator.wikimedia.org/T375914#10748245 (10dcaro) [14:56:40] 10Quarry: Quarry down? - https://phabricator.wikimedia.org/T392107#10748247 (10Alien333) [14:56:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: wikireplicas: maintain-views should not create _p databases - https://phabricator.wikimedia.org/T392105#10748246 (10fnegri) [14:57:32] 10Striker: Add a web shell allowing people to perform actions as their tool from striker - https://phabricator.wikimedia.org/T144713#10748271 (10dcaro) I think this is the `terminado` software that is mentioned there (very interesting) https://github.com/yuvipanda/jupyterhub-ssh [15:00:19] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission [15:01:57] 10cloud-services-team (FY2024/2025-Q3-Q4), 06DC-Ops, 10ops-eqiad: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10748293 (10fnegri) Thanks @Jclark-ctr, do you think there is a way to disable the sensor so that it will not trigger the alert? We could also sile... [15:02:38] (03approved) 10dcaro: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) (owner: 10raymond-ndibe) [15:04:49] (03approved) 10dcaro: registry-admission: bump to 0.0.60-20250416135747-60a94ec2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/747 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:04:52] (03merge) 10dcaro: registry-admission: bump to 0.0.60-20250416135747-60a94ec2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/747 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:06:36] 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10748309 (10Aklapper) [15:08:08] 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10748324 (10Alien333) [15:23:35] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10748385 (10aborrero) data point: a MR from a contributor fork did not have access to the secrets in the pipeline: https://gitlab.wikimedia.org/repos/cloud/toolf... [15:37:50] (03open) 10fnegri: start-devenv: don't ask if you want to edit config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/237 [15:38:05] (03update) 10fnegri: start-devenv: don't ask if you want to edit config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/237 [15:40:23] (03update) 10fnegri: start-devenv: don't ask if you want to edit config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/237 [15:48:15] 06cloud-services-team, 10Horizon: horizon: service account has role in project but it doesn't shows up in horizon - https://phabricator.wikimedia.org/T392116 (10aborrero) 03NEW [15:49:07] 06cloud-services-team, 10Horizon: horizon: service account has role in project but it doesn't shows up in horizon - https://phabricator.wikimedia.org/T392116#10748533 (10aborrero) [15:49:20] 06cloud-services-team, 10Horizon: horizon: service account has role in project but it doesn't shows up in horizon - https://phabricator.wikimedia.org/T392116#10748534 (10aborrero) p:05Triage→03Medium [15:55:29] 06cloud-services-team, 10Horizon: horizon: service account has role in project but it doesn't shows up in horizon - https://phabricator.wikimedia.org/T392116#10748562 (10aborrero) 05Open→03Resolved @Andrew removed the user from the project, added it again, and it is now showing up as expected. [16:01:23] (03open) 10aborrero: eqiad1: testlabs: bump floating IP quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/200 (https://phabricator.wikimedia.org/T391325) [16:01:26] 06cloud-services-team, 10Data-Services: Remove the compatibility layer of block schema in wikireplicas - https://phabricator.wikimedia.org/T390767#10748628 (10fnegri) @Ladsgroup I can write an email to cloud-announce to inform users of the upcoming change. What is the change exactly? Can you prepare a patch an... [16:01:34] 06cloud-services-team, 10Data-Services: Remove the compatibility layer of block schema in wikireplicas - https://phabricator.wikimedia.org/T390767#10748630 (10fnegri) p:05Triage→03Medium [16:02:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,cinder [16:02:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,cinder [16:03:45] (03approved) 10andrew: eqiad1: testlabs: bump floating IP quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/200 (https://phabricator.wikimedia.org/T391325) (owner: 10aborrero) [16:49:15] (03merge) 10aborrero: eqiad1: testlabs: bump floating IP quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/200 (https://phabricator.wikimedia.org/T391325) [16:49:17] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [16:49:50] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [16:50:04] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2025-04-12 - 2025-05-02): Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10748874 (10fnegri) 05Stalled→03In progress This ti... [16:50:27] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: wikireplicas: maintain-views should not create _p databases - https://phabricator.wikimedia.org/T392105#10748890 (10fnegri) p:05Triage→03Medium [16:51:45] 06cloud-services-team, 10Data-Services: maintain-views: skip new databases that have not been sanitized yet - https://phabricator.wikimedia.org/T375779#10748895 (10fnegri) Related: {T392105} [16:56:58] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): [jobs-api] Introduce deprecation metrics - https://phabricator.wikimedia.org/T390137#10748910 (10Raymond_Ndibe) [16:58:25] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 07Epic: [jobs-api] expose jobs-api continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#10748932 (10Raymond_Ndibe) [16:59:14] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcontrol1005.eqiad.wmnet - https://phabricator.wikimedia.org/T391413#10748937 (10VRiley-WMF) [16:59:44] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcontrol1005.eqiad.wmnet - https://phabricator.wikimedia.org/T391413#10748943 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF This is completed [17:04:22] FIRING: HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:09:22] RESOLVED: HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:14:14] (03update) 10dcaro: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) (owner: 10raymond-ndibe) [17:41:49] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10749105 (10Andrew) I was not able to get this volume to behave reasonably, but I restored its data to a new volume named 'frontrestore'. That new volum... [17:53:12] 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10749174 (10SD0001) From the logs: ` redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data s... [17:54:10] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10749190 (10Andrew) p:05High→03Medium [17:55:08] 06cloud-services-team, 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10749199 (10bd808) p:05Triage→03High [18:06:52] 06cloud-services-team, 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10749254 (10SD0001) Redis RDB persistence is failing as the pod is out of disk space. ` 4134:C 16 Apr 2025 17:53:32.082 # Failed opening the temp RDB file temp-4134.rdb (in ser... [18:17:08] 06cloud-services-team, 10Quarry: No alerting for quarry - https://phabricator.wikimedia.org/T392138 (10Andrew) 03NEW [18:26:48] !log taavi@cloudcumin1001 quarry START - Cookbook wmcs.vps.add_user_to_project for user 'taavi' in role 'reader' [18:26:52] !log taavi@cloudcumin1001 quarry END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'taavi' in role 'reader' [18:34:05] 10Tool-multitrack-drafting: Add “mul” label on created tracks - https://phabricator.wikimedia.org/T392139 (10JeanFred) 03NEW [18:49:25] (03update) 10raymond-ndibe: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) [18:49:45] (03update) 10raymond-ndibe: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) [18:49:58] (03approved) 10raymond-ndibe: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) [18:50:07] 06cloud-services-team, 10Quarry: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141 (10Andrew) 03NEW [18:51:12] 06cloud-services-team, 10Quarry: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143 (10Andrew) 03NEW [18:53:02] (03PS1) 10Ssingh: secret: rename ech-durum.pem [labs/private] - 10https://gerrit.wikimedia.org/r/1137051 [18:53:22] (03approved) 10raymond-ndibe: start-devenv: don't ask if you want to edit config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/237 (owner: 10fnegri) [18:53:23] (03update) 10raymond-ndibe: start-devenv: don't ask if you want to edit config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/237 (owner: 10fnegri) [18:54:36] (03CR) 10Ssingh: [V:03+2 C:03+2] secret: rename ech-durum.pem [labs/private] - 10https://gerrit.wikimedia.org/r/1137051 (owner: 10Ssingh) [18:55:00] (03merge) 10raymond-ndibe: [builds-api] create harbor project before getting quota [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/128 (https://phabricator.wikimedia.org/T353701) [18:57:14] 06cloud-services-team, 10Quarry: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10749506 (10Andrew) 05Open→03Resolved a:03Andrew This seems to have been a disk space issue on one of the worker nodes. I rebooted both nodes, and then taavi killed ex... [18:58:24] 06cloud-services-team, 10Toolforge: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#10749513 (10dcaro) [19:01:21] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-api: bump to 0.0.186-20250416185507-e06551d7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/748 (https://phabricator.wikimedia.org/T353701) [19:02:36] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [19:09:08] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [19:10:14] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:15:15] 06cloud-services-team, 10Cloud-VPS: tf-infra-test misbehavior in codfw1dev - https://phabricator.wikimedia.org/T391718#10749604 (10Andrew) > [] When it comes time for "tofu destroy -var datacenter=codfw1dev" my ssh session is killed sometime during the process. This happens when the floating IP is disassocia... [19:21:05] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [19:22:50] (03update) 10raymond-ndibe: [jobs-api] delete completed one-off jobs when getting jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/157 (https://phabricator.wikimedia.org/T352989) [19:23:41] (03update) 10raymond-ndibe: [jobs-api] delete completed one-off job of same name in create_job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/157 (https://phabricator.wikimedia.org/T352989) [19:27:21] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:30:14] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:33:28] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [19:33:38] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:45:55] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [19:47:27] (03update) 10raymond-ndibe: builds-api: bump to 0.0.186-20250416185507-e06551d7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/748 (https://phabricator.wikimedia.org/T353701) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [19:47:29] (03approved) 10raymond-ndibe: builds-api: bump to 0.0.186-20250416185507-e06551d7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/748 (https://phabricator.wikimedia.org/T353701) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [19:47:36] (03merge) 10raymond-ndibe: builds-api: bump to 0.0.186-20250416185507-e06551d7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/748 (https://phabricator.wikimedia.org/T353701) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [19:55:16] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [builds-cli,builds-api] `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701#10749712 (10Raymond_Ndibe) 05In progress→03Resolved [20:10:51] 10PAWS: [bug] - https://phabricator.wikimedia.org/T392150 (10Mgagat) 03NEW [23:02:50] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [23:08:18] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [23:25:25] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [23:58:42] 10Quarry: [bug] Quarry queries don't run - https://phabricator.wikimedia.org/T392169 (10Liz) 03NEW