[01:17:31] (03PS1) 10Legoktm: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 [01:20:42] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:26:21] (03PS1) 10Legoktm: Wrap job in a 9m timeout [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071365 [01:44:42] (03update) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [01:45:41] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10128531 (10Tgr) Thanks @Samwilson! Since we wanted to migrate away from Vagrant eventually anyway, there is no point in trying to get it working on Bul... [01:46:52] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10128529 (10Samwilson) 05Open→03Declined > (that would be T322991) Ah, thanks! I think this task should be declined then (as we're not going to cont... [02:01:57] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:52:00] (03CR) 10AntiCompositeNumber: [C:03+2] Wrap job in a 9m timeout [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071365 (owner: 10Legoktm) [02:52:25] (03Merged) 10jenkins-bot: Wrap job in a 9m timeout [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071365 (owner: 10Legoktm) [04:01:57] RESOLVED: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:20:42] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:36:15] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 03Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10128591 (10ABran-WMF) [06:53:33] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10128610 (10SLyngshede-WMF) @Dzahn Sorry, Hal was allowed to keep many of his permissions as he'll still b... [07:42:02] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10128737 (10ABran-WMF) >>! In T365717#10126627, @fnegri wrote: > @ABran-WMF I would appreciate if you could do a quick review... [07:57:12] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14): toolforge: prometheus server died - https://phabricator.wikimedia.org/T370143#10128798 (10dcaro) 05In progress→03Resolved The service has been stable for the last few days, and we are gathering the metrics we wanted :) Cl... [08:30:31] 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): Prepare and check storage layer for bdrwiki - https://phabricator.wikimedia.org/T371759#10128906 (10ABran-WMF) >>>! In T371759#10125982, @BTullis wrote: > Any views on whether this creation of the `${wiki}_p` database and the associat... [08:42:39] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: the cookbook should use a different git tree copy than the main one - https://phabricator.wikimedia.org/T374022#10128979 (10aborrero) p:05Triage→03Medium [08:43:34] 06cloud-services-team, 10Cloud-VPS: tofu-infra: extend coverage to Designate DNS data - https://phabricator.wikimedia.org/T374338 (10aborrero) 03NEW [08:46:39] 06cloud-services-team, 10Cloud-VPS: tofu-infra: extend coverage to Designate DNS data - https://phabricator.wikimedia.org/T374338#10129005 (10aborrero) 05Open→03In progress p:05Triage→03Medium [09:15:39] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T373986) [09:15:45] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [09:20:42] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:30:53] 06cloud-services-team: SystemdUnitDown Unit purge_vm_rbd_images.service on node cloudcontrol1005 has been down for long. - https://phabricator.wikimedia.org/T374313#10129130 (10dcaro) 05Open→03Resolved a:03dcaro Transient error due to image going away while doing the cleanup [09:35:24] 06cloud-services-team: SystemdUnitDown Unit opentofu-infra-diff.service on node cloudcontrol1007 has been down for long. - https://phabricator.wikimedia.org/T374295#10129146 (10dcaro) 05Open→03Resolved a:03dcaro Not failing anymore [09:37:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [09:37:47] (03open) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:37:53] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:38:01] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:39:48] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:39:49] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:40:19] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:42:18] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:42:19] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:42:45] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:44:08] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:44:09] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:44:25] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:49:49] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:49:51] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:49:59] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:51:38] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:51:39] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [09:51:53] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:58:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:58:27] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:59:19] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:59:33] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:07:25] 10Cloud-VPS (Debian Buster Deprecation), 10linkwatcher: Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#10129246 (10fnegri) [10:09:29] 10Cloud-VPS (Debian Buster Deprecation), 10linkwatcher: Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#10129243 (10fnegri) 05Open→03Resolved a:03Beetstra Thank you @Beetstra! I will mark this as Resolved. [10:11:40] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:11:51] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:12:40] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:12:41] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [10:12:51] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:47:55] (03CR) 10David Caro: Revert^2 "openstack.tofu: use run_script instead of reimplementing it" (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [10:54:34] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:54:35] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [10:54:48] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:54:56] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:55:34] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:55:57] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [10:56:01] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [10:56:24] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [11:10:10] 10PAWS: upgrade ansible - https://phabricator.wikimedia.org/T374349 (10rook) 03NEW [11:11:43] vivian-rook opened https://github.com/toolforge/paws/pull/453 [11:16:04] vivian-rook closed https://github.com/toolforge/paws/pull/453 [11:16:24] 10PAWS: upgrade ansible - https://phabricator.wikimedia.org/T374349#10129438 (10rook) 05Open→03Resolved a:03rook [11:20:07] 10Tools: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10129443 (10Bugreporter) [11:35:10] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [11:35:48] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [11:35:57] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [11:36:52] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [11:37:22] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] "LGTM." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [11:37:24] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [11:41:31] (03Merged) 10jenkins-bot: Revert^2 "openstack.tofu: use run_script instead of reimplementing it" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1070020 (owner: 10David Caro) [11:50:50] 10Tools: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10129530 (10Fnielsen) See also https://github.com/lucaswerkmeister/tool-lexeme-forms/issues/226 [12:22:04] (03approved) 10fnegri: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) (owner: 10aborrero) [12:22:46] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1071597 (owner: 10L10n-bot) [12:28:51] (03PS1) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: don't collapse MR notes with plans if they are small [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071600 (https://phabricator.wikimedia.org/T370414) [12:29:05] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [12:29:34] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [12:38:50] (03PS2) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: don't collapse MR notes with plans if they are small [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071600 (https://phabricator.wikimedia.org/T370414) [12:38:51] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [12:39:06] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [12:39:39] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [12:40:16] (03PS3) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: don't collapse MR notes with plans if they are small [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071600 (https://phabricator.wikimedia.org/T370414) [12:42:22] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.openstack.tofu: don't collapse MR notes with plans if they are small [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071600 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [12:46:22] (03Merged) 10jenkins-bot: wmcs.openstack.tofu: don't collapse MR notes with plans if they are small [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071600 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [13:20:42] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:23:27] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [13:37:03] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#10129864 (10dcaro) Done, dashboard is in https://grafana... [13:38:24] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#10129888 (10dcaro) [13:40:35] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#10129859 (10dcaro) [13:40:51] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#10129894 (10dcaro) a:03dcaro [13:44:34] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#10129896 (10dcaro) 05Open→03Resolved [13:54:12] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10129965 (10dcaro) [14:11:06] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130050 (10rook) https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/JA4F2K4EBEC3CMS54JDTJBMRAPKND2NN/ [14:12:52] (03update) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [14:12:59] (03merge) 10aborrero: tofu-infra: add support for DNS zones and import them [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 (https://phabricator.wikimedia.org/T374338) [14:13:20] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [14:14:24] (03PS1) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: fix missing quote [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071622 [14:14:55] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [14:15:50] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.openstack.tofu: fix missing quote [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071622 (owner: 10Arturo Borrero Gonzalez) [14:17:16] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10130071 (10fnegri) @ABran-WMF thank you for reviewing! > I've found that part that seems a bit old What exactly is incorrec... [14:19:19] (03Merged) 10jenkins-bot: wmcs.openstack.tofu: fix missing quote [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071622 (owner: 10Arturo Borrero Gonzalez) [14:20:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [14:26:26] 06cloud-services-team, 06Data-Engineering, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10130139 (10fnegri) p:05Triage→03Medium [14:26:47] 06cloud-services-team, 06Data-Engineering, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10130140 (10fnegri) [14:26:49] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Infrastructure Security: wikireplicas root access - https://phabricator.wikimedia.org/T344599#10130141 (10fnegri) [14:29:00] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [14:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:29:06] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [14:29:38] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [14:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:31:41] (03open) 10aborrero: data: organize data into directories [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/39 (https://phabricator.wikimedia.org/T374338) [14:44:25] (03PS2) 10Raymond Ndibe: [wmcs-cookbook] update toolsbeta-test-k8s-control vms [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071052 (https://phabricator.wikimedia.org/T359641) [14:44:31] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362 (10rook) 03NEW [14:46:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [14:49:53] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130274 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/69 [14:50:10] vivian-rook opened https://github.com/toolforge/quarry/pull/69 [14:54:46] vivian-rook closed https://github.com/toolforge/quarry/pull/69 [14:55:26] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130306 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/69 [14:56:41] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130299 (10rook) 05Open→03Resolved a:03rook [15:02:23] 10Quarry: Upgrade to Ansible 10.3.0 - https://phabricator.wikimedia.org/T374362#10130341 (10Aklapper) [15:04:20] !log dcaro@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T373986) [15:04:25] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [15:05:10] (03CR) 10David Caro: "Already done in I6cb4f5bc555acd860b37161e81dc8a46d9c08cd7" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071052 (https://phabricator.wikimedia.org/T359641) (owner: 10Raymond Ndibe) [15:05:49] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T373986) [15:06:10] 10cloud-services-team (FY2024/2025-Q1-Q2): Drain C8 rack - https://phabricator.wikimedia.org/T374043#10130367 (10dcaro) [15:11:35] (03CR) 10Raymond Ndibe: "ok. declining then" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071052 (https://phabricator.wikimedia.org/T359641) (owner: 10Raymond Ndibe) [15:11:47] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130397 (10Nemo_bis) > The query itself will remain, so getting fresh results should be nothing more than a submit query away. That's not quite accurate when the purpose of the query is to get trends, for example in the numbe... [15:11:56] (03Abandoned) 10Raymond Ndibe: [wmcs-cookbook] update toolsbeta-test-k8s-control vms [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1071052 (https://phabricator.wikimedia.org/T359641) (owner: 10Raymond Ndibe) [15:12:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:14:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:18:39] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [15:18:41] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:18:41] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [15:19:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:22:53] 10PAWS: Upgrade to Ansible 10.3.0 - https://phabricator.wikimedia.org/T374349#10130458 (10rook) [15:23:32] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/39 [15:24:06] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/39 [15:24:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:24:23] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [15:24:23] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:24:24] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [15:25:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [15:25:09] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:25:10] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-11 from 1.26.15 to 1.27.16 (T359641) [15:25:10] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:25:18] (03merge) 10aborrero: data: organize data into directories [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/39 (https://phabricator.wikimedia.org/T374338) [15:25:58] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-10 from 1.26.15 to 1.27.16 (T359641) [15:25:58] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:26:24] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130460 (10rook) >>! In T360041#10130397, @Nemo_bis wrote: >> The query itself will remain, so getting fresh results should be nothing more than a submit query away. > > That's not quite accurate when the purpose of the query... [15:28:25] 10Tool-video-answer-tool, 06Future-Audiences: Improve rendering of images - https://phabricator.wikimedia.org/T374367 (10Maryana) 03NEW [15:31:11] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-10 from 1.26.15 to 1.27.16 (T359641) [15:31:12] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [15:31:13] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [15:33:19] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research TTS options - https://phabricator.wikimedia.org/T374368 (10Maryana) 03NEW [15:33:48] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Improve rendering of images - https://phabricator.wikimedia.org/T374367#10130546 (10Maryana) [15:37:12] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:38:32] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130581 (10SD0001) Is there a benefit to doing this? According to T178520, disk usage was 112 GB in 2017. I seem to recall it being around 195G last time I checked. Although now, it appears to have mysteriously shrunk to 100G:... [15:44:42] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [15:44:45] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:45:10] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [15:51:01] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130639 (10rook) The issue is not one of size, or suspicion that people may think the data is fresh, but the data itself. Periodically there are tickets opened regarding data that has been removed from the wikis but remains in... [15:55:12] (03open) 10aborrero: tofu-infra: introduce DNS records [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/40 (https://phabricator.wikimedia.org/T374338) [16:08:11] RESOLVED: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:09:52] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component cert-manager [16:09:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:11:18] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: MVP: Privately serve wikitech via mwdebug1001 - https://phabricator.wikimedia.org/T371537#10130750 (10jijiki) [16:11:20] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10130754 (10Dzahn) @SLyngshede-WMF I see. Thanks for the update. I was just curious how the process works.... [16:14:33] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: MVP: Privately serve wikitech via mwdebug1001 - https://phabricator.wikimedia.org/T371537#10130738 (10jijiki) @Ladsgroup and I tested wikitech.wikimedia.org on mwdebug1001 The following tests worked as expected: * Reading and articl... [16:14:53] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager [16:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:16:45] !log raymondndibe@wmf3402 tools START - Cookbook wmcs.toolforge.component.deploy for component cert-manager [16:16:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:23:33] !log raymondndibe@wmf3402 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager [16:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:23:47] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: MVP: Privately serve wikitech via mwdebug1001 - https://phabricator.wikimedia.org/T371537#10130812 (10Ladsgroup) Regarding jobs, since we don't have an easy control where it'll be consumed, it will use the default wikitech config whic... [16:25:03] (03update) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [16:25:09] (03approved) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [16:25:13] (03merge) 10raymond-ndibe: [toolforge-deploy] upgrade cert-manager [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/517 (https://phabricator.wikimedia.org/T359641) [16:26:32] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not - https://phabricator.wikimedia.org/T373072#10130837 (10dcaro) > https://github.com/Tufin/oasdiff (Apache License 2.0) . This can detect brea... [16:28:28] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research getting image attribution via API - https://phabricator.wikimedia.org/T374375 (10Maryana) 03NEW [16:29:57] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research getting image attribution via API - https://phabricator.wikimedia.org/T374375#10130865 (10Maryana) [16:30:26] 10Tool-video-answer-tool, 06Future-Audiences: Implement attribution requirements - https://phabricator.wikimedia.org/T374376 (10Maryana) 03NEW [16:36:32] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-1 from 1.26.15 to 1.27.16 (T359641) [16:36:33] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:36:34] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [16:37:22] Raymond_Ndibe: I see you are going ahead with the upgrade \o/ [16:37:38] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-1 from 1.26.15 to 1.27.16 (T359641) [16:37:38] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:38:48] uuuhhh, interesting error [16:39:02] (the irc logging one) [16:39:40] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-2 from 1.26.15 to 1.27.16 (T359641) [16:39:40] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:40:42] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-2 from 1.26.15 to 1.27.16 (T359641) [16:40:42] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:41:10] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:41:10] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:42:14] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:42:15] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:42:16] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [16:47:35] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:47:36] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:47:36] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [16:48:39] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:48:39] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:50:04] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:50:04] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:50:45] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [16:50:45] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:52:37] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.26.15 (T359641) [16:52:39] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [16:52:39] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [16:53:04] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.26.15 (T359641) [16:53:05] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:03:51] (03update) 10dcaro: cronjob: add simple cronjob [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/3 (https://phabricator.wikimedia.org/T368602) [17:05:33] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [17:05:34] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:05:35] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [17:06:31] (03open) 10dcaro: ChecksDashboard: add scheduled job check [toolforge-repos/sample-complex-app-frontend] (fix_color_scheme) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/6 [17:06:38] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [17:06:38] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:06:48] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-4 from 1.26.15 to 1.27.16 (T359641) [17:06:49] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:07:22] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node toolsbeta-test-k8s-worker-nfs-4 from 1.26.15 to 1.27.16 (T359641) [17:07:22] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:08:08] (03update) 10dcaro: cronjob: add simple cronjob [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/3 (https://phabricator.wikimedia.org/T368602) [17:08:21] 10Tool-video-answer-tool, 06Future-Audiences: Implement attribution requirements - https://phabricator.wikimedia.org/T374376#10131054 (10Maryana) Waiting for recommendations, coming next week [17:09:56] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research getting image attribution via API - https://phabricator.wikimedia.org/T374375#10131057 (10Maryana) Some design decisions: currently rendering license as string that appears on Commons (e.g. includes PD details; simplify CC?) Might be additional... [17:10:20] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component kyverno [17:10:20] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:10:34] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research getting image attribution via API - https://phabricator.wikimedia.org/T374375#10131059 (10Maryana) Styling is coming as part of bigger attribution guidelines. [17:12:05] (03update) 10dcaro: ChecksDashboard: add scheduled job check [toolforge-repos/sample-complex-app-frontend] (fix_color_scheme) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/6 [17:15:24] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component kyverno [17:15:25] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:16:02] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [17:16:02] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:16:02] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [17:17:02] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-3 from 1.26.15 to 1.27.16 (T359641) [17:17:02] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:17:31] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-4 from 1.26.15 to 1.27.16 (T359641) [17:17:32] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:18:42] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-4 from 1.26.15 to 1.27.16 (T359641) [17:18:42] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:19:48] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-8 from 1.26.15 to 1.27.16 (T359641) [17:19:48] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:20:45] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-8 from 1.26.15 to 1.27.16 (T359641) [17:20:46] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:20:56] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-10 from 1.26.15 to 1.27.16 (T359641) [17:20:57] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:20:58] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-ingress-10 from 1.26.15 to 1.27.16 (T359641) [17:20:58] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:21:23] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-7 from 1.26.15 to 1.27.16 (T359641) [17:21:24] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:21:24] T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641 [17:22:23] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-7 from 1.26.15 to 1.27.16 (T359641) [17:22:24] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:22:38] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-6 from 1.26.15 to 1.27.16 (T359641) [17:22:38] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:22:57] that looks like a bug in the bot, actually [17:23:11] the part that it adds user@host at the start [17:23:34] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-6 from 1.26.15 to 1.27.16 (T359641) [17:23:34] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:23:36] needs to flip around the order, first "toolsbeta" and then that and it would work [17:23:59] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-10 from 1.26.15 to 1.27.16 (T359641) [17:23:59] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:24:47] Raymond_Ndibe: the logs won't be logged because the stashbot expects the log message to start with the project name, followed by everything else [17:24:57] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-10 from 1.26.15 to 1.27.16 (T359641) [17:24:58] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:25:10] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-11 from 1.26.15 to 1.27.16 (T359641) [17:25:10] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:26:09] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Improve rendering of images - https://phabricator.wikimedia.org/T374367#10131123 (10Maryana) Found a Ken Burns improvement but might make render time slower but that's probably okay. Couple days of work to implement this approach. [17:26:13] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-11 from 1.26.15 to 1.27.16 (T359641) [17:26:14] logmsgbot_cloud: Unknown project "raymond-ndibe@cloudcumin1001" [17:29:52] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research TTS options - https://phabricator.wikimedia.org/T374368#10131159 (10Maryana) ElevenLabs free API tier rate limits us, but has better features than OpenAI - faster subtitling, bigger diversity of voices. Open source solutions investigated haven'... [17:47:22] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Improve rendering of images - https://phabricator.wikimedia.org/T374367#10131235 (10Maryana) p:05Triage→03High [18:03:08] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Improve rendering of images - https://phabricator.wikimedia.org/T374367#10131286 (10Maryana) a:03derenrich [18:06:15] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research TTS options - https://phabricator.wikimedia.org/T374368#10131288 (10Maryana) p:05Triage→03High a:03derenrich [18:06:33] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Research getting image attribution via API - https://phabricator.wikimedia.org/T374375#10131292 (10Maryana) a:03derenrich [18:18:49] (03PS1) 10Urbanecm: app: Fix unprivileged access [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1071676 [19:24:23] FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-62 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [19:29:23] RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-62 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [19:44:27] (03update) 10bd808: Make image useful for Brad [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/1 [19:44:56] (03update) 10bd808: Make image useful for Brad [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/1 [20:21:12] 10wikitech.wikimedia.org, 10Local-Wiki-Template-And-Gadget-Issues, 07Mobile: Improve mobile experience on Wikitech (as navbox classes in templates are intentionally not displayed) - https://phabricator.wikimedia.org/T242931#10131846 (10TBurmeister) 05Open→03Resolved a:03TBurmeister I have just chec... [20:37:56] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14): [infra,k8s,kyverno] Toolforge Kyverno low policy resources tools - https://phabricator.wikimedia.org/T373972#10131907 (10dcaro) 05In progress→03Resolved [21:02:31] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10131990 (10Krinkle) Is there a recommended place to paste Quarry results in a way that 1) doesn't automatically expire, 2) is human-readable, and 3) has CSV/JSON export? If we don't recommend such a place, I assume we go from... [21:17:43] (03PS2) 10Krinkle: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:18:05] (03PS3) 10Krinkle: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:18:17] (03PS4) 10Krinkle: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:18:29] (03PS5) 10Krinkle: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:23:25] (03CR) 10Krinkle: [C:03+2] Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:23:53] (03Merged) 10jenkins-bot: Add .wav as timed media [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1071364 (owner: 10Legoktm) [21:30:51] 10Tools: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10132077 (10LucasWerkmeister) Well, apparently there were some random Wikimedia errors: `name=uwsgi.log raise ValueError("Could not decode as JSON:\n{0}" ValueError: Could not decode as JSON: !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T373986) [21:32:14] T373986: cloudsw1-c8-eqiad is unstable - https://phabricator.wikimedia.org/T373986 [21:33:42] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#10132079 (10Krinkle) Every once in a while I get cron jobs stuck in toolforge-jobs/Kubernetes for mysterious reasons. In this state, the job is stuck for multiple days cont... [21:45:27] 10Tools: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10132099 (10LucasWerkmeister) Possible improvement to the error page: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1071715 [23:15:38] (03CR) 10Urbanecm: [C:03+2] app: Fix unprivileged access [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1071676 (owner: 10Urbanecm) [23:15:58] (03Merged) 10jenkins-bot: app: Fix unprivileged access [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1071676 (owner: 10Urbanecm) [23:24:05] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#10132235 (10AntiCompositeNumber) My preferred solution for this is a `concurrencyPolicy` of `Replace`, which translates to "no matter what happened with k8s or the script o... [23:40:03] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10132239 (10rook) I apologize I have yet to understand the interest in old data. The above seems to be suggesting that if the data is retained for 90 days, it would be copied all over the web to be read later. Results that are...