[00:04:55] (03update) 10raymond-ndibe: run: mark as skipped if the deploy failed [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/66 (owner: 10dcaro) [00:09:47] (03approved) 10raymond-ndibe: run: mark as skipped if the deploy failed [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/66 (owner: 10dcaro) [00:10:29] (03update) 10raymond-ndibe: run: mark as skipped if the deploy failed [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/66 (owner: 10dcaro) [00:10:34] 10VPS-project-Phabricator, 06collaboration-services: Phabricator test project requires email verification but can't send email - https://phabricator.wikimedia.org/T388022#10719791 (10Dzahn) @JJMC89 Thank you! The part "//in most cases you can set the SMTP host to localhost and port to 25//" is perfect because... [00:16:00] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [00:28:39] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [00:56:03] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [00:56:23] (03update) 10raymond-ndibe: components-api: bump to 0.0.99-20250407225903-bc59462a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/741 (https://phabricator.wikimedia.org/T388830) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [00:56:31] (03approved) 10raymond-ndibe: components-api: bump to 0.0.99-20250407225903-bc59462a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/741 (https://phabricator.wikimedia.org/T388830) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [00:56:39] (03merge) 10raymond-ndibe: components-api: bump to 0.0.99-20250407225903-bc59462a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/741 (https://phabricator.wikimedia.org/T388830) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [01:02:01] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T359808) [01:08:25] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [01:36:50] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804) [01:41:27] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804) [02:00:30] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [02:00:45] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [02:06:27] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:07:11] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [02:16:22] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:17:43] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [02:18:11] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [02:18:48] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:23:52] (03update) 10raymond-ndibe: [jobs-api] stream logs from all containers in all pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [02:30:57] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [02:35:56] (03update) 10raymond-ndibe: builds-api: bump to 0.0.184-20250407201912-10dbacc5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/740 (https://phabricator.wikimedia.org/T388706 https://phabricator.wikimedia.org/T389954) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [02:36:05] (03update) 10raymond-ndibe: builds-api: bump to 0.0.184-20250407201912-10dbacc5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/740 (https://phabricator.wikimedia.org/T388706 https://phabricator.wikimedia.org/T389954) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [02:36:11] (03approved) 10raymond-ndibe: builds-api: bump to 0.0.184-20250407201912-10dbacc5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/740 (https://phabricator.wikimedia.org/T388706 https://phabricator.wikimedia.org/T389954) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [02:36:22] (03merge) 10raymond-ndibe: builds-api: bump to 0.0.184-20250407201912-10dbacc5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/740 (https://phabricator.wikimedia.org/T388706 https://phabricator.wikimedia.org/T389954) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [02:42:52] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:48:19] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:49:11] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:49:49] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [02:50:48] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [02:51:31] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [03:05:42] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [03:08:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:17:58] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [03:18:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:18:33] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [03:19:13] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [03:20:27] 10Toolforge (Toolforge iteration 19): [builds-api] Limit the amount of running builds - https://phabricator.wikimedia.org/T388706#10720042 (10Raymond_Ndibe) 05In progress→03Resolved [03:22:36] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T389118) [03:24:29] 10tool-wscontest: Message param is null - https://phabricator.wikimedia.org/T391310 (10Samwilson) 03NEW [03:25:23] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api] Split the API, core, and storage and runtime modules - https://phabricator.wikimedia.org/T359808#10720060 (10Raymond_Ndibe) [03:28:21] 10Toolforge (Toolforge iteration 19): [jobs-api] Split the core layer and create the core models - https://phabricator.wikimedia.org/T390135#10720062 (10Raymond_Ndibe) 05Open→03In progress [03:31:57] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T389118) [03:32:42] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [03:34:29] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [03:34:34] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [03:35:31] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [03:35:45] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [03:38:34] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [03:39:34] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [03:48:40] 10tool-wscontest: Message param is null - https://phabricator.wikimedia.org/T391310#10720066 (10Samwilson) 05Open→03Resolved a:03Samwilson PR: https://github.com/wikisource/wscontest/pull/90 Merged and released in 2.8.2. [04:12:13] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [07:02:17] 10Striker, 13Patch-For-Review: Add Bitu container to Striker development environment - https://phabricator.wikimedia.org/T362318#10720294 (10SLyngshede-WMF) @Arendpieter there is a patch for review that adds the container. It staled on my end, but the patch was updated in March. See: https://gerrit.wikimedia.o... [07:06:27] 10Tools, 10Wikidata, 07Security: Blocked Wikidata user sockpuppets are doing automated misconduct with QuickStatements - https://phabricator.wikimedia.org/T386978#10720296 (10Magnus) FWIW I added a check on every ~20s batch edit (to not overload the Wikidata API) if the user is blocked, which should then blo... [08:33:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:43:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:05:57] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: improve networktests for newer network setup - https://phabricator.wikimedia.org/T391325 (10aborrero) 03NEW [09:06:18] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: improve networktests for newer network setup - https://phabricator.wikimedia.org/T391325#10720529 (10aborrero) 05Open→03In progress p:05Triage→03Medium [09:25:55] 10Tools, 10Wikidata, 07Security: Blocked Wikidata user sockpuppets are doing automated misconduct with QuickStatements - https://phabricator.wikimedia.org/T386978#10720642 (10Aklapper) @Magnus: Could you please provide a link to that changeset/patch? Thanks. [09:31:24] (03open) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [09:48:12] (03PS2) 10Arendpieter: Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) [09:50:31] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [09:52:00] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [09:52:55] (03PS3) 10Arendpieter: Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) [09:54:40] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:00:19] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:03:35] (03PS4) 10Arendpieter: Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) [10:07:36] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:10:23] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:12:15] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:17:13] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:41:42] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282#10720857 (10cmooney) >>! In T379282#10717839, @gerritbot wrote: > Change #1134699 had a related patch set uploaded (by Majavah; author: Majavah): > %%%[operations/puppet... [10:49:23] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282#10720866 (10taavi) a:03taavi >>! In T379282#10720857, @cmooney wrote: >>>! In T379282#10717839, @gerritbot wrote: >> Change #1134699 had a related patch set uploaded (... [10:50:14] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [10:51:21] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [11:11:02] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [11:12:08] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [11:13:47] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [11:14:13] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [12:08:42] (03update) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [12:10:40] (03merge) 10aborrero: networktests-tofu-provisioning: bootstrap opentofu code [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/1 (https://phabricator.wikimedia.org/T391325) [12:13:02] 10Tool-bulkuserinfo: Optimize Backend - https://phabricator.wikimedia.org/T382655#10721193 (10Gnoeee) [12:14:52] 10Tool-bulkuserinfo: Modify frontend compatile with backend - https://phabricator.wikimedia.org/T382656#10721202 (10Gnoeee) 05Open→03Resolved a:03Gnoeee [12:16:39] (03PS2) 10Eugene233: Denylist for depict items that shouldn't be used [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/950013 (https://phabricator.wikimedia.org/T318843) [12:17:09] (03PS2) 10Eugene233: Submit reviews back to MachineVision service [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/950012 (https://phabricator.wikimedia.org/T315668) [12:51:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:06:19] (03PS2) 10Jelto: ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) [13:08:43] (03CR) 10Jelto: "I uploaded a new patchset which uses the new `Ceph::S3::Credential` structure from Id8979165b96d737addc676f3abf3f088a48eda48." [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:11:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:13:24] (03CR) 10MVernon: [C:03+1] "LGTM, thanks! I added a suggested comment." [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:13:44] (03CR) 10MVernon: [C:03+1] "Done" [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:14:23] (03PS3) 10Jelto: ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) [13:14:30] (03CR) 10Jelto: ceph: add gitlab dummy credentials (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:18:59] (03CR) 10MVernon: [C:03+1] "I feel gerrit shouldn't remove the +1 when you apply my suggestion, but there we are :-)" [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:37:56] (03CR) 10Arnaudb: [C:03+1] ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [13:50:34] 10Tool-bulkuserinfo: Optimize Backend - https://phabricator.wikimedia.org/T382655#10721562 (10Athulvis) 05Open→03Resolved [14:13:46] PROBLEM - Host clouddumps1001 is DOWN: PING CRITICAL - Packet loss = 100% [14:19:01] FIRING: NodeDown: Node clouddumps1001 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [14:19:29] 10cloud-services-team (FY2024/2025-Q3-Q4), 06DC-Ops, 10ops-eqiad: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10721810 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=591bcb32-8025-4bce-af2c-49d023d1b4ca) set by fnegri@cumin1002 for 1 da... [14:20:32] 10Toolforge (Toolforge iteration 19): [components-api] use the component name for the image instead of the default tool - https://phabricator.wikimedia.org/T388830#10721815 (10Raymond_Ndibe) 05In progress→03Resolved [14:20:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [components-api] Add "runs" section to the deployment structure - https://phabricator.wikimedia.org/T389339#10721818 (10Raymond_Ndibe) 05In progress→03Resolved [14:20:56] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [builds-api] don't cleanup pending builds - https://phabricator.wikimedia.org/T389954#10721820 (10Raymond_Ndibe) 05In progress→03Resolved [14:20:57] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#10721822 (10Raymond_Ndibe) 05In progress→03Resolved [14:21:01] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#10721825 (10Raymond_Ndibe) 05Resolved→03In progress [14:31:17] (03CR) 10Jelto: [V:03+2 C:03+2] ceph: add gitlab dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1132643 (https://phabricator.wikimedia.org/T378922) (owner: 10Jelto) [14:36:20] (03open) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [14:37:27] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [14:37:57] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [14:38:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [14:45:52] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10721981 (10Piracalamina) [14:48:02] 10Tools, 10Wikidata, 07Security: Blocked Wikidata user sockpuppets are doing automated misconduct with QuickStatements - https://phabricator.wikimedia.org/T386978#10721992 (10Magnus) @Aklapper https://github.com/magnusmanske/quickstatements_rs/commit/3e50c3a9356e1ddc34a0de7cfd345a4a78b51be5 [14:51:45] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:01:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [15:02:28] FIRING: [3x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-27 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:06:22] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:07:00] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722110 (10Piracalamina) [15:07:28] FIRING: [16x] PuppetAgentFailure: Puppet agent failure detected on instance tools-bastion-12 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:07:28] FIRING: [3x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:09:27] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:09:41] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:12:28] FIRING: [2x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-test-k8s-worker-nfs-10 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:12:28] FIRING: [7x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-19 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:14:14] RECOVERY - Host clouddumps1001 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [15:16:05] 10Data-Services, 06Data-Engineering, 06Data-Platform-SRE: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10722200 (10Gehel) @Ahoelzl : could you validate if those tables should or should not be exposed? Is redaction needed? [15:16:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [15:17:03] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [15:17:27] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [15:17:28] FIRING: [14x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-19 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:21:02] 10Data-Services, 06Data-Engineering, 06Data-Platform-SRE: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10722274 (10Ahoelzl) @Bugreporter thanks for filing. Can you elaborate on the use cases? And also on priority? [15:22:28] FIRING: [4x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-static-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:22:28] FIRING: [26x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:22:31] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722296 (10Piracalamina) [15:27:28] FIRING: [30x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:27:28] FIRING: [7x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:29:42] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722313 (10Piracalamina) [15:32:00] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722328 (10Piracalamina) [15:32:28] FIRING: [34x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:32:28] FIRING: [8x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-bastion-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:32:59] (03update) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:36:15] (03merge) 10aborrero: networktests-infra: create VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/2 (https://phabricator.wikimedia.org/T391325) [15:37:28] FIRING: [49x] PuppetAgentFailure: Puppet agent failure detected on instance tools-bastion-13 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:37:28] FIRING: [5x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-static-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:38:10] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722373 (10Piracalamina) [15:41:16] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369 (10fnegri) 03NEW [15:42:13] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#10722400 (10fnegri) p:05Triage→03High a:03fnegri [15:42:28] RESOLVED: [23x] PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-worker-nfs-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:42:28] RESOLVED: [3x] PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-static-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:44:55] 06cloud-services-team, 10Toolforge, 07Epic: [jobs-cli,builds-cli,toolforge-cli,webservice] Consolidate the Toolforge CLIs - https://phabricator.wikimedia.org/T356262#10722410 (10Addshore) Generally speaking, https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli is a consolidation of CLIs, curr... [15:45:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:49:11] 10Tool-paulina: New design for author page - https://phabricator.wikimedia.org/T391370 (10Pepe_piton) 03NEW [15:49:19] (03PS1) 10Arturo Borrero Gonzalez: wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 [15:49:59] 10Tool-paulina: New design for author page - https://phabricator.wikimedia.org/T391370#10722434 (10Pepe_piton) a:03Pepe_piton [15:50:09] 10cloud-services-team (FY2024/2025-Q3-Q4), 06DC-Ops, 10ops-eqiad: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10722436 (10fnegri) > I'm gonna shut down the server tomorrow for about 1 hour, to check if there's any unexpected impact, then take it back online... [15:50:22] 10Tool-paulina: New design for author page - https://phabricator.wikimedia.org/T391370#10722438 (10Pepe_piton) p:05Triage→03Medium [15:50:38] (03PS2) 10Arturo Borrero Gonzalez: wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 [15:51:26] (03CR) 10FNegri: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [15:53:58] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [15:54:57] (03CR) 10CI reject: [V:04-1] wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [15:55:19] (03PS3) 10Arturo Borrero Gonzalez: wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 [15:58:20] (03CR) 10FNegri: [C:03+1] wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [15:58:40] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [15:59:03] (03CR) 10CI reject: [V:04-1] wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [16:04:20] (03CR) 10Andrew Bogott: [C:03+2] setup.py: pin spicerack version [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133908 (owner: 10Andrew Bogott) [16:08:20] (03Merged) 10jenkins-bot: setup.py: pin spicerack version [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133908 (owner: 10Andrew Bogott) [16:08:36] (03PS5) 10Andrew Bogott: upgrade_openstack_node: don't lock tables when backing up [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133432 [16:15:17] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge-cli,jobs-cli,builds-cli,envvars-cli] Explore OpenAPI SDK tooling for client consolidation - https://phabricator.wikimedia.org/T356261#10722562 (10Addshore) https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli makes use of http... [16:20:41] 10Tool-paulina: New design for work page - https://phabricator.wikimedia.org/T391377 (10Pepe_piton) 03NEW [16:21:14] 10Tool-paulina: New design for work page - https://phabricator.wikimedia.org/T391377#10722613 (10Pepe_piton) p:05Triage→03Medium a:03marfossatti [16:25:48] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge-cli,jobs-cli,builds-cli,envvars-cli] Explore OpenAPI SDK tooling for client consolidation - https://phabricator.wikimedia.org/T356261#10722634 (10Addshore) In https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli currently the... [16:31:25] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [jobs-cli,toolforge-cli] Add tfj as a shortcut for toolforge-jobs command - https://phabricator.wikimedia.org/T309308#10722662 (10Addshore) Partially inspired by this ticket, I added single letter aliases to the top level commands that are currently in... [16:54:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:19:12] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10722754 (10Piracalamina) [18:38:08] 10Toolforge (Toolforge iteration 19): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#10723204 (10Addshore) I spent a little time thinking about this in the context of the new combined cli for toolforge commands. As of https://gitl... [19:16:32] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10723335 (10Piracalamina) [19:18:04] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10723351 (10Piracalamina) [19:21:33] 10Tool-paulina: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10723361 (10Piracalamina) [19:23:49] 10Tool-paulina, 03Wikimedia-Hackathon-2025: Update Wikidata copyright status as a creator (P7763) in Wikidata for people dead over 100 years ago for your country or region - https://phabricator.wikimedia.org/T388576#10723371 (10Piracalamina) [21:34:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:39:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:42:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [21:43:56] FIRING: SystemdUnitDown: The service unit designate-producer.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:49:17] FIRING: KernelErrors: Server cloudcontrol1011 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [21:49:18] FIRING: KernelErrors: Server cloudcontrol1011 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [21:49:23] 06cloud-services-team: KernelErrors Server cloudcontrol1011 logged kernel errors - https://phabricator.wikimedia.org/T391408 (10phaultfinder) 03NEW [21:49:24] 06cloud-services-team: KernelErrors Server cloudcontrol1011 logged kernel errors - https://phabricator.wikimedia.org/T391407 (10phaultfinder) 03NEW [21:49:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:54:28] FIRING: [9x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:58:38] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Ceph, 06DC-Ops, and 2 others: [cloudceph] test the new DELL hard drives throughput - https://phabricator.wikimedia.org/T390134#10724011 (10Jclark-ctr) [21:58:56] FIRING: [3x] SystemdUnitDown: The service unit designate-producer.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:59:06] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Ceph, 06DC-Ops, and 2 others: [cloudceph] test the new DELL hard drives throughput - https://phabricator.wikimedia.org/T390134#10724027 (10Jclark-ctr) @Andrew @dcaro installed 8tb ssd drive [21:59:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:00:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' [22:00:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudcontrol1011.eqiad.wmnet' [22:04:28] RESOLVED: [7x] PuppetAgentNoResources: No Puppet resources found on instance runner-1021 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:07:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' [22:13:56] RESOLVED: SystemdUnitDown: The service unit designate-producer.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:16:06] PROBLEM - Host cloudcontrol1011 is DOWN: PING CRITICAL - Packet loss = 100% [22:18:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1011.eqiad.wmnet' [22:18:34] RECOVERY - Host cloudcontrol1011 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [22:23:22] 06cloud-services-team, 10decommission-hardware: decommission cloudcontrol1005.eqiad.wmnet - https://phabricator.wikimedia.org/T391413 (10Andrew) 03NEW [22:24:01] 06cloud-services-team, 10decommission-hardware: decommission cloudcontrol1005.eqiad.wmnet - https://phabricator.wikimedia.org/T391413#10724093 (10Andrew) [22:24:03] 10cloud-services-team (Hardware), 13Patch-For-Review: cloudcontrol1011 service implementation tracking - https://phabricator.wikimedia.org/T391300#10724094 (10Andrew) [22:32:07] RESOLVED: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:53:46] PROBLEM - Memcached on cloudcontrol1005 is CRITICAL: connect to address 10.64.151.3 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [22:56:46] RECOVERY - Memcached on cloudcontrol1005 is OK: TCP OK - 0.000 second response time on 10.64.151.3 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [22:56:56] FIRING: SystemdUnitDown: The service unit designate-api.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:01:56] FIRING: [3x] SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:06:56] RESOLVED: SystemdUnitDown: The service unit designate-api.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:10:02] (03update) 10chuckonwumelu: Draft: Start [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/1 [23:11:56] FIRING: [2x] SystemdUnitDown: The service unit designate-api.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:15:07] (03CR) 10Andrew Bogott: upgrade_openstack_node: don't lock tables when backing up (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133432 (owner: 10Andrew Bogott) [23:16:56] RESOLVED: SystemdUnitDown: The service unit designate-api.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown