[00:27:35] 06Toolforge-standards-committee: Check Community CRM for any known conflicts with committee nominees - https://phabricator.wikimedia.org/T411440 (10bd808) 03NEW [00:28:16] 06Toolforge-standards-committee: Check Community CRM for any known conflicts with committee nominees - https://phabricator.wikimedia.org/T411440#11422065 (10bd808) p:05Triage→03High [01:05:27] 10Toolforge (Quota-requests): Elasticsearch credential request for gutensearch - https://phabricator.wikimedia.org/T411445 (10Ijon) 03NEW [01:06:38] 10Toolforge (Quota-requests): Elasticsearch credential request for gutensearch - https://phabricator.wikimedia.org/T411445#11422166 (10Ijon) [04:33:07] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1213447 (owner: 10L10n-bot) [08:30:52] !log volans@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.11.3 (T399313) [08:30:52] !log volans@cloudcumin1001 tools Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.11.3 (T399313) [08:30:57] T399313: Add tracing to understand Toolforge and CloudVPS usage and dependencies - https://phabricator.wikimedia.org/T399313 [08:31:46] !log volans@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.11.3 (T399313) [10:19:15] 06cloud-services-team, 10Horizon: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470 (10fgiunchedi) 03NEW [10:36:09] 06cloud-services-team, 10Horizon: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470#11422973 (10fgiunchedi) Availability as seen by network probes: {F70817181} [12:13:33] (03open) 10taavi: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:13:37] (03update) 10taavi: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:14:47] (03update) 10taavi: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:15:28] (03approved) 10fnegri: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 (owner: 10taavi) [12:21:20] (03merge) 10taavi: build: Upgrade pre-commit dependencies [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/91 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:21:38] (03update) 10taavi: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:24:02] (03merge) 10taavi: cli: Validate buildservice image names [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:55:27] 06cloud-services-team, 10Toolforge: Java missing from PATH in Clojure buildpack on Toolforge - https://phabricator.wikimedia.org/T411486 (10KBach) 03NEW [13:47:06] 06cloud-services-team, 10Cloud-VPS: Upgrade cloud-vps hosts to Debian Trixie - https://phabricator.wikimedia.org/T409579#11423701 (10cmooney) >>! In T409579#11419519, @Andrew wrote: >>>! In T409579#11418062, @Andrew wrote: >> Just now I ran into this error during reimage: >> >> >>> >>> RuntimeError: Host is... [14:22:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:46:46] (03merge) 10pepepiton: T397554: Add Wikidata Query Service link to works list [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/66 (owner: 10dipanshu1223) [14:50:02] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11423981 (10fnegri) Apologies for the delay, I have just created the alerts and you can see them at https://prometheus.wmcloud.org/rules#vi... [14:54:27] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424009 (10Reputation22) >>! In T409668#11423981, @fnegri wrote: > Apologies for the delay, I have just created the alerts and you can see... [14:57:00] RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:57:30] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:00:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a07ac179-366e-49a4-9499-bee721949963 (cluster eqiad1) [15:01:01] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a07ac179-366e-49a4-9499-bee721949963 (cluster eqiad1) [15:02:30] RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:02:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 1b6df367-1d3d-4e48-8333-7a4f79a49a2a (cluster eqiad1) [15:03:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 1b6df367-1d3d-4e48-8333-7a4f79a49a2a (cluster eqiad1) [15:03:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9f04369c-d074-44e2-a3b2-b0545accd0e0 (cluster eqiad1) [15:04:26] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9f04369c-d074-44e2-a3b2-b0545accd0e0 (cluster eqiad1) [15:04:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 2a87ab18-417b-42a6-9ee1-a273a2379e62 (cluster eqiad1) [15:04:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 2a87ab18-417b-42a6-9ee1-a273a2379e62 (cluster eqiad1) [15:04:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d4f8ac4d-3059-499a-961c-505f6ff89675 (cluster eqiad1) [15:05:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d4f8ac4d-3059-499a-961c-505f6ff89675 (cluster eqiad1) [15:05:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9b6c52a7-527d-4513-b996-160af646c5fb (cluster eqiad1) [15:06:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9b6c52a7-527d-4513-b996-160af646c5fb (cluster eqiad1) [15:06:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 7e7b75d2-0f13-4973-ad18-3dc7a52b0781 (cluster eqiad1) [15:06:53] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 7e7b75d2-0f13-4973-ad18-3dc7a52b0781 (cluster eqiad1) [15:06:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm dedb5d73-e8f8-47d3-8598-2900be096236 (cluster eqiad1) [15:07:23] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm dedb5d73-e8f8-47d3-8598-2900be096236 (cluster eqiad1) [15:07:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9c7abfa9-a848-4ed5-8abb-dda2cb842b04 (cluster eqiad1) [15:08:31] 06cloud-services-team (FY2025/26-Q1-Q2), 10Toolforge, 07Sustainability (Incident Followup): [toolsdb] crash recovery can fail because of insufficient innodb_log_file_size - https://phabricator.wikimedia.org/T409922#11424082 (10fnegri) 05In progress→03Resolved A week later, the "Crash recovery is brok... [15:09:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9c7abfa9-a848-4ed5-8abb-dda2cb842b04 (cluster eqiad1) [15:09:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm b5fcdad2-d7d9-4ba2-903e-188231b05f71 (cluster eqiad1) [15:11:41] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm b5fcdad2-d7d9-4ba2-903e-188231b05f71 (cluster eqiad1) [15:11:42] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 143f7189-22b2-4708-9a33-91d368a5c8eb (cluster eqiad1) [15:12:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 143f7189-22b2-4708-9a33-91d368a5c8eb (cluster eqiad1) [15:12:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm bbe5835d-5e8d-4778-b80f-5c6424928a88 (cluster eqiad1) [15:12:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm bbe5835d-5e8d-4778-b80f-5c6424928a88 (cluster eqiad1) [15:12:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 7cafb806-ecc1-459d-b6b3-4213511f1257 (cluster eqiad1) [15:13:08] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 7cafb806-ecc1-459d-b6b3-4213511f1257 (cluster eqiad1) [15:13:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 06ea2eca-b4e5-42fa-afc3-684b3c2b87a3 (cluster eqiad1) [15:13:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 06ea2eca-b4e5-42fa-afc3-684b3c2b87a3 (cluster eqiad1) [15:13:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d3812466-439a-4355-901c-b1097a033d0b (cluster eqiad1) [15:14:15] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d3812466-439a-4355-901c-b1097a033d0b (cluster eqiad1) [15:14:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 62cf19eb-ebf9-49d9-baa7-fba2cd6942d6 (cluster eqiad1) [15:14:22] (03update) 10oluwatumininu: Added SPARQL query link to author's works list page (T397554) [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/71 [15:15:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 62cf19eb-ebf9-49d9-baa7-fba2cd6942d6 (cluster eqiad1) [15:15:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 55b557f8-5817-47f6-adb4-abccac2b2997 (cluster eqiad1) [15:15:45] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 55b557f8-5817-47f6-adb4-abccac2b2997 (cluster eqiad1) [15:15:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 2b573025-a221-482a-b6b7-fd7e1b1308f7 (cluster eqiad1) [15:16:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 2b573025-a221-482a-b6b7-fd7e1b1308f7 (cluster eqiad1) [15:16:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm def4772e-c01f-4e5e-8e70-d62a546ebc2a (cluster eqiad1) [15:16:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm def4772e-c01f-4e5e-8e70-d62a546ebc2a (cluster eqiad1) [15:16:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm b4b2d7cd-3492-4d04-86ce-4c0b8344ddc3 (cluster eqiad1) [15:17:22] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm b4b2d7cd-3492-4d04-86ce-4c0b8344ddc3 (cluster eqiad1) [15:17:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a8269e6b-e09d-4b6b-909e-5e4014165440 (cluster eqiad1) [15:17:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a8269e6b-e09d-4b6b-909e-5e4014165440 (cluster eqiad1) [15:17:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm b89a2d14-2bc3-481b-baf6-496edadbe242 (cluster eqiad1) [15:18:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm b89a2d14-2bc3-481b-baf6-496edadbe242 (cluster eqiad1) [15:18:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 94f8be5c-3cdf-47cb-80b2-43c44da01789 (cluster eqiad1) [15:19:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 94f8be5c-3cdf-47cb-80b2-43c44da01789 (cluster eqiad1) [15:19:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 7dc8757e-8b8d-4cc9-ac8e-a2f925639f0b (cluster eqiad1) [15:19:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm None (cluster eqiad1) [15:19:32] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm None (cluster eqiad1) [15:19:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm ci2.mediawiki-quickstart.eqiad1.wikimedia.cloud (cluster eqiad1) [15:19:51] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm ci2.mediawiki-quickstart.eqiad1.wikimedia.cloud (cluster eqiad1) [15:20:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 7dc8757e-8b8d-4cc9-ac8e-a2f925639f0b (cluster eqiad1) [15:20:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d409f39a-e24a-462e-b588-6f5f6557e26b (cluster eqiad1) [15:20:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d409f39a-e24a-462e-b588-6f5f6557e26b (cluster eqiad1) [15:20:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 995817db-0966-485d-aca5-e5377c77a005 (cluster eqiad1) [15:21:01] 06cloud-services-team (FY2025/26-Q1-Q2), 10Toolforge, 10Wiki-Loves-Monuments-Database, 07Sustainability (Incident Followup): [toolsdb] ibdata1 growing on primary - https://phabricator.wikimedia.org/T409716#11424150 (10fnegri) 05In progress→03Resolved The transaction history length is still having s... [15:21:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 995817db-0966-485d-aca5-e5377c77a005 (cluster eqiad1) [15:21:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 6fa9b0be-219d-4b10-962e-fa3a71f6740c (cluster eqiad1) [15:21:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 5e5c3bad-f1c7-49e5-b846-edaf111af83c (cluster eqiad1) [15:22:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 5e5c3bad-f1c7-49e5-b846-edaf111af83c (cluster eqiad1) [15:22:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 6fa9b0be-219d-4b10-962e-fa3a71f6740c (cluster eqiad1) [15:22:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 7e3011e8-aed8-4bed-8e18-f75afe3ec3a2 (cluster eqiad1) [15:22:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm e6796dbd-2511-4bf6-bdee-4a14a7414d5f (cluster eqiad1) [15:22:56] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424166 (10fnegri) They are not accessible via Horizon but they are accessible from the following public links: * https://prometheus.wmclo... [15:23:22] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 7e3011e8-aed8-4bed-8e18-f75afe3ec3a2 (cluster eqiad1) [15:23:23] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm f8e08f70-f87e-413d-acad-080126ad5b1a (cluster eqiad1) [15:23:30] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm e6796dbd-2511-4bf6-bdee-4a14a7414d5f (cluster eqiad1) [15:23:44] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9c66be4e-6787-4844-a9ff-a65295ac5aac (cluster eqiad1) [15:23:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm f8e08f70-f87e-413d-acad-080126ad5b1a (cluster eqiad1) [15:23:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a4892d89-0981-412e-9f00-8882416948a1 (cluster eqiad1) [15:24:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a4892d89-0981-412e-9f00-8882416948a1 (cluster eqiad1) [15:24:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9c66be4e-6787-4844-a9ff-a65295ac5aac (cluster eqiad1) [15:24:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 63f68215-d302-4684-a91e-58f5272486a5 (cluster eqiad1) [15:25:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9a7cf939-c634-4aa1-9fd2-dbc14b18d70e (cluster eqiad1) [15:25:12] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 63f68215-d302-4684-a91e-58f5272486a5 (cluster eqiad1) [15:25:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 63a229be-765f-4b48-b8d9-24ee39243604 (cluster eqiad1) [15:25:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9a7cf939-c634-4aa1-9fd2-dbc14b18d70e (cluster eqiad1) [15:25:41] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 63a229be-765f-4b48-b8d9-24ee39243604 (cluster eqiad1) [15:25:42] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d73171e3-49ef-4d40-8008-a900781ea102 (cluster eqiad1) [15:26:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 966b6ab3-b561-4e46-bfcd-1681ce9e91ac (cluster eqiad1) [15:26:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d73171e3-49ef-4d40-8008-a900781ea102 (cluster eqiad1) [15:26:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm f7e8f001-e9c0-4fe9-8887-32289702b804 (cluster eqiad1) [15:26:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm f7e8f001-e9c0-4fe9-8887-32289702b804 (cluster eqiad1) [15:26:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 0b5b7c51-dc42-4bec-90f2-161807a385f7 (cluster eqiad1) [15:26:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 966b6ab3-b561-4e46-bfcd-1681ce9e91ac (cluster eqiad1) [15:27:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a5705913-72f2-4abd-84e6-3e084bfbd98d (cluster eqiad1) [15:27:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 0b5b7c51-dc42-4bec-90f2-161807a385f7 (cluster eqiad1) [15:27:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 31bf05f8-122b-4558-8932-7ac4b8375ed5 (cluster eqiad1) [15:27:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 31bf05f8-122b-4558-8932-7ac4b8375ed5 (cluster eqiad1) [15:27:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 5037f71d-bcbf-4ed7-809b-052ca6026219 (cluster eqiad1) [15:28:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 5037f71d-bcbf-4ed7-809b-052ca6026219 (cluster eqiad1) [15:28:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 2a4b9dfd-7006-4b5b-8c95-7883709e5b2d (cluster eqiad1) [15:29:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 2a4b9dfd-7006-4b5b-8c95-7883709e5b2d (cluster eqiad1) [15:29:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 702a55e1-e176-45f1-af81-569013f91be3 (cluster eqiad1) [15:29:19] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a5705913-72f2-4abd-84e6-3e084bfbd98d (cluster eqiad1) [15:29:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 702a55e1-e176-45f1-af81-569013f91be3 (cluster eqiad1) [15:29:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a5cb6818-f3ac-4ba9-afb5-5c657cf65f9a (cluster eqiad1) [15:30:17] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a5cb6818-f3ac-4ba9-afb5-5c657cf65f9a (cluster eqiad1) [15:30:18] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 736c7c6d-319d-43e0-b2b1-efdd84b4736a (cluster eqiad1) [15:30:20] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424183 (10fnegri) The SQL command I pasted above was only as a reference, to show the alert definitions in the DB, not something that any... [15:30:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 736c7c6d-319d-43e0-b2b1-efdd84b4736a (cluster eqiad1) [15:30:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm be716e27-6b34-4cb0-a498-b300937edc4c (cluster eqiad1) [15:30:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 01042fb6-b2e4-4690-88fb-3840c98b01aa (cluster eqiad1) [15:31:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm be716e27-6b34-4cb0-a498-b300937edc4c (cluster eqiad1) [15:31:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 8e6f5b87-57b8-4ba5-b9e6-8feb4e413f3d (cluster eqiad1) [15:31:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 01042fb6-b2e4-4690-88fb-3840c98b01aa (cluster eqiad1) [15:32:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 35433ec4-9fd5-49f8-ac51-c05ecb433a4d (cluster eqiad1) [15:32:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 8e6f5b87-57b8-4ba5-b9e6-8feb4e413f3d (cluster eqiad1) [15:32:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 24e5b10a-80df-4bbc-807c-97d4e935d1f4 (cluster eqiad1) [15:32:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 35433ec4-9fd5-49f8-ac51-c05ecb433a4d (cluster eqiad1) [15:32:48] 10Cloud-VPS, 06tools-infrastructure-team: Improve how virt networks are configured in cloudgw - https://phabricator.wikimedia.org/T411081#11424206 (10taavi) >>! In T411081#11412295, @fgiunchedi wrote: > Something I wanted to add: I'm not very familiar with that part of the puppet codebase though I was wond... [15:32:49] 10Cloud-VPS, 06tools-infrastructure-team: Improve how virt networks are configured in cloudgw - https://phabricator.wikimedia.org/T411081#11424207 (10taavi) 05Open→03Resolved [15:32:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 24e5b10a-80df-4bbc-807c-97d4e935d1f4 (cluster eqiad1) [15:32:55] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 23c93ff7-f301-41e5-9ea5-9d4b2da1bf22 (cluster eqiad1) [15:33:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 23c93ff7-f301-41e5-9ea5-9d4b2da1bf22 (cluster eqiad1) [15:33:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 2074763a-97af-4b3d-a3b5-7d5cf43b9ecd (cluster eqiad1) [15:34:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 2074763a-97af-4b3d-a3b5-7d5cf43b9ecd (cluster eqiad1) [15:34:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 03d3daa4-c46e-4152-a4dd-c02a872f7edd (cluster eqiad1) [15:34:53] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 03d3daa4-c46e-4152-a4dd-c02a872f7edd (cluster eqiad1) [15:34:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm a80c58d9-fcce-4739-9f83-204cff354959 (cluster eqiad1) [15:35:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm a80c58d9-fcce-4739-9f83-204cff354959 (cluster eqiad1) [15:35:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm b875763a-d70f-4cab-92ce-60a523161799 (cluster eqiad1) [15:35:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 43462c80-0923-4494-a5db-a8df39d71cdd (cluster eqiad1) [15:36:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 43462c80-0923-4494-a5db-a8df39d71cdd (cluster eqiad1) [15:36:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm abf4e1e6-1bd6-41f2-ad1c-345e940b0b8b (cluster eqiad1) [15:37:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm abf4e1e6-1bd6-41f2-ad1c-345e940b0b8b (cluster eqiad1) [15:37:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm b875763a-d70f-4cab-92ce-60a523161799 (cluster eqiad1) [15:37:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 65283e58-53e0-4545-b201-dab88a8ae7e5 (cluster eqiad1) [15:37:44] 06cloud-services-team, 10Cloud-VPS: Octavia network public access inconsistency - https://phabricator.wikimedia.org/T411509 (10taavi) 03NEW [15:38:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d761eca2-4a21-4522-8d95-584bf639e6c0 (cluster eqiad1) [15:38:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 65283e58-53e0-4545-b201-dab88a8ae7e5 (cluster eqiad1) [15:38:53] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 6c263c50-71da-40ee-b1e0-00d40ba108e7 (cluster eqiad1) [15:39:21] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 6c263c50-71da-40ee-b1e0-00d40ba108e7 (cluster eqiad1) [15:39:22] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 82b22752-6752-4814-90c5-2aebd3825e95 (cluster eqiad1) [15:39:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d761eca2-4a21-4522-8d95-584bf639e6c0 (cluster eqiad1) [15:39:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 2f7b7cfa-12ed-41d5-977d-1e11e8335cf4 (cluster eqiad1) [15:39:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 82b22752-6752-4814-90c5-2aebd3825e95 (cluster eqiad1) [15:40:00] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm c4c0ffc0-ebd2-4133-9912-585af2725bfd (cluster eqiad1) [15:40:12] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424243 (10Reputation22) >>! In T409668#11424166, @fnegri wrote: > They are not accessible via Horizon but they are accessible from the fo... [15:40:26] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 2f7b7cfa-12ed-41d5-977d-1e11e8335cf4 (cluster eqiad1) [15:40:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm c4c0ffc0-ebd2-4133-9912-585af2725bfd (cluster eqiad1) [15:40:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm bc10309b-5227-4ca2-b74c-440e2fdc116e (cluster eqiad1) [15:41:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm bc10309b-5227-4ca2-b74c-440e2fdc116e (cluster eqiad1) [15:42:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 90b1b0c3-14fb-47f6-9c50-f952f55bcfea (cluster eqiad1) [15:43:19] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 90b1b0c3-14fb-47f6-9c50-f952f55bcfea (cluster eqiad1) [15:43:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 627735e5-57c7-4714-855b-b7311fc527c6 (cluster eqiad1) [15:43:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 627735e5-57c7-4714-855b-b7311fc527c6 (cluster eqiad1) [15:43:50] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 797c51db-bc81-4363-922e-a52c3fc3eeea (cluster eqiad1) [15:44:26] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 797c51db-bc81-4363-922e-a52c3fc3eeea (cluster eqiad1) [15:44:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 36b6590c-eac2-40a0-ac30-7cf79ff12ce3 (cluster eqiad1) [15:44:56] FIRING: [4x] ProbeDown: Service tools-k8s-haproxy-7:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:45:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 36b6590c-eac2-40a0-ac30-7cf79ff12ce3 (cluster eqiad1) [15:45:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 278b5002-e9db-4506-a40f-167b52b9515f (cluster eqiad1) [15:45:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 278b5002-e9db-4506-a40f-167b52b9515f (cluster eqiad1) [15:45:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 4945aa99-aeff-4198-9aaa-7391c9a84c55 (cluster eqiad1) [15:46:02] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 4945aa99-aeff-4198-9aaa-7391c9a84c55 (cluster eqiad1) [15:46:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm d9e4c884-82f1-4c2e-8b35-70bfeb5292cf (cluster eqiad1) [15:46:28] FIRING: TargetDown: Job toolsdb-mariadb is unreachable in project tools instance tools-db-7 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [15:46:40] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm d9e4c884-82f1-4c2e-8b35-70bfeb5292cf (cluster eqiad1) [15:46:41] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 106a0f58-3276-4754-93cd-a7ae20fddc75 (cluster eqiad1) [15:47:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 106a0f58-3276-4754-93cd-a7ae20fddc75 (cluster eqiad1) [15:47:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 02bf16d5-5e10-470b-b05d-341673a284de (cluster eqiad1) [15:47:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-6 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [15:47:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 02bf16d5-5e10-470b-b05d-341673a284de (cluster eqiad1) [15:47:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9ccf684d-c6ea-45ee-83db-ee3af5de3dfe (cluster eqiad1) [15:48:17] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9ccf684d-c6ea-45ee-83db-ee3af5de3dfe (cluster eqiad1) [15:48:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 8030caca-e1e8-4f1d-bce1-04afd22adb3a (cluster eqiad1) [15:48:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 8030caca-e1e8-4f1d-bce1-04afd22adb3a (cluster eqiad1) [15:49:56] RESOLVED: [4x] ProbeDown: Service tools-k8s-haproxy-7:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:58:58] 10Tool-paulina: At an author's list of works, show a link to the SPARQL query at the Wikidata Query Service - https://phabricator.wikimedia.org/T397554#11424365 (10Pepe_piton) 05Open→03Resolved Merged @Dipanshu1223's solution. Thanks everyone for your work! [16:06:28] RESOLVED: TargetDown: Job toolsdb-mariadb is unreachable in project tools instance tools-db-7 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [16:07:31] RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-6 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [16:18:45] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424478 (10fnegri) > no i meant the emails for sending alerts can be taken out via horizon/vps portal .etc? Ah sorry, I completely misund... [16:31:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 55ed5d49-43db-4f62-8c40-5cb0431dfce2 (cluster eqiad1) [16:31:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 1d74fc9a-0ddd-41d6-a0fd-5bba5e455c32 (cluster eqiad1) [16:32:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 55ed5d49-43db-4f62-8c40-5cb0431dfce2 (cluster eqiad1) [16:32:20] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424566 (10Reputation22) >>! In T409668#11424478, @fnegri wrote: >> no i meant the emails for sending alerts can be taken out via horizon/... [16:32:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 1d74fc9a-0ddd-41d6-a0fd-5bba5e455c32 (cluster eqiad1) [16:33:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 5e5c3bad-f1c7-49e5-b846-edaf111af83c (cluster eqiad1) [16:33:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm e6796dbd-2511-4bf6-bdee-4a14a7414d5f (cluster eqiad1) [16:33:44] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424571 (10Reputation22) (ideally all this should be configurable by maintainers/owners, will help in reducing redundant work, what do you... [16:33:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 5e5c3bad-f1c7-49e5-b846-edaf111af83c (cluster eqiad1) [16:34:23] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm e6796dbd-2511-4bf6-bdee-4a14a7414d5f (cluster eqiad1) [16:40:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1040.eqiad.wmnet}' [16:40:48] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424641 (10bd808) >>! In T409668#11424571, @Reputation22 wrote: > (ideally all this should be configurable by maintainers/owners, will hel... [16:44:05] PROBLEM - SSH on cloudvirt1040 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:44:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 028c29da-adcb-4239-bcb4-6e80516e6fbb (cluster eqiad1) [16:45:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 028c29da-adcb-4239-bcb4-6e80516e6fbb (cluster eqiad1) [16:45:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 60d1c96e-0c3c-47e1-86d6-cd30527d5066 (cluster eqiad1) [16:45:53] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1040.eqiad.wmnet}' [16:45:55] RECOVERY - SSH on cloudvirt1040 is OK: SSH OK - OpenSSH_10.0p2 Debian-7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:45:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 60d1c96e-0c3c-47e1-86d6-cd30527d5066 (cluster eqiad1) [16:46:00] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm c7b1311c-ee8b-4118-b907-ad0382644350 (cluster eqiad1) [16:46:40] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424673 (10Reputation22) >>! In T409668#11424641, @bd808 wrote: >>>! In T409668#11424571, @Reputation22 wrote: >> (ideally all this should... [16:46:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm c7b1311c-ee8b-4118-b907-ad0382644350 (cluster eqiad1) [16:46:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 8b546fd2-137d-4b91-86f3-b50fa515c98c (cluster eqiad1) [16:47:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 8b546fd2-137d-4b91-86f3-b50fa515c98c (cluster eqiad1) [16:47:44] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm afb538cb-a128-450b-a02f-4fee25183588 (cluster eqiad1) [16:48:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm afb538cb-a128-450b-a02f-4fee25183588 (cluster eqiad1) [16:48:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 59f99bab-8a86-4701-a142-3a15a1c18d48 (cluster eqiad1) [16:49:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 59f99bab-8a86-4701-a142-3a15a1c18d48 (cluster eqiad1) [16:49:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 9491619f-43b5-4612-b976-00862dcd901d (cluster eqiad1) [16:50:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 9491619f-43b5-4612-b976-00862dcd901d (cluster eqiad1) [16:50:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 848de230-1687-40c5-b954-f8c2a3b7a443 (cluster eqiad1) [16:50:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [16:51:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 848de230-1687-40c5-b954-f8c2a3b7a443 (cluster eqiad1) [16:51:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 3f0dc3e0-f5e8-43a4-86dc-523ad08e90e6 (cluster eqiad1) [16:51:44] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424703 (10fnegri) @Reputation22 yes the current system is very manual and definitely sub-optimal, implementing the tasks mentioned by @bd... [16:52:02] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 3f0dc3e0-f5e8-43a4-86dc-523ad08e90e6 (cluster eqiad1) [16:52:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 1da6f8f7-db35-4f33-92f9-29a6516bf47c (cluster eqiad1) [16:52:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [16:52:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 1da6f8f7-db35-4f33-92f9-29a6516bf47c (cluster eqiad1) [16:56:20] 06cloud-services-team, 10Toolforge (Quota-requests): Elasticsearch credential request for gutensearch - https://phabricator.wikimedia.org/T411445#11424751 (10bd808) The largest single index in the current cluster is 1.8GB (https://bd808-test.toolforge.org/elastic7.php), but it looks like there should be space... [16:57:05] PROBLEM - Host cloudvirt1042 is DOWN: PING CRITICAL - Packet loss = 100% [16:58:02] 06cloud-services-team, 10Toolforge (Quota-requests): Elasticsearch credential request for gutensearch - https://phabricator.wikimedia.org/T411445#11424808 (10taavi) a:03taavi [16:58:15] ACKNOWLEDGEMENT - SSH on cloudvirt1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Andrew Bogott rebooting for 410846 https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:58:16] ACKNOWLEDGEMENT - Host cloudvirt1042 is DOWN: PING CRITICAL - Packet loss = 100% Andrew Bogott rebooting for 410846 [16:58:30] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [16:58:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [16:58:35] RECOVERY - Host cloudvirt1042 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [16:59:38] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424830 (10fnegri) Done: `lang=mysql MariaDB [prometheusconfig]> SELECT * FROM contact_group_members WHERE contact_group_id=7; +----+----... [17:01:07] PROBLEM - Host cloudvirt1042 is DOWN: PING CRITICAL - Packet loss = 100% [17:01:32] 06cloud-services-team, 10Toolforge (Quota-requests): Elasticsearch credential request for gutensearch - https://phabricator.wikimedia.org/T411445#11424851 (10taavi) 05Open→03Resolved Your credentials will be usable in about half an hour when Puppet runs on the full Elastic cluster. [17:02:35] RECOVERY - Host cloudvirt1042 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [17:02:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [17:03:27] 06cloud-services-team, 10Cloud-VPS: wmcs Trixie kernel reboots - https://phabricator.wikimedia.org/T410846#11424872 (10Andrew) 05Open→03Resolved [17:04:14] 06cloud-services-team, 10Horizon, 10Striker, 10wikitech.wikimedia.org: Reimage cloudweb hosts to trixie - https://phabricator.wikimedia.org/T376277#11424879 (10Andrew) 05Open→03Resolved a:03Andrew [17:05:11] 06cloud-services-team, 06Wikimedia Enterprise, 10Wikimedia Enterprise Volunteer Request: Toolforge no longer has IP-based access to Wikimedia Enterprise - https://phabricator.wikimedia.org/T410994#11424888 (10HShaikh) Please check again and let us know if the access is not resotored. We have updated an ip w... [17:05:30] 06cloud-services-team, 10Cloud-VPS: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217#11424892 (10Andrew) a:03Andrew [17:05:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [17:09:47] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424918 (10Reputation22) >>! In T409668#11424830, @fnegri wrote: > Done: > > `lang=mysql > MariaDB [prometheusconfig]> SELECT * FROM cont... [17:12:32] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424932 (10Reputation22) >>! In T409668#11424703, @fnegri wrote: > @Reputation22 yes the current system is very manual and definitely sub-... [17:13:26] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424940 (10Reputation22) >>! In T409668#11424918, @Reputation22 wrote: >>>! In T409668#11424830, @fnegri wrote: >> Done: >> >> `lang=mysq... [17:15:57] 06cloud-services-team, 06Wikimedia Enterprise, 10Wikimedia Enterprise Volunteer Request: Toolforge no longer has IP-based access to Wikimedia Enterprise - https://phabricator.wikimedia.org/T410994#11424976 (10bd808) `lang=shell-session bd808@tools-bastion-14.tools.eqiad1:~$ curl -s https://api.enterprise.wik... [17:16:50] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11424979 (10fnegri) > also on a side note.. can i add a custom message in these alerts? it would be great if i can attach a runbook here on... [17:19:56] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425012 (10Reputation22) >>! In T409668#11424979, @fnegri wrote: >> also on a side note.. can i add a custom message in these alerts? it w... [17:20:35] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425013 (10Reputation22) >>! In T409668#11424979, @fnegri wrote: >> also on a side note.. can i add a custom message in these alerts? it w... [17:28:48] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425061 (10Reputation22) for alert runbook, lets use this link for all alerts https://commons.wikimedia.org/w/index.php?title=Commons%3AV... [17:33:23] 06cloud-services-team, 10Horizon: Prevent creating web proxies on ports with no matching security group rule to permit the traffic - https://phabricator.wikimedia.org/T411531 (10taavi) 03NEW [17:34:19] 06cloud-services-team, 10Horizon: Prevent creating web proxies on ports with no matching security group rule to permit the traffic - https://phabricator.wikimedia.org/T411531#11425099 (10taavi) p:05Triage→03Low [17:39:52] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425122 (10fnegri) 05In progress→03Resolved Added both links, you can see them [here](https://prometheus.wmcloud.org/alerts) and t... [17:40:56] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425138 (10Soda) >>! In T409668#11425061, @Reputation22 wrote: > for alert runbook, lets use this link for all alerts > > https://com... [17:45:57] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425168 (10fnegri) > Make it a page on wikitech; you are generally not supposed to put this on content wikis. You could maybe use htt... [18:24:02] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425405 (10Reputation22) >>! In T409668#11425138, @Soda wrote: >>>! In T409668#11425061, @Reputation22 wrote: >> for alert runbook, le... [18:41:55] 06cloud-services-team, 10Horizon, 07Puppet: Allow providing a commit message for hieradata changes - https://phabricator.wikimedia.org/T250623#11425504 (10taavi) p:05Triage→03Low [18:44:48] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425524 (10Soda) Yepp! [18:46:57] 06cloud-services-team, 10Horizon, 07TestMe, 07Upstream: Horizon - Not possible to remove A record from Record Set - https://phabricator.wikimedia.org/T219079#11425544 (10taavi) [18:48:31] 06cloud-services-team, 10Horizon, 10Openstack-Magnum, 07Upstream: magnum dashboard shows clusters across all projects - https://phabricator.wikimedia.org/T392384#11425553 (10taavi) [18:48:44] 06cloud-services-team, 10Horizon, 10Openstack-Magnum, 07Upstream: magnum dashboard shows clusters across all projects - https://phabricator.wikimedia.org/T392384#11425554 (10taavi) p:05Triage→03Medium [19:13:04] 06cloud-services-team (FY2025/26-Q1-Q2), 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11425642 (10Reputation22) >>! In T409668#11425405, @Reputation22 wrote: >>>! In T409668#11425138, @Soda wrote: >>>>! In T409668#1142... [19:26:03] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545 (10Andrew) 03NEW [19:31:56] FIRING: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:52:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [19:55:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [19:56:44] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [19:57:26] RESOLVED: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:58:54] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [20:01:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [20:03:09] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [20:06:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [20:07:23] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1046.eqiad.wmnet' [20:09:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1050.eqiad.wmnet' [20:10:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1050.eqiad.wmnet' [20:13:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [20:16:18] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1054.eqiad.wmnet' [20:17:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [20:18:38] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1054.eqiad.wmnet' [20:23:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm c6645048-8447-4553-bf13-8122f959e4a8 (cluster eqiad1) [20:23:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm c6645048-8447-4553-bf13-8122f959e4a8 (cluster eqiad1) [20:25:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [20:26:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1054.eqiad.wmnet' [20:29:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1061.eqiad.wmnet' [20:32:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1061.eqiad.wmnet' [20:44:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1061.eqiad.wmnet' [20:47:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1061.eqiad.wmnet' [20:50:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 592754fe-dd64-463f-ab33-d51a4108cec0 (cluster eqiad1) [20:51:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 592754fe-dd64-463f-ab33-d51a4108cec0 (cluster eqiad1) [20:51:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 1797e3a3-f04a-4cb0-9102-79f1d0079d57 (cluster eqiad1) [20:51:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1061.eqiad.wmnet' [20:52:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 1797e3a3-f04a-4cb0-9102-79f1d0079d57 (cluster eqiad1) [20:53:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1061.eqiad.wmnet' [21:22:45] (03open) 10bd808: Upgrade to PHP 8.4 [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/6 [21:29:35] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11426107 (10komla) The opt-out flag is for the sendBulkEmail php script. Bryan's script above only extracts the list of users into a file that is passed to the php script. [21:50:44] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:50:44] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:50:53] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:50:54] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:51:08] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:51:09] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:54:01] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:54:01] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:56:33] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:56:34] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:56:37] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:56:38] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:56:48] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:56:49] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [21:58:32] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [21:58:33] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [22:00:13] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [22:00:14] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [22:07:50] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [22:07:51] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [22:10:49] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [22:10:50] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [22:15:06] (03merge) 10bd808: Upgrade to PHP 8.4 [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/6 [22:19:02] (03open) 10bd808: help: Update for PHP 8.4 version bump [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/7 [22:28:17] FIRING: JobUnavailable: Reduced availability for job rabbitmq in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:51:22] FIRING: HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:53:50] FIRING: [47x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:54:22] FIRING: [2x] HAProxyServiceUnavailable: HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [22:56:06] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11426333 (10bd808) The `labswiki` database that used to be on m5-master is now in the `s6` cluster. Developer accounts are completely disconnected from Wikitech following the SUL migration starting last October,... [23:03:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [23:03:51] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for all services [23:04:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [23:04:43] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for all services [23:05:53] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [23:07:52] FIRING: [7x] HAProxyBackendUnavailable: HAProxy service glance-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [23:08:30] (03merge) 10bd808: help: Update for PHP 8.4 version bump [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/7 [23:12:52] RESOLVED: [7x] HAProxyBackendUnavailable: HAProxy service glance-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [23:14:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:19:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:21:20] RESOLVED: [48x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:21:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [23:22:52] RESOLVED: [2x] HAProxyServiceUnavailable: HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [23:33:47] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable