[00:16:28] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:21:28] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:55:56] FIRING: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:59:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:10:56] RESOLVED: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:33:54] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/47 [06:33:57] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/54 [06:34:08] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/6 [06:34:23] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/76 [06:34:26] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/54 [06:34:31] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/5 [06:34:43] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/76 [06:34:49] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/6 [06:34:57] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/77 [06:35:02] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/5 [06:35:03] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/99 [06:35:05] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/77 [06:35:08] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/48 [06:35:10] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/99 [06:35:11] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/48 [06:35:26] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/100 [06:35:26] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/48 [06:35:30] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/25 [06:35:30] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/48 [06:35:31] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/49 [06:35:39] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/100 [06:35:41] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/5 [06:35:42] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/25 [06:35:46] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/21 [06:35:50] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/11 [06:35:54] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/49 [06:36:05] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/5 [06:36:13] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/99 [06:36:21] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/26 [06:36:25] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/11 [06:36:29] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: poetry: Autoupdate [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/21 [06:36:33] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/99 [06:36:37] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/26 [06:41:39] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispeech: Cloud VPS "wikispeech" project Buster deprecation - https://phabricator.wikimedia.org/T367565#9938238 (10Sebastian_Berlin-WMSE) See {T360787}. [09:12:08] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#9938671 (10Tgr) This will be harder than I thought as there is no Vagrant base box for Bullseye + amd64 + LXC. We'll either have to build our own per https:/... [09:13:11] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikispore" project Buster deprecation - https://phabricator.wikimedia.org/T367566#9938674 (10Tgr) >>! In T367566#9893303, @Tgr wrote: > We expect this to be done on time. Or maybe not: T365934#9938671 [09:17:44] 10Data-Services: [wikireplicas] Automated tests for views - https://phabricator.wikimedia.org/T368050#9938705 (10fnegri) p:05Triage→03Medium [09:18:51] 10Data-Services: maintain-replica-indexes --help fails - https://phabricator.wikimedia.org/T361948#9938717 (10fnegri) p:05Triage→03High [09:20:31] 10Data-Services: add proper dry-run/diff mode to maintain-views - https://phabricator.wikimedia.org/T351637#9938724 (10fnegri) p:05Triage→03Low [09:27:14] 10Data-Services: expose entityschema_id_counter table to cloud replica - https://phabricator.wikimedia.org/T345089#9938773 (10fnegri) p:05Triage→03Medium [09:44:15] 10Data-Services, 06Data-Engineering-Icebox: Discuss labsdb visibility of rev_text_id and ar_comment - https://phabricator.wikimedia.org/T158166#9938830 (10fnegri) p:05Triage→03Low [09:54:58] 06cloud-services-team, 10Data-Services, 06Stewards-and-global-tools: Add some columns of `renameuser_queue` to the replica - https://phabricator.wikimedia.org/T310341#9938854 (10fnegri) @TheresNoTime I'm going through the wikireplicas backlog and found this task. It looks like there are no objections from a... [09:57:29] 10Data-Services: Denormalize user_groups to contain actor information - https://phabricator.wikimedia.org/T238497#9938886 (10fnegri) p:05Triage→03Low [09:59:06] 10Data-Services: [wikireplicas] Automated tests for views - https://phabricator.wikimedia.org/T368050#9938892 (10Marostegui) That could work for sections without many objects, but s3 has 80k views, so probably you want to use `information_schema` and `show warnings` You don't necessarily need to query all the... [10:01:05] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 06Data-Platform-SRE: Automate maintain-views replica depooling - https://phabricator.wikimedia.org/T300427#9938896 (10fnegri) p:05Triage→03Medium a:03fnegri [10:05:17] 06cloud-services-team, 10Data-Services, 06Stewards-and-global-tools: Add some columns of `renameuser_queue` to the replica - https://phabricator.wikimedia.org/T310341#9938923 (10Marostegui) For what is worth, this table is present on the hosts (empty), so there would not be any importing needed or anything.... [11:28:14] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 06DBA: Prepare and check storage layer for btmwiki - https://phabricator.wikimedia.org/T368066#9939321 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views run by btullis: Started updating wiki replica views [11:29:24] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 06DBA: Prepare and check storage layer for btmwiki - https://phabricator.wikimedia.org/T368066#9939322 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views started by btullis executed with errors: - an-redacteddb1001.eqiad.wmn... [11:32:41] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 06DBA: Prepare and check storage layer for btmwiki - https://phabricator.wikimedia.org/T368066#9939333 (10BTullis) We experienced a failure relating to the sqooping of this new wiki into HDFS, so I'm just investigating why this might be. Apologi... [11:45:13] 06cloud-services-team, 10Data-Services, 10Infrastructure Security: wikireplicas root access - https://phabricator.wikimedia.org/T344599#9939381 (10Marostegui) >>! In T344599#9935341, @fnegri wrote: >> cloud-services-team define precisely what permissions are required but missing from the wikireplica hosts >... [11:56:57] 06cloud-services-team, 10Data-Services, 10Infrastructure Security: wikireplicas root access - https://phabricator.wikimedia.org/T344599#9939399 (10jcrespo) If I may @fnegri, the issue is that those hosts are in a way special, because they are pieces (data) of production (meaning here mediawiki) on cloud real... [12:03:34] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#9939418 (10dcaro) [12:03:47] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy (T309789) [12:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:03:53] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [12:13:55] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9939480 (10dcaro) With the current data, you can start observing that `cloudcephosd1034-sdh` (the new drive that has... [12:14:13] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/48 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:16] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/48 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:21] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/48 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:32] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/77 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:33] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/77 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:36] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/77 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:14:55] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/5 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:15:00] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/5 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:15:04] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/5 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:15:49] (03update) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/76 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:17:23] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) (T309789) [12:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:17:28] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [12:18:20] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: ingress-admission: bump to 0.0.44-20240701121521-ce317cdc [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/367 [12:20:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:24:02] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/13 [12:36:48] 10Toolforge (Toolforge iteration 11): [toolforge,replica_cnf] Use tool-prefixed urls for envvars - https://phabricator.wikimedia.org/T368909 (10dcaro) 03NEW [12:37:05] (03update) 10dcaro: api: remove unprefixed endpoints [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/33 (owner: 10sstefanova) [12:37:31] (03update) 10dcaro: api: remove unprefixed endpoints [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/33 (https://phabricator.wikimedia.org/T363808) (owner: 10sstefanova) [12:38:45] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [12:38:56] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/47 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:38:59] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/47 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:39:07] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:39:13] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:39:31] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/6 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:39:34] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/6 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:41:51] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 [12:42:14] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: registry-admission: bump to 0.0.43-20240701123945-624fca18 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/369 [12:56:13] 10Toolforge (Toolforge iteration 11): [toolforge,replica_cnf] Use tool-prefixed urls for envvars - https://phabricator.wikimedia.org/T368909#9939577 (10dcaro) [12:56:15] 10Toolforge (Toolforge iteration 11), 13Patch-For-Review: [envvars-api] version 0.0.50 introduces breaking changes that need adapting for replica_cnf service - https://phabricator.wikimedia.org/T368516#9939578 (10dcaro) [12:57:01] 10Toolforge (Toolforge iteration 11), 13Patch-For-Review: [envvars-api] version 0.0.50 introduces breaking changes that need adapting for replica_cnf service - https://phabricator.wikimedia.org/T368516#9939579 (10dcaro) [13:02:05] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission [13:02:16] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission [13:06:48] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission [13:06:59] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission [13:10:29] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#9939610 (10fnegri) During the weekend, the replication lag in clouddb1019 went down to zero, but then it started increasing again: {F56123032} I'm sti... [13:14:23] (03update) 10dcaro: registry-admission: bump to 0.0.43-20240701123945-624fca18 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/369 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:14:24] (03approved) 10dcaro: registry-admission: bump to 0.0.43-20240701123945-624fca18 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/369 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:14:27] (03merge) 10dcaro: registry-admission: bump to 0.0.43-20240701123945-624fca18 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/369 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:14:47] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission [13:14:57] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission [13:21:19] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission [13:21:30] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission [13:24:33] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947#9939666 (10aborrero) [13:26:09] 06cloud-services-team, 10Release-Engineering-Team (Priority Backlog 📥): Experiment with WMCS as a k8s provider for gitlab-cloud-runner cluster - https://phabricator.wikimedia.org/T353356#9939673 (10fnegri) [13:35:42] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Goal, 13Patch-For-Review, 10Puppet (Puppet 7.0): Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9939698 (10fnegri) 05Resolved→03In progress [13:36:59] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Goal, 13Patch-For-Review, 10Puppet (Puppet 7.0): Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9939702 (10fnegri) Ceph hosts are still missing, blocked by {T309789} [13:40:04] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Goal, 13Patch-For-Review, 10Puppet (Puppet 7.0): Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9939714 (10taavi) 05In progress→03Resolved which is tracked in {T349619} and not here. [13:42:57] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9939721 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1040.eqiad.wmnet with OS bullseye [13:43:36] (03update) 10dcaro: ingress-admission: bump to 0.0.44-20240701121521-ce317cdc [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/367 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:43:51] (03update) 10dcaro: ingress-admission: bump to 0.0.44-20240701121521-ce317cdc [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/367 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:43:52] (03approved) 10dcaro: ingress-admission: bump to 0.0.44-20240701121521-ce317cdc [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/367 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:44:13] (03merge) 10dcaro: ingress-admission: bump to 0.0.44-20240701121521-ce317cdc [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/367 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:44:43] (03update) 10dcaro: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:47:50] 10Toolforge: [toolforge,infra] Fix deprecated Kubelet flags - https://phabricator.wikimedia.org/T355881#9939733 (10aborrero) [13:47:54] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.25 - https://phabricator.wikimedia.org/T316107#9939734 (10aborrero) [13:51:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 11), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#9939761 (10dcaro) [13:51:14] 10Toolforge (Toolforge iteration 11): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9939762 (10dcaro) [13:53:46] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.25 - https://phabricator.wikimedia.org/T316107#9939777 (10aborrero) [13:58:09] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations: Remove wmcs-admin access from production cumin hosts - https://phabricator.wikimedia.org/T347979#9939806 (10Raymond_Ndibe) 05Stalled→03Open [13:59:38] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 07Epic: [Hypothesis] WE6.3.2 Create "standard" tool to measure the number of steps for a deployment - https://phabricator.wikimedia.org/T368602#9939810 (10dcaro) [13:59:47] 10Cloud Services Proposals, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, and 3 others: [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build ... - https://phabricator.wikimedia.org/T194332#9939811 [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:03:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9939826 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1040.eqiad.wmnet with OS bullseye exec... [14:10:51] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): Incident: 2024-06-12 toolforge k8s control plane - https://phabricator.wikimedia.org/T367348#9939858 (10aborrero) see https://wikitech.wikimedia.org/wiki/Incidents/2024-06-12_WMCS_toolforge_k8s_control_plane [14:10:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9939859 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1040.eqiad.wmnet with OS bullseye [14:24:33] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy (T309789) [14:24:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:24:39] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [14:28:02] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) (T309789) [14:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:39:47] (03update) 10aborrero: jobs-api: bump to 0.0.311-20240628093550-c6df8783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/362 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:40:20] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:40:31] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:41:50] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:42:01] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:42:17] (03merge) 10aborrero: jobs-api: bump to 0.0.311-20240628093550-c6df8783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/362 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:42:42] (03update) 10aborrero: tests/fixtures: drop PSP reference [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/47 (https://phabricator.wikimedia.org/T368142) [14:43:57] (03merge) 10aborrero: deployment: drop PSP [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/10 (https://phabricator.wikimedia.org/T368142) [14:44:20] (03merge) 10aborrero: tests/fixtures: drop PSP reference [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/47 (https://phabricator.wikimedia.org/T368142) [14:46:18] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: volume-admission: bump to 0.0.48-20240701144407-0003a769 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/370 (https://phabricator.wikimedia.org/T368142) [14:47:39] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:52:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:54:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:56:53] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission [14:57:04] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission [14:59:18] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission [14:59:19] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Modify db-mysql to connect to an-redacteddb1001 from cumin hosts - https://phabricator.wikimedia.org/T368354#9940104 (10Marostegui) As I told @ABran-WMF a quick way to check if this was fixed is: ` root... [14:59:29] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission [14:59:34] (03merge) 10aborrero: volume-admission: bump to 0.0.48-20240701144407-0003a769 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/370 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:04:29] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations: Remove wmcs-admin access from production cumin hosts - https://phabricator.wikimedia.org/T347979#9940122 (10fnegri) This is currently blocked by {T347977} Until that task is done, members of the wmcs-admin group cannot use cloudcumin hos... [15:14:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:15:30] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9940169 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1040.eqiad.wmnet with OS bullseye comp... [15:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:27:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:28:00] (03update) 10dcaro: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:31:29] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [15:31:40] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [15:36:17] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [15:36:28] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [15:37:45] (03update) 10dcaro: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:37:46] (03approved) 10dcaro: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:37:50] (03merge) 10dcaro: maintain-kubeusers: bump to 0.0.161-20240701123925-3f71c85d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/368 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:39:25] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [15:39:35] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [15:56:42] 06cloud-services-team, 10Data-Services, 10Infrastructure Security: wikireplicas root access - https://phabricator.wikimedia.org/T344599#9940511 (10fnegri) > If I may @fnegri, the issue is that those hosts are in a way special @jcrespo you absolutely may :) I'm starting to think that clouddb* hosts will rema... [15:57:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:14:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:17:08] 10Data-Services: [wikireplicas] Views flaggedpage_pending and flaggedtemplates are broken - https://phabricator.wikimedia.org/T368939 (10fnegri) 03NEW [16:17:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9940600 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1041.eqiad.wmnet with OS bullseye [16:18:26] 10Data-Services: [wikireplicas] Automated tests for views - https://phabricator.wikimedia.org/T368050#9940603 (10fnegri) @Marostegui thanks that's very useful! I can try implementing this as a Python script called by a systemd timer on each clouddb host. Then we can publish the results of the script to prometheu... [16:19:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:20:39] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:20:43] 10Data-Services: [wikireplicas] Views flaggedpage_pending and flaggedtemplates are broken - https://phabricator.wikimedia.org/T368939#9940653 (10fnegri) p:05Triage→03Medium [16:22:37] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9940658 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1039.eqiad.wmnet with OS bullseye [16:24:29] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-redis-3 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:25:39] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:25:58] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9940669 (10fnegri) p:05Medium→03High @RoySmith escalating to high, I'll spend some more time trying to understand what is causing... [16:29:29] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-redis-3 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:29:42] 10Data-Services: Denormalize user_groups to contain actor information - https://phabricator.wikimedia.org/T238497#9940713 (10fnegri) I've set this to "Low" priority as there was no activity on this task since 2019. Happy to increase the priority if more users would find it useful. [16:45:21] 06cloud-services-team, 10Data-Services, 06Stewards-and-global-tools: Add some columns of `renameuser_queue` to the replica - https://phabricator.wikimedia.org/T310341#9940801 (10fnegri) p:05Triage→03Low [16:49:57] 06cloud-services-team, 10Technical-blog-posts: Tech blog post: "Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to kyverno" - https://phabricator.wikimedia.org/T368948 (10Andrew) 03NEW [16:53:20] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#9940889 (10Nemo_bis) @Hydriz Can I upgrade the VMs to Debian 11 one of these weekends? The only reason not to that I can think of is some scripts may require Python2, but [that's stil... [16:53:54] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#9940903 (10Nemo_bis) a:03Nemo_bis [16:57:30] 10Data-Services: CONVERT_TZ fails because named time zones have not been loaded - https://phabricator.wikimedia.org/T323183#9940908 (10fnegri) p:05Triage→03Low To fix this, we would need to run [mysql_tzinfo_to_sql](https://mariadb.com/kb/en/mysql_tzinfo_to_sql/) on all wikireplica hosts, and make sure we ru... [17:00:15] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9940924 (10RoySmith) Thanks, In theory, I think BD's solution would work. I am only using redis as a cache and data loss, while anno... [17:07:10] 10Data-Services, 06Data-Engineering-Icebox: Discuss labsdb visibility of rev_text_id and ar_comment - https://phabricator.wikimedia.org/T158166#9940977 (10fnegri) I've marked this as "Low" priority as there was no activity on the task since 2019. If someone is still interested in having those fields in replica... [17:08:58] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/13 (owner: 10l10n-bot) [17:09:05] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/13 (owner: 10l10n-bot) [17:27:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9941106 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1039.eqiad.wmnet with OS bullseye comp... [17:38:34] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9941207 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1041.eqiad.wmnet with OS bullseye exec... [17:56:16] 10Data-Services: SQL function to recover the normal hostname, to install on Wiki Replica instances - https://phabricator.wikimedia.org/T344877#9941335 (10fnegri) p:05Triage→03Medium > `SELECT CONCAT(domain_index_to_domain(el_to_domain_index), el_to_path) from externallinks` is the sort of thing that folks we... [18:01:35] 06cloud-services-team, 10Data-Services, 06Data-Platform, 13Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108#9941365 (10fnegri) @joanna_borun could you help with defining the review process for this change, and similar ones in the future? [19:08:34] FIRING: DiskSpace: Disk space cloudbackup1002-dev:9100:/srv/cinder-backups 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:09:56] FIRING: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:14:56] FIRING: [2x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:16:43] PROBLEM - Disk space on cloudbackup1002-dev is CRITICAL: DISK CRITICAL - free space: /srv/cinder-backups 0MiB (0% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1002-dev&var-datasource=eqiad+prometheus/ops [19:19:08] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace all codfw1dev Buster VMs - https://phabricator.wikimedia.org/T368341#9941803 (10Andrew) 05Open→03Resolved [19:33:34] RESOLVED: DiskSpace: Disk space cloudbackup1002-dev:9100:/srv/cinder-backups 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:34:56] FIRING: [2x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:36:43] RECOVERY - Disk space on cloudbackup1002-dev is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1002-dev&var-datasource=eqiad+prometheus/ops [20:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:27:29] FIRING: InstanceDown: Project tools instance tools-elastic-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:32:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-elastic-4 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:32:29] RESOLVED: InstanceDown: Project tools instance tools-elastic-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:42:30] 06cloud-services-team, 10Technical-blog-posts: Tech blog post: "Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to kyverno" - https://phabricator.wikimedia.org/T368948#9942220 (10debt) I've reviewed the document and awaiting a few responses :) [20:42:50] 06cloud-services-team, 10Technical-blog-posts: Tech blog post: "Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to kyverno" - https://phabricator.wikimedia.org/T368948#9942225 (10debt) [20:57:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-elastic-4 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:04:56] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:05:08] 06cloud-services-team: SystemdUnitDown Unit backup_cinder_volumes.service on node cloudbackup1002-dev has been down for long. - https://phabricator.wikimedia.org/T368986 (10phaultfinder) 03NEW [21:19:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:56:08] (03open) 10cjming: Add test stream configs for dogfooding [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/1 (https://phabricator.wikimedia.org/T366949) [21:59:58] (03update) 10cjming: Add test stream configs for dogfooding [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/1 [22:07:01] (03update) 10cjming: Add test stream configs for dogfooding [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/1 [22:26:29] FIRING: [2x] PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-elastic-5 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:41:29] FIRING: [2x] PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-elastic-5 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:46:29] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-elastic-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:48:08] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9942634 (10Jclark-ctr) [22:49:43] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9942635 (10Jclark-ctr) 05Open→03Resolved [23:02:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942655 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1035.eqiad.wmnet with OS bullseye [23:05:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942656 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1036.eqiad.wmnet with OS bullseye [23:11:43] 06cloud-services-team, 10Toolforge: Upgrade Toolforge (Elastic|Open)Search cluster to Debian Bullseye - https://phabricator.wikimedia.org/T311905#9942665 (10Andrew) [23:11:47] 06cloud-services-team, 10Toolforge: Upgrade Toolforge (Elastic|Open)Search cluster to Debian Bullseye - https://phabricator.wikimedia.org/T311905#9942666 (10Andrew) My plan to jump to bookworm was premature; sticking with elasticsearch and moving to Bullseye is the easier choice for now. [23:25:53] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942690 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1038.eqiad.wmnet with OS bullseye [23:25:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942691 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1037.eqiad.wmnet with OS bullseye [23:38:23] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Upgrade Toolforge (Elastic|Open)Search cluster to Debian Bullseye - https://phabricator.wikimedia.org/T311905#9942742 (10Andrew) There are now three Bullseye nodes in the cluster, with working ES but non-working haproxy. This seems to be a reasonable g... [23:41:08] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942744 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1035.eqiad.wmnet with OS bullseye comp... [23:43:46] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942746 (10Jclark-ctr) [23:55:28] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9942761 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1036.eqiad.wmnet with OS bullseye comp...