[00:04:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:19:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:34:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:49:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:04:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:20:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:35:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:49:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:04:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:09:24] 10Tool-bullseye: Update maxmind GeoLite2 mmdb files - https://phabricator.wikimedia.org/T425049#12032213 (10AntiCompositeNumber) 05Open→03Resolved a:03AntiCompositeNumber Before: ` tools.bullseye@tools-bastion-15:~/geolite$ ls -l total 78512 -rw-r--r-- 1 tools.bullseye tools.bullseye 73069110 Aug 31 2... [02:13:30] 10Tool-bullseye: Bullseye geolocation is missing OSM, Wikimedia Maps attribution - https://phabricator.wikimedia.org/T429560 (10AntiCompositeNumber) 03NEW [02:18:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:33:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:50:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:05:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:20:56] FIRING: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:35:56] RESOLVED: SystemdUnitDown: The service unit security_group_ssh-from-restricted-bastion_to_project_zuul.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:39:04] 06cloud-services-team, 10Data-Services, 10VPS-Projects: Requesting access to NFS mount /public/dumps for language Cloud VPS project - https://phabricator.wikimedia.org/T429433#12032255 (10santhosh) Thanks. This is resolved [04:49:55] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: move math-nfs-1 to new instance - https://phabricator.wikimedia.org/T429544#12032257 (10Physikerwelt) The documentation at https://wikitech.wikimedia.org/wiki/Help:Shared_storage#/data/project does not mention that an NFS server needs to be conf... [05:55:37] 10Tool-wikimedia-attribution, 10MediaWiki-REST-API, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Check Editor Counts - https://phabricator.wikimedia.org/T427548#12032328 (10KineticPelagic) [06:15:47] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [06:20:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [06:45:29] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 06tools-infrastructure-team: Put cloudvirt10[77-80] in service - https://phabricator.wikimedia.org/T429563 (10fgiunchedi) 03NEW [07:05:56] (03update) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [07:06:17] (03update) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [07:06:32] (03update) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [07:06:32] (03update) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [07:17:54] (03CR) 10David Caro: [C:03+1] "LGTM, I think though that it might be good to understand where does that undersized pg come from, as it might be pointing to a bigger issu" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1293792 (https://phabricator.wikimedia.org/T427295) (owner: 10Andrew Bogott) [07:20:36] (03CR) 10David Caro: [C:03+1] roll_reboot_osds: set maintenance per-osd rather than for the whole run (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1293792 (https://phabricator.wikimedia.org/T427295) (owner: 10Andrew Bogott) [07:37:57] (03PS1) 10Giuseppe Lavagetto: Add session secret key [labs/private] - 10https://gerrit.wikimedia.org/r/1303915 [07:38:30] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Add session secret key [labs/private] - 10https://gerrit.wikimedia.org/r/1303915 (owner: 10Giuseppe Lavagetto) [08:12:26] 10Cloud-VPS (Debian Bullseye Deprecation): move math-nfs-1 to new instance - https://phabricator.wikimedia.org/T429544#12032540 (10aputhin) [08:35:10] (03update) 10fnegri: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 (owner: 10dcaro) [08:35:15] (03approved) 10fnegri: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 (owner: 10dcaro) [08:40:43] (03merge) 10dcaro: core.images._get_harbor_images: fix wrong stale cache check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/310 [08:40:46] (03update) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311 [08:44:26] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.509-20260618084056-aca01951 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [08:44:28] (03open) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.509-20260618084056-aca01951 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [09:25:58] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577 (10dcaro) 03NEW [09:32:01] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032849 (10dcaro) Fonud a node with issues (I think), looking: ` tools.wm-lol@tools-bastion-15:~$ kubectl get pods -l 'app.kubernetes.io/name=testcurl' -o json | grep worker... [09:34:02] 06cloud-services-team, 10PAWS: PAWS disk space is running out - https://phabricator.wikimedia.org/T429578 (10fnegri) 03NEW [09:34:46] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032866 (10dcaro) I can curl directly from the worker: ` root@tools-k8s-worker-nfs-57:~# curl https://web.archive.org ` [09:36:39] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032870 (10dcaro) [09:41:52] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032887 (10dcaro) [09:44:08] 06cloud-services-team, 10Toolforge: Sporadic "exim paniclog" on tools-bastion host - https://phabricator.wikimedia.org/T429579 (10fnegri) 03NEW [09:49:17] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032918 (10dcaro) Other external websites seem to work, maybe we got blocked? Oh, now it also works for web archive it seems [09:53:25] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12032928 (10dcaro) Worker 80 was failing to curl too, then right after curling google.com, it started working again: ` dcaro@tools-bastion-15:~$ kubectl-sudo run test-pod --image=docker... [10:10:14] (03merge) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311 [10:10:18] (03update) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [10:13:30] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.510-20260618101026-852e5c92 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [10:13:39] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.510-20260618101026-852e5c92 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [10:15:40] (03merge) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [10:18:47] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.511-20260618101556-dbbba8af [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [10:18:51] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.511-20260618101556-dbbba8af [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) [10:33:32] 06cloud-services-team: cloudceph HEALTH_WARN, multiple OSD(s) experiencing slow operations in BlueStore - https://phabricator.wikimedia.org/T429387#12033071 (10Volans) p:05Triage→03High [11:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:48:41] 10Tool-wmf-openapi-linter, 06Test Platform, 03[MWI] FY2025-26 Q4, 10Continuous-Integration-Config, and 2 others: [5.2.5b Epic] Prepare the linter for use in CI - https://phabricator.wikimedia.org/T422918#12033296 (10hashar) [11:54:59] 10Toolforge (Push-to-Deploy), 06tools-platform-team, 07Epic, 13Patch-For-Review: allow exposing continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#12033307 (10aputhin) 05Duplicate→03Resolved [11:55:54] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12033314 (10dcaro) [12:22:40] 10Cloud-VPS (Debian Bullseye Deprecation): move math-nfs-1 to new instance - https://phabricator.wikimedia.org/T429544#12033376 (10fgiunchedi) You need to update the dns record for math-nfs.svc.math.eqiad1.wikimedia.cloud to the new instance, via https://horizon.wikimedia.org/ngdetails/OS::Designate::Zone/8fb8cf... [12:23:18] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12033379 (10dcaro) @fgiunchedi pointed to check the ips that we are using to get out: * bastions use ip6 by default, with the ip4 address being 185.15.56.99 ` dcaro@tools-bastion-15:~$ cur... [12:23:31] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/44 [12:23:31] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/40 [12:24:06] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12033380 (10dcaro) The workers themselves use ip4 and that same ip too, let me try to confirm if they get any errors: ` root@tools-k8s-control-9:~# curl ifconfig.me 185.15.56.1 ` [12:24:53] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12033381 (10dcaro) Yep: ` root@tools-k8s-control-9:~# curl -v https://web.archive.org/ -o /dev/null % Total % Received % Xferd Average Speed Time Time Time Current... [12:24:54] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1304032 (owner: 10L10n-bot) [12:30:26] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-ap [12:30:26] !log dcaro@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-ap [12:30:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:42:14] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [12:49:55] 10Cloud-VPS (Debian Bullseye Deprecation): move math-nfs-1 to new instance - https://phabricator.wikimedia.org/T429544#12033488 (10fgiunchedi) I'm not sure you'd be able to change the zone yourself, if that's not the case then please let us know (cfr https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin... [12:50:30] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:56:43] 06cloud-services-team: cloudceph HEALTH_WARN, multiple OSD(s) experiencing slow operations in BlueStore - https://phabricator.wikimedia.org/T429387#12033506 (10Volans) The current status is: ` health: HEALTH_WARN 80 OSD(s) experiencing slow operations in BlueStore ` BUT, we have to keep in mind that from... [12:58:21] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [13:03:17] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 10Toolforge, 10Observability-Alerting, and 3 others: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502#12033537 (10fgiunchedi) [13:03:19] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:16:01] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops: Establish a blackbox network probe vantage point into cloud realm - https://phabricator.wikimedia.org/T429451#12033583 (10fgiunchedi) >>! In T429451#12030754, @cmooney wrote: > @fg... [13:16:16] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:17:36] (03approved) 10dcaro: jobs-api: bump to 0.0.511-20260618101556-dbbba8af [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [13:17:43] (03merge) 10dcaro: jobs-api: bump to 0.0.511-20260618101556-dbbba8af [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1302 (https://phabricator.wikimedia.org/T429231) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [13:18:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:23:36] 10Cloud-VPS (Debian Bullseye Deprecation): move math-nfs-1 to new instance - https://phabricator.wikimedia.org/T429544#12033610 (10Physikerwelt) 05Open→03Resolved a:03Physikerwelt @fgiunchedi thank you so much I made it. It worked almost without problems. Only I had to manually edit /etc/fstab to moun... [13:29:37] 06cloud-services-team, 10Cloud-VPS: Monitoring/metrics for trove instances - https://phabricator.wikimedia.org/T402738#12033647 (10fgiunchedi) wrt metrics, I'm wondering how easy it would be to bundle node-exporter in the trove images. Meaning we would get vm observability (most importantly disk space) like st... [13:29:43] 06cloud-services-team: cloudceph HEALTH_WARN, multiple OSD(s) experiencing slow operations in BlueStore - https://phabricator.wikimedia.org/T429387#12033659 (10Andrew) From https://www.mail-archive.com/ceph-users@ceph.io/msg29873.html > This is not an issue. It's a new warning that can be adjusted or muted. Ad... [13:32:13] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops: Establish a blackbox network probe vantage point into cloud realm - https://phabricator.wikimedia.org/T429451#12033677 (10cmooney) >>! In T429451#12033583, @fgiunchedi wrote: > I'm... [13:36:30] 06cloud-services-team, 10Cloud-VPS: Monitoring/metrics for trove instances - https://phabricator.wikimedia.org/T402738#12033703 (10Andrew) >>! In T402738#12033646, @fgiunchedi wrote: > wrt metrics, I'm wondering how easy it would be to bundle node-exporter in the trove images. Meaning we would get vm observabi... [13:40:43] 10Tool-paulina: Migrate SPARQL queries to other data access methods - https://phabricator.wikimedia.org/T426391#12033731 (10Pepe_piton) [13:41:06] 10Toolforge, 06tools-platform-team: Specifying --filelog-stdout or --filelog-stderr requires --filelog - https://phabricator.wikimedia.org/T428354#12033746 (10Wbm1058) I got the `ERROR: Specifying --filelog-stdout or --filelog-stderr requires --filelog` yesterday and did not at first understand what that was t... [13:46:50] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-codfw: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608 (10Andrew) 03NEW [13:46:58] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-codfw: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12033802 (10Andrew) p:05Triage→03High [13:48:58] 10Toolforge, 06tools-platform-team: Specifying --filelog-stdout or --filelog-stderr requires --filelog - https://phabricator.wikimedia.org/T428354#12033805 (10aputhin) >>! In T428354#12033746, @Wbm1058 wrote: > >>>! In T428354#12005573, @aputhin wrote: >> We'll update the documentation to reflect the fact that... [13:51:54] 06cloud-services-team, 06tools-infrastructure-team: PowerSupplyFailure for cloudbackup2003 did not open a task even though it had severity: task - https://phabricator.wikimedia.org/T429609 (10fgiunchedi) 03NEW [13:52:23] 06cloud-services-team, 06tools-infrastructure-team: PowerSupplyFailure for cloudbackup2003 did not open a task even though it had severity: task - https://phabricator.wikimedia.org/T429609#12033833 (10fgiunchedi) [14:04:32] 06cloud-services-team, 06tools-infrastructure-team: PowerSupplyFailure for cloudbackup2003 did not open a task even though it had severity: task - https://phabricator.wikimedia.org/T429609#12033933 (10fnegri) The config that used to open a task from WMCS alerts was removed in https://gerrit.wikimedia.org/r/c/o... [14:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:41:30] 06cloud-services-team, 10Data-Services: [wikireplicas] Create views for new wiki magwiki - https://phabricator.wikimedia.org/T428282#12034234 (10fnegri) [14:42:33] 06cloud-services-team, 10Data-Services, 06tools-platform-team: [wikireplicas] Create views for new wiki magwiki - https://phabricator.wikimedia.org/T428282#12034243 (10fnegri) a:03fnegri [14:45:23] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [toolsdb] Transaction History Length growing too much - https://phabricator.wikimedia.org/T428139#12034266 (10fnegri) Still growing: {F89318029} Current long transactions: `lang=mysql MariaDB [(none)]> SELECT trx_id, trx_started, TIMESTAMPDIFF(SECON... [14:48:04] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-codfw, 06SRE: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12034300 (10Jhancock.wm) pulled the cable and reseated psu1 to get the alert to clear. should be okay now. [14:48:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:08:41] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12034374 (10dcaro) This has been deployed, I'll close the task, but feel free to reopen if you see the issue again. [15:08:43] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12034376 (10dcaro) 05In progress→03Resolved [15:08:46] 06tools-platform-team: [infa] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12034378 (10dcaro) p:05Triage→03Low [15:38:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:48:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:49:03] 10Tool-paulina: "On this day" page - https://phabricator.wikimedia.org/T425032#12034606 (10Pepe_piton) 05Open→03Resolved a:03Pepe_piton Implemented from a contribution by @Ademola: https://gitlab.wikimedia.org/toolforge-repos/paulina/-/commit/0076e743255dd0f8c324f996d48ecb5bbe76c11a The solution is ba... [15:54:54] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-codfw, 06SRE: Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T429608#12034649 (10Andrew) 05Open→03Resolved a:03Andrew Indeed, the alert seems to have cleared. Thank you! [15:56:27] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Create Cloud VPS VM Migration Guide - https://phabricator.wikimedia.org/T429635 (10komla) 03NEW [15:57:07] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Create Cloud VPS VM Migration Guide - https://phabricator.wikimedia.org/T429635#12034678 (10komla) Draft guide [[ https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_VM_Migration_Guide | here ]]: [16:02:50] 06tools-platform-team: [infra] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12034725 (10aputhin) [16:08:07] 06cloud-services-team, 10Toolforge: [infra] unable to access web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12034741 (10aputhin) [16:13:27] (03open) 10dcaro: k8s: Use custom errors for k8s issues [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/313 [16:34:05] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 06Tech-Docs-Team, 07OKR-Work, 13Patch-For-Review: Fix summary issues in the MediaWiki REST API OAD - https://phabricator.wikimedia.org/T428150#12034807 (10TBurmeister) As of now, the linter output for https://www.mediawiki.org/w/rest.php/specs/v0/module/-... [16:39:07] (03update) 10dcaro: k8s: Use custom errors for k8s issues [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/313 [17:37:13] 06cloud-services-team, 10Toolforge: [infra] rate limiting by web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12034973 (10dcaro) [17:43:34] (03open) 10dcaro: core: don't use runtime jobs as source of truth [repos/cloud/toolforge/jobs-api] (use_own_exceptions) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/314 [18:22:33] 06cloud-services-team: cloudceph HEALTH_WARN, multiple OSD(s) experiencing slow operations in BlueStore - https://phabricator.wikimedia.org/T429387#12035049 (10Volans) I've setup an iostat for ~1.5h earlier and we got a slow event during it, this is what I found. Iostat setup: ` iostat -xt 1 > /tmp/io.$(hostnam... [18:24:50] 06tools-platform-team, 10Toolhub, 13Patch-For-Review: Replace Bullseye base image with Trixie before August 2026 EOL deadline - https://phabricator.wikimedia.org/T425303#12035052 (10fnegri) 05Open→03In progress [19:11:17] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12035177 (10Don-vip) Hi @Andrew , I received this e-mail as maintainer of the video CloudVPS project. The only remaining... [20:17:57] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/40 (owner: 10l10n-bot) [20:18:00] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/40 (owner: 10l10n-bot) [20:18:49] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/44 (owner: 10l10n-bot) [20:18:53] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/44 (owner: 10l10n-bot) [20:24:43] 06cloud-services-team, 10Toolforge: [infra] rate limiting by web.archive.org from within pods - https://phabricator.wikimedia.org/T429577#12035411 (10R1F4T) Seems rate limit is the main culprit. now i used a 15s delay between each request and it seems to work but still 2 failed attempt out of 56 not bad.