[02:24:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:09:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:09:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:30:41] 10Cloud-VPS (Project-requests): Request creation of eseap VPS project - https://phabricator.wikimedia.org/T401957#11092764 (10Robertsky) @Aklapper: Tentatively: about us, projects we are working on, team(s) composition, contact us form. Admittedly the content can mostly be done on meta or the main wikifarm, but... [05:59:12] 10Cloud-Services, 06DC-Ops, 10ops-eqiad: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157 (10ayounsi) 03NEW p:05Triage→03High The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/83... [06:00:08] 06cloud-services-team, 10Data-Services, 06DC-Ops, 10ops-eqiad: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157#11092797 (10ayounsi) [06:22:47] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157#11092819 (10taavi) [06:29:21] 06cloud-services-team, 10Cloud-VPS: PuppetFailure - https://phabricator.wikimedia.org/T402003#11092820 (10taavi) 05Open→03Resolved [06:30:57] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06DBA: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#11092824 (10taavi) This [[ https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/mediawiki/files/mariadb/tables-catalog.... [08:14:44] 06cloud-services-team, 10Cloud-VPS: Use cloud-private network and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145#11093002 (10fgiunchedi) [08:27:35] (03open) 10dcaro: worker_stuck: only alert for nfs-based worker [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/37 [08:28:55] (03approved) 10filippo: worker_stuck: only alert for nfs-based worker [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/37 (owner: 10dcaro) [08:31:02] (03merge) 10dcaro: worker_stuck: only alert for nfs-based worker [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/37 [08:32:22] (03PS8) 10David Caro: reboot_stuck_workers: add net cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [08:34:19] !log dcaro@acme tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud [08:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:34:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:35:33] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-harbor-2.tools.eqiad1.wikimedia.cloud [08:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:35:51] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06DBA: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#11093075 (10SD0001) Seems like it was previously available in wiki replicas (added in 2014, see T59491). [09:04:53] 06cloud-services-team, 10Toolforge: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11093298 (10dcaro) > I don't see anything on https://wikitech.wikimedia.org/wiki/Help:Toolforge/API about OAuth authentication. Yep, there's... [09:05:59] (03approved) 10dcaro: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 (owner: 10taavi) [09:07:07] (03approved) 10dcaro: Fix tab completion [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/89 (owner: 10taavi) [09:08:05] (03approved) 10dcaro: Fix tab completion [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/123 (owner: 10taavi) [09:08:19] (03merge) 10dcaro: Fix tab completion [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/123 (owner: 10taavi) [09:08:26] (03merge) 10dcaro: Fix tab completion [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/118 (owner: 10taavi) [09:08:34] (03merge) 10dcaro: Fix tab completion [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/89 (owner: 10taavi) [09:12:27] (03merge) 10mhorsey: Banner editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/7 (owner: 10vriaa) [09:14:06] (03merge) 10mhorsey: fix: ColorPicker component [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/4 (owner: 10vriaa) [09:14:37] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:19:37] FIRING: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:24:37] FIRING: [4x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:29:37] FIRING: [4x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:34:18] 06cloud-services-team, 10Toolforge: [build service] failure due to transient issue - https://phabricator.wikimedia.org/T401917#11093374 (10dcaro) Talking out loud a bit here :) We can add some retries to wget where we control it, but there's many places where we don't (ex. inside buildpacks, pip, go get, git... [09:34:37] RESOLVED: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:35:07] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11093384 (10fgiunchedi) Current status: * All 414 reachable hosts with duplicate `/etc/machine-id` have been fixed. * Of those hosts, 190 are running `systemd-networkd` which has been r... [09:35:40] 06cloud-services-team, 10Toolforge: [build service] failure due to transient issue - https://phabricator.wikimedia.org/T401917#11093386 (10dcaro) This might also be alleviated by having a caching proxy of sorts to avoid always hitting external services (that would also speed up some processes). [09:36:15] (03update) 10dcaro: logs: use logs-api for logs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/121 [09:37:11] (03update) 10vriaa: Text editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/5 [09:42:17] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11093389 (10fgiunchedi) Also these hosts share `/var/lib/dbus/machine-id`: ` (5) deployment-kafka-jumbo-[5,8-9].deployment-prep.eqiad1.wikimedia.cloud,deployment-kafka-main-[5-6].deplo... [09:45:16] (03update) 10vriaa: Close button editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/6 [10:03:05] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11093438 (10fgiunchedi) >>! In T401880#11093389, @fgiunchedi wrote: > Also these hosts share `/var/lib/dbus/machine-id`: > > ` > (5) deployment-kafka-jumbo-[5,8-9].deployment-prep.eqia... [10:04:39] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11093441 (10fgiunchedi) [10:09:08] 06cloud-services-team, 10Cloud-VPS: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880#11093468 (10fgiunchedi) I'm optimistically call this done. There are a minority unreachable / unauditable hosts in P81423, some expected (trove, magnum VMs) and some unexpected. Will tr... [10:35:40] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [10:37:18] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11093532 (10fnegri) @DamienZaremba good point about root access, if you find yourself needing it, please open a Phab task about it and we'll find a way to make it work. [10:39:34] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157#11093541 (10VRiley-WMF) Yes, we were testing some of the ports because we were troubleshooting some of these issues to find out wht was going on with some of... [10:39:48] 10Tool-link-dispenser: Not finished after 5 hours running - https://phabricator.wikimedia.org/T402178 (10Chidgk1) 03NEW [10:45:21] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11093565 (10fnegri) @taavi yes, I can add it to my backlog. I have never played with those dbs until now, do we have any docs about them? [10:46:17] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11093566 (10fnegri) a:03fnegri [10:48:42] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157#11093573 (10ayounsi) We can do anything with that port. The most important part is that Netbox reflects what's going on exactly in the DC. Can you make sure... [10:57:47] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Outsdanding diff on cloudsw1-d5-eqiad - https://phabricator.wikimedia.org/T402157#11093614 (10VRiley-WMF) Okay, my apologies. I thought this was one of the cables that I was connected to one of the trouble servers. Currently, port 42 on D5... [11:18:00] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06DBA: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#11093693 (10Ladsgroup) I don't think this was exposed in wikireplica recently otherwise it would have been shown in maintain-views (in f... [12:25:42] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [12:25:52] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/42 [12:26:04] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/10 [12:26:59] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11093840 (10taavi) >>! In T402005#11093565, @fnegri wrote: > I have never played with those dbs until now, do we have any docs about the... [12:41:20] 06cloud-services-team, 10Cloud-VPS: Audit and potentially fix VMs not reachable by cloudcumin root key - https://phabricator.wikimedia.org/T402185 (10fgiunchedi) 03NEW [12:41:39] (03open) 10dcaro: runtime.k8s.diff_with_running_job: use internal get_job [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/197 [12:43:28] (03update) 10dcaro: runtime.k8s.diff_with_running_job: use internal get_job [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/197 [12:44:31] (03update) 10dcaro: builds-api,jobs-api: when checking for launcher, use k8s [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/926 [13:47:43] (03PS1) 10Hashar: Add missing passwords::mysql::phabricator::phd_user [labs/private] - 10https://gerrit.wikimedia.org/r/1179693 [13:47:58] (03CR) 10Hashar: [V:03+2 C:03+2] Add missing passwords::mysql::phabricator::phd_user [labs/private] - 10https://gerrit.wikimedia.org/r/1179693 (owner: 10Hashar) [14:00:54] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [14:17:14] (03update) 10vriaa: Draft: code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [14:41:32] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [14:41:58] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_get_job_for_diff) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [14:42:13] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_get_job_for_diff) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [14:43:27] (03merge) 10mhorsey: Text editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/5 (owner: 10vriaa) [15:01:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds (T402190) [15:01:23] T402190: [ceph,eqiad1] upgrade from pacific->quincy - https://phabricator.wikimedia.org/T402190 [15:01:46] (03open) 10dcaro: remove futures [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/198 [15:02:31] (03update) 10dcaro: remove futures [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/198 [15:04:01] PROBLEM - Host cloudcephosd1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:05:29] RECOVERY - Host cloudcephosd1004 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [15:06:40] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [15:07:29] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [15:08:09] (03approved) 10filippo: remove futures [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/198 (owner: 10dcaro) [15:08:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:08:56] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [15:13:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:15:49] (03merge) 10dcaro: remove futures [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/198 [15:16:38] (03update) 10dcaro: runtime.k8s.diff_with_running_job: use internal get_job [repos/cloud/toolforge/jobs-api] (dont_return_launcher) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/197 [15:17:04] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [15:17:25] (03update) 10vriaa: Close button editing feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/6 [15:19:18] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.401-20250818151559-cea084cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/928 [15:23:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds (T402190) [15:23:08] T402190: [ceph,eqiad1] upgrade from pacific->quincy - https://phabricator.wikimedia.org/T402190 [15:25:49] PROBLEM - Host cloudcephosd1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:27:47] RECOVERY - Host cloudcephosd1004 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [15:29:26] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_get_job_for_diff) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [15:30:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:33:51] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) [15:33:56] (03update) 10vriaa: feat: Add banner code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [15:38:19] PROBLEM - Host cloudcephosd1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:38:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:39:49] RECOVERY - Host cloudcephosd1004 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [15:42:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_mons (T402190) [15:42:43] T402190: [ceph,eqiad1] upgrade from pacific->quincy - https://phabricator.wikimedia.org/T402190 [15:43:56] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:45:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:46:13] PROBLEM - Host cloudcephmon1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:47:47] RECOVERY - Host cloudcephmon1004 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [15:49:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:51:15] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11094767 (10Raymond_Ndibe) [15:52:11] PROBLEM - Host cloudcephmon1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:53:35] RECOVERY - Host cloudcephmon1005 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [15:55:18] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11094787 (10Raymond_Ndibe) [15:58:33] PROBLEM - Host cloudcephmon1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:58:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:00:03] RECOVERY - Host cloudcephmon1006 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [16:01:09] 06cloud-services-team, 10Toolforge: toolforge jobs load does not update jobs when image is changed - https://phabricator.wikimedia.org/T402194#11094821 (10JJMC89) [16:01:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_mons (exit_code=0) [16:02:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:04:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:04:41] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:07:11] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:07:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:08:51] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [16:10:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:16:58] (03update) 10dcaro: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 [16:19:38] (03open) 10fnegri: CI: use trusted runner for build_ci_deb [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/3 (https://phabricator.wikimedia.org/T395266) [16:19:56] (03update) 10fnegri: CI: use trusted runner for build_ci_deb [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/3 (https://phabricator.wikimedia.org/T395266) [16:20:39] (03update) 10fnegri: CI: use trusted runner for build_ci_deb [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/3 (https://phabricator.wikimedia.org/T395266) [16:20:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:29:04] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11094944 (10dcaro) This is weird, as both calls are to the same exact endpoint, so it's not likely a change in behavior between calls. Has this h... [16:30:04] (03CR) 10FNegri: reboot_stuck_workers: add net cookbook (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [16:32:28] FIRING: InstanceDown: Project tools instance tools-harbor-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:33:17] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:35:33] (03PS9) 10David Caro: reboot_stuck_workers: add new cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 [16:37:28] RESOLVED: InstanceDown: Project tools instance tools-harbor-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:41:00] (03CR) 10FNegri: [C:03+1] reboot_stuck_workers: add new cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [16:41:11] (03CR) 10FNegri: [C:03+1] reboot_stuck_workers: add new cookbook (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [16:41:58] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:43:08] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): toolforge jobs load does not update jobs when image is changed - https://phabricator.wikimedia.org/T402194#11095051 (10dcaro) 05Open→03In progress a:03dcaro [16:44:33] (03open) 10dcaro: diff_with_running_job: fix bug when comparing the images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/199 [16:45:48] (03CR) 10David Caro: [C:03+2] reboot_stuck_workers: add new cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [16:47:14] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:47:14] (03approved) 10dcaro: CI: use trusted runner for build_ci_deb [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/3 (https://phabricator.wikimedia.org/T395266) (owner: 10fnegri) [16:48:06] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): toolforge jobs load does not update jobs when image is changed - https://phabricator.wikimedia.org/T402194#11095094 (10dcaro) p:05Triage→03Medium [16:48:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file - https://phabricator.wikimedia.org/T401922#11095100 (10dcaro) a:03dcaro [16:48:58] (03approved) 10fnegri: diff_with_running_job: fix bug when comparing the images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/199 (owner: 10dcaro) [16:49:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file - https://phabricator.wikimedia.org/T401922#11095104 (10dcaro) p:05Triage→03Medium [16:49:25] (03Merged) 10jenkins-bot: reboot_stuck_workers: add new cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177960 (owner: 10David Caro) [16:50:25] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:53:26] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud (T350687) [16:53:31] T350687: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687 [16:54:32] (03open) 10dcaro: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 [16:54:39] (03update) 10dcaro: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 [16:54:51] (03approved) 10fnegri: builds-api,jobs-api: when checking for launcher, use k8s [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/926 (owner: 10dcaro) [16:54:58] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud (T350687) [16:56:34] (03approved) 10fnegri: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 (owner: 10dcaro) [17:02:34] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [17:02:49] (03open) 10vriaa: fix: Move Close button out of banner link [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/9 [17:03:59] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli [17:07:49] 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#11095168 (10dcaro) [17:11:35] (03update) 10vriaa: fix: Move Close button out of banner link [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/9 [17:12:16] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli [17:13:53] (03approved) 10dcaro: jobs-api: bump to 0.0.401-20250818151559-cea084cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/928 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:13:56] (03merge) 10dcaro: jobs-api: bump to 0.0.401-20250818151559-cea084cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/928 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:42:15] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095279 (10Raymond_Ndibe) [17:44:14] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095292 (10Raymond_Ndibe) [17:48:14] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T362869) [17:48:15] !log dcaro@cloudcumin1001 tools Updating container image docker-registry.svc.toolforge.org/metrics-server:v0.7.2 (T362869) [17:48:19] T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869 [17:48:21] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T362869) [17:49:19] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T362869) [17:49:20] !log dcaro@cloudcumin1001 tools Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.16.0 (T362869) [17:49:35] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T362869) [17:50:30] (03open) 10dcaro: wmcs-k8s-metrics: update to support k8s v1.30 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/929 [17:51:42] (03update) 10dcaro: wmcs-k8s-metrics: update to support k8s v1.30 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/929 [17:53:34] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11095314 (10DamianZaremba) I haven't noticed it again, but also I haven't really been looking. There are a number of deploys over the last week th... [17:54:28] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/10 (owner: 10l10n-bot) [17:54:31] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/10 (owner: 10l10n-bot) [18:27:21] (03approved) 10raymond-ndibe: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 (owner: 10dcaro) [18:27:23] (03update) 10raymond-ndibe: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 (owner: 10dcaro) [18:27:32] (03merge) 10raymond-ndibe: models: use *.err for stderr output logs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/200 (owner: 10dcaro) [18:31:02] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.402-20250818182747-4d89d9df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/930 (https://phabricator.wikimedia.org/T401922) [18:32:55] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [18:40:34] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [19:03:31] !log andrew@cloudcumin1001 magnum START - Cookbook wmcs.vps.create_project for project magnum in eqiad1 [19:03:37] !log andrew@cloudcumin1001 magnum END (ERROR) - Cookbook wmcs.vps.create_project (exit_code=97) for project magnum in eqiad1 [19:19:28] (03approved) 10fnegri: buildservice: strip `launcher` when returning the job [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/196 (owner: 10dcaro) [19:38:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:43:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:50:28] FIRING: InstanceDown: Project cloudinfra instance enc-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:51:19] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095787 (10Raymond_Ndibe) [19:54:21] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095811 (10Raymond_Ndibe) [19:55:28] RESOLVED: InstanceDown: Project cloudinfra instance enc-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:57:05] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [20:05:57] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095875 (10Raymond_Ndibe) [20:09:10] 10cloud-services-team (FY2025/26-Q1), 10Toolforge (Toolforge iteration 23), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#11095907 (10Raymond_Ndibe) [20:10:15] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [21:03:55] 06cloud-services-team, 10Cloud-VPS: [upstream] [openstack] Fix capi-helm magnum driver to support more template options - https://phabricator.wikimedia.org/T402232 (10Andrew) 03NEW [21:07:26] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [21:09:47] (03open) 10bd808: kubectl: install via hatch_build.py [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/3 [21:20:18] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [21:40:23] (03update) 10bd808: kubectl: install via hatch_build.py [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/3 [22:06:38] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.402-20250818182747-4d89d9df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/930 (https://phabricator.wikimedia.org/T401922) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [22:06:39] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.402-20250818182747-4d89d9df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/930 (https://phabricator.wikimedia.org/T401922) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [22:06:54] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.402-20250818182747-4d89d9df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/930 (https://phabricator.wikimedia.org/T401922) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [22:14:35] (03update) 10bd808: kubectl: install via hatch_build.py [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/3 [22:18:55] (03merge) 10bd808: kubectl: install via hatch_build.py [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/3 [22:32:02] 06cloud-services-team, 10Toolforge: [build service] failure due to transient issue - https://phabricator.wikimedia.org/T401917#11096346 (10DamianZaremba) Also just shower thoughts; Regarding github rate limiting is this using a github app or other token, or relying on the public ip limits (which I guess a sin... [23:00:36] 10Tools, 07Privacy: wmtran Tool Loads External Resource - https://phabricator.wikimedia.org/T333894#11096428 (10Aklapper) >>! In T333894#10566156, @Gryllida wrote: > Can I use jQuery if I save a copy locally in toolforge? See https://phabricator.wikimedia.org/phame/post/view/65/toolforge_provides_proxied_mirr...