[00:15:28] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:28] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:46:56] FIRING: [3x] CloudVPSDesignateLeaks: Detected 14 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:09:45] (03update) 10raymond-ndibe: [jobs-cli] remove to_delete from calculate_changes [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T364204) [02:11:52] (03update) 10raymond-ndibe: [jobs-api] support services in jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/71 (https://phabricator.wikimedia.org/T348758) [02:12:49] (03update) 10raymond-ndibe: [jobs-api] support services in jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/71 (https://phabricator.wikimedia.org/T348758) [02:15:12] (03merge) 10raymond-ndibe: [jobs-api] support services in jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/71 (https://phabricator.wikimedia.org/T348758) [02:15:24] (03merge) 10raymond-ndibe: [jobs-cli] support services in jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/18 (https://phabricator.wikimedia.org/T348758) [02:16:15] (03update) 10raymond-ndibe: [jobs-cli] remove to_delete from calculate_changes [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T364204) [02:17:31] (03merge) 10raymond-ndibe: [jobs-cli] remove to_delete from calculate_changes [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T364204) [02:17:41] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.302-20240528104119-80e50d7b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/289 [02:17:46] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.302-20240528104119-80e50d7b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/289 (https://phabricator.wikimedia.org/T348758) [02:22:01] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.302-20240528104119-80e50d7b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/289 (https://phabricator.wikimedia.org/T348758) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [02:48:02] (03open) 10raymond-ndibe: d/changelog: bump to 16.0.9 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/34 [02:48:24] (03approved) 10raymond-ndibe: d/changelog: bump to 16.0.9 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/34 [02:48:25] (03update) 10raymond-ndibe: d/changelog: bump to 16.0.9 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/34 [02:48:28] (03merge) 10raymond-ndibe: d/changelog: bump to 16.0.9 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/34 [02:59:21] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [02:59:23] !log raymond@ubuntu tools END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component envvars-api [02:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:00:04] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [03:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [03:00:57] !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [03:00:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:46:56] FIRING: [3x] CloudVPSDesignateLeaks: Detected 14 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:45:09] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [07:45:19] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [07:57:57] (03update) 10dcaro: jobs-api: bump to 0.0.302-20240528104119-80e50d7b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/289 (https://phabricator.wikimedia.org/T348758) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [07:58:02] (03merge) 10dcaro: jobs-api: bump to 0.0.302-20240528104119-80e50d7b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/289 (https://phabricator.wikimedia.org/T348758) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [07:59:31] (03open) 10dcaro: Revert "jobs-api: bump to 0.0.303-20240529021520-1bdba302" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/290 [07:59:35] (03approved) 10dcaro: Revert "jobs-api: bump to 0.0.303-20240529021520-1bdba302" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/290 [07:59:40] (03merge) 10dcaro: Revert "jobs-api: bump to 0.0.303-20240529021520-1bdba302" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/290 [08:02:51] (03open) 10dcaro: jobs-api: bump to 0.0.303-20240529021520-1bdba302 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/291 [08:08:45] (03update) 10dcaro: jobs-api: bump to 0.0.303-20240529021520-1bdba302 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/291 (https://phabricator.wikimedia.org/T348758) [08:09:23] (03update) 10dcaro: jobs-api: bump to 0.0.303-20240529021520-1bdba302 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/291 (https://phabricator.wikimedia.org/T348758) [08:45:03] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [08:51:13] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [08:59:40] 10Toolforge (Toolforge iteration 10): [builds-api, builds-cli] Prefix all endpoints with `/tool/` - https://phabricator.wikimedia.org/T363808#9841098 (10Slst2020) [09:05:14] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [09:07:01] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [09:08:03] (03open) 10sstefanova: Slavina/use prefixed endpoints [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/71 (https://phabricator.wikimedia.org/T363808) [09:08:45] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [09:15:32] (03update) 10sstefanova: cli: use prefixed endpoints [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/71 (https://phabricator.wikimedia.org/T363808) [09:15:51] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [09:20:41] (03update) 10sstefanova: cli: use prefixed endpoints [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/71 (https://phabricator.wikimedia.org/T363808) [09:20:52] (03update) 10dcaro: jobs-api: add some basic functional tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/288 (https://phabricator.wikimedia.org/T357977) [09:46:31] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147 (10Slst2020) 03NEW [09:46:57] FIRING: [3x] CloudVPSDesignateLeaks: Detected 14 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:49:33] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9841230 (10Slst2020) [10:09:06] 10Data-Services, 06DBA: Prepare and check storage layer for dtpwiki - https://phabricator.wikimedia.org/T365229#9841289 (10ABran-WMF) Database sanitized Database `_p` created Grants assigned to `labsdbuser` Wiki ready for views creation [10:20:22] 14Grid-Engine-to-K8s-Migration: Migrate wikisaurusbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320164#9841321 (10dcaro) > @dcaro could you fix this? What do I need to do, for example in file https://github.com/wikisaurus/wikisaurusbot/blob/master/facenapalmscript... [10:22:21] 10Tool-toolwatch, 06Indic-MediaWiki-Developers: Enhance Tool Watch to Track Tool Liveliness and Display Graphs - https://phabricator.wikimedia.org/T365857#9841323 (10Hrideshmg) Hello @Gopavasanth, thank you for the prompt response. I've resolved the bug with the dates and added a month/year selector, also i wo... [10:23:26] 14Grid-Engine-to-K8s-Migration: Migrate wikisaurusbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320164#9841324 (10dcaro) >>! In T320164#9839361, @MBH wrote: > @dcaro A script `sandbox.py` generates an error `Skipped '/workspace/user-config.py': owned by someone el... [10:26:20] 14Grid-Engine-to-K8s-Migration: Migrate wikisaurusbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320164#9841334 (10MBH) This script is runned from this fragment of `jobs.yaml`: `- name: sandbox command: sandbox image: tool-wikisaurusbot/tool-wikisaurusbot:lates... [10:46:53] 14Grid-Engine-to-K8s-Migration: Migrate wikisaurusbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320164#9841462 (10dcaro) >>! In T320164#9841334, @MBH wrote: > This script is runned from this fragment of `jobs.yaml`: Awesome, that helps, looking (I see it also ha... [10:54:18] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Degraded RAID on cloudcephosd1031 - https://phabricator.wikimedia.org/T364060#9841481 (10dcaro) 05Resolved→03In progress Thank @Jclark-ctr, I don't see the drive on the host (sda) though: ` root@cloudcephosd1031:~# ls -la /dev/sd?... [11:20:17] (03update) 10sstefanova: cli: use prefixed endpoints [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/71 (https://phabricator.wikimedia.org/T363808) [11:25:04] 10PAWS: ansible to version 9.6.0 - https://phabricator.wikimedia.org/T366163 (10rook) 03NEW [11:27:43] 10PAWS: ansible to version 9.6.0 - https://phabricator.wikimedia.org/T366163#9841581 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/428 [11:27:51] vivian-rook opened https://github.com/toolforge/paws/pull/428 [11:31:21] 10PAWS: ansible to version 9.6.0 - https://phabricator.wikimedia.org/T366163#9841597 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/428 [11:31:27] vivian-rook closed https://github.com/toolforge/paws/pull/428 [11:31:48] 10PAWS: ansible to version 9.6.0 - https://phabricator.wikimedia.org/T366163#9841599 (10rook) 05Open→03Resolved [11:35:20] (03open) 10dcaro: Draft: donotmerge timeout log [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/94 [11:37:56] (03update) 10dcaro: Draft: donotmerge timeout log [repos/cloud/toolforge/builds-api] (slavina/prefix-endpoints) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/94 [11:53:55] (03update) 10sstefanova: cli: use prefixed endpoints [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/71 (https://phabricator.wikimedia.org/T363808) [12:47:47] 10Toolforge (Toolforge iteration 10): [builds-api] Nginx times out when tailing logs - https://phabricator.wikimedia.org/T366173 (10dcaro) 03NEW [12:47:56] 10Toolforge (Toolforge iteration 10): [builds-api] Nginx times out when tailing logs - https://phabricator.wikimedia.org/T366173#9841942 (10dcaro) p:05Triage→03Medium [12:48:28] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9841946 (10dcaro) [12:48:37] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9841947 (10dcaro) p:05Triage→03Medium [12:49:09] 10Toolforge (Toolforge iteration 10): [builds-api] Nginx times out when tailing logs - https://phabricator.wikimedia.org/T366173#9841944 (10dcaro) →14Duplicate dup:03T366147 [12:51:07] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9841965 (10dcaro) The issue here might be simpler than the heartbeat (that should not hurt in any case). The nginx instance that is timing out is the one deployed along builds-a... [13:05:23] 10PAWS: ingress-nginx not idempotent - https://phabricator.wikimedia.org/T366121#9841999 (10rook) This would appear to be a limitation of the helm ansible module. It seems to be seeing the set_values as a difference: ` set_values: - value: controller.service.type=NodePort value_type... [13:08:48] (03update) 10aborrero: maintain_kubeusers: introduce resource abstraction [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 (https://phabricator.wikimedia.org/T364312) [13:10:54] 10PAWS: ingress-nginx not idempotent - https://phabricator.wikimedia.org/T366121#9842017 (10rook) [13:11:41] 10PAWS: ingress-nginx and prometheus not idempotent - https://phabricator.wikimedia.org/T366121#9842020 (10rook) [13:11:42] 10PAWS: prometheus not idempotent - https://phabricator.wikimedia.org/T366122#9842015 (10rook) →14Duplicate dup:03T366121 [13:12:14] 10PAWS: ingress-nginx and prometheus not idempotent - https://phabricator.wikimedia.org/T366121#9842021 (10rook) This appears to be the same issue with prometheus. Removing: ` set_values: - value: prometheus.retention=30d value_type: string ` gets it working as expected. [13:16:24] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9842048 (10dcaro) Nope, it's the path matching expression, now the paths start with `/v1/tool/...` not `/v1/build/...` to be fixed on https://gitlab.wikimedia.org/repos/cloud/too... [13:29:02] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9842076 (10Slst2020) >>! In T366147#9842048, @dcaro wrote: > Nope, it's the path matching expression, now the paths start with `/v1/tool/...` not `/v1/build/...` to be fixed on h... [13:48:42] FIRING: [3x] CloudVPSDesignateLeaks: Detected 14 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:04:56] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Degraded RAID on cloudcephosd1031 - https://phabricator.wikimedia.org/T364060#9842272 (10Jclark-ctr) @dcaro the drive was listed as ready in idrac Converted to non-raid should be visible now [14:06:51] vivian-rook opened https://github.com/toolforge/paws/pull/429 [14:08:34] 10PAWS: update prometheus - https://phabricator.wikimedia.org/T366182 (10rook) 03NEW [14:10:55] 10PAWS: jupyterhub helm deploy show changed in ansible - https://phabricator.wikimedia.org/T366183 (10rook) 03NEW [14:40:45] 10Toolforge (Toolforge iteration 10): [builds-api] Fix issue with log streaming timing out - https://phabricator.wikimedia.org/T366147#9842480 (10dcaro) >>! In T366147#9842076, @Slst2020 wrote: >>>! In T366147#9842048, @dcaro wrote: >> Nope, it's the path matching expression, now the paths start with `/v1/tool/.... [14:41:28] 10Toolforge: `toolforge jobs logs -f` crashes after a while with internal k8s api errors - https://phabricator.wikimedia.org/T359953#9842481 (10dcaro) We should add something like we do to builds-api: ` │ location ~ /.*/logs {... [14:55:59] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:56:52] !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:56:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:32:45] (03update) 10aborrero: maintain_kubeusers: introduce resource abstraction [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 (https://phabricator.wikimedia.org/T364312) [15:55:51] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984#9842872 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm [15:59:16] 10PAWS: ingress-nginx and prometheus not idempotent - https://phabricator.wikimedia.org/T366121#9842882 (10rook) https://github.com/ansible-collections/kubernetes.core/issues/732 [15:59:31] (03update) 10aborrero: maintain_kubeusers: introduce resource abstraction [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 (https://phabricator.wikimedia.org/T364312) [16:13:32] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [16:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:14:32] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [16:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:24:43] 10PAWS: jupyterhub helm deploy show changed in ansible - https://phabricator.wikimedia.org/T366183#9843049 (10RohithReddy1234) To address the issue of JupyterHub Helm deploys showing changes in Ansible due to regenerated elements like checksums and secrets, you can use specific annotations and strategies to igno... [16:27:40] 10PAWS: jupyterhub helm deploy show changed in ansible - https://phabricator.wikimedia.org/T366183#9843078 (10rook) I agree, that the checksums and secrets could be stored as annotations. Though the image puller job would be more difficult to manage in a similar fashion. And in this case, the thing being deploye... [16:32:37] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.303-20240529021520-1bdba302 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/291 (https://phabricator.wikimedia.org/T348758) (owner: 10dcaro) [16:32:42] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.303-20240529021520-1bdba302 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/291 (https://phabricator.wikimedia.org/T348758) (owner: 10dcaro) [17:08:36] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984#9843300 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm [17:13:18] 10Toolforge (Toolforge iteration 10): [webservice-cli] `webservice logs -f` should expect KeyboardInterrupt - https://phabricator.wikimedia.org/T361437#9843302 (10bd808) a:05dcaro→03dancy [17:22:06] 10PAWS: jupyterhub helm deploy show changed in ansible - https://phabricator.wikimedia.org/T366183#9843341 (10RohithReddy1234) To manage JupyterHub Helm deploys with Ansible, where certain elements always show as changed (like image puller job and secret annotations), here’s an effective strategy: Accept Routin... [17:50:43] FIRING: [3x] CloudVPSDesignateLeaks: Detected 14 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:32:24] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [18:32:46] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [18:32:54] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984#9843563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm completed: - cloudvirt... [18:42:12] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984#9843591 (10Andrew) a:05Jclark-ctr→03None After a nic firmware upgrade things seem to be working. It took a couple of tries (suspicious!) but now the host is imaged an... [18:53:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (T364984) [18:54:00] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99) (T364984) [18:54:02] T364984: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984 [18:54:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (T364984) [18:54:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99) (T364984) [18:54:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [18:54:33] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99) [19:10:52] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984#9843740 (10Andrew) a:03aborrero This host is up and seems stable, but VMs running on it cannot reach the internet. Since this host was being moved from a 2-nic to 1-ni... [20:13:05] 10Tool-bub2, 13Patch-For-Review: Add persistance to queue page on refresh - https://phabricator.wikimedia.org/T357236#9843998 (10theprotonade) 05Open→03In progress [[ https://github.com/coderwassananmol/BUB2/pull/245 | GitHub PR ]] [20:15:01] 10Toolforge (Toolforge iteration 10): [jobs-api] api endpoint that returns all the default values of a job from the backend - https://phabricator.wikimedia.org/T366209 (10Raymond_Ndibe) 03NEW [20:22:04] 10Toolforge (Toolforge iteration 10): [jobs-api] api endpoint that returns all the default values of a job from the backend - https://phabricator.wikimedia.org/T366209#9844054 (10JJMC89) [20:51:15] 10Toolforge (Toolforge iteration 10): [jobs-cli] enforce proper validation for load jobs before calculate_changes - https://phabricator.wikimedia.org/T366211 (10Raymond_Ndibe) 03NEW [20:53:54] 10Toolforge (Toolforge iteration 10): [jobs-cli] enforce proper validation for load jobs before calculate_changes - https://phabricator.wikimedia.org/T366211#9844173 (10JJMC89) [21:42:59] 10Wikibugs: Wikibugs should ignore GitLab draft merge request activity - https://phabricator.wikimedia.org/T366218 (10bd808) 03NEW [21:50:43] FIRING: [3x] CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:15:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:16:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance paws-puppetserver-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:17:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:20:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:23:42] RESOLVED: [3x] CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:25:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:26:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cvn-nfs-1 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:34:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:36:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance cvn-app10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:37:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:37:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tf-bastion on project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:40:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:40:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:41:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:41:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:51:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:52:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:52:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tf-bastion on project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:54:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:55:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:55:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:56:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:56:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:00:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:02:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:06:28] RESOLVED: [2x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:10:28] RESOLVED: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:11:28] RESOLVED: [3x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources