[00:04:08] (03open) 10bd808: gitlab: Switch to keyset based pagination [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/18 (https://phabricator.wikimedia.org/T368761) [00:04:24] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [00:06:52] (03merge) 10bd808: gitlab: Switch to keyset based pagination [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/18 (https://phabricator.wikimedia.org/T368761) [00:14:37] 10Tool-gitlab-account-approval, 13Patch-For-Review: Switch to keyset based pagination - https://phabricator.wikimedia.org/T368761#10478086 (10bd808) 05Open→03Resolved a:03bd808 [00:23:03] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [00:43:23] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [00:59:25] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [01:02:22] 06cloud-services-team, 10Cloud-VPS: VM nova records attached to incorrect cloudcephmon IPs - https://phabricator.wikimedia.org/T383583#10478098 (10Andrew) I doubt that the video2commons issue is related to this task; the only symptom I've seen for this task is a VM being stuck in a 'rebooting' or 'shutoff' sta... [01:03:00] 10Tool-gitlab-account-approval: Reject users that have not been approved after N days - https://phabricator.wikimedia.org/T384264 (10bd808) 03NEW [01:03:45] 10Tool-gitlab-account-approval: Reject users that have not been approved after N days - https://phabricator.wikimedia.org/T384264#10478111 (10bd808) 05Open→03In progress p:05Triage→03Medium a:03bd808 [01:04:01] 06cloud-services-team, 10Cloud-VPS: CloudVPSDesignateLeaks alert is flapping - https://phabricator.wikimedia.org/T384118#10478115 (10Andrew) I've been seeing the flap too, and I don't understand why it resolves itself; maybe someone added an automatic cleanup job? [01:10:52] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [01:10:54] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [01:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:21:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:26:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:56:10] (03open) 10bd808: feature: Reject users not approved after N days [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/19 (https://phabricator.wikimedia.org/T384264) [04:58:08] (03update) 10bd808: feature: Reject users not approved after N days [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/19 (https://phabricator.wikimedia.org/T384264) [05:04:53] (03merge) 10bd808: feature: Reject users not approved after N days [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/19 (https://phabricator.wikimedia.org/T384264) [05:12:16] 10Tool-gitlab-account-approval, 13Patch-For-Review: Reject users that have not been approved after N days - https://phabricator.wikimedia.org/T384264#10478409 (10bd808) 05In progress→03Resolved The bot is now rejecting pending accounts after 90 days: https://wikitech.wikimedia.org/w/index.php?title=Too... [05:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:20:00] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Dec), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10478507 (10Wangombe) [07:55:05] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10478521 (10Ladsgroup) Sounds good to me. One thing we can also do: > The script ignores accounts where the SUL name and the Developer name are the sa... [07:55:07] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Dec), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10478522 (10Wangombe) [08:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:10:28] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [jobs-api,jobs-cli] increased exit code 137 rate since 2024-12-14 - https://phabricator.wikimedia.org/T382865#10478581 (10dcaro) a:03dcaro [09:20:53] (03update) 10dcaro: [jobs-api] support http health check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/136 (https://phabricator.wikimedia.org/T362621) (owner: 10raymond-ndibe) [09:34:41] (03CR) 10Eugene233: "Looks good. Nonetheless, The messages file have to be updated in the case where a message is updated. Instructions could be found in the R" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1053046 (https://phabricator.wikimedia.org/T358396) (owner: 10Hridyesh_Gupta) [09:38:11] (03CR) 10Eugene233: "Seems there are some failing tests. Please kindly recheck" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/998432 (owner: 10Juniorbesong) [09:40:24] (03CR) 10Eugene233: "recheck" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1009402 (owner: 10AgnesAbah) [09:40:45] (03CR) 10CI reject: [V:04-1] Bug:T343438 has been corrected to up to six caption languages and one depicts language in routes.py [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1009402 (owner: 10AgnesAbah) [09:41:10] (03CR) 10Eugene233: "Looks like there are some failing tests. Kindly recheck" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1009401 (owner: 10AgnesAbah) [10:10:34] (03approved) 10lucaswerkmeister: Make tool translatable [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/1 (https://phabricator.wikimedia.org/T384061) [10:10:37] (03merge) 10lucaswerkmeister: Make tool translatable [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/1 (https://phabricator.wikimedia.org/T384061) [10:14:22] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Dec), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10478749 (10LucasWerkmeister) >>! In T384061#10472258, @LucasWerkmeister wrote: > Currently, the code to make the tool trans... [10:15:23] (03approved) 10dcaro: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) (owner: 10raymond-ndibe) [10:15:29] (03update) 10dcaro: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) (owner: 10raymond-ndibe) [10:32:45] (03update) 10dcaro: packages: install components-cli by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/222 (https://phabricator.wikimedia.org/T384203) [10:39:52] (03CR) 10FNegri: [C:03+1] "LGTM!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1112754 (owner: 10David Caro) [10:42:09] (03approved) 10fnegri: packages: install components-cli by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/222 (https://phabricator.wikimedia.org/T384203) (owner: 10dcaro) [10:43:28] (03merge) 10dcaro: packages: install components-cli by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/222 (https://phabricator.wikimedia.org/T384203) [10:44:18] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [components-cli,lima-kilo] deploy compontents-cli on lima-kilo by default - https://phabricator.wikimedia.org/T384203#10478822 (10dcaro) 05In progress→03Resolved [10:45:13] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Dec), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10478828 (10abi_) @Wangombe Few notes: - Please take a look at some other Wikidata tools such as Wikidata Mismatch Finder, W... [10:48:00] (03CR) 10David Caro: [C:03+2] roll_restart_osd_daemons: allow oking all the rest of osds [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1112754 (owner: 10David Caro) [10:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:50:47] (03open) 10dcaro: packaging: copy scripts and minor fixes [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/11 [10:52:11] (03Merged) 10jenkins-bot: roll_restart_osd_daemons: allow oking all the rest of osds [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1112754 (owner: 10David Caro) [11:00:58] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867#10478909 (10fnegri) [11:02:18] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867#10478916 (10fnegri) What's left before this task can be resolved? Only {T370245} or also something else? [11:28:00] (03open) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [11:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:40:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10479066 (10fnegri) 05In progress→03Stalled Setting to Stalled until @aborrero is back. [11:56:59] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: [wmcs-cookbooks] Add owner property - https://phabricator.wikimedia.org/T384293 (10fnegri) 03NEW [11:57:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: [wmcs-cookbooks] Add owner property - https://phabricator.wikimedia.org/T384293#10479132 (10fnegri) 05Open→03In progress p:05Triage→03Medium [12:02:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:07:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:07:35] 06cloud-services-team, 10Cloud-VPS: [wmcs-cookbooks] Remove redundant SAL logging - https://phabricator.wikimedia.org/T384296 (10fnegri) 03NEW [12:08:21] 06cloud-services-team, 10Cloud-VPS: [wmcs-cookbooks] Remove redundant SAL logging - https://phabricator.wikimedia.org/T384296#10479187 (10fnegri) p:05Triage→03Low [12:38:25] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-10 [12:39:38] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:42:47] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-10 [12:44:08] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-5 [12:44:27] FIRING: ToolsbetaNFSDown: No toolsbeta nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsbetaNFSDown [12:44:38] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:48:07] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-5 [12:48:09] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-7 [12:52:06] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-7 [12:52:09] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-8 [12:56:02] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-8 [12:56:03] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for toolsbeta-test-k8s-worker-nfs-9 [12:59:55] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for toolsbeta-test-k8s-worker-nfs-9 [13:04:53] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10479366 (10Wangombe) [13:09:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:12:47] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10479402 (10Wangombe) [13:19:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:21:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:22:03] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10479435 (10LucasWerkmeister) >>! In T384061#10478828, @abi_ wrote: > @LucasWerkmeister - Can you please add https://gitlab.... [13:26:06] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:28:18] 10tool-wscontest: I can't edit or view scores of a contest despite being made an administrator by the creator - https://phabricator.wikimedia.org/T384310 (10Ninovolador) 03NEW [13:31:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:20:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-api,jobs-cli] Introduce a way to stop stuck cronjobs - https://phabricator.wikimedia.org/T377420#10479764 (10dcaro) [14:20:21] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-api,jobs-cli] Introduce a way to stop stuck cronjobs - https://phabricator.wikimedia.org/T377420#10479766 (10dcaro) a:05Raymond_Ndibe→03dcaro [14:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:26:29] 10Toolforge (Toolforge iteration 17), 07Documentation: [harbor,docs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#10479798 (10dcaro) 05Stalled→03In progress [14:30:15] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#10479815 (10dcaro) 05Stalled→03In progress [14:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:31:08] 10Toolforge (Toolforge iteration 17): [infra,harbor] upgrade to latest - https://phabricator.wikimedia.org/T384327 (10dcaro) 03NEW [14:31:41] 10Toolforge (Toolforge iteration 17): [infra,harbor] upgrade to latest - https://phabricator.wikimedia.org/T384327#10479840 (10dcaro) p:05Triage→03Medium [14:37:14] 10Toolforge (Toolforge iteration 17): [builds-cli,builds-api] `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701#10479859 (10dcaro) Let's create the project on the fly if it does not exist, or just return a message saying "Unable to get quota, did you run any builds?" or similar. [14:39:20] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245#10479874 (10dcaro) a:03Raymond_Ndibe [14:50:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster (T370245) [14:50:14] T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245 [14:50:34] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the toolsbeta cluster (T370245) [14:51:13] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [14:51:38] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster (T370245) [14:56:25] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [14:56:31] T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245 [14:56:51] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster (T370245) [14:58:05] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [15:05:36] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster [15:05:37] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [15:05:43] T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245 [15:06:01] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster (T370245) [15:06:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-worker-14 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:07:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:11:25] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-14 [15:11:48] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-14 [15:12:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:21:58] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-worker-14 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:06:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-58 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:11:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:11:05] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [16:11:12] T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245 [16:17:37] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster [16:18:21] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-14 [16:18:42] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-14 [16:20:08] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-puppet-01.monitoring.eqiad.wmflabs is about to expire in 24d 23h 56m 27s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [16:21:01] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster (T370245) [16:21:05] T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245 [16:28:41] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the toolsbeta cluster [16:29:10] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-14 [16:29:31] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-14 [16:46:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-worker-14 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:51:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [17:07:58] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-worker-14 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [17:09:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Workers are not rotating the /var/log/acctount/pacct log and it's growing - https://phabricator.wikimedia.org/T384250#10480701 (10dcaro) 05Open→03Resolved p:05Triage→03Medium a:03dcaro [17:12:55] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [jobs-api,jobs-cli] increased exit code 137 rate since 2024-12-14 - https://phabricator.wikimedia.org/T382865#10480712 (10dcaro) @JJMC89 I have cleanup some space on all the workers (also added alerts + automation to avoid it from getting there), can... [17:14:48] (03update) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [17:17:12] (03update) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [17:17:54] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10480739 (10Devnull) |**Wikitech account/LDAP:**| devnull | |**SUL account**| MarkRosenbaum | |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** | Y | |**I have visited [[ https://... [17:20:38] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10480754 (10fnegri) @RobH do you think that this can be done in the next one/two weeks? We need these servers to... [17:21:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:26:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:29:38] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10480813 (10RobH) >>! In T382412#10480754, @fnegri wrote: > @RobH do you think that this can be done in the next... [17:31:04] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10480823 (10Ladsgroup) Renamed the wikitech account to MarkRosenbaum and force attached it. You should be able to access it now. [17:38:12] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10480846 (10Devnull) Thanks! Works now. [17:42:16] (03open) 10dcaro: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 [17:43:57] (03update) 10dcaro: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 [17:46:06] (03update) 10dcaro: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 [18:01:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Workers are not rotating the /var/log/acctount/pacct log and it's growing - https://phabricator.wikimedia.org/T384250#10480913 (10dcaro) Added an alert and installed cron on all the toolforge nodes, that will get... [18:12:05] (03update) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [18:14:09] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T370245) [18:14:09] !log raymond-ndibe@cloudcumin1001 tools Updating container image docker-registry.tools.wmflabs.org/pause:3.9 (T370245) [18:14:15] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T370245) [18:14:20] (03approved) 10dcaro: [jobs-api] support http health check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/136 (https://phabricator.wikimedia.org/T362621) (owner: 10raymond-ndibe) [18:14:37] (03approved) 10dcaro: [jobs-cli] support http healthcheck for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T362621) (owner: 10raymond-ndibe) [18:38:11] (03merge) 10dcaro: cli: Improve deploy-token command UX and safety [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/6 (https://phabricator.wikimedia.org/T380706) (owner: 10sstefanova) [19:00:26] (03open) 10bd808: phabricator: fix JSONDecodeError logging [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/20 [19:02:45] (03merge) 10bd808: phabricator: fix JSONDecodeError logging [toolforge-repos/gitlab-account-approval] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/merge_requests/20 [19:04:25] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10481364 [19:20:43] 06cloud-services-team, 10Cloud-VPS: VM nova records attached to incorrect cloudcephmon IPs - https://phabricator.wikimedia.org/T383583#10481470 (10Andrew) All affected VMs are now corrected. This leaves the followup of understanding how to prevent this the next time we get new cloudcephmons. [19:51:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge, 10Observability-Alerting, and 2 others: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502#10481599 (10andrea.denisse) [20:11:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:11:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:16:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:17:48] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:46:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:07:20] 06cloud-services-team, 06Release-Engineering-Team: Kokkuri feature request: pipeline-configurable repo credentials - https://phabricator.wikimedia.org/T384396 (10Andrew) 03NEW [23:35:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:40:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks