[00:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:37:10] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10482456 (10bd808) >>! In T161859#10478521, @Ladsgroup wrote: > Sounds good to me. One thing we can also do: >> The script ignores accounts where the... [00:44:02] 06cloud-services-team, 06Release-Engineering-Team, 10GitLab (CI & Job Runners): Kokkuri feature request: pipeline-configurable repo credentials - https://phabricator.wikimedia.org/T384396#10482463 (10bd808) [01:48:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:53:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:42:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:47:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:52:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:12:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:22:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:47:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [04:57:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [07:21:57] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10482791 (10Wangombe) [07:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:00:11] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: Persist important toolforge k8s components logs - https://phabricator.wikimedia.org/T383081#10482845 (10Raymond_Ndibe) [08:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:08:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:15:01] (03approved) 10volans: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 (owner: 10dcaro) [10:15:07] (03update) 10volans: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 (owner: 10dcaro) [10:16:36] (03merge) 10dcaro: tox: allow having `~` in your `PATH` [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/26 [10:17:03] (03update) 10dcaro: packaging: copy scripts and minor fixes [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/11 [10:27:48] (03close) 10raymond-ndibe: [toolviews] add tools on-wiki edits to toolviews [toolforge-repos/toolviews] (major_refactor) - 10https://gitlab.wikimedia.org/toolforge-repos/toolviews/-/merge_requests/10 (https://phabricator.wikimedia.org/T317953) [10:27:59] (03close) 10raymond-ndibe: [toolviews] refactor in preparation for new features [toolforge-repos/toolviews] - 10https://gitlab.wikimedia.org/toolforge-repos/toolviews/-/merge_requests/11 (https://phabricator.wikimedia.org/T317953) [10:29:32] (03approved) 10volans: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) (owner: 10dcaro) [10:29:35] (03update) 10volans: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) (owner: 10dcaro) [10:33:48] (03update) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [10:33:54] (03update) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [10:36:30] (03merge) 10dcaro: kubernetes: add check for disk left on root partition [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/25 (https://phabricator.wikimedia.org/T384250) [10:42:49] 06cloud-services-team, 10Toolforge: add on-wiki edits of toolforge tools to toolstats report - https://phabricator.wikimedia.org/T317953#10483291 (10Raymond_Ndibe) [11:13:21] (03open) 10raymond-ndibe: [toolstats] add database to config [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/1 (https://phabricator.wikimedia.org/T317953) [11:13:22] (03approved) 10raymond-ndibe: [toolstats] add database to config [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/1 (https://phabricator.wikimedia.org/T317953) [11:13:50] (03merge) 10raymond-ndibe: [toolstats] add database to config [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/1 (https://phabricator.wikimedia.org/T317953) [11:15:17] FIRING: AlertLintProblem: Linting problems found for ToolforgeKubernetesWorkerDiskFull - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [11:20:18] FIRING: AlertLintProblem: Linting problems found for ToolforgeKubernetesWorkerDiskFull - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [11:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:51:16] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10483646 (10Neslihan_Turan_WMDE) Hi, I am unable to login to Wikitech. When I reset password I get the email, but the temporary password in the email doesn't work. My SUL account is as belov... [12:11:02] (03approved) 10raymond-ndibe: [toolstats] fix bug in schema.sql [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/2 (https://phabricator.wikimedia.org/T317953) [12:11:03] (03open) 10raymond-ndibe: [toolstats] fix bug in schema.sql [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/2 (https://phabricator.wikimedia.org/T317953) [12:11:35] (03merge) 10raymond-ndibe: [toolstats] fix bug in schema.sql [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/2 (https://phabricator.wikimedia.org/T317953) [12:17:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:22:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:24:50] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10483734 (10Ladsgroup) I force created your account, you should be able to use it now. [12:29:07] (03open) 10raymond-ndibe: fixed a bug [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/3 [12:29:10] (03approved) 10raymond-ndibe: fixed a bug [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/3 [12:29:12] (03merge) 10raymond-ndibe: fixed a bug [toolforge-repos/toolstats] - 10https://gitlab.wikimedia.org/toolforge-repos/toolstats/-/merge_requests/3 [12:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:36:14] 10Tools: PetScan returns "This web service cannot be reached" - https://phabricator.wikimedia.org/T384464#10483783 (10Aklapper) 05Open→03Invalid @M2k_dewiki: Please see T363073#9759346 [12:43:15] (03approved) 10dcaro: packaging: copy scripts and minor fixes [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/11 [12:43:22] (03update) 10dcaro: packaging: copy scripts and minor fixes [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/11 [12:43:23] (03merge) 10dcaro: packaging: copy scripts and minor fixes [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/11 [12:45:19] (03open) 10dcaro: d/changelog: bump to 0.0.2 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/12 [12:54:40] (03open) 10dcaro: worker_out_of_space: remove project from query [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/27 [12:54:53] (03approved) 10dcaro: worker_out_of_space: remove project from query [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/27 [12:54:57] (03merge) 10dcaro: worker_out_of_space: remove project from query [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/27 [13:22:47] RESOLVED: AlertLintProblem: Linting problems found for ToolforgeKubernetesWorkerDiskFull - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [13:23:43] 06cloud-services-team, 10Cloud-VPS, 10InternetArchiveBot: Block crawlers on cyberbot project (iabot.wmcloud.org) - https://phabricator.wikimedia.org/T383592#10483881 (10Andrew) @Cyberpower678 reviewing this ticket I'm not clear on if you concluded that you do or don't really have a crawler problem. Were you... [13:31:48] RESOLVED: AlertLintProblem: Linting problems found for ToolforgeKubernetesWorkerDiskFull - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [13:34:29] 06cloud-services-team, 10Tool-flickr2commons-ng, 10Toolforge: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468 (10AntiCompositeNumber) 03NEW [13:38:40] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10483930 (10AntiCompositeNumber) [13:55:41] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: 'backy2 cleanup' fails on cloudbackup1004 - https://phabricator.wikimedia.org/T381548#10484000 (10Andrew) update: I'm waiting for this issue to recur so I can consider fnegri's suggestions on the patch. [14:13:17] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10484056 (10Wangombe) [14:18:58] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10484106 (10Andrew) I think the last vm with a .wmflabs A record is gone. ` root@cloudcontrol1007:~# openstack recordset list --sudo-project-id noauth-project 114f13... [14:46:38] (03update) 10dcaro: emailer: run webserver in a different thread [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/9 (https://phabricator.wikimedia.org/T379924) (owner: 10aborrero) [14:48:37] (03update) 10dcaro: add prometheus stats [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/10 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) [14:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:51:36] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10484235 (10Ladsgroup) Thank you! When I get the list, I will force attach them. [14:52:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: [wmcs-cookbooks] Add owner property - https://phabricator.wikimedia.org/T384293#10484250 (10fnegri) 05In progress→03Resolved I forgot that I had already added `owner_team = WMCS` to cookbooks in the `wmcs-cookbooks` repo in [this patch](https://ge... [14:56:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:02:29] 06cloud-services-team, 06Release-Engineering-Team, 10GitLab (CI & Job Runners): Kokkuri feature request: pipeline-configurable repo credentials - https://phabricator.wikimedia.org/T384396#10484333 (10Andrew) p:05Triage→03Medium [15:04:47] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10484350 (10Andrew) p:05Triage→03High a:03Andrew [15:05:41] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [jobs-emailer] If the pod is in error status, try to get the status.message field in the email, otherwise just 'error' is not that useful - https://phabricator.wikimedia.org/T384252#10484359 (10joanna_borun) p:05Triage→03Medium [15:06:06] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [jobs-cli] If the pod exists and it has no logs, read the message status from it and output that - https://phabricator.wikimedia.org/T384251#10484374 (10joanna_borun) p:05Triage→03Medium [15:06:56] 06cloud-services-team, 10Toolforge: maintain-kubeusers should manage tool observer access - https://phabricator.wikimedia.org/T384126#10484391 (10joanna_borun) p:05Triage→03Low [15:07:12] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10484392 (10Wangombe) [15:07:39] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10484397 (10Wangombe) This project is now deployed on Translatewiki. Awiting exports before closing. [15:11:07] 06cloud-services-team, 10Data-Services (Quota-requests): User has exceeded the 'max_user_connections' (10) on Toolforge DB replicas - https://phabricator.wikimedia.org/T384119#10484417 (10joanna_borun) p:05Triage→03Medium [15:11:44] 06cloud-services-team, 10Cloud-VPS: [wmcs-cookbooks] Use OpenStack APIs instead of using the CLIs as novaadmin - https://phabricator.wikimedia.org/T383517#10484426 (10dcaro) p:05Triage→03Medium [15:12:51] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10484428 (10joanna_borun) p:05Triage→03High [15:14:42] 06cloud-services-team, 10Toolforge: Upgrade ingress-nginx to v1.12.0+ - https://phabricator.wikimedia.org/T383516#10484432 (10joanna_borun) p:05Triage→03Medium [15:14:51] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade ingress-nginx to v1.12.0+ - https://phabricator.wikimedia.org/T383516#10484433 (10dcaro) p:05Medium→03Triage [15:15:30] 06cloud-services-team, 10Toolforge: [jobs-api] Indicate when a job is too big to be scheduled - https://phabricator.wikimedia.org/T383515#10484435 (10joanna_borun) p:05Triage→03Medium [15:15:34] 06cloud-services-team, 10Toolforge: [jobs-api] Indicate when a job is too big to be scheduled - https://phabricator.wikimedia.org/T383515#10484436 (10dcaro) p:05Medium→03Triage [15:15:50] 06cloud-services-team, 10Toolforge: [jobs-api] Indicate when a job is too big to be scheduled - https://phabricator.wikimedia.org/T383515#10484438 (10dcaro) p:05Triage→03Medium [15:16:14] 06cloud-services-team, 10Toolforge, 03Wikimedia-Hackathon-2025: Introducing and exploring Toolforge UI with prospective users - https://phabricator.wikimedia.org/T383149#10484441 (10joanna_borun) p:05Triage→03Medium [15:16:59] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10484442 (10joanna_borun) p:05Triage→03High [15:18:18] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10484445 (10fnegri) Maybe related to {T380901} [15:19:04] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10484449 (10ajhalili2006) |**Wikitech account/LDAP:**| Ajhalili2006 (old Wikitech username was `AndreiJirohOnDevsCentral`, see https://phabricator.wikimedia.org/T340099 for context)| |**SUL a... [15:19:43] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10484451 (10Neslihan_Turan_WMDE) Thank you so much @Ladsgroup , it works now!:) [15:20:32] 06cloud-services-team, 10Toolforge: [builds-builder] Support using custom buildpacks - https://phabricator.wikimedia.org/T363033#10484455 (10dcaro) 05Open→03Declined I'm not sure about this one, this means that your buildpack would have to implement and keep up with the API https://buildpacks.io/docs/r... [15:20:55] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10484458 (10joanna_borun) p:05Triage→03Medium [15:21:16] 06cloud-services-team, 10Cloud-VPS, 07IPv6: dns: add PTR support for 2a02:ec80:a000:: - https://phabricator.wikimedia.org/T380746#10484459 (10joanna_borun) p:05Triage→03Medium [15:21:26] 06cloud-services-team, 10Toolforge, 07IPv6, 07Kubernetes: Support IPv6 in Toolforge Kubernetes - https://phabricator.wikimedia.org/T380060#10484460 (10joanna_borun) p:05Triage→03Medium [15:23:57] 06cloud-services-team, 10Toolforge: [jobs-cli,toolforge-weld] `toolforge jobs ...` should use named loggers and always show timestamps and logger names - https://phabricator.wikimedia.org/T359963#10484506 (10joanna_borun) p:05Triage→03Medium [15:26:05] 06cloud-services-team, 10Toolforge: [builds-builder] Support adding repositories for Apt buildpack - https://phabricator.wikimedia.org/T363027#10484512 (10dcaro) p:05Triage→03Low I'll set this as low for now, until we have more use cases, there's more than just adding the toolforge repos to using the clis... [15:26:40] 06cloud-services-team, 10Toolforge: [maintain-kubeusers] should manage tool observer access - https://phabricator.wikimedia.org/T384126#10484517 (10dcaro) [15:40:09] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-puppet-01.monitoring.eqiad.wmflabs is about to expire in 24d 0h 36m 27s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:41:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:41:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:46:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:47:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:59:33] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10484876 (10jsn.sherman) >>! In T382314#10441400, @fnegri wrote: > Which SMTP server are you us... [16:07:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10484951 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [16:08:46] 06cloud-services-team, 10Cloud-VPS: prometheus wmcloud alerts stopped sending emails - https://phabricator.wikimedia.org/T380901#10484959 (10taavi) I applied a fix (see `dfb7761757706ae3b59e9ed9a199bccfa4433b19` + https://sal.toolforge.org/log/uBzFjpQBffdvpiTrN8sy), could someone with a Wikimedia email confirm... [16:14:28] 06cloud-services-team, 10Cloud-VPS: prometheus wmcloud alerts stopped sending emails - https://phabricator.wikimedia.org/T380901#10484972 (10taavi) 05Open→03Resolved a:03taavi Checked that that works with a dummy email to cloud-admin-feed@. Closing. [16:15:22] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10484981 (10taavi) >>! In T382314#10484445, @fnegri wrote: > Maybe related to {T380901} Indeed... [16:20:13] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485032 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [16:20:43] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485037 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [16:49:23] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485259 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [16:49:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485264 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [17:10:43] 10Tools: PetScan returns "This web service cannot be reached" - https://phabricator.wikimedia.org/T384464#10485373 (10M2k_dewiki) Also see * https://github.com/magnusmanske/petscan_rs/issues/187 Thanks a lot! [17:23:27] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485445 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [17:23:28] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10485442 (10jsn.sherman) 05Open→03Resolved a:03jsn.sherman >>! In T382314#10484981, @... [17:23:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485448 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [17:36:29] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 [17:41:50] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 [17:53:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:54:48] 06cloud-services-team, 10Cloud-VPS, 10Library-Card-Platform, 06Moderator-Tools-Team: The Wikipedia Library emails aren't being received by @wikimedia.org email inboxes - https://phabricator.wikimedia.org/T382314#10485719 (10taavi) a:05jsn.sherman→03taavi [17:55:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485721 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [17:56:04] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485722 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [18:01:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485751 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [18:02:42] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485752 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [18:07:21] (03update) 10dcaro: add prometheus stats [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/10 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) [18:12:16] 06cloud-services-team, 10Toolforge: [builds-builder] Support adding repositories for Apt buildpack - https://phabricator.wikimedia.org/T363027#10485820 (10bd808) {T380108} is another concrete use case where adding a PPA would enable installing newer software versions than are in the currently available apt repos. [18:16:10] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485849 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [18:16:23] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [18:18:15] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [18:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:18:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [18:22:37] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485864 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [18:22:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485865 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [18:23:08] 06cloud-services-team, 10Toolforge: [builds-builder] Support adding repositories for Apt buildpack - https://phabricator.wikimedia.org/T363027#10485867 (10dcaro) >>! In T363027#10485820, @bd808 wrote: > {T380108} is another concrete use case where adding a PPA would enable installing newer software versions th... [18:26:14] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [18:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:26:18] 06cloud-services-team, 10Toolforge: [builds-builder] Support using custom buildpacks - https://phabricator.wikimedia.org/T363033#10485884 (10bd808) >>! In T363033#10484455, @dcaro wrote: > I'm not sure about this one, this means that your buildpack would have to implement and keep up with the API https://b... [18:26:41] (03approved) 10dcaro: d/changelog: bump to 0.0.2 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/12 [18:26:44] (03update) 10dcaro: d/changelog: bump to 0.0.2 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/12 [18:26:44] (03merge) 10dcaro: d/changelog: bump to 0.0.2 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/12 [18:28:34] (03approved) 10dcaro: functional tests: add components-api tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/631 (https://phabricator.wikimedia.org/T379092) (owner: 10sstefanova) [18:28:42] (03merge) 10dcaro: functional tests: add components-api tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/631 (https://phabricator.wikimedia.org/T379092) (owner: 10sstefanova) [18:30:14] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [components-api] Add functional tests for the components api - https://phabricator.wikimedia.org/T379092#10485904 (10dcaro) 05In progress→03Resolved [18:35:22] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10485925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [18:37:05] 06cloud-services-team, 10Toolforge: [builds-builder] Support adding repositories for Apt buildpack - https://phabricator.wikimedia.org/T363027#10485928 (10bd808) >>! In T363027#10485867, @dcaro wrote: >>>! In T363027#10485820, @bd808 wrote: >> {T380108} is another concrete use case where adding a PPA would ena... [18:38:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [18:38:38] (03open) 10dcaro: bump components cli [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/13 [18:38:55] (03approved) 10dcaro: bump components cli [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/13 [18:39:37] (03merge) 10dcaro: bump components cli [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/13 [18:50:16] 06cloud-services-team, 10Toolforge: [builds-builder] Support using custom buildpacks - https://phabricator.wikimedia.org/T363033#10485986 (10dcaro) >>! In T363033#10485884, @bd808 wrote: >>>! In T363033#10484455, @dcaro wrote: >> I'm not sure about this one, this means that your buildpack would have to imp... [18:50:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:57:26] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486011 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cl... [18:59:26] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486026 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cl... [19:04:06] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486038 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [19:08:43] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486077 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cl... [19:11:34] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: partman vs cloudcephosd1012 - https://phabricator.wikimedia.org/T383817#10486082 (10Andrew) With the above patch, the partman script is now recognizing the OS drives correctly (I think). Howe... [19:13:56] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: partman vs cloudcephosd1012 - https://phabricator.wikimedia.org/T383817#10486089 (10Andrew) Either there's a second pass that's garbling things or there's some serious quoting issue with the c... [19:17:37] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486105 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cl... [19:18:55] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: partman vs cloudcephosd1012 - https://phabricator.wikimedia.org/T383817#10486111 (10Andrew) I've confirmed that in the debian shell console the script generates the same incoherent /tmp/dynami... [19:20:57] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: dns: add PTR support for 2a02:ec80:a000:: - https://phabricator.wikimedia.org/T380746#10486116 (10cmooney) a:05cmooney→03None I think most of the work here is already done. 2a02:ec80:a000::/48 comes out of our RIPE allocation of 2a02:ec80... [19:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:37:11] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10486165 (10Sarai-WMF) [19:37:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [19:38:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486189 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [19:44:20] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486200 (10VRiley-WMF) a:03VRiley-WMF [19:52:13] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: partman vs cloudcephosd1012 - https://phabricator.wikimedia.org/T383817#10486231 (10Andrew) Now it looks right to me ` d-i partman-auto/disk string /dev/sda /dev/sdb d-i grub-installer/boo... [19:55:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486252 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [19:55:24] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486253 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [19:56:43] 10wikitech.wikimedia.org: Temporarily suppress SUL migration banner from Help:Toolforge pages on Wikitech - https://phabricator.wikimedia.org/T384534 (10Sarai-WMF) 03NEW [20:15:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486336 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [20:15:52] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486337 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [20:18:18] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10486340 (10Ladsgroup) I renamed your wikitech account and force attached it. you should be able to use wikitech now. [21:12:22] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486518 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [21:13:00] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10486522 (10VRiley-WMF) Ran through decomission on both servers and moved them to the corrosponding locations cl... [21:26:28] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10486568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [21:34:59] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: partman vs cloudcephosd1012 - https://phabricator.wikimedia.org/T383817#10486594 (10Andrew) Still fails. ` Error while setting up RAID ` <3 to spend another day with partman [21:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:50:59] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10486660 (10bd808) >>! In T161859#10484235, @Ladsgroup wrote: > Thank you! When I get the list, I will force attach them. See `/data/project/wikitech... [22:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:02:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:07:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:26:30] 10Tool-techcontribs: I break the tool :( - https://phabricator.wikimedia.org/T384554 (10Reedy) 03NEW [23:28:47] 10Tool-techcontribs: I break the tool :( - https://phabricator.wikimedia.org/T384554#10487018 (10Reedy) p:05Triage→03Low [23:57:46] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10487035 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [23:58:24] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10487036 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi...