[00:25:28] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [00:25:39] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [00:30:28] RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [00:30:39] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [02:33:08] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 21), 07Epic, 05Goal: [infra] Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#10933434 (10Ykhwong) `tedbot` is now on Debian Bookworm. [08:48:43] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 21), 07Epic, 05Goal: [infra] Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#10933795 (10taavi) [08:49:18] 06cloud-services-team, 10Toolforge: Lock down tools-sgebastion-10 (login-buster.toolforge.org) to only members of tools with known dependencies on it - https://phabricator.wikimedia.org/T397459#10933798 (10taavi) [08:58:17] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502 (10taavi) 03NEW [11:07:49] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10934223 (10dcaro) Awesome, thanks a lot! [11:16:31] (03update) 10dcaro: scheduled: add scheduled component support [repos/cloud/toolforge/components-api] (add_all_continuous_options) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T395071) [11:33:17] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge] Investigate authentication - https://phabricator.wikimedia.org/T363983#10934261 (10dcaro) Some notes on CAS, and client authentication. By default CAS does not enable `proxy` tokens (https://apereo.github.io/cas/7.2.x/services/Configuring-S... [12:52:57] (03approved) 10fnegri: components-api: bump to 0.0.120-20250619182909-09ea62ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/820 (https://phabricator.wikimedia.org/T394990) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:53:06] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10934414 (10cmooney) I am going to remove the netops tags from this task. Unfortunately there is not a whole lot we can really do here. The fact that you get good transfer speeds to... [12:55:48] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517 (10Andrew) 03NEW [13:04:42] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10934456 (10Andrew) I restored the contents of app-www to a new volume named app-www-1. Hopefully that will move us away from whatever curse is associated with app-www-... [13:09:43] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10934462 (10Andrew) p:05Triage→03Medium The restore is working so this issue is still a concerning mystery but it's not causing an outage anymore. [13:18:57] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10934468 (10Andrew) I'm still wrestling with gitlab-prod-backup but in the meantime I have duplica... [13:21:19] 06cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T397390#10934471 (10Andrew) >>! In T397390#10930714, @dcaro wrote: > @Andrew This seems fixed now, though it happened during your working hours I think and I see maybe it's related to https://gerrit.wikimedia.org/r/c/operations/... [13:21:41] 06cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T397390#10934472 (10Andrew) 05Open→03Resolved a:03Andrew [14:11:07] (03update) 10dcaro: scheduled: add scheduled component support [repos/cloud/toolforge/components-api] (add_all_continuous_options) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T395071) [14:21:43] 06cloud-services-team, 10Cloud-VPS, 06SRE: [cloudsw] enable 25G network - https://phabricator.wikimedia.org/T393676#10934635 (10cmooney) 05Open→03Resolved a:03cmooney Closing this one @dcaro I believe all understand the current situation, and we can connect at 25G where it is possible. Please re-o... [15:04:29] (03open) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [15:15:53] (03PS1) 10Giuseppe Lavagetto: HIDDENPARMA: Add root stub api token [labs/private] - 10https://gerrit.wikimedia.org/r/1162016 [15:16:11] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] HIDDENPARMA: Add root stub api token [labs/private] - 10https://gerrit.wikimedia.org/r/1162016 (owner: 10Giuseppe Lavagetto) [15:41:07] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502#10934915 (10bd808) The utility of a public web search returning cached SQL query results from Quarry seems pretty low. I suppose an argument could be made for public web search making finding an existing query to fork easier, but the... [15:42:20] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [15:53:32] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge] Investigate authentication - https://phabricator.wikimedia.org/T363983#10934943 (10fnegri) This looks great! I also checked the POC code and it looks good. Do you plan on testing the OIDC protocol as well, in addition to the CAS protocol? O... [16:19:00] 06cloud-services-team, 10Data-Services: [maintain-views] --table acts as a wildcard - https://phabricator.wikimedia.org/T397533 (10fnegri) 03NEW [16:19:07] 06cloud-services-team, 10Data-Services: [maintain-views] --table acts as a wildcard - https://phabricator.wikimedia.org/T397533#10935057 (10fnegri) p:05Triage→03Low [17:19:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph}' [17:24:20] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [17:28:07] PROBLEM - Host cloudvirt1040 is DOWN: PING CRITICAL - Packet loss = 100% [17:30:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: [trove] Disk full for DBapp instance in glamwikidashboard project - https://phabricator.wikimedia.org/T396724#10935221 (10fnegri) @YochayCO I did keep an eye on the disk usage in the past few days. `wal_archive` kept on growing quite fast, but as I expect... [17:30:37] RECOVERY - Host cloudvirt1040 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [17:34:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1040 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [17:42:33] PROBLEM - Host cloudvirt1041 is DOWN: PING CRITICAL - Packet loss = 100% [17:45:29] RECOVERY - Host cloudvirt1041 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [17:49:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1040 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [17:54:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1040 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [18:12:07] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [18:31:17] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [18:34:35] PROBLEM - Host cloudvirt1042 is DOWN: PING CRITICAL - Packet loss = 100% [18:37:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [18:37:59] RECOVERY - Host cloudvirt1042 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [18:51:30] (03PS1) 10Andrew Bogott: Added stand-in passwords for nova service user [labs/private] - 10https://gerrit.wikimedia.org/r/1162060 (https://phabricator.wikimedia.org/T330759) [18:56:59] PROBLEM - Host cloudvirt1043 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:00:07] RECOVERY - Host cloudvirt1043 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [19:01:19] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:06:19] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1042 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:08:05] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Added stand-in passwords for nova service user [labs/private] - 10https://gerrit.wikimedia.org/r/1162060 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [19:24:05] PROBLEM - nova-compute proc minimum on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:25:05] RECOVERY - nova-compute proc minimum on cloudvirt1073 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:26:05] PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:26:31] PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:26:57] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [19:27:05] RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:27:29] PROBLEM - nova-compute proc minimum on cloudvirtlocal1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:27:31] RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:28:09] PROBLEM - nova-compute proc minimum on cloudvirt1064 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:28:29] RECOVERY - nova-compute proc minimum on cloudvirtlocal1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:28:31] PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:28:57] PROBLEM - nova-compute proc minimum on cloudvirt1062 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:29:09] RECOVERY - nova-compute proc minimum on cloudvirt1064 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:29:31] RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:29:57] RECOVERY - nova-compute proc minimum on cloudvirt1062 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:30:51] PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:31:05] PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:31:27] PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:31:51] RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:32:05] RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:32:27] RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:35:27] PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:36:09] PROBLEM - nova-compute proc minimum on cloudvirt1064 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:36:27] RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:37:09] RECOVERY - nova-compute proc minimum on cloudvirt1064 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:40:09] PROBLEM - nova-compute proc minimum on cloudvirt1064 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:41:09] RECOVERY - nova-compute proc minimum on cloudvirt1064 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:41:51] PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:42:51] RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:44:09] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [20:02:39] PROBLEM - Host cloudvirt1044 is DOWN: PING CRITICAL - Packet loss = 100% [20:05:07] RECOVERY - Host cloudvirt1044 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [20:05:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1044 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:22:11] PROBLEM - Host cloudvirt1045 is DOWN: PING CRITICAL - Packet loss = 100% [20:24:41] RECOVERY - Host cloudvirt1045 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [20:25:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance cvn-app10 in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:25:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1045 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:40:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1044 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:51:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,nova,cinder,neutron [20:52:52] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [20:58:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,nova,cinder,neutron [21:02:56] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:08:50] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [21:12:57] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:20:32] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [22:01:47] 10Tool-paulina: Move translations to translatewiki - https://phabricator.wikimedia.org/T397553 (10Pepe_piton) 03NEW [22:01:50] andrew@cloudcumin1001 safe_reboot (PID 3795026) is awaiting input [22:02:25] 10Tool-paulina: Move translations to translatewiki - https://phabricator.wikimedia.org/T397553#10935815 (10Pepe_piton) p:05Triage→03Medium a:03Pepe_piton [22:05:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-58 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:10:04] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [22:16:27] 10Tool-paulina: Show SPARQL queries - https://phabricator.wikimedia.org/T397554 (10Pepe_piton) 03NEW [22:16:43] 10Tool-paulina: Show SPARQL queries - https://phabricator.wikimedia.org/T397554#10935834 (10Pepe_piton) p:05Triage→03Medium a:03Pepe_piton [22:22:00] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10935837 (10bd808) >>! In T355663#10931634, @SLyngshede-WMF wrote: > I still think that we should stop allocating uidNumbers to users who don't... [22:56:04] (03PS1) 10Krinkle: write_config: Index operations/docker-images/docker-pkg repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1162107 [22:56:48] (03CR) 10Krinkle: [C:03+2] write_config: Index operations/docker-images/docker-pkg repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1162107 (owner: 10Krinkle) [22:57:41] (03Merged) 10jenkins-bot: write_config: Index operations/docker-images/docker-pkg repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1162107 (owner: 10Krinkle) [23:07:10] (03PS7) 10Krinkle: write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:07:42] (03CR) 10Krinkle: [C:03+2] write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:07:58] (03CR) 10CI reject: [V:04-1] write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:08:38] (03PS8) 10Krinkle: write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:08:41] (03CR) 10Krinkle: [C:03+2] write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:09:34] (03Merged) 10jenkins-bot: write_config: Add various missing operations/software/* repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773779 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [23:46:56] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10935885 (10stwalkerster) From the project point of view, we've transitioned off the `accounts-appserver6` instance entirely now to a newly-provisioned `accounts-appser...