[00:01:02] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [00:21:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-74 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [00:41:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [00:46:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [01:06:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:06:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:21:30] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:26:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:21:30] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:42:30] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10249076 (10Physikerwelt) @bking thank you. That sound all right https://wiki.bitplan.com/index.php/Wikidata_Import_2024-10-17 su... [06:03:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:13:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:51:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-grafana-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:57:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:05:57] 10VPS-project-Wikistats: Merge Gamepedia table with Wikia table (and perhaps rename Wikia table to Fandom as well?) - https://phabricator.wikimedia.org/T377549#10249105 (10RhinosF1) That we can do automatically. The stats pull can be told as a one off to follow redirects. [07:06:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-controller-2 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:07:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:10:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance syslog-server-audit01 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:11:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-alertmanager-2 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:16:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-alertmanager-2 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:20:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:21:28] FIRING: [9x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-alertmanager-2 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:25:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:27:28] FIRING: [5x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:35:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:39:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:40:28] RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:41:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance tools-cumin-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:43:09] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-harbor-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:44:29] FIRING: PuppetAgentNoResources: No Puppet resources found on instance paws-nfs-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:46:28] FIRING: [21x] PuppetAgentNoResources: No Puppet resources found on instance tools-cumin-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:47:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:48:09] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-docker-imagebuilder-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:49:29] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance paws-nfs-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:50:58] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:51:28] FIRING: [34x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:53:09] FIRING: [13x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-cumin-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:55:58] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:56:13] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:56:28] FIRING: [46x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:56:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cvn-nfs-1 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [07:58:09] FIRING: [19x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-cumin-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:01:28] FIRING: [69x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:03:09] FIRING: [30x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:05:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:06:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:06:28] FIRING: [89x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:06:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance cvn-app10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:08:09] FIRING: [34x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:09:29] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:10:58] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:11:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-alertmanager-2 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:11:28] FIRING: [104x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:11:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:13:09] FIRING: [35x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:18:30] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [08:19:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:27:39] 06cloud-services-team, 10Toolforge: Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`) - https://phabricator.wikimedia.org/T377781#10249179 (10Raymond_Ndibe) [08:31:58] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [08:46:35] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [08:46:42] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803 (10aborrero) 03NEW [08:47:51] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803#10249241 (10aborrero) 05Open→03In progress p:05Triage→03Unbreak! [09:01:01] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [09:01:14] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [09:03:26] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803#10249291 (10aborrero) the puppet-enc API seems to be up and running: `lang=shell-session aborrero@cloudinfra-cloudvps-puppetserver-1:~$ curl https://puppet-enc.cloudinfra.wmc... [09:04:29] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:05:00] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803#10249293 (10aborrero) The puppetserver service seems to have some errors: `lang=shell-session aborrero@cloudinfra-cloudvps-puppetserver-1:~$ sudo journalctl -u puppetserver -... [09:05:58] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:08:12] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803#10249297 (10aborrero) p:05Unbreak!→03High restarting the `puppetserver.service` unit in the corresponding puppetserver VM seems to fix the problem. Lowering priority. [09:10:58] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:11:12] 06cloud-services-team: Cloud VPS: cloud-wide puppet problem related to puppet-enc 2024-10-22 - https://phabricator.wikimedia.org/T377803#10249303 (10aborrero) There was an unnatended java upgrade today: `lang=shell-session aborrero@cloudinfra-cloudvps-puppetserver-1:~$ sudo tail /var/log/apt/history.log [...]... [09:11:18] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [09:11:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-74 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:11:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:16:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:20:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:21:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:21:30] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:22:51] 06cloud-services-team: Cloud VPS: 2024-10-22 cloud-wide puppet problem related to java update - https://phabricator.wikimedia.org/T377803#10249328 (10aborrero) [09:25:58] RESOLVED: [2x] PuppetAgentNoResources: No Puppet resources found on instance etcd-discovery-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:26:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:26:28] RESOLVED: [4x] PuppetAgentNoResources: No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:29:25] 06cloud-services-team: Cloud VPS: 2024-10-22 cloud-wide puppet problem related to java update - https://phabricator.wikimedia.org/T377803#10249331 (10aborrero) chat on IRC `#wikimedia-sre` channel: `lang=irc 11:25 arturo: o/ yes we are aware of this issue, when openjdk is installed we need to immediat... [09:29:29] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:32:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:36:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [09:42:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:43:09] FIRING: [35x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:48:09] FIRING: [35x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:52:28] RESOLVED: [5x] PuppetAgentNoResources: No Puppet resources found on instance maps-proxy-03 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:53:09] RESOLVED: [18x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:04:29] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance paws-nfs-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:16:58] RESOLVED: [3x] PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-alertmanager-3 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:01:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-23 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [11:08:08] (03update) 10raymond-ndibe: Draft: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [11:09:58] (03update) 10raymond-ndibe: Draft: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [11:17:44] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10249701 (10aborrero) As the days pass, and I keep reflecting on this ticket, I think I clearly see option 2 as the... [11:39:44] !log akosiaris@cloudcumin1001 maps START - Cookbook wmcs.vps.remove_user_from_project for user 'faidon' [11:39:53] !log akosiaris@cloudcumin1001 maps END (PASS) - Cookbook wmcs.vps.remove_user_from_project (exit_code=0) for user 'faidon' [11:40:53] !log akosiaris@cloudcumin1001 visualeditor START - Cookbook wmcs.vps.remove_user_from_project for user 'faidon' [11:41:02] !log akosiaris@cloudcumin1001 visualeditor END (PASS) - Cookbook wmcs.vps.remove_user_from_project (exit_code=0) for user 'faidon' [11:41:55] !log akosiaris@cloudcumin1001 swift START - Cookbook wmcs.vps.remove_user_from_project for user 'faidon' [11:42:03] !log akosiaris@cloudcumin1001 swift END (PASS) - Cookbook wmcs.vps.remove_user_from_project (exit_code=0) for user 'faidon' [11:42:09] !log akosiaris@cloudcumin1001 testlabs START - Cookbook wmcs.vps.remove_user_from_project for user 'faidon' [11:42:17] !log akosiaris@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.vps.remove_user_from_project (exit_code=0) for user 'faidon' [11:47:14] (03update) 10raymond-ndibe: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [11:47:25] (03update) 10raymond-ndibe: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [11:56:36] (03update) 10raymond-ndibe: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [12:04:18] (03update) 10raymond-ndibe: [maintain-harbor] Move to become a toolforge component [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/34 (https://phabricator.wikimedia.org/T358225) [12:17:11] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [12:39:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:52:58] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 [12:58:31] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 [13:00:29] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 [13:04:30] 06cloud-services-team, 13Patch-For-Review: Cloud VPS: 2024-10-22 cloud-wide puppet problem related to java update - https://phabricator.wikimedia.org/T377803#10250093 (10aborrero) [13:05:58] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 [13:13:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:21:30] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:34:52] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10250203 (10bking) > Do you know if the latest wibase dumps are available via nfs? Yes, everything on `dumps.wikimedia.org` is a... [13:41:23] (03approved) 10sstefanova: Migrate to pathlib [repos/cloud/toolforge/disable-tool] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/21 (owner: 10taavi) [13:44:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-33 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:57:29] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-6 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:02:29] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-6 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:12:31] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10250388 (10aborrero) I think we need to clearly specify again the required disk and RAM quotas. [14:17:29] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:27:29] FIRING: [5x] PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-4 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:33:51] (03open) 10sstefanova: start-devenv.sh: don't hardcode the VM name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/203 [14:38:51] (03approved) 10aborrero: start-devenv.sh: don't hardcode the VM name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/203 (owner: 10sstefanova) [14:39:33] (03merge) 10sstefanova: start-devenv.sh: don't hardcode the VM name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/203 [14:39:36] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10250552 (10bking) @aborrero sorry for the confusion. I believe we are talking about a single server, as opposed to a project-wid... [14:39:50] (03update) 10sstefanova: start-devenv.sh: don't hardcode the VM name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/203 [14:43:22] 06cloud-services-team, 10Toolforge: [harbor] Do not clean up images currently running in production - https://phabricator.wikimedia.org/T377854 (10fnegri) 03NEW [14:43:44] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10250598 (10aborrero) >>! In T377655#10250552, @bking wrote: > @aborrero sorry for the confusion. I believe we are talking about... [14:43:45] (03open) 10raymond-ndibe: components-api: bump to 0.0.42-20241015121530-8b9350de [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/564 [14:44:27] (03merge) 10taavi: Migrate to pathlib [repos/cloud/toolforge/disable-tool] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/21 [14:45:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): [harbor] Do not clean up images currently running in production - https://phabricator.wikimedia.org/T377854#10250603 (10aborrero) [14:51:35] (03update) 10sstefanova: [lima-kilo] minor project refactor [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/198 (owner: 10raymond-ndibe) [14:54:16] (03update) 10raymond-ndibe: components-api: bump to 0.0.42-20241015121530-8b9350de [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/564 [14:54:35] (03update) 10raymond-ndibe: components-api: bump to 0.0.42-20241015121530-8b9350de for local and toolsbeta only [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/564 [14:54:58] (03approved) 10fnegri: components-api: bump to 0.0.42-20241015121530-8b9350de for local and toolsbeta only [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/564 (owner: 10raymond-ndibe) [14:59:17] 10wikitech.wikimedia.org: Requesting content administrator access for Ameisenigel - https://phabricator.wikimedia.org/T339841#10250703 (10bd808) [[https://wikitech.wikimedia.org/w/index.php?title=Special:Log&logid=974886|Rights revoked]] per [[https://wikitech.wikimedia.org/w/index.php?oldid=2237777#Remove_c... [15:03:27] 06cloud-services-team, 10wikitech.wikimedia.org: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220#10250733 (10taavi) [15:06:07] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06Tech-Docs-Team, 07Documentation: WMCS: Document different types of root and admin privileges - https://phabricator.wikimedia.org/T375113#10250739 (10TBurmeister) Suggestions! (mostly small things to help with findability and usability): * Move the s... [15:07:22] (03update) 10sstefanova: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (owner: 10raymond-ndibe) [15:11:53] 06cloud-services-team, 10wikitech.wikimedia.org: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220#10250768 (10taavi) [15:12:54] 06cloud-services-team, 10wikitech.wikimedia.org: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220#10250775 (10taavi) Done. The new password for the bot is in `cloudvps-log-bot` in pwstore. [15:13:45] 06cloud-services-team, 10wikitech.wikimedia.org: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220#10250769 (10taavi) 05Open→03Resolved a:03taavi [15:30:41] (03update) 10sstefanova: [lima-kilo] minor project refactor [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/198 (owner: 10raymond-ndibe) [15:31:28] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10250959 (10aborrero) [15:34:22] (03update) 10sstefanova: [lima-kilo] minor project refactor [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/198 (owner: 10raymond-ndibe) [15:37:01] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10251003 (10taavi) (Automatically) renumbering VMs is already scary. Giving them v6 addresses is even scarier. Two... [15:39:30] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10251037 (10bd808) How did we handle renumbering when we did the nova network to neutron migration? I have a vague m... [15:40:39] (03update) 10sstefanova: [lima-kilo] cache disk for caching container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/201 (owner: 10raymond-ndibe) [15:41:22] 06cloud-services-team, 10VPS-project-Codesearch, 06Security-Team, 13Patch-For-Review, and 3 others: XSS - codesearch.wmcloud.org - https://phabricator.wikimedia.org/T377168#10251041 (10sbassett) Hall of fame update deployed: [[ https://sal.toolforge.org/log/ItThtJIBKFqumxvthXL0 | codfw ]], [[ https://s... [15:53:12] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10251091 (10aborrero) >>! In T377467#10251003, @taavi wrote: > (Automatically) renumbering VMs is already scary. Giv... [15:57:58] RESOLVED: [4x] PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-4 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:00:44] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10251120 (10aborrero) >>! In T377467#10251037, @bd808 wrote: > How did we handle renumbering when we did the nova ne... [16:12:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:27:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:32:42] (03merge) 10raymond-ndibe: components-api: bump to 0.0.42-20241015121530-8b9350de for local and toolsbeta only [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/564 [16:38:13] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [16:59:54] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879 (10fnegri) 03NEW [17:01:48] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10251408 (10bd808) >>! In T377467#10251120, @aborrero wrote: >>>! In T377467#10251037, @bd808 wrote: >> How did we h... [17:12:03] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251455 (10fnegri) @kostajh do you have any suggestions/concerns about this? [17:12:11] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10251458 (10fnegri) 05In progress→03Stalled [17:12:17] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06Tech-Docs-Team, 07Documentation: WMCS: Document different types of root and admin privileges - https://phabricator.wikimedia.org/T375113#10251459 (10fnegri) 05Open→03In progress [17:13:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [17:14:39] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251478 (10kostajh) @fnegri is there a list of views/filters with a description of their purpose, that we could review? [17:15:26] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251484 (10kostajh) Also noting that temporary accounts are already live on testwiki and test2wiki (since July), so... [17:15:47] 06cloud-services-team, 10Cloud-VPS, 07Documentation: Clean up Cloud VPS doc content and sequence for account / project / instance setup and access - https://phabricator.wikimedia.org/T347637#10251461 (10fnegri) [17:17:01] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251490 (10kostajh) My suspicion, though, is that there's not going to be much to update. Temporary accounts will lo... [17:17:04] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10251491 (10taavi) Ping. [17:19:34] 06cloud-services-team, 10Data-Services: labstore: Re-evaluate traffic shaping settings - https://phabricator.wikimedia.org/T218338#10251501 (10taavi) 05Open→03Invalid Marking as invalid since this hardware is long gone. [17:19:47] 06cloud-services-team, 10Data-Services, 10Toolforge: Move replica_cnf_api out of the Puppet repo - https://phabricator.wikimedia.org/T340754#10251505 (10taavi) [17:21:30] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:21:41] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251512 (10fnegri) @kostajh thanks! As far as I know, there is no list with descriptions, but there are raw yaml lis... [17:21:47] 06cloud-services-team, 10Data-Services, 10Datasets-Archiving, 10Datasets-General-or-Unknown: Adjust bandwidth/connection limits, memory settings on clouddumps as appropriate - https://phabricator.wikimedia.org/T191491#10251514 (10taavi) [17:21:58] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251520 (10fnegri) Also, is there any non-sensitive new table/column that was added by Temporary accounts and might... [17:22:15] 06cloud-services-team, 10Data-Services: Find a better way to notify tool maintainers of schema and API changes - https://phabricator.wikimedia.org/T199234#10251510 (10taavi) [17:22:21] 06cloud-services-team, 10Data-Services: Add script_path to meta_p.wiki database - https://phabricator.wikimedia.org/T93483#10251518 (10taavi) [17:26:07] 06cloud-services-team, 10Data-Services, 06Data-Engineering-Icebox: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251522 (10taavi) [17:26:14] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251531 (10fnegri) > there are raw yaml lists: Apologies, one link was wrong, I fixed it now. [17:27:08] 06cloud-services-team, 10Data-Services, 06Data-Engineering-Icebox: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251525 (10taavi) Sorry to poke an many years old ticket.. but what still needs to happen here?... [17:27:20] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 10Temporary accounts: Verify if Temporary Accounts require any changes to Wiki Replicas - https://phabricator.wikimedia.org/T377879#10251546 (10fnegri) 05Open→03In progress [17:27:42] 06cloud-services-team, 10Data-Services, 10Projects-Cleanup: Archive the operations/debs/bdsync repository - https://phabricator.wikimedia.org/T377882 (10taavi) 03NEW [17:29:06] 06cloud-services-team, 10Data-Services, 05Cloud-Services-Origin-User, 07Cloud-Services-Worktype-Unplanned: [cloudvps] Find and cleanup any mounts to labstore1006/1007 - https://phabricator.wikimedia.org/T320425#10251526 (10taavi) 05Open→03Resolved Let's boldly say this is done or otherwise no longe... [17:30:59] 06cloud-services-team, 10Cloud-VPS, 10Data-Services: toolforge and misc NFS share backups log errors when reading old snapshots - https://phabricator.wikimedia.org/T188500#10251540 (10taavi) 05Open→03Invalid Assuming this is no longer a problem since we no longer use bdsync. [17:33:18] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud VPS: 2024-10-22 cloud-wide puppet problem related to java update - https://phabricator.wikimedia.org/T377803#10251585 (10taavi) [17:34:26] 06cloud-services-team, 10Cloud-VPS: Complete upgrading WMCS bare metal hosts from Bullseye to Bookworm - https://phabricator.wikimedia.org/T375217#10251590 (10taavi) [17:35:46] 10cloud-services-team (Hardware), 10Cloud-VPS: wmcs codfw hardware changes proposal - https://phabricator.wikimedia.org/T377568#10251596 (10taavi) [17:35:59] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [cookbooks.ceph] create a script to get the list of rbd images affected by stuck/inactive PGs - https://phabricator.wikimedia.org/T331636#10251597 (10taavi) [17:36:09] 10cloud-services-team (Hardware), 10Cloud-VPS, 05Goal: eqiad1: procure 1 additional cloudlb server - https://phabricator.wikimedia.org/T341062#10251600 (10taavi) [17:36:15] 10cloud-services-team (Hardware), 10Cloud-VPS: cloudcontrol2006-dev struggling with memory - https://phabricator.wikimedia.org/T370401#10251602 (10taavi) [17:36:25] 10cloud-services-team (Hardware), 10Cloud-VPS, 13Patch-For-Review: replace cloudlb2001-dev with cloudlb2004-dev - https://phabricator.wikimedia.org/T377126#10251605 (10taavi) [17:38:12] 06cloud-services-team, 10Data-Services, 06Data-Engineering-Icebox: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251589 (10Ottomata) @taavi many tickets were declined for complexity reasons, but we have new w... [19:13:33] (03open) 10sstefanova: lima-vm: fix hostname [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/204 [19:15:03] (03approved) 10sstefanova: lima-vm: fix hostname [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/204 [19:17:09] (03merge) 10sstefanova: lima-vm: fix hostname [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/204 [19:18:39] (03update) 10sstefanova: [lima-kilo] minor project refactor [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/198 (owner: 10raymond-ndibe) [20:49:37] 06Toolforge-standards-committee: Reset members and owners for toolforge-standards-committee@lists.wikimedia.org - https://phabricator.wikimedia.org/T375134#10252120 (10bd808) [21:13:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [21:21:30] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:47:36] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate mwv-builder-03.mediawiki-vagrant.eqiad.wmflabs is about to expire in 2d 23h 58m 34s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire