[00:15:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [00:18:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [00:19:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [00:20:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [00:27:30] 10Wikibugs: Wikibugs' gitlab connector stops working without a strong sign of why - https://phabricator.wikimedia.org/T364490#9792795 (10bd808) It looks like the gitlab-webhooks server is seeing connections closed at least sometimes at the same time that the wikibugs client sees a premature end of response: `lan... [00:32:03] 10Wikibugs: Wikibugs' gitlab connector stops working without a strong sign of why - https://phabricator.wikimedia.org/T364490#9792803 (10bd808) `lang=shell-session $ kubectl logs gitlab-webhooks-5b4d9cddd6-bb2z8 | grep "Disconnect from" 2024-05-13T20:03:18Z glwebhooks.sinks.sse INFO: Disconnect from 877880594621... [00:42:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:52:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:55:12] 10Wikibugs: Wikibugs' gitlab connector stops working without a strong sign of why - https://phabricator.wikimedia.org/T364490#9792837 (10bd808) I wonder what would happen if instead of connecting via the Toolforge proxy and ingress nginx wikibugs reached across namespaces to connect directly to the gitlab-webhoo... [00:57:41] RESOLVED: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:23:45] 10Tools: Tool:Panoviewer - Grid Engine web service cannot be reached. - https://phabricator.wikimedia.org/T354949#9792952 (10tstarling) >>! In T354949#9790843, @Ligliotoi wrote: > Hallo, > > https://panoviewer.toolforge.org/ does not work. Error 403 "Forbidden". Can somebody fixed it maybe? index.html was... [04:44:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:54:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:00:56] FIRING: SystemdUnitDown: The service unit rsync_nginxlogs.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:05:56] FIRING: [2x] SystemdUnitDown: The service unit rsync_nginxlogs.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:14:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:50:24] 10Tools, 06translatewiki.net, 10Language-Team (Language-2024-April-June), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Make Wikidata Image Positions tool translatable on translatewiki.net - https://phabricator.wikimedia.org/T363626#9793233 (10abi_) a:03siebrand [06:53:01] 10Tools, 06translatewiki.net, 10Language-Team (Language-2024-April-June), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Make Wikidata Image Positions tool translatable on translatewiki.net - https://phabricator.wikimedia.org/T363626#9793242 (10abi_) I see exports are happening as expe... [06:55:56] FIRING: SystemdUnitDown: The systemd unit rsync_nginxlogs.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:56:09] 06cloud-services-team: SystemdUnitDown Unit rsync_nginxlogs.service on node clouddumps1001 has been down for long. - https://phabricator.wikimedia.org/T364819 (10phaultfinder) 03NEW [07:00:56] FIRING: [2x] SystemdUnitDown: The systemd unit rsync_nginxlogs.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:01:11] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T364820 (10phaultfinder) 03NEW [07:04:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:05:10] 10Cloud Services Proposals: Decision request - kubernetes upgrade workgroup - https://phabricator.wikimedia.org/T363683#9793264 (10Slst2020) [07:12:55] 10Tools: Tool:Panoviewer - Grid Engine web service cannot be reached. - https://phabricator.wikimedia.org/T354949#9793273 (10Ligliotoi) >>! In T354949#9792952, @tstarling wrote: >>>! In T354949#9790843, @Ligliotoi wrote: >> Hallo, >> >> https://panoviewer.toolforge.org/ does not work. Error 403 "Forbidden".... [07:25:36] 06cloud-services-team, 10Toolforge: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#9793278 (10Slst2020) Here, we are only talking about allowing tools to set up and use s3-style buckets, correct? Is there any intersection/dependency between t... [07:34:41] 06cloud-services-team, 10Toolforge: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#9793292 (10dcaro) >>! In T358496#9793278, @Slst2020 wrote: > Here, we are only talking about allowing tools to set up and use s3-style buckets, correct? Is the... [07:36:44] 10Tools, 06translatewiki.net, 10Language-Team (Language-2024-April-June), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Make Wikidata Image Positions tool translatable on translatewiki.net - https://phabricator.wikimedia.org/T363626#9793293 (10LucasWerkmeister) Yes, there are still so... [07:38:20] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T364820#9793294 (10dcaro) a:03dcaro [07:42:15] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T364820#9793297 (10dcaro) The error on clouddumps1002 is: ` May 14 04:55:00 clouddumps1002 rsync[4096653]: @ERROR: chroot failed May 14 04:55:00 clouddumps1002 rsync[4096653]: rsync error: error starting client-server protocol (code 5) at... [07:42:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:47:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:48:09] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 [07:48:27] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-9 [07:50:08] 10Toolforge: [cookbook,infra] wmcs.toolforge.k8s.worker.drain failed to finish with `KeyError` on one node - https://phabricator.wikimedia.org/T364821 (10dcaro) 03NEW [07:50:29] 10Toolforge: [cookbook,infra] wmcs.toolforge.k8s.worker.drain failed to finish with `KeyError` on one node - https://phabricator.wikimedia.org/T364821#9793315 (10dcaro) p:05Triage→03Medium [07:52:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:54:13] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822 (10dcaro) 03NEW [07:54:25] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793337 (10dcaro) p:05Triage→03High [07:54:30] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793338 (10dcaro) p:05High→03Triage [07:54:46] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793339 (10dcaro) p:05Triage→03High [07:54:51] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793335 (10dcaro) [07:57:31] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793340 (10dcaro) [07:57:41] RESOLVED: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:57:46] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793343 (10dcaro) nfs-9 has been drained without issues, so I can start debugging on it. Looking into nfs-52... [08:02:09] 10Tools, 10Gerrit: Gerrit reviewer bot should have an option to add people as CC instead of reviewers - https://phabricator.wikimedia.org/T363290#9793349 (10hashar) Thus it looks like we can mark this as a duplicate of T334118? The resolution would be to decline it since the reviewer bot is in maintenance mode. [08:10:16] 06cloud-services-team, 10Toolforge: Request for access for user dr0ptp4kt for 'admin' tool - https://phabricator.wikimedia.org/T364761#9793357 (10dcaro) +1 from me [08:14:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-52 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:16:46] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793395 (10dcaro) The load on nfs-52 seems to come from osm4wiki processes, that has been killed several times for going over the memory limit. NFS is resp... [08:19:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-52 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:30:16] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9793435 (10dcaro) on nfs-9, there's 3 errors that repeat over and over in the logs, until right the moment when the number of D processes start to raise: `... [08:32:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-52 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:34:28] FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [08:37:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-52 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:39:28] RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:08:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [09:13:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [09:14:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:24:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:00:56] RESOLVED: SystemdUnitDown: The service unit rsync_nginxlogs.service is in failed status on host clouddumps1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:20:40] 10Quarry: quarry.wmcloud.org POST request to /api/query/stop does not work - https://phabricator.wikimedia.org/T364835 (10Oudedutchman) 03NEW [10:21:41] 10Quarry: quarry.wmcloud.org POST request to /api/query/stop does not work with queued queries - https://phabricator.wikimedia.org/T364835#9793876 (10Oudedutchman) [10:22:26] RESOLVED: SystemdUnitDown: The systemd unit rsync_nginxlogs.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:23:46] (03update) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [10:24:39] 10Quarry: quarry.wmcloud.org POST request to /api/query/stop does not work with queued queries - https://phabricator.wikimedia.org/T364835#9793884 (10Oudedutchman) →14Duplicate dup:03T362213 [10:25:06] 10Quarry: Error 500 when clicking "stop query" - https://phabricator.wikimedia.org/T362213#9793882 (10Oudedutchman) [10:26:00] (03update) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [10:35:24] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 cloudnets to Neutron OVS agent - https://phabricator.wikimedia.org/T364459#9793907 (10dcaro) I have some questions: * What tests will be run on each checkpoint? (linuxbridge active-ovs inactive, ovs active-linuxbridge inactive, ovs activ... [10:55:15] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 cloudnets to Neutron OVS agent - https://phabricator.wikimedia.org/T364459#9793966 (10taavi) >>! In T364459#9793907, @dcaro wrote: > I have some questions: > > * What tests will be run on each checkpoint? (linuxbridge active-ovs inactive,... [11:10:18] (03update) 10aborrero: Draft: maintain_kubeusers: introduce resource abstraction [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 (https://phabricator.wikimedia.org/T279110) [11:14:23] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 cloudnets to Neutron OVS agent - https://phabricator.wikimedia.org/T364459#9794012 (10cmooney) Thanks for the task @taavi. Looks well put together let me know the exact time you're starting and if feel free to ping me if there is anything... [12:12:22] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 cloudnets to Neutron OVS agent - https://phabricator.wikimedia.org/T364459#9794197 (10taavi) This operation is scheduled for 2024-05-21 starting at around 14:00 UTC: https://lists.wikimedia.org/hyperkitty/list/cloud@lists.wikimedia.org/thre... [12:21:26] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 cloudnets to Neutron OVS agent - https://phabricator.wikimedia.org/T364459#9794229 (10taavi) 05Open→03In progress [12:48:38] (03update) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [12:50:09] (03update) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [12:50:58] (03update) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [12:51:49] (03approved) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [12:51:56] (03merge) 10dcaro: [oapi-spec] add oapi-server to gateway [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 (https://phabricator.wikimedia.org/T362299) (owner: 10sstefanova) [12:52:07] 10Toolforge (Toolforge iteration 09), 13Patch-For-Review: [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9794322 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 [oapi-spec] add o... [12:53:50] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 [12:56:27] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:04:29] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:04:32] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:08:16] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [13:08:27] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [13:08:49] (03approved) 10aborrero: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:09:29] 10Toolforge (Toolforge iteration 09): [maintain-kubeusers] Increment default services quota - https://phabricator.wikimedia.org/T362520#9794428 (10taavi) [13:10:11] 10Toolforge (Toolforge iteration 09): increase quota for services - https://phabricator.wikimedia.org/T364780#9794426 (10taavi) →14Duplicate dup:03T362520 [13:13:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [13:15:13] (03open) 10dcaro: api: use APP_ as the prefix for environment variables [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/19 [13:17:26] (03approved) 10sstefanova: api: use APP_ as the prefix for environment variables [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/19 (owner: 10dcaro) [13:19:56] (03approved) 10dcaro: api: use APP_ as the prefix for environment variables [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/19 [13:19:58] (03update) 10dcaro: api: use APP_ as the prefix for environment variables [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/19 [13:20:00] (03merge) 10dcaro: api: use APP_ as the prefix for environment variables [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/19 [13:21:40] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 [13:22:33] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:23:45] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:24:56] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [13:25:07] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [13:28:23] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [13:28:36] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [13:45:51] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent enforcement model - https://phabricator.wikimedia.org/T362872#9794680 (10dcaro) > Kyverno has native support for mutating existing resources: https://kyverno.io/docs/writing-policies/mutate/#mutate-exist... [13:54:12] 10Toolforge (Toolforge iteration 09): [infra] NFS hangs in some workers until the worker is rebooted (2024-05-14) - https://phabricator.wikimedia.org/T364822#9794727 (10dcaro) 05Open→03In progress [13:55:21] 10Toolforge (Toolforge iteration 09): increase quota for services - https://phabricator.wikimedia.org/T364780#9794729 (10dcaro) 05Duplicate→03Resolved [13:56:07] 10Toolforge (Toolforge iteration 09): increase quota for services - https://phabricator.wikimedia.org/T364780#9794743 (10dcaro) 05Resolved→03Invalid moving it to the done column changed the status :/ [14:11:51] (03CR) 10Krinkle: [C:03+2] "@Ebrahim: Per https://fa.wikipedia.org/wiki/Special:CentralAuth/Ebrahim I see you are a sysop there, so I assume this either has formal co" [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1028824 (owner: 10Ebrahim) [14:12:18] (03Merged) 10jenkins-bot: Add Persian Wikipedia to configs [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1028824 (owner: 10Ebrahim) [14:12:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:22:41] RESOLVED: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:47:44] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent enforcement model - https://phabricator.wikimedia.org/T362872#9794999 (10aborrero) 05Open→03Resolved We had a decision meeting today, and voting went like this: * option 1: 1 vote * option 2: 3... [14:50:24] (03update) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:50:25] (03approved) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:50:29] (03merge) 10dcaro: api-gateway: bump to 0.0.21-20240514125202-02c6ab3b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 (https://phabricator.wikimedia.org/T362299) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:50:41] 10Toolforge (Toolforge iteration 09): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9795025 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/281 api-gateway: bump to 0.0.21-20240... [15:01:44] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870 (10RobH) 03NEW p:05Triage→03High [15:02:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9795094 (10RobH) [15:07:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:12:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:02:17] (03update) 10aborrero: Draft: maintain_kubeusers: introduce resource abstraction [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 (https://phabricator.wikimedia.org/T279110) [16:18:09] 10Toolforge: Running out of "secrets" quota (envvars) produces unhelpful error message from `toolforge envvars create` - https://phabricator.wikimedia.org/T364878 (10bd808) 03NEW [16:18:57] 10Cloud-VPS (Debian Buster Deprecation), 06The-Wikipedia-Library, 10Moderator-Tools-Team (Kanban): Replace deprecated Buster VMs in Cloud VPS - https://phabricator.wikimedia.org/T364399#9795675 (10jsn.sherman) p:05Triage→03Medium [16:29:33] 10Toolforge: Running out of "secrets" quota (envvars) produces unhelpful error message from `toolforge envvars create` - https://phabricator.wikimedia.org/T364878#9795914 (10bd808) Related: {T333976} [16:37:22] (03open) 10dcaro: secrets: use 60 as default quota [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/24 (https://phabricator.wikimedia.org/T364878) [16:37:59] 06cloud-services-team, 10Cloud-VPS, 06Trust and Safety Product Team: make sure Anti Harassment Tools are aware of cloud public IPv4 ranges - https://phabricator.wikimedia.org/T273731#9796053 (10TAdeleye_WMF) [16:38:46] (03update) 10dcaro: secrets: use 60 as default quota [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/24 (https://phabricator.wikimedia.org/T364878) [16:40:06] 10Toolforge, 13Patch-For-Review: Running out of "secrets" quota (envvars) produces unhelpful error message from `toolforge envvars create` - https://phabricator.wikimedia.org/T364878#9796102 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/24... [16:41:09] 10Toolforge (Toolforge iteration 09): [maintain-kubeusers] Increment default services quota - https://phabricator.wikimedia.org/T362520#9796110 (10bd808) > to some TBD higher value. How about 16 to match the number of pods in the default quota? We can certainly pick any other arbitrary number >1 as well, but th... [16:41:37] 10Toolforge: [maintain-kubeusers] Increment default Secrets (envvars) quota - https://phabricator.wikimedia.org/T364883 (10bd808) 03NEW [16:47:55] 06cloud-services-team, 10Toolforge: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#9796170 (10Andrew) >>! In T358496#9790877, @dcaro wrote: >> This all seems correct, although I reiterate that the interesting part is the scope creation or man... [16:50:48] 06cloud-services-team, 10Toolforge: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#9796177 (10Andrew) >>! In T358496#9793278, @Slst2020 wrote: > Here, we are only talking about allowing tools to set up and use s3-style buckets, correct? That... [17:12:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:13:15] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:17:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:22:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:27:41] RESOLVED: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:30:04] 10Data-Services, 10VPS-Projects: Some PetScan queries do not return any results anymore for some days now - https://phabricator.wikimedia.org/T363073#9796375 (10M2k_dewiki) Also see * https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/Reparaturhilfe#PetScan:_Suche_nach_Wikidata-Objekten_ohne_e... [18:01:07] (03update) 10dcaro: secrets: use 60 as default quota [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/24 (https://phabricator.wikimedia.org/T364878) [18:02:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:02:19] 10Toolforge (Toolforge iteration 09): [maintain-kubeusers] Increment default Secrets (envvars) quota - https://phabricator.wikimedia.org/T364883#9796541 (10dcaro) p:05Triage→03Medium [18:02:20] 10Toolforge (Toolforge iteration 09): [maintain-kubeusers] Increment default Secrets (envvars) quota - https://phabricator.wikimedia.org/T364883#9796546 (10dcaro) a:03dcaro [18:04:21] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:14:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:14:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:42:41] FIRING: [2x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:47:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:52:41] FIRING: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:57:41] RESOLVED: [3x] CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:24:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [19:26:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [20:55:33] 10superset.wmcloud.org: Upgrade to 4.0.0 - https://phabricator.wikimedia.org/T364022#9797398 (10rook) This error still appears in an upgrade from superset 3.1.1 (helm chart 0.12.7) to 4.0.1 (helm chart 0.12.11). Noted on github. [21:13:15] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:44:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:29:16] (03open) 10bd808: gitlab: handle "last_commit: null" properly [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/43