[00:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:24:57] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195#9565904 (10Legoktm) 05Open→03Resolved a:05Naypta→03Legoktm Seems like it fixed itself after I went to sleep. I'm going to call this resolved, we're working on a... [00:34:20] 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9565944 (10Legoktm) All sounds good to me, agreed that there are better ways to do automatic deploys without doing git pulls and all the internal detection logic. I am not sure whether this is in your scope or not, but... [00:36:48] (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudcephosd1012:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [00:49:17] 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9565965 (10bd808) >>! In T357851#9565944, @Legoktm wrote: > I am not sure whether this is in your scope or not, but from what I recall with your usage of ZNC to front other IRC bots, that would be nice to have for wikib... [00:54:45] (WidespreadPuppetFailure) firing: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [01:12:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:22:48] (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:27:04] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059#9566083 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbo... [01:41:18] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:41:23] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephmon1003:9100 - https://phabricator.wikimedia.org/T358165#9566087 (10phaultfinder) [01:51:18] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:51:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566104 (10phaultfinder) [01:56:18] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:01:19] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:04:09] 10Tool-global-search: 500: Internal Server Error on (Gadgets-definition|.*\.(js|css|json)) - https://phabricator.wikimedia.org/T358061#9566122 (10Samwilson) Looks like it's not just for JS pages, e.g. https://global-search.toolforge.org/?q=WSexport results in > The server said: cURL error 6: Could not resolve h... [02:13:14] 10Tool-global-search: 500: Internal Server Error on (Gadgets-definition|.*\.(js|css|json)) - https://phabricator.wikimedia.org/T358061#9566141 (10Bugreporter) >cloudelastic1004.wikimedia.org See {T358046} [02:19:49] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudcephosd1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:19:53] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1034:9100 - https://phabricator.wikimedia.org/T358169#9566154 (10phaultfinder) [02:23:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:24:49] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:24:55] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566162 (10phaultfinder) [02:34:49] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:39:49] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on cloudcephmon1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:45:18] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcephmon1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:45:24] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1030:9100 - https://phabricator.wikimedia.org/T358172#9566186 (10phaultfinder) [02:50:18] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephmon1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:50:24] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566198 (10phaultfinder) [03:00:33] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephosd1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:20:18] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephosd1025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:20:22] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566220 (10phaultfinder) [03:25:18] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephosd1025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:30:18] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on cloudcephosd1025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:46:43] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:06:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudcephosd1017:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:06:53] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1017:9100 - https://phabricator.wikimedia.org/T358174#9566258 (10phaultfinder) [04:15:05] 10Toolforge Jobs framework: dbreps job pending to start for 2d16h - https://phabricator.wikimedia.org/T358175#9566279 (10Legoktm) [04:15:18] 10Toolforge Jobs framework: dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566289 (10Legoktm) [04:21:49] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephosd1017:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:21:54] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1022:9100 - https://phabricator.wikimedia.org/T358176#9566290 (10phaultfinder) [04:34:45] (WidespreadPuppetFailure) resolved: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [04:41:48] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:41:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566299 (10phaultfinder) [05:01:49] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [05:36:49] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudbackup1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [05:36:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566308 (10phaultfinder) [05:46:50] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudbackup1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [05:51:49] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on cloudbackup1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:05:19] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephmon1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:05:24] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephmon1002:9100 - https://phabricator.wikimedia.org/T358177#9566317 (10phaultfinder) [06:05:33] (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudcephmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:06:45] (WidespreadPuppetFailure) firing: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [06:10:19] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:10:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566324 (10phaultfinder) [06:15:19] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:15:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566325 (10phaultfinder) [06:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:20:19] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:20:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566326 (10phaultfinder) [06:23:15] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:30:19] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:30:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566328 (10phaultfinder) [06:40:19] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:40:33] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:45:19] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:50:19] (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:51:48] 10tool-wdlocator, 10translatewiki.net, 10Language-Team (Language-2024-January-March), 10Localization Infrastructure FY2023-24, 10Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9566360 (10Wangombe) 05Open→03In progress a:03Wangombe [07:15:19] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [07:15:23] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566367 (10phaultfinder) [07:46:56] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:55:19] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [08:02:30] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 [08:02:46] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 [08:03:16] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 [08:03:30] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 [08:03:52] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 [08:04:08] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 [08:05:34] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566408 (10taavi) a:03taavi ` Feb 19 12:14:11 tools-k8s-worker-nfs-38 kubelet[3990]: E0219 12:14:11.504588 3990 pod_workers.go:965] "Error syncing pod, skipping" err="failed t... [08:10:49] 10Toolforge: wmcs.toolforge.add_k8s_node occasionally fails to setup custom Puppetmaster - https://phabricator.wikimedia.org/T358179#9566424 (10taavi) [08:11:28] (InstanceDown) firing: Project tools instance tools-k8s-worker-nfs-25 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:13:06] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566279 (10taavi) p:05Triage→03High [08:13:20] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566279 (10taavi) So the CNI path there is wrong, and our containerd config Puppetization is supposed to change that. That node was affected by {T358179}, so I think the reason why... [08:14:07] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566444 (10taavi) 05Open→03In progress [08:16:28] (InstanceDown) resolved: Project tools instance tools-k8s-worker-nfs-25 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:20:19] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [08:21:15] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9566491 (10taavi) >>! In T358175#9566434, @taavi wrote: > I'll see if we can alert on pods stuck in Pending for a while. Not easily, the same Pending status as reported by kube-st... [08:21:45] (WidespreadPuppetFailure) resolved: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [08:25:19] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on cloudbackup1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [08:27:12] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban), 10Patch-Needs-Improvement: Git tag/version fetching times out - https://phabricator.wikimedia.org/T334454#9566495 (10Samwilson) I've tagged and released version 1.2.4… and something seems to be broken! The preview doesn't work now... [08:31:20] 10Tool-Wikidata-Periodic-Table, 10Wikidata, 10Documentation, 10Patch-For-Review, 10Wikimedia-Hackathon-2024: Improve documentation of Wikidata periodic table - https://phabricator.wikimedia.org/T99847#9566530 (10Lydia_Pintscher) p:05Triage→03Low [08:53:11] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716#9566584 (10dcaro) [08:53:22] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9566585 (10dcaro) [09:00:38] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [09:00:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudcephosd1021:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [09:01:00] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1021:9100 - https://phabricator.wikimedia.org/T358186#9566601 (10phaultfinder) [09:01:05] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [09:01:43] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1017:9100 - https://phabricator.wikimedia.org/T358174#9566611 (10taavi) 05Open→03Resolved a:03taavi [09:01:57] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephmon1002:9100 - https://phabricator.wikimedia.org/T358177#9566607 (10taavi) 05Open→03Resolved a:03taavi [09:01:59] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1022:9100 - https://phabricator.wikimedia.org/T358176#9566609 (10taavi) 05Open→03Resolved a:03taavi [09:02:01] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1030:9100 - https://phabricator.wikimedia.org/T358172#9566613 (10taavi) 05Open→03Resolved a:03taavi [09:02:05] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1034:9100 - https://phabricator.wikimedia.org/T358169#9566615 (10taavi) 05Open→03Resolved a:03taavi [09:02:17] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephmon1003:9100 - https://phabricator.wikimedia.org/T358165#9566617 (10taavi) 05Open→03Resolved a:03taavi [09:02:19] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1021:9100 - https://phabricator.wikimedia.org/T358186#9566623 (10taavi) 05Open→03Resolved a:03taavi [09:02:21] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1008:9100 - https://phabricator.wikimedia.org/T358156#9566619 (10taavi) 05Open→03Resolved a:03taavi [09:02:28] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9566621 (10taavi) 05Open→03Resolved a:03taavi [09:15:49] (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudcephosd1021:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [09:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:17:19] (03PS1) 10Arturo Borrero Gonzalez: wmcs.toolforge.add_k8s_node: add default network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005711 [09:19:39] (03CR) 10Majavah: "Can we not rely on the current detection mechanism?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005711 (owner: 10Arturo Borrero Gonzalez) [09:29:38] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster (T284656) [09:29:42] T284656: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 [09:31:05] 10Toolforge (Toolforge iteration 06), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review, 10User-aborrero: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#9566679 (10aborrero) [09:32:38] 10Toolforge (Toolforge iteration 06), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review, 10User-aborrero: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#9566677 (10aborrero) Running cookbook: `lang=shell-session aborrero@cloudcumin1001:~ $ sud... [09:38:59] !log aborrero@cloudcumin1001 tools Added a new k8s control tools-k8s-control-8.tools.eqiad1.wikimedia.cloud to the cluster [09:38:59] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster [09:51:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:56:42] (CloudVPSDesignateLeaks) resolved: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:17:58] 10wikitech.wikimedia.org, 10Content-Transform-Team-WIP, 10DiscussionTools, 10Parsoid-Read-Views (Phase 1 - DiscussionTools support): Use Parsoid for DiscussionTools on wikitech - https://phabricator.wikimedia.org/T355374#9566807 (10ihurbain) [10:22:45] 10Toolforge (Toolforge iteration 06): [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566823 (10dcaro) [10:22:51] 10Toolforge (Toolforge iteration 06): [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566835 (10dcaro) p:05Triage→03High [10:23:15] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:28:46] 10Toolforge (Toolforge iteration 06): [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566864 (10Magnus) This happens ~50% of the time, re-running the exact same command often works. This error is new today, didn't happen yesterday. [10:47:24] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566924 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/62 deployment: Pin jobs-api pod to NFS-enabled w... [10:48:00] 10Toolforge: Create a pool of NFS-less Toolforge Kubernetes workers - https://phabricator.wikimedia.org/T355883#9566951 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/62 deployment: Pin jobs-api pod to NFS-enabled workers [10:48:05] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566952 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/62 deployment: Pin jobs-api pod to NFS-enabled w... [10:48:46] 10Toolforge (Toolforge iteration 06): Create a pool of NFS-less Toolforge Kubernetes workers - https://phabricator.wikimedia.org/T355883#9566954 (10taavi) a:03taavi [10:50:30] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566979 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_reques... [10:50:56] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: Create a pool of NFS-less Toolforge Kubernetes workers - https://phabricator.wikimedia.org/T355883#9566981 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/206 jobs-api: bump to 0.0.263... [10:50:58] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566982 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/206 jobs-api: bump to 0.0.263-2024022210... [10:51:08] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [10:51:14] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9566983 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/42 kubernetes.logs: use a default date if... [10:51:17] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [10:51:24] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: Create a pool of NFS-less Toolforge Kubernetes workers - https://phabricator.wikimedia.org/T355883#9566977 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/m... [10:51:51] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [10:52:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [10:58:16] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9567001 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/42 kubernetes.logs: use a default date if... [11:03:26] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194#9567006 (10taavi) 05Open→03Resolved Adding the missing `nodeSelector` seems to have fixed it. So {T355883} broke this, since I thought I'd added that everywhere... [11:06:53] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [11:15:19] !log taavi@cloudcumin1001 tools Added a new k8s worker tools-k8s-worker-104.tools.eqiad1.wikimedia.cloud to the cluster [11:15:19] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [11:23:02] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node (T284656) [11:23:07] T284656: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 [11:23:50] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) (T284656) [11:35:31] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:35:43] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:35:56] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:36:06] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:36:09] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: Create a pool of NFS-less Toolforge Kubernetes workers - https://phabricator.wikimedia.org/T355883#9567171 (10taavi) 05In progress→03Resolved So I added three non-NFS workers, `tools-k8s-worker-102` to 104. So far they're being used by various infra... [11:36:16] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:36:20] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:36:48] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:36:51] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:37:09] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:37:31] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:37:43] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:38:06] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:38:22] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [11:38:59] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [11:54:53] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [11:55:00] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [11:57:08] 10cloud-services-team, 10User-aborrero: wmcs.openstack.cloudvirt.lib.ensure_canary cookbook creates multiple canary VMs - https://phabricator.wikimedia.org/T357970#9567260 (10aborrero) 05Open→03Resolved a:03aborrero [12:03:03] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: nova-compute: error running local ceph command - https://phabricator.wikimedia.org/T358101#9567275 (10aborrero) p:05Triage→03Low This was a race condition with puppet, nova-compute started before ceph was fully installed. Confirmed via: `lang=shell-se... [12:11:42] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:16:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:32:37] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T319184) [12:32:42] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:45:55] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9567395 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/23 api: exp... [12:53:12] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1034.eqiad.wmnet' (T319184) [12:53:18] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:57:57] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T319184) [12:58:41] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1034.eqiad.wmnet' (T319184) [12:58:46] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [13:00:17] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9567445 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/to... [13:01:51] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567450 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS... [13:09:48] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9567486 (10Tim-moody) I don't think this fits the usual pattern for Toolforge. First, it is not a service that would only get spun up from time to time; it must be pe... [13:42:23] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567605 (10aborrero) [13:45:19] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9567624 (10dcaro) Hi @Tim-moody! It's perfectly fine if you prefer using a CloudVPS project, we are just trying to find the best option for you :) Toolforge is able... [13:45:22] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1034'] [13:45:29] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1034'] [13:47:10] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567631 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS book... [13:47:18] 10Toolforge (Toolforge iteration 06): Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#9567632 (10taavi) a:05taavi→03None [13:58:13] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:58:19] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [13:58:57] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [13:59:07] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [14:03:11] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [14:03:15] !log sstefanova@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api [14:04:59] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9567723 (10taavi) `lang=shell-session tools.capacity-exchange@tools-sgebastion-11:~$ toolforge envvars show TOOL_ELASTICSEARCH_PASSWORD name value TOOL_... [14:07:08] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [14:07:11] !log sstefanova@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api [14:09:49] (03PS1) 10Arturo Borrero Gonzalez: inventory: refresh tools k8s control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005766 (https://phabricator.wikimedia.org/T284656) [14:11:30] (03CR) 10Majavah: [C: 03+1] inventory: refresh tools k8s control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005766 (https://phabricator.wikimedia.org/T284656) (owner: 10Arturo Borrero Gonzalez) [14:13:22] (03CR) 10Slavina Stefanova: [C: 03+1] inventory: refresh tools k8s control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005766 (https://phabricator.wikimedia.org/T284656) (owner: 10Arturo Borrero Gonzalez) [14:13:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] inventory: refresh tools k8s control nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005766 (https://phabricator.wikimedia.org/T284656) (owner: 10Arturo Borrero Gonzalez) [14:16:06] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9567747 (10Slst2020) >>! In T357227#9567723, @taavi wrote: > That's a password hash, not a password... > > I regenerated the credentials and copied the correct password to the... [14:17:27] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [14:17:30] !log sstefanova@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api [14:23:15] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:24:21] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [14:24:26] !log sstefanova@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api [14:26:41] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [14:26:55] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [14:29:42] 10Toolforge: [maintain-harbor] Move to become a toolforge component - https://phabricator.wikimedia.org/T358225#9567809 (10dcaro) [14:29:46] 10Toolforge: [maintain-harbor] Move to become a toolforge component - https://phabricator.wikimedia.org/T358225#9567821 (10dcaro) p:05Triage→03Medium [14:31:38] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9567832 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/207 en... [14:36:04] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9567872 (10Tim-moody) I admit that familiarity makes me inclined towards VPS. I have used docker containers from others, but not built anything serious. I noted the b... [14:54:14] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9567935 (10Slst2020) [14:57:40] 10Toolforge: wmcs.toolforge.add_k8s_node occasionally fails to setup custom Puppetmaster - https://phabricator.wikimedia.org/T358179#9567968 (10dcaro) p:05Triage→03Medium [14:58:01] 10Toolforge: [wmcs-cookbooks] wmcs.toolforge.add_k8s_node occasionally fails to setup custom Puppetmaster - https://phabricator.wikimedia.org/T358179#9567975 (10dcaro) [14:58:22] 10Toolforge: [toolforge-cd] gitlab-ci refactor - https://phabricator.wikimedia.org/T353514#9567978 (10dcaro) 05In progress→03Open [15:02:59] 10Toolforge, 10cloud-services-team: [maintain-dbusers] When creating accounts, the script bails out processing other accounts if one of them fails in an unexpected way - https://phabricator.wikimedia.org/T332798#9567999 (10dcaro) [15:03:02] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 3 others: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955#9567997 (10dcaro) 05In progress→03Open a:05dcaro→03None [15:03:07] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [wmcs-cookbooks,toolforge,nfs] automate cleanup of D state webservices by deleting the stuck pod - https://phabricator.wikimedia.org/T348662#9568003 (10dcaro) [15:03:11] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Patch-For-Review: Split maintain-dbusers.py into two parts, one to run on cloudcontrol nodes and one to run on an NFS server VM - https://phabricator.wikimedia.org/T303663#9568000 (10dcaro) [15:03:15] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [wmcs-cookbooks,toolforge,nfs] automate cleanup of D state webservices by deleting the stuck pod - https://phabricator.wikimedia.org/T348662#9243341 (10dcaro) p:05Hi... [15:03:18] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [wmcs-cookbooks,toolforge,nfs] automate cleanup of D state webservices by deleting the stuck pod - https://phabricator.wikimedia.org/T348662#9243341 (10dcaro) p:05Me... [15:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:26:39] 10Striker: Add option to hide unwanted tool accounts from Striker UI - https://phabricator.wikimedia.org/T192225#9568116 (10taavi) 05Open→03Declined Tool accounts can be deleted now. [15:55:17] 10Toolforge (Toolforge iteration 06): [Toolforge CLI consolidation] Explore OpenAPI SDK tooling - https://phabricator.wikimedia.org/T356261#9568292 (10Slst2020) [15:58:47] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T356453#9568308 (10rook) Appears done ` @PAWS:~$ pwb.py --version Pywikibot: [https] r-pywikibot-core.git (3f413eb, g1, 2023/12/05, 15:44:04, OUTDATED) Release version: 8.6.0 ` [15:59:15] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T356453#9568325 (10rook) 05Open→03Invalid [16:00:04] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, and 2 others: [wmcs-cookbooks,toolforge,nfs] automate cleanup of D state webservices by deleting the stuck pod - https://phabricator.wikimedia.org/T348662#9568330 (10aborrero) [16:07:29] 10Toolforge, 10User-aborrero: Fix deprecated Kubelet flags - https://phabricator.wikimedia.org/T355881#9568388 (10aborrero) [16:11:01] 10Grid-Engine-to-K8s-Migration, 10Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9568411 (10bd808) >>! In T306888#9535963, @MusikAnimal wrote: >>>! In T306888#9535033, @MusikAnimal wrote: >> It was my intention to migrate to k8s (via toolforge-jobs), but T30... [16:12:31] 10Toolforge: [toolforge,storage,swift,s3] Object store? - https://phabricator.wikimedia.org/T225190#9568414 (10dcaro) p:05Triage→03High [16:13:51] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudnet2008-dev:9100 - https://phabricator.wikimedia.org/T357887#9568433 (10taavi) 05Open→03Resolved a:03taavi [16:14:02] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudvirt2004-dev:9100 - https://phabricator.wikimedia.org/T357886#9568435 (10taavi) 05Open→03Resolved a:03taavi [16:14:31] 10cloud-services-team: SystemdUnitDown Unit wmf_auto_restart_virtlogd.service on node cloudvirt1032 has been down for long. - https://phabricator.wikimedia.org/T357963#9568446 (10taavi) 05Open→03Resolved a:03taavi [16:16:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:21:19] 10Data-Services: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#9568522 (10fnegri) [16:21:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:21:42] 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-20 - https://phabricator.wikimedia.org/T357979#9568519 (10fnegri) Replication lag is back to zero: {F42044924} [16:21:56] 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-20 - https://phabricator.wikimedia.org/T357979#9568521 (10fnegri) 05Stalled→03Resolved [16:26:42] (CloudVPSDesignateLeaks) resolved: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:34:52] (03PS1) 10AntiCompositeNumber: Ignore canary events in StewardBot [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/1005797 [16:41:38] 10PAWS, 10Malayalam-Sites: Indic font in PAWS Terminal - https://phabricator.wikimedia.org/T355998#9568638 (10rook) `curl --silent "https://en.wikipedia.org/w/index.php?title=Malayalam_Wikipedia&oldid=1195587528" 2>&1 | grep "മലയാളം"` in the terminal of PAWS Further shows this to be the case. Compare to https:... [16:43:09] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9568663 (10aborrero) [16:46:30] (03CR) 10Operator873: [C: 03+2] "Safe change. +2 added due to minor change and straightforward approach. Syntax is good." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/1005797 (owner: 10AntiCompositeNumber) [16:47:27] (03Merged) 10jenkins-bot: Ignore canary events in StewardBot [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/1005797 (owner: 10AntiCompositeNumber) [16:56:26] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9568743 (10dcaro) p:05Triage→03High [16:57:12] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9568750 (10dcaro) Deadline for in-task decision is 12th of March [16:58:38] 10Toolforge, 10User-aborrero: [toolforge,infra] Fix deprecated Kubelet flags - https://phabricator.wikimedia.org/T355881#9568757 (10dcaro) p:05Triage→03Medium [17:02:57] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9568800 (10taavi) [17:06:53] 10cloud-services-team (FY2023/2024-Q3-Q4): Test using phabricator-maintenance-bot to sync wmcs-related boards - https://phabricator.wikimedia.org/T358251#9568811 (10fnegri) [17:06:55] 10PAWS, 10Malayalam-Sites: Indic font in PAWS Terminal - https://phabricator.wikimedia.org/T355998#9568822 (10rook) https://github.com/jupyterlab/jupyterlab/issues/15856 [17:11:36] 10cloud-services-team (FY2023/2024-Q3-Q4): Test using phabricator-maintenance-bot to sync wmcs-related boards - https://phabricator.wikimedia.org/T358251#9568847 (10fnegri) p:05Triage→03Low a:03fnegri [17:12:02] 10PAWS: Remove paws-dev from codfw1dev - https://phabricator.wikimedia.org/T355954#9568849 (10rook) ` root@cloudcontrol2001-dev:~# openstack project delete paws-dev ` [17:12:46] 10PAWS: Remove paws-dev from codfw1dev - https://phabricator.wikimedia.org/T355954#9568850 (10rook) 05Open→03Resolved a:03rook [17:13:12] 10cloud-services-team (FY2023/2024-Q3-Q4): Test using phabricator-maintenance-bot to sync wmcs-related boards - https://phabricator.wikimedia.org/T358251#9568811 (10fnegri) [17:14:43] 10PAWS: Add nbextensions to PAWS - https://phabricator.wikimedia.org/T287078#9568868 (10rook) 05Open→03Declined [17:19:48] 10PAWS, 10cloud-services-team: PAWS not allowing admins to impersonate users - https://phabricator.wikimedia.org/T265467#9568919 (10rook) 05Open→03Resolved [17:20:39] 10PAWS, 10cloud-services-team: PAWS not allowing admins to impersonate users - https://phabricator.wikimedia.org/T265467#9568918 (10rook) This appears to now function. [17:31:59] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9569025 (10Andrew) I'm perfectly happy with *.internal.toolforge.org or *.infra.toolforge.org, which seems to be what Taavi prefers as well :) [17:38:01] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:22:01] (03CR) 10Jforrester: releases: Bump Code to 1.3.3 (031 comment) [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1005174 (owner: 10VolkerE) [19:24:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [20:42:01] (03PS1) 10Ketulucas: Merge branch 'main' of ssh://gerrit.wikimedia.org:29418/labs/tools/Isa into fix-readme-instruction [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1005838 [20:42:03] (03PS1) 10Ketulucas: Bug: T225798. Swipe left right to change participant image. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1005839 (https://phabricator.wikimedia.org/T225798) [21:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:25:38] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570135 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/34 [buil... [21:25:59] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570138 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/78 [builds-a... [21:27:04] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570139 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolf... [21:31:57] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570160 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolf... [21:39:32] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570182 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/196 [t... [21:41:32] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570184 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/208 bu... [21:42:42] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570209 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/209 bu... [21:46:41] 10Toolforge (Toolforge iteration 06), 10Toolforge Build Service, 10Patch-For-Review: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763#9570216 (10Raymond_Ndibe) 05Open→03Resolved [22:24:50] (PuppetConstantChange) resolved: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange