[00:10:50] FIRING: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:59:57] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:00:11] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:15:56] FIRING: MaxConntrack: Max conntrack at 80.26% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:20:56] RESOLVED: MaxConntrack: Max conntrack at 80.17% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:55:22] (03open) 10samwilson: Update submodules to get recent fixes for Translate and core [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/1 (https://phabricator.wikimedia.org/T365558) [01:57:56] (03merge) 10samwilson: Update submodules to get recent fixes for Translate and core [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/1 (https://phabricator.wikimedia.org/T365558) [02:51:48] (03open) 10samwilson: Grant pagelang to sysop group [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/2 (https://phabricator.wikimedia.org/T365558) [02:52:49] (03merge) 10samwilson: Grant pagelang to sysop group [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/2 (https://phabricator.wikimedia.org/T365558) [03:01:35] (03open) 10samwilson: Add $wgPageLanguageUseDB = true [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/3 (https://phabricator.wikimedia.org/T368578) [03:04:30] (03merge) 10samwilson: Add $wgPageLanguageUseDB = true [toolforge-repos/wishlist-test] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist-test/-/merge_requests/3 (https://phabricator.wikimedia.org/T368578) [05:19:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:59:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:55:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#9933487 (10fnegri) clouddb1015 is back in sync. I will try repooling it. {F55953616} [08:15:33] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-06-21 - https://phabricator.wikimedia.org/T368250#9933503 (10fnegri) 05In progress→03Resolved Replication is finally back in sync. {F55954107} I am resolving this task. I have added some n... [08:48:55] (03Abandoned) 10Slavina Stefanova: go-cli: initial commit [cloud/toolforge/toolforge-cli] - 10https://gerrit.wikimedia.org/r/809986 (https://phabricator.wikimedia.org/T308748) (owner: 10Slavina Stefanova) [09:06:59] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: eqiad1: fix PTR delegations for 185.15.56.0/24 - https://phabricator.wikimedia.org/T341338#9933557 (10taavi) a:05taavi→03None What's left is removing the old `56.15.185.in-addr.arpa.` zone from Designate (while being careful not to remove `0-25.56.15.... [09:14:34] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: cloud-vps monitoring - https://phabricator.wikimedia.org/T362452#9933577 (10taavi) 05Open→03Resolved [09:14:42] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: python-flask-keystone, novaproxy, enc api - https://phabricator.wikimedia.org/T362449#9933582 (10taavi) 05Open→03Resolved [09:15:24] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: Toolforge misc services (e.g. mail server) - https://phabricator.wikimedia.org/T362447#9933590 (10taavi) 05Open→03Resolved a:03taavi [09:15:35] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Learn how to do what Taavi does - https://phabricator.wikimedia.org/T362443#9933593 (10taavi) 05Open→03Resolved [09:15:37] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: Cloud VPS OpenTofu provider - https://phabricator.wikimedia.org/T362450#9933579 (10taavi) 05Open→03Resolved a:03taavi [09:15:48] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: rebuild toolforge docker images - https://phabricator.wikimedia.org/T362448#9933584 (10taavi) 05Open→03Resolved a:03taavi [09:16:54] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: toolforge job investigation - https://phabricator.wikimedia.org/T362446#9933587 (10taavi) 05Open→03Resolved a:03taavi [09:17:53] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: maintain-kubeusers - https://phabricator.wikimedia.org/T362444#9933599 (10taavi) 05Open→03Resolved a:03taavi [09:18:00] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Taavi knowledge transfer: Toolforge k8s upgrades - https://phabricator.wikimedia.org/T362445#9933596 (10taavi) 05Open→03Resolved a:03taavi [09:19:06] (03merge) 10aborrero: deployment: drop PSP [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/24 (https://phabricator.wikimedia.org/T368142) [09:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:20:55] (03update) 10aborrero: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:21:12] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.25-20240628091913-285fb180 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/361 (https://phabricator.wikimedia.org/T368142) [09:21:18] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.25-20240628091913-285fb180 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/361 (https://phabricator.wikimedia.org/T368142) [09:23:20] (03update) 10sstefanova: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:24:37] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [09:24:48] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [09:28:18] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [09:28:30] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [09:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:30:18] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [09:30:28] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [09:35:40] (03merge) 10aborrero: deployment: drop PSP reference [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T368142) [09:37:19] (03open) 10taavi: logs: Fix mypy return value error [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/48 [09:37:58] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [09:38:10] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [09:38:23] (03approved) 10aborrero: logs: Fix mypy return value error [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/48 (owner: 10taavi) [09:38:29] (03update) 10taavi: logs: Fix mypy return value error [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/48 [09:39:05] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.311-20240628093550-c6df8783 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/362 (https://phabricator.wikimedia.org/T368142) [09:39:45] (03open) 10aborrero: kyverno: drop PSP [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/363 (https://phabricator.wikimedia.org/T368142) [09:40:18] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno [09:40:30] (03merge) 10taavi: logs: Fix mypy return value error [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/48 [09:40:30] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno [09:41:12] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno [09:41:24] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno [09:42:09] (03merge) 10aborrero: kyverno: drop PSP [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/363 (https://phabricator.wikimedia.org/T368142) [09:46:17] (03open) 10aborrero: wmcs-k8s-metrics: drop PSP [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/364 (https://phabricator.wikimedia.org/T368142) [09:49:27] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [09:49:39] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [09:50:06] (03update) 10sstefanova: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:50:37] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [09:50:48] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [09:50:49] (03update) 10sstefanova: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:51:41] (03approved) 10sstefanova: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:51:46] (03merge) 10sstefanova: builds-api: bump to 0.0.156-20240625082108-71537e14 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:52:29] (03update) 10aborrero: wmcs-k8s-metrics: drop PSP [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/364 (https://phabricator.wikimedia.org/T368142) [09:53:01] (03merge) 10aborrero: wmcs-k8s-metrics: drop PSP [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/364 (https://phabricator.wikimedia.org/T368142) [10:14:13] (03open) 10aborrero: cadvisor: drop PSP [repos/cloud/toolforge/wmcs-k8s-metrics] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/wmcs-k8s-metrics/-/merge_requests/9 (https://phabricator.wikimedia.org/T368142) [10:14:54] (03merge) 10aborrero: cadvisor: drop PSP [repos/cloud/toolforge/wmcs-k8s-metrics] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/wmcs-k8s-metrics/-/merge_requests/9 (https://phabricator.wikimedia.org/T368142) [10:19:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:54:10] (03open) 10aborrero: utils/update_component.sh: don't fail if there are value files without cartVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/365 [10:57:35] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9933870 (10taavi) [10:57:46] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723#9933871 (10taavi) [11:00:36] (03merge) 10aborrero: utils/update_component.sh: don't fail if there are value files without cartVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/365 [11:06:41] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: wmcs-k8s-metrics: bump to 0.0.20-20240628101504-9ed20c1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/366 (https://phabricator.wikimedia.org/T368142) [11:13:06] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [11:13:19] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [11:13:41] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [11:13:54] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [11:14:35] (03merge) 10aborrero: wmcs-k8s-metrics: bump to 0.0.20-20240628101504-9ed20c1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/366 (https://phabricator.wikimedia.org/T368142) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [11:20:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:25:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:34:29] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: [trove] cannot create mariadb instances - https://phabricator.wikimedia.org/T368725 (10fnegri) 03NEW [12:36:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: [trove] cannot create mariadb instances - https://phabricator.wikimedia.org/T368725#9934212 (10fnegri) 05Open→03In progress p:05Triage→03High a:03fnegri [12:38:12] (03open) 10sstefanova: Draft: Testing error generation for envvars-api [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/36 (https://phabricator.wikimedia.org/T366697) [12:43:34] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723#9934235 (10fnegri) a:05taavi→03Andrew [12:44:36] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723#9934231 (10fnegri) 05Open→03In progress p:05Triage→03Medium [12:52:32] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9934260 (10fnegri) p:05Triage→03Medium [12:53:50] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9934257 (10fnegri) 05Open→03In progress [12:54:23] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9934270 (10fnegri) a:05taavi→03Andrew [12:55:16] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11), 05Goal: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9934289 (10fnegri) a:05taavi→03None [12:56:29] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9934280 (10fnegri) 05Open→03In progress [12:56:51] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9934287 (10fnegri) a:05taavi→03Andrew [13:22:12] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#9934339 (10fnegri) clouddb1015 is looking good after being repooled (only some small spikes). clouddb1019 lag is continuing to grow and needs more inve... [13:48:04] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: [trove] cannot create mariadb instances - https://phabricator.wikimedia.org/T368725#9934456 (10fnegri) 05In progress→03Resolved I truncated the table and tried creating an instance with the smallest flavor (`g4.cores1.ram1.disk20`). It seems to b... [13:49:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:59:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:15:50] RESOLVED: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:41:50] 10Cloud-Services, 06DBA, 06SRE, 07Tracking-Neverending: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#9934676 (10sguebo_WMF) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikime... [14:44:44] 10Tool-phab-ban: Phab-ban returns a 500 Internal server error - https://phabricator.wikimedia.org/T368735 (10Mainframe98) 03NEW [14:55:21] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki configuration Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 1659 bytes in 0.100 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [15:23:23] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29660 bytes in 0.194 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [15:25:49] 10Data-Services, 06DBA, 06SRE, 07Tracking-Neverending: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#9934899 (10JJMC89) [15:29:05] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: eqiad1: fix PTR delegations for 185.15.56.0/24 - https://phabricator.wikimedia.org/T341338#9934918 (10Andrew) a:03Andrew [15:29:29] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-opensearch-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:31:28] FIRING: InstanceDown: Project tools instance tools-opensearch-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:36:28] RESOLVED: InstanceDown: Project tools instance tools-opensearch-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:42:18] 06cloud-services-team, 10Data-Services, 10Infrastructure Security: [wikireplicas] Review grants and views - https://phabricator.wikimedia.org/T368748 (10fnegri) 03NEW [15:49:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:09:20] 10Tool-phab-ban: Phab-ban returns a 500 Internal server error - https://phabricator.wikimedia.org/T368735#9935122 (10bd808) 05Open→03Resolved a:03bd808 Looks like an NFS hiccup may have taken it down? The uwsgi.log said `--- no python application found, check your startup logs for errors ---`. `webserv... [16:30:41] 06cloud-services-team, 10Data-Services, 06SRE: [wikireplicas] Make sure there is no sensitive data in clouddb hosts - https://phabricator.wikimedia.org/T368136#9935249 (10fnegri) @bd808 @Ladsgroup thanks for your replies! I will reiterate that the general goal is to make root access to clouddb* hosts as saf... [16:37:59] 10Tool-gitlab-account-approval: Switch to keyset based pagination - https://phabricator.wikimedia.org/T368761 (10bd808) 03NEW [16:53:22] 06cloud-services-team, 10Data-Services, 10Infrastructure Security: wikireplicas root access - https://phabricator.wikimedia.org/T344599#9935341 (10fnegri) > cloud-services-team define precisely what permissions are required but missing from the wikireplica hosts > If the need is just to run maintain-views,... [16:59:25] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9935354 (10SD0001) I have been having this issue in `sdzerobot` tool. After many months of trouble with the [[https://www.npmjs.com/pa... [17:15:17] 10Cloud-VPS (Debian Buster Deprecation), 10VideoCutTool: Cloud VPS "videocuttool" project Buster deprecation - https://phabricator.wikimedia.org/T367558#9935414 (10Gopavasanth) [17:16:05] 10Cloud-VPS (Debian Buster Deprecation), 10VideoCutTool: Cloud VPS "videocuttool" project Buster deprecation - https://phabricator.wikimedia.org/T367558#9935415 (10Gopavasanth) Thanks @SODA for your help on this migration of prod instance ^^ :) [17:16:56] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9935417 (10RoySmith) I'm still seeing it too. They seem to come in clusters. There were 14 on 2024-06-11 and 5 on 2024-06-17. Consi... [18:05:45] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9935497 (10bd808) https://wikitech.wikimedia.org/wiki/Help:Toolforge/Redis_for_Toolforge#Redis_containers may help some folks who are... [19:19:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:29:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:30:11] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:55:56] (03update) 10ebomani: Draft: Testing error generation for envvars-api [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/35 (https://phabricator.wikimedia.org/T360147 https://phabricator.wikimedia.org/T366697) [20:33:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-28 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:53:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-28 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:07:37] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Cloud-vps Buster deprecation - https://phabricator.wikimedia.org/T331738#9936190 (10Andrew) a:05Andrew→03None [21:08:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9936203 (10Andrew) a:05Andrew→03None [21:09:24] 10Horizon: Improve UI text and content for "Launch [database] instance" dialogue box in Horizon UI - https://phabricator.wikimedia.org/T325774#9936204 (10Andrew) 05Open→03Resolved [21:27:16] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936269 (10Jclark-ctr) cloudcephosd1039 2nd cable serial#20220008 port 1 cloudcephosd1040 2nd cable serial#20220043 port 5 cloudcephosd1041 2nd cable seria... [21:41:52] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936298 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1039.eqiad.wmnet with OS bullseye [21:41:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936299 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1041.eqiad.wmnet with OS bullseye [21:42:37] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936301 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1040.eqiad.wmnet with OS bullseye [22:09:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-redis-4 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:16:45] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936398 (10Jclark-ctr) [22:17:38] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936403 (10Jclark-ctr) a:03Jclark-ctr [22:24:29] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-redis-4 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:28:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:33:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:52:21] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#9936434 (10Papaul) [23:00:08] 10Toolforge: [envvars-api, envvars-cli] Create envvar name error message is not user friendly - https://phabricator.wikimedia.org/T360147#9936435 (10EBomani) Made [[ https://gitlab.wikimedia.org/ebomani/envvars-api/-/commit/28fb6c8e4f523f2fc6b20284e9b09c634d09a205 | some changes ]] but was unfortunately unable t... [23:00:46] 10Toolforge: [envvars-api, envvars-cli] Create envvar name error message is not user friendly - https://phabricator.wikimedia.org/T360147#9936436 (10EBomani) a:05EBomani→03None [23:27:42] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "toolhub" project Buster deprecation - https://phabricator.wikimedia.org/T367556#9936469 (10bd808) 05In progress→03Resolved