[00:11:26] FIRING: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:15:28] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:18:16] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9905650 (10Liz) I detected the problem whenever I opened this bug report. I remember filing a different one for Quarry about two weeks ago but this kind of incident, with queries running endlessly and never finishing, only... [00:20:28] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:21:20] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9905654 (10Liz) But hey, a few of my queries just issued reports! I don't know what happened since I posted this message but something has changed for the better. Surprised me. [01:01:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.migrate_server_to_ovs for server 02bb9b5a-cadf-4bee-9b63-519b1e9b485b [01:02:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server 02bb9b5a-cadf-4bee-9b63-519b1e9b485b [01:17:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.migrate_server_to_ovs for server 02bb9b5a-cadf-4bee-9b63-519b1e9b485b [01:17:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server 02bb9b5a-cadf-4bee-9b63-519b1e9b485b [01:18:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:23:54] (03PS1) 10Andrew Bogott: migrate_server_to_ovs.py: support more source flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1047210 (https://phabricator.wikimedia.org/T364457) [01:26:34] (03CR) 10CI reject: [V:04-1] migrate_server_to_ovs.py: support more source flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1047210 (https://phabricator.wikimedia.org/T364457) (owner: 10Andrew Bogott) [01:29:44] 10Cloud-VPS: dwl reboot coordination request - https://phabricator.wikimedia.org/T367797#9905780 (10Andrew) Oh, btw, I noticed that in a somewhat-arbitrary attempt to standardize this process your VMs were squashed down from 36G hosts to 32G hosts. If 32G turns out to not be enough ram it's pretty easy (if m... [01:31:45] (03PS2) 10Andrew Bogott: migrate_server_to_ovs.py: support more source flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1047210 (https://phabricator.wikimedia.org/T364457) [01:42:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.migrate_server_to_ovs for server a7725cd2-6162-41a3-8add-4dc0668b233b [01:43:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server a7725cd2-6162-41a3-8add-4dc0668b233b [01:43:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.migrate_server_to_ovs for server 71e4296d-039d-4452-9d92-69b9f8eb3aba [01:45:04] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server 71e4296d-039d-4452-9d92-69b9f8eb3aba [01:45:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.migrate_server_to_ovs for server f667d3c2-379a-48d7-ad44-4f3933bdb871 [01:46:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server f667d3c2-379a-48d7-ad44-4f3933bdb871 [01:49:26] !log andrew@cloudcumin1001 superset START - Cookbook wmcs.openstack.migrate_project_to_ovs [01:55:42] !log andrew@cloudcumin1001 superset END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [01:58:52] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [01:59:37] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [02:44:43] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [03:09:42] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [03:16:56] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:35:13] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [03:43:57] (03update) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [03:55:41] (03merge) 10raymond-ndibe: [envvars-cli] remove unused code [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/41 [03:55:54] (03merge) 10raymond-ndibe: [envvar-api] remove unused code [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/30 [03:56:24] (03merge) 10raymond-ndibe: [jobs-api] move simple job validations to pydantic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T366209) [03:58:41] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.309-20240619035632-dcbef566 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/338 (https://phabricator.wikimedia.org/T366209) [04:01:30] (03update) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: envvars-api: bump to 0.0.50-20240619035607-42829b67 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/339 [04:01:32] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: envvars-api: bump to 0.0.50-20240619035607-42829b67 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/339 [04:14:24] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [04:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:15:19] !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [04:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:15:27] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [04:15:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:16:18] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [04:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:16:39] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [04:16:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:17:26] !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [04:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:17:36] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [04:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:18:25] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [04:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:18:42] (03approved) 10raymond-ndibe: envvars-api: bump to 0.0.50-20240619035607-42829b67 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/339 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [04:18:45] (03merge) 10raymond-ndibe: envvars-api: bump to 0.0.50-20240619035607-42829b67 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/339 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [04:19:13] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.309-20240619035632-dcbef566 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/338 (https://phabricator.wikimedia.org/T366209) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [05:05:39] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.309-20240619035632-dcbef566 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/338 (https://phabricator.wikimedia.org/T366209) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [05:05:43] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.309-20240619035632-dcbef566 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/338 (https://phabricator.wikimedia.org/T366209) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [05:11:56] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:18:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:50:59] (03CR) 10Majavah: migrate_server_to_ovs.py: support more source flavors (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1047210 (https://phabricator.wikimedia.org/T364457) (owner: 10Andrew Bogott) [05:55:46] 06cloud-services-team, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#9905890 (10taavi) 05Open→03Resolved Yes. [06:45:11] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [06:46:15] (03update) 10raymond-ndibe: [jobs-api] move jobs load to backend [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T366209) [07:16:42] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9905956 (10Marostegui) >>! In T367778#9904187, @fnegri wrote: > As suggested by @taavi I tried depooling `s1` on `clouddb1017`, so that all `s1` wiki... [07:30:57] (03update) 10sstefanova: api: remove unprefixed endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T363346) [07:34:18] (03open) 10raymond-ndibe: [jobs-api] fix issues in openapi schema [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/96 [07:37:35] (03update) 10sstefanova: api: remove unprefixed endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T363346) [07:52:38] (03update) 10sstefanova: api: remove unprefixed endpoints [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/33 [08:02:16] (03approved) 10lucaswerkmeister: Bridge #wikimedia-rust IRC and Matrix [toolforge-repos/bridgebot] - 10https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/5 (https://phabricator.wikimedia.org/T366767) (owner: 10legoktm) [08:02:25] (03merge) 10lucaswerkmeister: Bridge #wikimedia-rust IRC and Matrix [toolforge-repos/bridgebot] - 10https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/5 (https://phabricator.wikimedia.org/T366767) (owner: 10legoktm) [08:30:35] 10superset.wmcloud.org: superset.wmcloud.org down - https://phabricator.wikimedia.org/T367945 (10Zache) 03NEW [08:36:04] 10Tool-schedule-deployment: Link diff - https://phabricator.wikimedia.org/T367948 (10jhsoby) 03NEW [08:39:24] 10Tool-schedule-deployment, 10WikimediaDebug: Integrate schedule-deployment with WikimediaDebug - https://phabricator.wikimedia.org/T367213#9906179 (10jhsoby) >>! In T367213#9881386, @bd808 wrote: > I believe this would need to work by adding something to the WikimediaDebug extension that changes what is rende... [08:42:12] 10superset.wmcloud.org: superset.wmcloud.org down - https://phabricator.wikimedia.org/T367945#9906190 (10Xqt) [08:47:15] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge pod security via custom admission webhook - https://phabricator.wikimedia.org/T367950 (10aborrero) 03NEW [08:52:44] (03update) 10sstefanova: api: remove unprefixed endpoints [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/33 [08:59:54] 06cloud-services-team, 10Toolforge, 13Patch-For-Review, 10Sustainability (Incident Followup): [k8s,infra] kyverno has a track record of overloading the cluster, maybe on new ways - https://phabricator.wikimedia.org/T367386#9906251 (10aborrero) 05In progress→03Resolved a:03aborrero My theory of wh... [08:59:55] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge pod security via custom admission webhook - https://phabricator.wikimedia.org/T367950#9906264 (10aborrero) [08:59:58] 06cloud-services-team, 10Toolforge: [k8s,infra] track PSP migration plan - https://phabricator.wikimedia.org/T364297#9906265 (10aborrero) [09:00:07] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279110#9906266 (10aborrero) [09:05:53] 06cloud-services-team, 10Toolforge: toolforge: drop kyverno - https://phabricator.wikimedia.org/T367952 (10aborrero) 03NEW [09:12:11] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:18:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:27:41] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Bring an-redacteddb1001 into service to replace clouddb1021 - https://phabricator.wikimedia.org/T365453#9906327 (10BTullis) >>! In T365453#9902189, @Marostegui wrote: > We also need to include this host in zarcillo (I will do... [09:32:34] (03open) 10aborrero: resources: delete kyverno_pod_policy [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/44 (https://phabricator.wikimedia.org/T367952) [09:39:26] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9906369 (10Ladsgroup) Removed the old VM and the corresponding volumes [09:40:07] 10Cloud-VPS (Quota-requests), 10VPS-project-Codesearch: Extra 80GB volume to allow migration of buster VM to bullseye - https://phabricator.wikimedia.org/T367878#9906374 (10Ladsgroup) Hi, I'm now done with the migration, please shrink it back to 80GB (from 160GB). [09:40:10] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9906370 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup [09:40:28] (03update) 10aborrero: resources: delete kyverno_pod_policy [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/44 (https://phabricator.wikimedia.org/T367952) [09:45:25] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9906394 (10fnegri) > As suggested by @taavi I tried depooling s1 on clouddb1017, so that all s1 wikireplica traffic will go to the other host (cloudd... [09:47:37] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9906403 (10Marostegui) >>! In T367778#9906394, @fnegri wrote: >> As suggested by @taavi I tried depooling s1 on clouddb1017, so that all s1 wikirepli... [09:49:16] 06cloud-services-team: haproxy: install some command line interface - https://phabricator.wikimedia.org/T367956 (10aborrero) 03NEW [09:50:20] 06cloud-services-team: haproxy: install some command line interface - https://phabricator.wikimedia.org/T367956#9906420 (10taavi) fwiw, I tend to just port-forward the stats interface on port 8404 to my laptop. [09:52:39] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:52:59] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Bring an-redacteddb1001 into service to replace clouddb1021 - https://phabricator.wikimedia.org/T365453#9906432 (10Marostegui) 05Open→03Resolved This has been done @BTullis can you let me know when clouddb1021 is decommis... [09:53:05] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Bring an-redacteddb1001 into service to replace clouddb1021 - https://phabricator.wikimedia.org/T365453#9906435 (10Marostegui) 05Resolved→03Open Sorry wrong task [09:57:39] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:02:20] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9906471 (10fnegri) To verify if my theory is correct, I repooled clouddb1017, let's see if the lag starts increasing again. I tried executing the qu... [10:05:45] !log taavi@cloudcumin1001 incubator START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:06:22] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9906482 (10fnegri) Query plans on clouddb1013: ` root@clouddb1013:s1[enwiki_p]> EXPLAIN SELECT p1.page_title FROM page AS p1 WHERE p1.page_nam... [10:07:03] !log taavi@cloudcumin1001 incubator END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [10:07:14] !log taavi@cloudcumin1001 isa START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:08:26] !log taavi@cloudcumin1001 isa END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [10:08:57] !log taavi@cloudcumin1001 k8splay START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:11:28] !log taavi@cloudcumin1001 k8splay END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [10:12:58] !log taavi@cloudcumin1001 language START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:22:17] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9906498 (10Marostegui) It makes sense it has more load and hence the queries can take longer, as anayltics hosts have larger queries in general, whic... [10:23:17] !log taavi@cloudcumin1001 language END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [10:24:18] !log taavi@cloudcumin1001 ldap-dev START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:26:23] !log taavi@cloudcumin1001 ldap-dev END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [10:31:37] (03open) 10taavi: Revert "envvars-api: bump to 0.0.50-20240619035607-42829b67" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/340 [10:31:51] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [10:32:01] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [10:36:40] 10Toolforge (Toolforge iteration 11): envvars-api 0.0.50 depends on unreleased envvars-cli changes - https://phabricator.wikimedia.org/T367961 (10taavi) 03NEW [10:36:56] 10Toolforge (Toolforge iteration 11): envvars-api 0.0.50 depends on unreleased envvars-cli changes - https://phabricator.wikimedia.org/T367961#9906558 (10taavi) p:05Triage→03High [10:39:03] !log taavi@cloudcumin1001 library-upgrader START - Cookbook wmcs.openstack.migrate_project_to_ovs [10:41:36] !log taavi@cloudcumin1001 library-upgrader END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [11:04:48] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): [k8s,infra,alerting] improve HAproxy and k8s apiserver interaction - https://phabricator.wikimedia.org/T367389#9906620 (10aborrero) p:05High→03Medium [11:06:56] RESOLVED: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:06:56] RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:22:34] (03update) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:22:51] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#9906635 (10taavi) Currently the highest number in use is 47058. So that's 1081 accounts in the 148 days since I created this task, or about 7.3... [11:23:03] (03update) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:24:15] (03update) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:26:56] (03update) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:27:04] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9906640 (10LucasWerkmeister) Should be deployed now. [11:30:34] (03merge) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:30:37] (03update) 10sstefanova: dev: add pre-commit [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/147 [11:33:08] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [11:49:25] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1042.eqiad.wmnet' [11:51:17] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1043.eqiad.wmnet' [11:53:46] 10Toolforge (Toolforge iteration 11): Provision more non-NFS k8s workers - https://phabricator.wikimedia.org/T367964 (10taavi) 03NEW p:05Triage→03Medium [11:55:47] 10Toolforge: Provision more non-NFS k8s workers - https://phabricator.wikimedia.org/T367964#9906699 (10taavi) [12:10:26] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1043.eqiad.wmnet' [12:10:36] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1044.eqiad.wmnet' [12:25:10] 10wikitech.wikimedia.org: Requesting content administrator access for Kamila Součková - https://phabricator.wikimedia.org/T367967 (10kamila) 03NEW [12:27:41] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1044.eqiad.wmnet' [12:30:23] 10wikitech.wikimedia.org: Requesting content administrator access for Kamila Součková - https://phabricator.wikimedia.org/T367967#9906822 (10taavi) 05Open→03Resolved a:03taavi [12:34:55] (03open) 10aborrero: resources: fix state configmap key deletion [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/45 [12:37:08] 10Tool-bridgebot: lint:golang CI job times out - https://phabricator.wikimedia.org/T367969 (10LucasWerkmeister) 03NEW [12:37:54] 10wikitech.wikimedia.org, 06SRE, 10SRE-Access-Requests: Update "WMDE group" approvers on Wikitech - https://phabricator.wikimedia.org/T367914#9906851 (10kamila) 05Open→03Resolved a:03kamila [12:38:36] (03update) 10aborrero: resources: delete kyverno_pod_policy [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/44 (https://phabricator.wikimedia.org/T367952) [12:52:06] 10cloud-services-team (Hardware), 06DC-Ops: hw troubleshooting: cloudvirt1042 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971 (10taavi) 03NEW [12:53:40] 10cloud-services-team (Hardware), 06DC-Ops: hw troubleshooting: cloudvirt1042, cloudvirt1043 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971#9906901 (10taavi) [12:55:24] 10cloud-services-team (Hardware), 06DC-Ops: hw troubleshooting: cloudvirt1042, cloudvirt1043 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971#9906902 (10taavi) cloudvirt1043 seems to be having the same issue too. So this may be an issue for the entire batch. [12:55:30] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: hw troubleshooting: cloudvirt1042, cloudvirt1043 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971#9906903 (10taavi) [13:21:26] RESOLVED: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [13:50:07] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: hw troubleshooting: cloudvirt1042, cloudvirt1043 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971#9907023 (10taavi) As suggested by volans I tried running the firmware-upgrade cookbook on the other cumin server which h... [14:00:19] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1044'] [14:00:39] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1044'] [14:34:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: hw troubleshooting: cloudvirt1042, cloudvirt1043 fails to boot after a reimage - https://phabricator.wikimedia.org/T367971#9907209 (10taavi) 05Open→03Resolved a:05Jclark-ctr→03taavi The reimages finished succesfully after a firmwar... [14:35:05] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1043'] [14:35:30] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1042'] [14:35:49] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1042'] [14:43:26] 10Toolforge (Toolforge iteration 11): envvars-api 0.0.50 depends on unreleased envvars-cli changes - https://phabricator.wikimedia.org/T367961#9907237 (10Raymond_Ndibe) Hello @taavi , thanks for helping reverse this. This was a oversight on my path. I thought the cli changes has already been deployed [14:48:24] (03open) 10andrew: cloudvps_flavors: add a few more flavors to support existing g3 VMs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/6 [14:51:19] !log taavi@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate True, for hosts list: ['cloudvirt1043'] [14:51:28] 10superset.wmcloud.org: update superset to 4.0.1 - https://phabricator.wikimedia.org/T367983 (10rook) 03NEW [14:53:16] (03update) 10andrew: cloudvps_flavors: add a few more flavors to support existing g3 VMs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/6 [14:53:46] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9907271 (10JMeybohm) [14:59:46] (03merge) 10taavi: cloudvps_flavors: add a few more flavors to support existing g3 VMs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/6 (owner: 10andrew) [15:05:16] (03update) 10aborrero: kubernetes: add some basic HAproxy alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/15 (https://phabricator.wikimedia.org/T367389) [15:10:15] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolforge: drop kyverno - https://phabricator.wikimedia.org/T367952#9907313 (10aborrero) 05Open→03In progress p:05Triage→03High [15:10:31] 06cloud-services-team, 10Toolforge: toolforge: create a new custom admission webhook to handle pod security settings - https://phabricator.wikimedia.org/T367985 (10aborrero) 03NEW [15:15:12] 06cloud-services-team, 10Toolforge: [k8s,infra] track PSP migration plan - https://phabricator.wikimedia.org/T364297#9907355 (10aborrero) [15:15:45] 10superset.wmcloud.org: superset.wmcloud.org down - https://phabricator.wikimedia.org/T367945#9907363 (10rook) This went down as part of a shift to g4 vm flavors. It's back, but it looks like 4.0.0 was deployed, so the query history doesn't work. I'm unsure if that will cause other problems, and we may have to w... [15:17:11] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1043'] [15:17:30] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1043'] [15:24:01] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9907379 (10fnegri) > But hey, a few of my queries just issued reports! I don't know what happened since I posted this message but something has changed for the better. This is likely related to the issues described in T367... [15:29:52] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [15:30:02] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [15:30:32] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [15:30:42] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [15:30:53] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [15:31:02] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [15:31:16] !log taavi@cloudcumin1001 logging START - Cookbook wmcs.openstack.migrate_project_to_ovs [15:34:14] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1042'] [15:34:36] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1042'] [15:37:38] 06cloud-services-team, 10Toolforge, 13Patch-For-Review, 10Sustainability (Incident Followup): [k8s,infra] kyverno has a track record of overloading the cluster, maybe on new ways - https://phabricator.wikimedia.org/T367386#9907425 (10aborrero) I sent additional information to upstream, in particular I... [15:39:56] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:40:26] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:40:39] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1044.eqiad.wmnet' [15:43:09] !log taavi@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=97) on host 'cloudvirt1044.eqiad.wmnet' [15:48:51] !log taavi@cloudcumin1001 logging END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [16:21:23] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907495 (10Legoktm) 05Open→03Resolved a:03Legoktm Thank you! `lang=irc 12:12:50 [matrix] ooh, I think the Matrix bridging is working now 12:12:58 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907503 (10LucasWerkmeister) Hm, I don’t see any old relayed message on the IRC side at least? {F55472940} [16:26:24] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907510 (10Legoktm) `lang=irc 04:15:17 --> wm-bb (~wm-bridge@wikimedia/bot/wm-bridgebot) has joined #wikimedia-rust 04:16:05 [matrix] I filed https://phabricato... [16:26:31] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907511 (10LucasWerkmeister) (Or is that just because I joined afterwards?) [16:27:13] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907513 (10LucasWerkmeister) Okay, then I just forgot what happened in which order, sorry ^^ [16:27:41] 10Tool-bridgebot: Bridge #wikimedia-rust on libera.chat and #wikimedia-rust:matrix.org - https://phabricator.wikimedia.org/T366767#9907518 (10Legoktm) https://github.com/42wim/matterbridge/issues/2033 is the upstream bug report. [16:28:10] 10Tool-bridgebot: lint:golang CI job times out - https://phabricator.wikimedia.org/T367969#9907515 (10LucasWerkmeister) 05Open→03Resolved a:03LucasWerkmeister It worked with another retry now 🤷 [16:59:54] 10superset.wmcloud.org: update superset to 4.0.1 - https://phabricator.wikimedia.org/T367983#9907608 (10rook) 05Open→03Declined [17:00:49] 10superset.wmcloud.org: superset.wmcloud.org down - https://phabricator.wikimedia.org/T367945#9907610 (10rook) Ok, it's back online on version 3.1.1 with a restored db from June 18. [17:03:38] 10superset.wmcloud.org: update superset to 4.0.1 - https://phabricator.wikimedia.org/T367983#9907611 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/superset-deploy/pull/24 [17:03:46] vivian-rook opened https://github.com/toolforge/superset-deploy/pull/24 [17:08:37] 10superset.wmcloud.org: update superset to 4.0.1 - https://phabricator.wikimedia.org/T367983#9907615 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/superset-deploy/pull/24 [17:08:44] vivian-rook closed https://github.com/toolforge/superset-deploy/pull/24 [17:18:57] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:53:48] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:41:08] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9907774 (10Liz) Well, the queries were running and finishing up for a few hours last night (at least it was night where I was at) but now they are back to not completing at all. But for a few hours, everything was in sync. [19:44:50] FIRING: NeutronAgentDown: Neutron neutron-linuxbridge-agent on cloudvirt1044 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:48:50] FIRING: NeutronAgentDownForLong: Neutron neutron-linuxbridge-agent on cloudvirt1044 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [19:48:55] 06cloud-services-team: NeutronAgentDownForLong A Neutron agent has been down for more than 2h, VMs will have connectivity issues - https://phabricator.wikimedia.org/T365461#9907779 (10phaultfinder) [21:18:57] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:27:36] 10Cloud-VPS (Debian Buster Deprecation), 06Research: Cloud VPS "research-collaborations-api" project Buster deprecation - https://phabricator.wikimedia.org/T367551#9907937 (10XiaoXiao-WMF) 05Open→03In progress a:03MunizaA [21:54:03] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:29:22] 10Tool-inteGraality: Create pages linked using `grouping_link` - https://phabricator.wikimedia.org/T368001 (10JeanFred) 03NEW [22:30:33] 10Tool-inteGraality: Create pages linked using `grouping_link` - https://phabricator.wikimedia.org/T368001#9908046 (10JeanFred) [23:45:05] FIRING: NeutronAgentDown: Neutron neutron-linuxbridge-agent on cloudvirt1044 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:49:05] FIRING: NeutronAgentDownForLong: Neutron neutron-linuxbridge-agent on cloudvirt1044 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong