[00:02:29] 10Cloud-VPS (Debian Buster Deprecation), 06Machine-Learning-Team, 10Wikilabels: Cloud VPS "wikilabels" project Buster deprecation - https://phabricator.wikimedia.org/T367562#9993181 (10Andrew) I'm shutting down the Buster VMs today since they appear abandoned. If anyone restarts them, please follow up on thi... [00:03:32] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikipathways" project Buster deprecation - https://phabricator.wikimedia.org/T367563#9993184 (10Andrew) This VM is now shut down (although I didn't shut it down.) Can it be deleted? [00:03:40] 10Cloud-VPS (Debian Buster Deprecation), 06Community-Tech, 10IA Upload, 10Wikimedia OCR: Cloud VPS "wikisource" project Buster deprecation - https://phabricator.wikimedia.org/T367564#9993188 (10Andrew) The only remaining buster host in this project is ia-upload-prod.wikisource.eqiad1.wikimedia.cloud, which... [00:06:03] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wm-bot" project Buster deprecation - https://phabricator.wikimedia.org/T367567#9993205 (10Andrew) @MacFan4000 or I can delete them but please confirm! [00:12:41] 10Cloud-VPS (Debian Buster Deprecation), 06Community-Tech, 10IA Upload, 10Wikimedia OCR: Cloud VPS "wikisource" project Buster deprecation - https://phabricator.wikimedia.org/T367564#9993224 (10Samwilson) Yep, nearly. As noted in T369881 I'm just waiting another day or so before deleting it. I had a couple... [00:14:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:49:41] FIRING: CloudVPSDesignateLeaks: Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:25:12] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [01:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:25:24] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [01:25:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:55:31] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [01:55:33] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [01:55:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [02:02:10] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [02:07:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [02:28:39] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9993333 (10Papaul) @Jclark-ctr please update task with the error you are getting and what is on the console. [03:32:43] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "k8splay" project Buster deprecation - https://phabricator.wikimedia.org/T367535#9993351 (10Andrew) 05Open→03Resolved [03:47:42] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [03:48:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [03:53:15] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [03:58:00] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [04:06:56] FIRING: SystemdUnitDown: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:21:19] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [04:21:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [04:21:31] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [04:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [04:21:56] FIRING: [2x] SystemdUnitDown: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:51:24] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [04:51:25] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [04:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [04:51:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [05:40:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [05:45:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [06:01:56] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:16:57] FIRING: [2x] SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:17:08] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T370383 (10phaultfinder) 03NEW [06:36:39] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:41:39] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:04:49] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request: add 80Gb storage to catalyst project quota - https://phabricator.wikimedia.org/T370365#9993473 (10Slst2020) a:03Slst2020 [07:08:37] !log sstefanova@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase [07:08:45] !log sstefanova@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [07:10:25] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request: add 80Gb storage to catalyst project quota - https://phabricator.wikimedia.org/T370365#9993478 (10Slst2020) Done! Happy logging :) ` sstefanova@cloudcontrol1005:~$ sudo wmcs-openstack quota show catalyst +-----------------------+-------+ | Resource... [07:10:32] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Request: add 80Gb storage to catalyst project quota - https://phabricator.wikimedia.org/T370365#9993479 (10Slst2020) 05Open→03Resolved [07:16:53] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [07:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:17:05] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [07:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:17:45] (03update) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/40 (https://phabricator.wikimedia.org/T365014) [07:40:49] (03merge) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/40 (https://phabricator.wikimedia.org/T365014) [07:45:15] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: envvars-api: bump to 0.0.56-20240718074100-89430f97 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/431 (https://phabricator.wikimedia.org/T365014) [07:47:13] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [07:47:15] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [07:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:48:01] 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: profile::labs::lvm::srv fails with Invalid argument for --extents: 100%FREE - https://phabricator.wikimedia.org/T370312#9993559 (10hashar) 05In progress→03Resolved I love shooting myself in the foot one year apart and... [07:52:52] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "schematreerecommender" project Buster deprecation - https://phabricator.wikimedia.org/T367552#9993573 (10Michaelcochez) @Andrew I will be working on these in the coming week. The VMs can be moved to a newer version of Debian without issues. We have in the meant... [08:10:39] 10Toolforge (Toolforge iteration 13): [builds-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367182#9993600 (10dcaro) 05In progress→03Resolved [08:44:06] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [08:44:16] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [08:45:39] (03open) 10dcaro: cli: add `jobs-cli` to the user agent [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/56 [08:49:16] (03open) 10dcaro: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 [08:49:36] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [08:49:41] FIRING: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:49:48] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [08:51:34] (03update) 10dcaro: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 [08:51:41] (03update) 10dcaro: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 [08:52:33] (03update) 10sstefanova: envvars-api: bump to 0.0.56-20240718074100-89430f97 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/431 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:52:34] (03approved) 10sstefanova: envvars-api: bump to 0.0.56-20240718074100-89430f97 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/431 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:52:38] (03merge) 10sstefanova: envvars-api: bump to 0.0.56-20240718074100-89430f97 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/431 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:52:45] 10Toolforge (Toolforge iteration 13): [jobs-cli,builds-cli,envvars-cli] consolidate user agent - https://phabricator.wikimedia.org/T370393 (10dcaro) 03NEW [08:52:46] 10Toolforge (Toolforge iteration 13): [jobs-cli,builds-cli,envvars-cli] consolidate user agent - https://phabricator.wikimedia.org/T370393#9993743 (10dcaro) p:05Triage→03Medium [08:53:20] (03approved) 10sstefanova: cli: add `jobs-cli` to the user agent [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/56 (owner: 10dcaro) [08:53:26] (03update) 10sstefanova: cli: add `jobs-cli` to the user agent [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/56 (owner: 10dcaro) [08:54:12] 10Toolforge (Toolforge iteration 13): [jobs-cli,builds-cli,envvars-cli] consolidate user agent - https://phabricator.wikimedia.org/T370393#9993745 (10dcaro) 05Open→03In progress [08:54:14] (03update) 10dcaro: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 [08:54:28] (03update) 10dcaro: cli: add `jobs-cli` to the user agent [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/56 [08:57:22] (03open) 10dcaro: cli: add namespaces and host to the user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/52 [08:57:56] (03approved) 10sstefanova: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 (owner: 10dcaro) [08:57:57] (03update) 10sstefanova: cli: add host and namespaces to the user agent [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/80 (owner: 10dcaro) [08:59:42] (03update) 10sstefanova: cli: add namespaces and host to the user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/52 (owner: 10dcaro) [08:59:43] (03approved) 10sstefanova: cli: add namespaces and host to the user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/52 (owner: 10dcaro) [09:00:01] (03approved) 10dcaro: api: drop deprecated endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/108 (https://phabricator.wikimedia.org/T363346 https://phabricator.wikimedia.org/T365014) (owner: 10sstefanova) [09:08:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [09:13:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [10:01:48] 10cloud-services-team (Hardware): cloudcontrol2006-dev struggling with memory - https://phabricator.wikimedia.org/T370401 (10aborrero) 03NEW [10:14:16] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [10:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:14:28] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [10:14:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:17:11] FIRING: [2x] SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:35:24] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [10:41:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [10:42:06] (03PS1) 10Rahul44895: Corrected participants' spelling [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1055168 [10:44:58] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [10:45:00] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [10:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:45:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:46:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [10:49:58] 10cloud-services-team (Hardware): cloudcontrol2006-dev struggling with memory - https://phabricator.wikimedia.org/T370401#9994180 (10aborrero) the alert triggered again. [10:56:59] (03update) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [10:57:07] (03merge) 10aborrero: tofu-infra: introduce Cloud VPS networks for codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 (https://phabricator.wikimedia.org/T370037) [11:09:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [11:14:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [11:38:22] (03open) 10aborrero: networks: refactor to remove _set indirection and cloudvps keyword [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/16 (https://phabricator.wikimedia.org/T370037) [11:57:00] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [11:59:56] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29689 bytes in 5.136 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [12:13:02] (03merge) 10aborrero: networks: refactor to remove _set indirection and cloudvps keyword [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/16 (https://phabricator.wikimedia.org/T370037) [12:15:25] (03open) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) [12:16:04] (03open) 10aborrero: tofu-infra: import codfw1dev subnets [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/17 (https://phabricator.wikimedia.org/T370037) [12:18:55] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:18:59] FIRING: [2x] SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:19:42] (03open) 10aborrero: main: mark refactored resources as moved [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/18 [12:22:10] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/15 [12:25:13] (03close) 10aborrero: main: mark refactored resources as moved [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/18 [12:28:56] (03merge) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/51 (https://phabricator.wikimedia.org/T365014) [12:31:34] (03open) 10aborrero: data/networks: have eqiad1-r placeholders [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/19 [12:32:48] (03open) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T365014) [12:33:28] (03update) 10aborrero: data/networks: have eqiad1-r placeholders [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/19 [12:35:14] (03update) 10aborrero: data/networks: have eqiad1-r placeholders [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/19 [12:37:05] (03update) 10aborrero: data/networks: have eqiad1-r placeholders [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/19 [12:38:02] (03merge) 10aborrero: data/networks: have eqiad1-r placeholders [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/19 [12:40:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:41:00] 06cloud-services-team: PuppetFailure Puppet failure on cloudcontrol1006:9100 - https://phabricator.wikimedia.org/T370411 (10phaultfinder) 03NEW [12:42:11] (03update) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T365014) [12:45:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudcontrol1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:46:22] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T370412 (10phaultfinder) 03NEW [12:48:53] (03update) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T365014) [12:48:55] (03update) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) [12:49:56] FIRING: CloudVPSDesignateLeaks: Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:54:21] (03open) 10sstefanova: api: remove deprecated endpoints [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/42 (https://phabricator.wikimedia.org/T365014) [12:54:31] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: create a cookbook automation to run tofu - https://phabricator.wikimedia.org/T370414 (10aborrero) 03NEW [12:55:23] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: create a cookbook automation to run tofu - https://phabricator.wikimedia.org/T370414#9994538 (10aborrero) [12:55:52] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: create a cookbook automation to run tofu - https://phabricator.wikimedia.org/T370414#9994541 (10fnegri) I would maybe force `--branch main` if you select `--apply` (or at least require `--force` to apply a different branch) [12:57:27] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: create a cookbook automation to run tofu - https://phabricator.wikimedia.org/T370414#9994549 (10fnegri) We should also make use of Spicerack's locking functionality to prevent two people from running apply at the same time: https://doc.wikimedia.org/spi... [12:58:12] (03update) 10aborrero: tofu-infra: import codfw1dev subnets [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/17 (https://phabricator.wikimedia.org/T370037) [12:59:14] (03update) 10aborrero: tofu-infra: import codfw1dev subnets [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/17 (https://phabricator.wikimedia.org/T370037) [13:00:35] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: create a cookbook automation to run tofu - https://phabricator.wikimedia.org/T370414#9994565 (10aborrero) maybe we can only allow `plan` for non-default branch. [13:02:48] (03open) 10sstefanova: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/53 [13:08:57] (03update) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T365014) [13:09:04] (03merge) 10sstefanova: api endpoints: use plural paths [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T365014) [13:13:00] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [13:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:13:12] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [13:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:14:04] (03update) 10sstefanova: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/53 [13:14:08] (03approved) 10sstefanova: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/53 [13:14:11] (03merge) 10sstefanova: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/53 [13:22:09] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [13:24:06] 06cloud-services-team, 10Cloud-VPS, 07Epic, 13Patch-For-Review: Cloud VPS: extend tofu-infra coverage - https://phabricator.wikimedia.org/T370037#9994667 (10fnegri) [13:25:03] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29637 bytes in 3.653 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [13:34:05] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9994697 (10Jclark-ctr) @Papaul they to fail start pxe I have downgraded firmware on nic and set correct ports for pxe. but still continue to fail t... [13:44:00] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [13:44:02] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [13:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:50:48] FIRING: [3x] PuppetFailure: Puppet has failed on cloudcontrol1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:50:53] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T370412#9994771 (10phaultfinder) [13:57:23] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9994794 (10Papaul) @Jclark-ctr that are some helpful informations I will take a look at it once on site. [14:06:00] (03approved) 10dcaro: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) (owner: 10sstefanova) [14:06:07] (03update) 10dcaro: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) (owner: 10sstefanova) [14:08:26] (03update) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) [14:08:34] (03merge) 10sstefanova: api: rename api resources to plural [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/105 (https://phabricator.wikimedia.org/T365014) [14:14:30] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: builds-api: bump to 0.0.165-20240718140844-131a3480 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/432 (https://phabricator.wikimedia.org/T365014) [14:15:58] (03CR) 10Andrew Bogott: [C:03+2] Add rebuild_dbinstance cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1054912 (https://phabricator.wikimedia.org/T355721) (owner: 10Andrew Bogott) [14:21:27] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "schematreerecommender" project Buster deprecation - https://phabricator.wikimedia.org/T367552#9994891 (10Andrew) Thanks for the response! I encourage you to subscribe to the cloud-announce list and add a spam trap exception so things like this don't take you by... [14:22:21] 06cloud-services-team, 10Cloud-VPS, 10Data-Services, 13Patch-For-Review: Fix 'openstack database instance rebuild' - https://phabricator.wikimedia.org/T355721#9994880 (10Andrew) 05Open→03Resolved There's now a cookbook for this which so far seems to be working. [14:30:48] FIRING: [3x] PuppetFailure: Puppet has failed on cloudcontrol1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:35:03] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [14:35:13] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [14:39:33] (03open) 10dcaro: metrics: initialize all the stats to 0 [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/109 [14:39:47] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [14:39:58] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [14:42:58] (03open) 10fnegri: Add CI validation (fmt and validate) [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/20 [14:44:31] (03update) 10sstefanova: builds-api: bump to 0.0.165-20240718140844-131a3480 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/432 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:44:33] (03approved) 10sstefanova: builds-api: bump to 0.0.165-20240718140844-131a3480 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/432 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:44:39] (03merge) 10sstefanova: builds-api: bump to 0.0.165-20240718140844-131a3480 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/432 (https://phabricator.wikimedia.org/T365014) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:45:29] (03approved) 10aborrero: Add CI validation (fmt and validate) [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/20 (owner: 10fnegri) [14:46:03] (03merge) 10fnegri: Add CI validation (fmt and validate) [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/20 [14:50:44] (03update) 10dcaro: metrics: initialize all the stats to 0 [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/109 [14:50:48] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:00:24] (03update) 10dcaro: metrics: initialize all the stats to 0 [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/109 [15:00:55] (03update) 10aborrero: tofu-infra: import codfw1dev subnets [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/17 (https://phabricator.wikimedia.org/T370037) [15:04:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:09:39] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:10:18] 06cloud-services-team, 10Cloud-VPS, 07Epic, 13Patch-For-Review: Cloud VPS: extend tofu-infra coverage - https://phabricator.wikimedia.org/T370037#9995200 (10aborrero) [15:10:21] 06cloud-services-team, 10Cloud-VPS: Migrate Cloud VPS instances to VXLAN based networks - https://phabricator.wikimedia.org/T364725#9995201 (10aborrero) [15:57:02] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433 (10Robertsky) 03NEW [16:10:13] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [16:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:10:25] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [16:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:14:32] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9995562 (10Mdann52) As mentioned on en:WP:BOTN, I am happy to be added as a co-maintainer to ensure we do not have 1 maintainer linked, as long as Robertsky does not object! [16:16:22] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:17:11] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:18:47] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9995573 (10Robertsky) No objections to @Mdann52 joining in. I having dislike bus factor of 1. [16:20:07] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10video2commons: Replace or remove Debian Buster VMs in 'video' cloud-vps project - https://phabricator.wikimedia.org/T360711#9995576 (10Andrew) Hello @Don-vip! first of all: yes, I will maintain the nfs server. Is there anything I can do to ke... [16:20:17] 06cloud-services-team, 10Toolforge: Missing Perl packages on dev.toolforge.org for anomiebot workflows - https://phabricator.wikimedia.org/T360488#9995580 (10fnegri) [16:20:19] 06cloud-services-team, 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#9995579 (10fnegri) [16:21:22] RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:22:52] 10VPS-project-devtools, 06collaboration-services, 10Release-Engineering-Team (Priority Backlog 📥): Puppet failure on deploy-1006.devtools.eqiad1.wikimedia.cloud - Not authorized to call search on /file_metadata/volatile/GeoIP - https://phabricator.wikimedia.org/T370436 (10brennen) 03NEW [16:23:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#9995613 (10fnegri) I've added {T360488} as a subtask to remember that anomiebot is currently relying on the `... [16:40:55] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [16:40:57] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [16:40:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:45:54] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Tools to k8s version 1.26 - https://phabricator.wikimedia.org/T370249#9995720 (10dcaro) p:05Triage→03High [16:46:07] 10Toolforge: [replica_cnf,functional-tests] Run replica_cnf functional tests in lima-kilo with the rest of functional tests - https://phabricator.wikimedia.org/T369800#9995723 (10dcaro) p:05Triage→03Medium [16:46:19] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.frontend] Show the backend status - https://phabricator.wikimedia.org/T370324#9995724 (10dcaro) p:05Triage→03High [16:46:24] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.backend] Transform the "/" API reply to json - https://phabricator.wikimedia.org/T370323#9995726 (10dcaro) p:05Triage→03High [16:48:42] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'wikidata-dev' cloud-vps project - https://phabricator.wikimedia.org/T360713#9995733 (10Andrew) Now everything in this project is shut down except for wikidata-icinga.wikidata-dev.eqiad1.wikimedia.cloud [16:51:47] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 14Toolforge Build Service, 14Toolforge Jobs framework: [sct.backend] Create trove database - https://phabricator.wikimedia.org/T370317#9995736 (10dcaro) [16:52:42] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.backend] Create trove database - https://phabricator.wikimedia.org/T370317#9995737 (10dcaro) [16:54:01] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 14Toolforge Jobs framework: [sct.frontend] Show the backend status - https://phabricator.wikimedia.org/T370324#9995738 (10dcaro) [16:55:00] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13): [sct.frontend] Show the backend status - https://phabricator.wikimedia.org/T370324#9995741 (10dcaro) [16:55:56] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'wikidata-dev' cloud-vps project - https://phabricator.wikimedia.org/T360713#9995746 (10Andrew) [16:56:36] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248#9995749 (10dcaro) p:05Triage→03High [16:56:49] 06cloud-services-team, 10Toolforge: [infra,k8s] review k8s API usage by custom components for 1.26 upgrade - https://phabricator.wikimedia.org/T370247#9995751 (10dcaro) p:05Triage→03High [16:57:21] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] review k8s API usage by custom components for 1.26 upgrade - https://phabricator.wikimedia.org/T370247#9995753 (10dcaro) [16:57:24] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'wikidata-dev' cloud-vps project - https://phabricator.wikimedia.org/T360713#9995744 (10Andrew) @WMDE-leszek Can you please catch me up on the state of this project? [16:57:26] 06cloud-services-team, 10Toolforge: [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#9995757 (10dcaro) [16:57:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#9995758 (10dcaro) p:05Triage→03High [16:57:58] 06cloud-services-team, 10Toolforge: [infra,k8s] review kubelet flags before 1.26 upgrade - https://phabricator.wikimedia.org/T370245#9995764 (10dcaro) p:05Triage→03High [16:58:05] 06cloud-services-team, 10Toolforge: [lima-kilo, k8s] Upgrade Kubernetes in lima-kilo to version 1.26 - https://phabricator.wikimedia.org/T370244#9995766 (10dcaro) p:05Triage→03High [16:58:16] 06cloud-services-team, 10Toolforge: [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#9995760 (10dcaro) [16:58:19] 06cloud-services-team, 10Toolforge: toolforge: upgrade all Kubernetes components to versions supporting Kubernetes 1.26 - https://phabricator.wikimedia.org/T370046#9995768 (10dcaro) p:05Triage→03High [16:58:21] 06cloud-services-team, 10Toolforge: [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#9995762 (10dcaro) [16:58:21] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [16:58:34] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "shiny-r" project Buster deprecation - https://phabricator.wikimedia.org/T367553#9995771 (10Gehel) Not used by me or my team's. Unless @mpopov has an objection this can be deleted [16:58:37] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: puppetserver got OOMkilled - https://phabricator.wikimedia.org/T369797#9995770 (10dcaro) p:05Triage→03Medium [16:59:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: puppetserver got OOMkilled - https://phabricator.wikimedia.org/T369797#9995773 (10dcaro) [17:00:21] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29695 bytes in 9.781 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [17:06:22] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995788 (10dcaro) I'm not very familiar with the configuration, but first check, on the github org page, sonarcloud github app has enabled access to all the repos... [17:06:37] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "shiny-r" project Buster deprecation - https://phabricator.wikimedia.org/T367553#9995791 (10Andrew) OK, I'm going to shut down the VM for now and will delete it if I don't hear otherwise. Thanks for the response. [17:08:35] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995796 (10dcaro) When trying to re-import I get an error {F56507579} [17:08:43] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995806 (10dcaro) just changed the settings to allow only one repo, then changed to allow all repos again, it still shows empty (maybe it take a bit to refresh) [17:15:41] 10Horizon: Cannot view database instance logs in Horizon - https://phabricator.wikimedia.org/T353010#9995823 (10JJMC89) 05Open→03Resolved Not sure when or how, but its no longer an issue. [17:16:05] 10VPS-project-devtools, 06collaboration-services, 10Release-Engineering-Team (Priority Backlog 📥): Puppet failure on deploy-1006.devtools.eqiad1.wikimedia.cloud - Not authorized to call search on /file_metadata/volatile/GeoIP - https://phabricator.wikimedia.org/T370436#9995839 (10Dzahn) I have started invest... [17:16:08] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995847 (10dcaro) Hmm... I think it might be related to gitlab vs github (the fact that it did not work to send the report) [17:18:14] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#9995859 (10fnegri) Back from my holidays, here's a glance of what happened since my last comment: {F56505488} Things are looking a bit better and the... [17:18:59] 10Horizon: Cannot view database instance logs in Horizon - https://phabricator.wikimedia.org/T353010#9995862 (10Andrew) great! I hope this remains true after next week's updates. [17:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:22:07] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995890 (10dcaro) hmpf... I can't login into sonarcloud.io with wikimedia gitlab, as sso is only enabled for enterprise, that might make things complicated [17:23:01] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wm-bot" project Buster deprecation - https://phabricator.wikimedia.org/T367567#9995891 (10MacFan4000) 05Open→03Resolved a:03MacFan4000 [17:37:27] 10Tool-spacemedia, 10Toolforge, 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995945 (10dcaro) I created a new organization manually (can't import) called toolforge-repos, and added video2commons project, not sure if we can integrate bette... [17:42:27] 10Tool-spacemedia, 10Toolforge (Toolforge iteration 13), 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995975 (10dcaro) [17:42:46] 10Tool-spacemedia, 10Toolforge (Toolforge iteration 13), 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995980 (10dcaro) p:05Triage→03Medium [17:43:46] 10Tool-spacemedia, 10Toolforge (Toolforge iteration 13), 10video2commons: Enable SonarCloud usage for GitHub Toolforge projects - https://phabricator.wikimedia.org/T369267#9995973 (10dcaro) 05Open→03In progress a:03dcaro [17:53:39] 06cloud-services-team, 10Toolforge: Request for access for user dr0ptp4kt for 'admin' tool - https://phabricator.wikimedia.org/T364761#9996014 (10dr0ptp4kt) Thanks, the nginx logs are accessible on tools-proxy-7 and tools-proxy-8 for me now. [18:01:28] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9996064 (10bd808) #### Review of https://admin.toolforge.org/tool/adminstats #### * /data/project/adminstats/settings.py contained a [[https://www.mediawiki.org/wiki/Manual:Bot_p... [18:25:03] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9996181 (10bd808) #### Review of https://toolsadmin.wikimedia.org/tools/id/aivhelperbot #### * /data/project/aivhelperbot/.password contained a password for [[https://en.wikipedi... [18:25:32] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9996182 (10bd808) [18:32:20] 06Toolforge-standards-committee: Adoption request for AdminStatsBot and HBC AIV helperbot - https://phabricator.wikimedia.org/T370433#9996240 (10bd808) 05Open→03Resolved a:03bd808 After removing stored passwords from both tools I have added both @Robertsky and @Mdann52 as co-maintainers of the adminsta... [18:42:40] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "shiny-r" project Buster deprecation - https://phabricator.wikimedia.org/T367553#9996298 (10mpopov) Thanks for checking in! No objection from me. You can actually just delete the [[ https://openstack-browser.toolforge.org/project/shiny-r | shiny-r project ]] at... [18:49:38] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "shiny-r" project Buster deprecation - https://phabricator.wikimedia.org/T367553#9996343 (10Andrew) 05Open→03Resolved a:03Andrew I have deleted the project. Thanks for the followup! [18:54:29] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [18:55:27] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29687 bytes in 8.750 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [18:56:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:04:36] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370458 (10Andrew) 03NEW [19:04:42] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370459 (10Andrew) 03NEW [19:04:51] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460 (10Andrew) 03NEW [19:05:06] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370461 (10Andrew) 03NEW [19:05:23] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-shellbox.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370462 (10Andrew) 03NEW [19:06:43] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370465 (10Andrew) 03NEW [19:06:44] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370466 (10Andrew) 03NEW [19:06:48] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370467 (10Andrew) 03NEW [19:08:30] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [19:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:08:42] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [19:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:10:35] 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9996518 (10Andrew) [19:16:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:30:28] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370465#9996597 (10Andrew) [19:30:57] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370467#9996612 (10Andrew) [19:31:08] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370466#9996614 (10Andrew) [19:31:58] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370461#9996621 (10Andrew) [19:32:04] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#9996625 (10Andrew) [19:32:11] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370459#9996628 (10Andrew) [19:32:21] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370458#9996629 (10Andrew) [19:38:59] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [19:39:01] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [19:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:40:06] 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9996645 (10Andrew) [19:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:01:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:03:11] 10VPS-project-devtools, 06collaboration-services, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Puppet failure on deploy-1006.devtools.eqiad1.wikimedia.cloud - Not authorized to call search on /file_metadata/volatile/GeoIP - https://phabricator.wikimedia.org/T370436#9996722 (10Dzahn)... [20:04:32] 10VPS-project-devtools, 06collaboration-services, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Puppet failure on deploy-1006.devtools.eqiad1.wikimedia.cloud - Not authorized to call search on /file_metadata/volatile/GeoIP - https://phabricator.wikimedia.org/T370436#9996724 (10Dzahn)... [20:11:54] PROBLEM - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:00] PROBLEM - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:14] PROBLEM - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:22] FIRING: [28x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:12:24] PROBLEM - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:28] PROBLEM - Check DNS auth via UDP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:36] PROBLEM - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 248 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [20:12:38] PROBLEM - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:42] PROBLEM - Check DNS auth via TCP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:12:44] PROBLEM - Check DNS auth via TCP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:06] RECOVERY - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.028 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.6.113) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:16] RECOVERY - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.025 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:18] RECOVERY - Check DNS auth via UDP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.027 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:30] RECOVERY - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.022 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.6.113) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:32] RECOVERY - Check DNS auth via TCP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.050 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:34] RECOVERY - Check DNS auth via TCP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.033 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:14:46] RECOVERY - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.248 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [20:15:44] RECOVERY - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.022 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:15:52] RECOVERY - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.049 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:17:22] RESOLVED: [28x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:20:24] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services, 13Patch-For-Review: replace buster machines in devtools project - https://phabricator.wikimedia.org/T360964#9996759 (10Dzahn) This latest merge today finally fixed the puppet runs on `deploy-1006` (T370436) [20:21:18] PROBLEM - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:21:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:22:08] RECOVERY - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.076 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.6.113) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:23:36] 06Toolforge-standards-committee: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474 (10bd808) 03NEW [20:26:57] 06Toolforge-standards-committee: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#9996782 (10bd808) p:05Triage→03Medium a:03bd808 `lang=irc [16:03] < bd808> JJMC89: blame me for inventing a committee but then just hoping that they would fill in the process... [20:29:20] 06Toolforge-standards-committee: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#9996800 (10bd808) 05Open→03In progress I will work on the details for where we will collect nominations (probably on wikitech again) and make the announcement of the open process s... [20:37:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1050 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:43:27] FIRING: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:43:37] 06cloud-services-team: ProbeDown virt.cloudgw.eqiad1.wikimediacloud.org:0 failed when probed by icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4 from codfw. Availability is 0%. - https://phabricator.wikimedia.org/T370477 (10phaultfinder) 03NEW [20:44:56] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:46:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:46:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:48:27] RESOLVED: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:49:56] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:52:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:54:57] FIRING: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:57:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1050 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:07:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [21:14:57] RESOLVED: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:15:27] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370467#9996910 (10andrea.denisse) The VM can be safely removed as XHGui is now s... [21:21:16] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install new cloudcephmon hosts - https://phabricator.wikimedia.org/T364870#9996920 (10Papaul) @Jclark-ctr I checked 1004 PXE boot was set on both the 1G and 10G I disable it on the 1G you should be good now. You can check the oth... [21:23:38] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370467#9996923 (10andrea.denisse) I tried to delete the instance but I was unabl... [21:30:48] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370467#9996940 (10Andrew) 05Open→03Resolved a:03Andrew You're right! S... [21:43:41] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-ircd02 with a Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369919#9996975 (10Southparkfan) a:03Southparkfan [21:44:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-12 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:49:03] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-12 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:08:05] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [22:08:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:08:17] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [22:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:19:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [22:20:48] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-ircd02 with a Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369919#9997107 (10Southparkfan) `role::mw_rc_irc` seems to work fine on a Bullseye box, except for a loop. Puppet run... [22:22:06] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [22:26:00] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-ircd02 with a Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369919#9997111 (10Southparkfan) Loop fixed by setting `profile::base::remove_python2_on_bullseye: false` on prefix lev... [22:38:24] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [22:38:26] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [22:38:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [23:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:21:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-3 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:22:23] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Rebuild or delete deployment-docker-changeprop01 - https://phabricator.wikimedia.org/T369913#9997194 (10Southparkfan) Instance is offline, seems to be superseded by `deployment-changeprop-1.deployment-prep.eqiad1.w... [23:27:20] 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370487 (10Southparkfan) 03NEW [23:27:25] 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370487#9997223 (10Southparkfan) a:03Southparkfan [23:29:02] 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9997226 (10Southparkfan) [23:36:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-3 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:40:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [23:45:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM