[00:01:21] RESOLVED: PrometheusK8sCertExpirySoon: Prometheus k8s certificate is about to expire - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/PrometheusK8sCertExpirySoon - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPrometheusK8sCertExpirySoon [04:45:01] 10Tool-extjsonuploader, 10GitLab (Integrations): Support gitlab.wikimedia.org in extjsonuploader - https://phabricator.wikimedia.org/T352831#10861988 (10Samwilson) As Gerrit is no longer going to be retired, all extensions should probably be there instead of on GitLab. The above CommunityConfiguration one has... [04:51:28] FIRING: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:12:58] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10862082 (10Marostegui) db2186 being pooled [08:23:24] (03PS1) 10MVernon: apus: add docker-registry user with dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1151605 (https://phabricator.wikimedia.org/T394476) [08:29:24] (03PS1) 10Majavah: Add fake metricsinfra Phabricator credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1151607 (https://phabricator.wikimedia.org/T394446) [08:33:37] (03CR) 10Majavah: [V:03+2 C:03+2] Add fake metricsinfra Phabricator credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1151607 (https://phabricator.wikimedia.org/T394446) (owner: 10Majavah) [08:40:04] 10Tool-extjsonuploader, 10GitLab (Integrations): Support gitlab.wikimedia.org in extjsonuploader - https://phabricator.wikimedia.org/T352831#10862559 (10Tgr) 05Open→03Declined [08:41:00] (03CR) 10Elukey: [C:03+1] "<3" [labs/private] - 10https://gerrit.wikimedia.org/r/1151605 (https://phabricator.wikimedia.org/T394476) (owner: 10MVernon) [08:46:52] 10Tool-extjsonuploader, 10GitLab (Integrations): Support gitlab.wikimedia.org in extjsonuploader - https://phabricator.wikimedia.org/T352831#10862589 (10Peachey88) There isn't any requirement that they be on Gerrit, the same way many are on GitHub which extjsonuploader is setup to support. [08:46:58] (03CR) 10MVernon: [V:03+2 C:03+2] apus: add docker-registry user with dummy credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1151605 (https://phabricator.wikimedia.org/T394476) (owner: 10MVernon) [09:01:09] (03PS1) 10Majavah: Add PHAB contact group member type [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1151617 (https://phabricator.wikimedia.org/T394446) [09:08:23] (03PS1) 10Majavah: alertmanager: Support PHAB contact group members [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1151618 (https://phabricator.wikimedia.org/T394446) [09:10:32] 10Tool-extjsonuploader, 10GitLab (Integrations): Support gitlab.wikimedia.org in extjsonuploader - https://phabricator.wikimedia.org/T352831#10862715 (10Samwilson) True, people can have extensions on Wikimedia GitLab if they like, but we'd probably //encourage// them to use Gerrit wouldn't we? There's no s... [09:23:26] 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Consider setting up an https://github.com/knyar/phalerts instance in metricsinfra - https://phabricator.wikimedia.org/T394446#10862751 (10taavi) a:03taavi [09:23:55] 06cloud-services-team, 10Toolforge, 10Toolhub: Require tools to host a valid toolinfo.json file (e.g. while upgrading from one Debian version to another) - https://phabricator.wikimedia.org/T271712#10862753 (10taavi) p:05Triage→03Low [09:29:31] 06cloud-services-team, 10Cloud-VPS: Add some monitoring/non-paging alerts to codfw1dev - https://phabricator.wikimedia.org/T344440#10862759 (10taavi) [09:29:33] 06cloud-services-team, 10Observability-Metrics: Deploy 'cloud' Prometheus instance to codfw - https://phabricator.wikimedia.org/T350010#10862760 (10taavi) [09:29:40] 10Data-Services, 06Data-Engineering: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898#10862762 (10fnegri) @Milimetric this is not urgent, at least from the #cloud-services-team side. @Bugreporter do you have a specific use case for having this view in wiki replicas, or is it... [10:02:47] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10862830 (10ops-monitoring-bot) Finished cloning db2191.codfw.wmnet to db2186.codfw.wmnet - marostegui@cumin1002 [10:15:33] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10862845 (10ops-monitoring-bot) Started cloning db2242.codfw.wmnet to db2187.codfw.wmnet - marostegui@cumin1002 [10:15:56] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10862848 (10ops-monitoring-bot) Completed depool of db2242 - Depool db2242.codfw.wmnet to then clone it to db2187.codfw.wmnet - marostegui@cumin1002 - marostegui@cumin1002 [10:39:48] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,designate [10:40:50] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment codfw1dev for service: project,designate [11:02:14] 10Cloud-VPS (Quota-requests): Temporary quota increase for 'cvn' - https://phabricator.wikimedia.org/T395274#10862976 (10taavi) Per https://libera.chat/guides/connect there's a hostname (`irc.ipv6.libera.chat`) to force an IPv6 connection. If you have interest in trying out that then that's great, and if not we... [11:04:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432 (10fnegri) 03NEW [11:11:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10863009 (10Marostegui) Maybe you can delete it from a host and leave it for a week or two to see what breaks. [11:20:39] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10863034 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudlb1001.eqiad.wmnet with OS bookworm [11:32:13] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10863090 (10fnegri) > Maybe you can delete it from a host and leave it for a week or two to se... [11:36:31] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10863100 (10fnegri) 05Open→03In progress p:05Triage→03Medium [12:23:06] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10863217 (10Marostegui) db2187 has been cloned into s8 [12:26:22] FIRING: [58x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:29:43] ^ side effect of cloudlb1001 reimage [12:31:22] RESOLVED: [58x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:32:22] FIRING: [58x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:36:13] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10863271 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudlb1001.eqiad.wmnet with OS bookworm completed: - cloudlb1001 (**PASS**) - Downtimed o... [12:36:33] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10863272 (10taavi) [12:44:45] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10863327 (10ops-monitoring-bot) Start pool of db2242 gradually with 4 steps - Pool db2242.codfw.wmnet in after cloning - marostegui@cumin1002 [12:54:03] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [13:06:06] (03update) 10raymond-ndibe: [deploy] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [13:08:21] (03update) 10raymond-ndibe: [deploy] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [13:08:26] (03update) 10raymond-ndibe: [deploy] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [13:13:19] (03update) 10raymond-ndibe: [deploy] add force-build and force-run query params [repos/cloud/toolforge/components-api] (skip_build_if_refs_are_same) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/80 (https://phabricator.wikimedia.org/T389044) [13:19:00] (03update) 10raymond-ndibe: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] (update_toolforge_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) [13:30:19] 10Data-Services, 06DBA, 13Patch-For-Review: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10863600 (10ops-monitoring-bot) Completed pool of db2242 gradually with 4 steps - Pool db2242.codfw.wmnet in after cloning - marostegui@cumin1002 [14:03:04] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10863771 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudlb1002.eqiad.wmnet with OS bookworm [14:07:22] RESOLVED: [58x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:14:16] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10863837 (10Andrew) @SGupta-WMF since you're in India would you be willing to run some of the above tests so that we can get a second data point? We'd like to know who widespread this... [14:15:48] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10863842 (10SGupta-WMF) Sure @Andrew . Any directions or steps to help me ? [14:20:40] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10863880 (10Andrew) >>! In T395135#10863842, @SGupta-WMF wrote: > Sure @Andrew . Any directions or steps to help me ? You can just run the same steps that Nokib_Sarkar did, and paste... [14:22:55] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [14:23:34] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [14:31:54] (03update) 10raymond-ndibe: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] (update_toolforge_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) [14:33:24] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10863959 (10SGupta-WMF) **What is the output of http://test-ipv6.com/helpdesk/ ?** Your IPv4 address on the public Internet appears to be 49.36.238.255 Your IPv6 address on the publ... [14:33:54] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10863961 (10SGupta-WMF) @Andrew I added the results . Let me know if anything else is required. [14:34:56] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10863965 (10ops-monitoring-bot) Finished cloning db2242.codfw.wmnet to db2187.codfw.wmnet - marostegui@cumin1002 [14:44:50] (03update) 10raymond-ndibe: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] (update_toolforge_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) [14:49:43] 06cloud-services-team, 10Data-Services, 06DBA, 10Wikidata, and 2 others: Set up x3 replication to wikireplicas - https://phabricator.wikimedia.org/T390954#10864054 (10Ladsgroup) [15:07:15] 06cloud-services-team, 10Cloud-VPS, 07Upstream: codfw1dev has seen neutron metadata agents down since epoxy upgrade - https://phabricator.wikimedia.org/T395255#10864159 (10Andrew) After discussion we are moving ahead with the Epoxy upgrade, but this is still of interest! [15:14:46] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10864202 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudlb1002.eqiad.wmnet with OS bookworm completed: - cloudlb1002 (**PASS**) - Downtimed o... [15:15:09] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10864209 (10taavi) 05Open→03Resolved [15:15:22] 06cloud-services-team, 10Cloud-VPS: Upgrade cloudlb hosts to bookworm - https://phabricator.wikimedia.org/T375082#10864213 (10taavi) [15:19:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1005-dev.codfw.wmnet' (T390914) [15:19:27] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [15:19:53] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudservices1005-dev.codfw.wmnet' (T390914) [15:20:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1005-dev.eqiad.wmnet' (T390914) [15:21:24] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudservices1005-dev.eqiad.wmnet' (T390914) [15:22:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1005.eqiad.wmnet' (T390914) [15:27:17] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, 13Patch-For-Review: Move Striker to Bitu username validation API - https://phabricator.wikimedia.org/T364605#10864327 (10Arendpieter) [15:27:18] 06cloud-services-team, 10Striker, 13Patch-For-Review: Add Bitu container to Striker development environment - https://phabricator.wikimedia.org/T362318#10864326 (10Arendpieter) [15:28:08] PROBLEM - Host cloudservices1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:29:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudservices1005.eqiad.wmnet' (T390914) [15:29:41] RECOVERY - Host cloudservices1005 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [15:29:44] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [15:31:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1005-dev.eqiad.wmnet' (T390914) [15:32:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudservices1005-dev.eqiad.wmnet' (T390914) [15:32:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1005.eqiad.wmnet' (T390914) [15:39:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudservices1005.eqiad.wmnet' (T390914) [15:39:51] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [15:40:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices1006.eqiad.wmnet' (T390914) [15:48:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudservices1006.eqiad.wmnet' (T390914) [15:48:54] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [15:52:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' (T390914) [15:56:28] RESOLVED: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:57:35] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudcontrol1011.eqiad.wmnet' (T390914) [15:57:42] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [16:04:46] (03open) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044) [16:05:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' (T390914) [16:05:42] 06cloud-services-team, 10Data-Services, 06DBA, 10Wikidata, and 2 others: Set up x3 replication to wikireplicas - https://phabricator.wikimedia.org/T390954#10864657 (10fnegri) Sent an update to cloud-announce: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/2IEY34ZAUT3V... [16:05:44] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [16:18:10] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:18:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:19:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1011.eqiad.wmnet' (T390914) [16:19:58] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [16:20:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1007.eqiad.wmnet' (T390914) [16:20:21] (03PS1) 10Martindevelops: added endpoints.ts [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1151745 [16:23:10] RESOLVED: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:23:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:28:22] RESOLVED: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:28:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10864836 (10fnegri) Predictably I was wrong, and `maintain-views` is indeed using that user:... [16:37:37] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:39:52] RESOLVED: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:42:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1007.eqiad.wmnet' (T390914) [16:42:09] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [16:42:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1006.eqiad.wmnet' (T390914) [16:59:10] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [17:00:07] FIRING: [28x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:00:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1006.eqiad.wmnet' (T390914) [17:00:52] RESOLVED: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:00:54] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [17:02:02] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbot1003.eqiad.wmnet' (T390914) [17:02:36] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudrabbot1003.eqiad.wmnet' (T390914) [17:03:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1003.eqiad.wmnet' (T390914) [17:12:06] PROBLEM - Host cloudrabbit1003 is DOWN: PING CRITICAL - Packet loss = 100% [17:12:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1003.eqiad.wmnet' (T390914) [17:12:33] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [17:12:38] RECOVERY - Host cloudrabbit1003 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [17:13:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1002.eqiad.wmnet' (T390914) [17:16:40] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [17:17:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-bastionless.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:21:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1002.eqiad.wmnet' (T390914) [17:21:31] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [17:21:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove unused maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10865086 (10fnegri) If I remove `user` and `pass` from the `pymysql.connect()` parameters, it... [17:21:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1001.eqiad.wmnet' (T390914) [17:26:18] (03update) 10raymond-ndibe: Draft: [components-api] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [17:28:00] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [17:29:14] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [17:29:27] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [17:29:32] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10865104 (10fnegri) [17:30:22] FIRING: HAProxyBackendUnavailable: HAProxy service mysql backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:30:30] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1001.eqiad.wmnet' (T390914) [17:30:37] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [17:31:10] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [17:33:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet1005.eqiad.wmnet' (T390914) [17:35:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:36:10] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [17:36:18] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10865121 (10cmooney) Thanks for the reports. Regarding the trace in [[ https://phabricator.wikimedia.org/T395135#10859582 | this comment ]], the RTT gets quite high, at over 400ms tot... [17:40:22] RESOLVED: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:40:28] FIRING: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:41:46] PROBLEM - Host cloudnet1005 is DOWN: PING CRITICAL - Packet loss = 100% [17:42:22] RECOVERY - Host cloudnet1005 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [17:42:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudnet1005.eqiad.wmnet' (T390914) [17:42:43] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [17:43:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet1006.eqiad.wmnet' (T390914) [17:44:26] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-bastionless.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:03:55] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for all services [18:04:22] RESOLVED: [3x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:04:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1040.eqiad.wmnet' (T390914) [18:04:53] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:11:22] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' (T390914) [18:11:29] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:12:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-bastionless.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:13:26] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-bastionless.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:15:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1041.eqiad.wmnet' (T390914) [18:15:39] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044) [18:16:02] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:19:13] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044) [18:21:10] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044) [18:22:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1041.eqiad.wmnet' (T390914) [18:22:10] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:24:03] 10VPS-project-Wikistats: Add minwikibooks to wikistats - https://phabricator.wikimedia.org/T395504#10865375 (10Dzahn) 05Open→03Stalled Should be a subtask of T395452 directly. [18:24:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1042.eqiad.wmnet' (T390914) [18:25:50] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044) [18:27:17] FIRING: [2x] KernelErrors: Server cloudcontrol1011 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [18:27:23] FIRING: OOM: OOM killer active on cloudcontrol1011:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [18:27:23] 06cloud-services-team: KernelErrors Server cloudcontrol1011 logged kernel errors - https://phabricator.wikimedia.org/T395509 (10phaultfinder) 03NEW [18:28:38] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:30:28] RESOLVED: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:31:19] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1042.eqiad.wmnet' (T390914) [18:31:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1043.eqiad.wmnet' (T390914) [18:31:25] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:31:33] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:32:10] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [18:32:22] FIRING: [14x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:32:23] RESOLVED: OOM: OOM killer active on cloudcontrol1011:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [18:33:11] FIRING: SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:34:01] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:37:10] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [18:37:22] RESOLVED: [9x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:37:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirtlocal1001.eqiad.wmnet' (T390914) [18:37:32] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:37:33] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:38:00] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1043.eqiad.wmnet' (T390914) [18:38:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1044.eqiad.wmnet' (T390914) [18:38:11] FIRING: [2x] SystemdUnitDown: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:39:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10865449 (10fnegri) While we decide if we want to remove these users or not, I manually recreated the... [18:43:38] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [18:44:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirtlocal1001.eqiad.wmnet' (T390914) [18:44:42] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:44:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1044.eqiad.wmnet' (T390914) [18:44:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1045.eqiad.wmnet' (T390914) [18:49:21] PROBLEM - nova-compute proc minimum on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:29] PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:33] PROBLEM - nova-compute proc minimum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:33] PROBLEM - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:34] PROBLEM - nova-compute proc minimum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:35] PROBLEM - nova-compute proc minimum on cloudvirt1058 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:35] PROBLEM - nova-compute proc minimum on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:39] PROBLEM - nova-compute proc minimum on cloudvirt1059 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:49] PROBLEM - nova-compute proc minimum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:49] PROBLEM - nova-compute proc minimum on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:50] PROBLEM - nova-compute proc minimum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:51] PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:49:51] PROBLEM - nova-compute proc minimum on cloudvirt1067 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:03] PROBLEM - nova-compute proc minimum on cloudvirtlocal1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:03] PROBLEM - nova-compute proc minimum on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:04] PROBLEM - nova-compute proc minimum on cloudvirt1057 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:04] PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:07] PROBLEM - nova-compute proc minimum on cloudvirtlocal1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:09] PROBLEM - nova-compute proc minimum on cloudvirtlocal1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:09] PROBLEM - nova-compute proc minimum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:09] PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:10] PROBLEM - nova-compute proc minimum on cloudvirt1054 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:11] PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:12] PROBLEM - nova-compute proc minimum on cloudvirt1065 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:13] PROBLEM - nova-compute proc minimum on cloudvirt1062 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:50:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirtlocal1002.eqiad.wmnet' (T390914) [18:50:31] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:51:05] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1045.eqiad.wmnet' (T390914) [18:51:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1046.eqiad.wmnet' (T390914) [18:51:46] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10865473 (10fnegri) [18:52:09] RECOVERY - nova-compute proc minimum on cloudvirtlocal1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:52:51] RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:09] RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:11] FIRING: [3x] SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:53:23] PROBLEM - nova-compute proc maximum on cloudvirt1059 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:29] PROBLEM - nova-compute proc maximum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:29] PROBLEM - nova-compute proc maximum on cloudvirt1054 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:30] PROBLEM - nova-compute proc maximum on cloudvirt1067 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:31] PROBLEM - nova-compute proc maximum on cloudvirtlocal1001 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:33] PROBLEM - nova-compute proc maximum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:33] RECOVERY - nova-compute proc minimum on cloudvirt1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:34] PROBLEM - nova-compute proc maximum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:49] PROBLEM - nova-compute proc maximum on cloudvirtlocal1003 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:53:49] PROBLEM - nova-compute proc maximum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:01] PROBLEM - nova-compute proc maximum on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:03] PROBLEM - nova-compute proc maximum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:04] PROBLEM - nova-compute proc maximum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:04] PROBLEM - nova-compute proc maximum on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:04] PROBLEM - nova-compute proc maximum on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:09] PROBLEM - nova-compute proc maximum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:09] PROBLEM - nova-compute proc maximum on cloudvirt1057 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:10] PROBLEM - nova-compute proc maximum on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:10] PROBLEM - nova-compute proc maximum on cloudvirt1058 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:11] PROBLEM - nova-compute proc maximum on cloudvirt1065 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:12] PROBLEM - nova-compute proc maximum on cloudvirt1062 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:54:17] PROBLEM - nova-compute proc maximum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:56:51] PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:35] RECOVERY - nova-compute proc minimum on cloudvirt1050 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:36] RECOVERY - nova-compute proc minimum on cloudvirt1058 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1046.eqiad.wmnet' (T390914) [18:57:39] RECOVERY - nova-compute proc minimum on cloudvirt1059 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1047.eqiad.wmnet' (T390914) [18:57:45] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [18:57:49] RECOVERY - nova-compute proc minimum on cloudvirt1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:49] RECOVERY - nova-compute proc maximum on cloudvirt1044 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:49] RECOVERY - nova-compute proc minimum on cloudvirt1048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:50] RECOVERY - nova-compute proc minimum on cloudvirt1052 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:51] RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:52] RECOVERY - nova-compute proc minimum on cloudvirt1067 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:57:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirtlocal1002.eqiad.wmnet' (T390914) [18:58:01] RECOVERY - nova-compute proc maximum on cloudvirt1052 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:03] RECOVERY - nova-compute proc maximum on cloudvirt1047 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:03] RECOVERY - nova-compute proc maximum on cloudvirt1051 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:04] RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:04] RECOVERY - nova-compute proc maximum on cloudvirt1048 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:05] RECOVERY - nova-compute proc minimum on cloudvirt1051 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:06] RECOVERY - nova-compute proc minimum on cloudvirt1057 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:07] RECOVERY - nova-compute proc maximum on cloudvirt1050 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:09] RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:09] RECOVERY - nova-compute proc minimum on cloudvirt1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:10] RECOVERY - nova-compute proc maximum on cloudvirt1057 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:11] RECOVERY - nova-compute proc maximum on cloudvirt1061 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:12] RECOVERY - nova-compute proc minimum on cloudvirt1054 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:13] RECOVERY - nova-compute proc maximum on cloudvirt1049 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:14] RECOVERY - nova-compute proc maximum on cloudvirt1058 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:15] RECOVERY - nova-compute proc minimum on cloudvirt1065 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:16] RECOVERY - nova-compute proc maximum on cloudvirt1062 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:17] RECOVERY - nova-compute proc maximum on cloudvirt1065 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:18] RECOVERY - nova-compute proc minimum on cloudvirt1062 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:19] RECOVERY - nova-compute proc maximum on cloudvirt1042 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:21] RECOVERY - nova-compute proc minimum on cloudvirt1049 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:23] RECOVERY - nova-compute proc maximum on cloudvirt1059 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:29] RECOVERY - nova-compute proc maximum on cloudvirt1041 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:29] RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:29] RECOVERY - nova-compute proc maximum on cloudvirt1054 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:30] RECOVERY - nova-compute proc maximum on cloudvirt1067 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:33] RECOVERY - nova-compute proc maximum on cloudvirt1053 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:34] RECOVERY - nova-compute proc minimum on cloudvirt1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:34] RECOVERY - nova-compute proc minimum on cloudvirt1056 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:58:34] RECOVERY - nova-compute proc maximum on cloudvirt1056 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:00:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirtlocal1003.eqiad.wmnet' (T390914) [19:01:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove maintainviews and maintainindexes users - https://phabricator.wikimedia.org/T395432#10865521 (10fnegri) > I'm confused by the fact that you can (optionally) specify a user/password when... [19:02:08] RECOVERY - nova-compute proc minimum on cloudvirtlocal1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:02:32] RECOVERY - nova-compute proc maximum on cloudvirtlocal1001 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:02:48] RECOVERY - nova-compute proc maximum on cloudvirtlocal1003 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:03:04] RECOVERY - nova-compute proc minimum on cloudvirtlocal1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:03:53] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1047.eqiad.wmnet' (T390914) [19:03:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1048.eqiad.wmnet' (T390914) [19:03:59] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:06:49] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-metadata-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:07:12] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirtlocal1003.eqiad.wmnet' (T390914) [19:08:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudbackup1003.eqiad.wmnet' (T390914) [19:09:48] PROBLEM - nova-compute proc minimum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:10:05] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1048.eqiad.wmnet' (T390914) [19:10:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1049.eqiad.wmnet' (T390914) [19:10:11] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:10:48] RECOVERY - nova-compute proc minimum on cloudvirt1048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:11:51] (03close) 10chuckonwumelu: [cli] Adding warning message for beta [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/31 (https://phabricator.wikimedia.org/T394277) [19:15:34] FIRING: DiskSpace: Disk space cloudcontrol1011:9100:/ 5.806% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:17:01] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1049.eqiad.wmnet' (T390914) [19:17:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1050.eqiad.wmnet' (T390914) [19:17:08] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:18:23] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudbackup1003.eqiad.wmnet' (T390914) [19:18:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudbackup1004.eqiad.wmnet' (T390914) [19:20:34] RESOLVED: DiskSpace: Disk space cloudcontrol1011:9100:/ 5.801% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:23:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1050.eqiad.wmnet' (T390914) [19:23:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1051.eqiad.wmnet' (T390914) [19:23:34] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:28:30] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudbackup1004.eqiad.wmnet' (T390914) [19:28:36] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:28:45] (03update) 10don-vip: Draft: Migrate many media to a common media table [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/2 [19:29:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudbackup1001-dev.eqiad.wmnet' (T390914) [19:30:17] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1051.eqiad.wmnet' (T390914) [19:30:18] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1052.eqiad.wmnet' (T390914) [19:33:11] RESOLVED: [2x] SystemdUnitDown: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:35:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudbackup1001-dev.eqiad.wmnet' (T390914) [19:35:34] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:36:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1052.eqiad.wmnet' (T390914) [19:36:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1053.eqiad.wmnet' (T390914) [19:37:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudbackup1002-dev.eqiad.wmnet' (T390914) [19:42:56] FIRING: SystemdUnitDown: The systemd unit prometheus-node-textfile-wmcs-bastionless.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:43:02] 06cloud-services-team: SystemdUnitDown The systemd unit prometheus-node-textfile-wmcs-bastionless.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T395515 (10phaultfinder) 03NEW [19:43:05] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1053.eqiad.wmnet' (T390914) [19:43:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1054.eqiad.wmnet' (T390914) [19:43:13] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:43:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudbackup1002-dev.eqiad.wmnet' (T390914) [19:49:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1054.eqiad.wmnet' (T390914) [19:49:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1055.eqiad.wmnet' (T390914) [19:49:20] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [19:55:21] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1055.eqiad.wmnet' (T390914) [19:55:22] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1056.eqiad.wmnet' (T390914) [19:55:28] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:01:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [20:02:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1056.eqiad.wmnet' (T390914) [20:02:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1057.eqiad.wmnet' (T390914) [20:03:03] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:08:57] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:09:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1057.eqiad.wmnet' (T390914) [20:09:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1058.eqiad.wmnet' (T390914) [20:09:55] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:13:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [20:13:57] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:16:08] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1058.eqiad.wmnet' (T390914) [20:16:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1059.eqiad.wmnet' (T390914) [20:16:15] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:17:28] FIRING: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:23:12] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1059.eqiad.wmnet' (T390914) [20:23:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1060.eqiad.wmnet' (T390914) [20:23:23] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:27:28] RESOLVED: InstanceDown: Project cvn instance cvn-app13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:29:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1060.eqiad.wmnet' (T390914) [20:29:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1061.eqiad.wmnet' (T390914) [20:30:04] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:36:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1061.eqiad.wmnet' (T390914) [20:36:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1062.eqiad.wmnet' (T390914) [20:36:14] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:43:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1062.eqiad.wmnet' (T390914) [20:43:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1063.eqiad.wmnet' (T390914) [20:43:30] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:49:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1063.eqiad.wmnet' (T390914) [20:49:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1064.eqiad.wmnet' (T390914) [20:49:20] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:55:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1064.eqiad.wmnet' (T390914) [20:55:50] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1065.eqiad.wmnet' (T390914) [20:55:55] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [20:57:40] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [21:02:45] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1065.eqiad.wmnet' (T390914) [21:02:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1066.eqiad.wmnet' (T390914) [21:02:52] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:09:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1066.eqiad.wmnet' (T390914) [21:09:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1067.eqiad.wmnet' (T390914) [21:10:02] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:17:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1067.eqiad.wmnet' (T390914) [21:17:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1068.eqiad.wmnet' (T390914) [21:17:17] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:23:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1068.eqiad.wmnet' (T390914) [21:23:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1069.eqiad.wmnet' (T390914) [21:23:46] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:30:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1069.eqiad.wmnet' (T390914) [21:30:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1070.eqiad.wmnet' (T390914) [21:30:20] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:37:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1070.eqiad.wmnet' (T390914) [21:37:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1071.eqiad.wmnet' (T390914) [21:37:16] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:43:13] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1071.eqiad.wmnet' (T390914) [21:43:14] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1072.eqiad.wmnet' (T390914) [21:43:20] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:48:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1072.eqiad.wmnet' (T390914) [21:48:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1073.eqiad.wmnet' (T390914) [21:48:57] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [21:51:35] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [21:54:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1073.eqiad.wmnet' (T390914) [21:54:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1074.eqiad.wmnet' (T390914) [21:54:40] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [22:01:12] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1074.eqiad.wmnet' (T390914) [22:01:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1075.eqiad.wmnet' (T390914) [22:01:18] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [22:07:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1075.eqiad.wmnet' (T390914) [22:07:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1076.eqiad.wmnet' (T390914) [22:07:13] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [22:12:17] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1076.eqiad.wmnet' (T390914) [22:12:23] T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914 [22:19:57] (03update) 10addshore: Draft: Components [repos/cloud/toolforge/toolforge-gen-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli/-/merge_requests/2 [22:48:53] (03PS1) 10Krinkle: Revert "Move CVNBot17 (cvn-wikivoyage) from cvn-app12 to cvn-app13" [labs/countervandalism/stillalive] - 10https://gerrit.wikimedia.org/r/1151809 (https://phabricator.wikimedia.org/T395164) [22:49:00] (03CR) 10Krinkle: [C:03+2] Revert "Move CVNBot17 (cvn-wikivoyage) from cvn-app12 to cvn-app13" [labs/countervandalism/stillalive] - 10https://gerrit.wikimedia.org/r/1151809 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [22:49:24] (03Merged) 10jenkins-bot: Revert "Move CVNBot17 (cvn-wikivoyage) from cvn-app12 to cvn-app13" [labs/countervandalism/stillalive] - 10https://gerrit.wikimedia.org/r/1151809 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [23:05:13] (03open) 10raymond-ndibe: [builds-api] dummy PR to force-create a new release [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/134 [23:05:18] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [23:07:16] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29763 bytes in 7.581 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [23:11:18] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [23:20:08] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29762 bytes in 1.065 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [23:44:21] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [23:55:43] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044)