[00:11:25] FIRING: TfInfraTestApplyFailed: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:21:57] 10superset.wmcloud.org: query is very slow in superset compared to running it directly - https://phabricator.wikimedia.org/T367676 (10Zache) 03NEW [01:12:57] FIRING: [3x] CloudVPSDesignateLeaks: Detected 27 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:26:22] 10Tool-citationanalyzer: Citation Analyzer is currently unavailable - https://phabricator.wikimedia.org/T367489#9896663 (10Pppery) Presumably #tool-citationanalyzer should be archived as the tool's sole maintainer was blocked for abusing Toolforge resources. [04:53:32] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9896698 (10Liz) Well, maybe it depends on the database, queries aren't running on enwiki_p. But I've noticed that editors have just given up on running Quarry queries since about 2 days ago when it stopped working [05:12:57] FIRING: [3x] CloudVPSDesignateLeaks: Detected 28 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:29:23] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "recommendation-api" project Buster deprecation - https://phabricator.wikimedia.org/T367549#9897089 (10Nikerabbit) We're targeting a fix for {T365347} to be included in next week train. [09:11:36] 10Toolforge: `toolforge jobs load jobs.yaml` crashes - https://phabricator.wikimedia.org/T367520#9897472 (10Slst2020) Hmm, I have tested this in lima-kilo with jobs-api versions 0.0.307 and 0.0.308, and I am so far unable to reproduce this bug with either version. ` slavina@lima-kilo:/home/slavina$ helm lis... [09:12:57] FIRING: [3x] CloudVPSDesignateLeaks: Detected 27 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:22:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-3 is lagging behind the primary, the current lag is 66469 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [09:24:31] RESOLVED: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-3 (errno 1595) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [09:24:31] RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [09:25:22] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "research-collaborations-api" project Buster deprecation - https://phabricator.wikimedia.org/T367551#9897536 (10MunizaA) Hi @Isaac, thanks for the ping! There is a cron job that runs every month and imports the latest clickstream dump into sqlite but this could... [09:26:46] 10Toolforge: `toolforge jobs load jobs.yaml` crashes - https://phabricator.wikimedia.org/T367520#9897548 (10Slst2020) I should probably learn to read properly, y'all were testing with invalid yaml files and there's a reason this task was closed as invalid. (TIL there's no facepalm emoji in Phabricator) xd [09:28:07] 14cloud-services-team (FY2023/2024-Q1-Q2), 10Data-Services: [toolsdb] Replication stopped because of invalid event - https://phabricator.wikimedia.org/T351457#9897551 (10fnegri) This happened again yesterday. Similar to the previous occurrences, `START SLAVE;` was enough to resume replication. ` Jun 16 14... [09:36:49] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043744 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [09:41:19] (03PS7) 10Majavah: openstack: Add cookbook to migrate a server to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043744 (https://phabricator.wikimedia.org/T326373) [09:41:51] (03CR) 10Majavah: [C:03+2] openstack: cloudnet: Remove migrate_to_ovs [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043743 (owner: 10Majavah) [09:44:59] (03Merged) 10jenkins-bot: openstack: cloudnet: Remove migrate_to_ovs [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043743 (owner: 10Majavah) [09:53:48] 10Cloud-VPS (Debian Buster Deprecation), 06Infrastructure-Foundations: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9897657 (10Jelto) > packager02.packaging.eqiad1.wikimedia.cloud According to the [etherpad upgrade docs](https://wikitech.wikimedia.org/wiki/Ether... [10:00:20] (03open) 10brouberol: Remove Quarry related messages from #wikimedia-analytics [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/46 [10:02:51] (03approved) 10btullis: Remove Quarry related messages from #wikimedia-analytics [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/46 (owner: 10brouberol) [10:04:56] FIRING: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:08:26] (03CR) 10Majavah: [C:03+2] openstack: Add cookbook to migrate a server to OVS (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043744 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [10:09:37] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9897698 (10MoritzMuehlenhoff) [10:09:56] FIRING: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:11:26] (03Merged) 10jenkins-bot: openstack: Add cookbook to migrate a server to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043744 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [10:11:51] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9897697 (10MoritzMuehlenhoff) >>! In T367544#9897657, @Jelto wrote: >> packager02.packaging.eqiad1.wikimedia.cloud > > According to the [etherpad upgr... [10:14:56] FIRING: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:17:12] !log taavi@cloudcumin1001 project-proxy START - Cookbook wmcs.openstack.migrate_server_to_ovs for server proxy-04 [10:18:22] !log taavi@cloudcumin1001 project-proxy END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server proxy-04 [10:19:56] RESOLVED: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:21:38] (03merge) 10brouberol: Remove Quarry related messages from #wikimedia-analytics [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/46 [10:21:55] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723 (10taavi) 03NEW [10:22:44] !log taavi@cloudcumin1001 project-proxy START - Cookbook wmcs.openstack.migrate_server_to_ovs for server maps-proxy-04 [10:23:59] !log taavi@cloudcumin1001 project-proxy END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server maps-proxy-04 [10:24:17] !log taavi@cloudcumin1001 project-proxy START - Cookbook wmcs.openstack.migrate_server_to_ovs for server project-proxy-acme-chief-02 [10:26:19] !log taavi@cloudcumin1001 project-proxy END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server project-proxy-acme-chief-02 [10:26:47] !log taavi@cloudcumin1001 project-proxy START - Cookbook wmcs.openstack.migrate_server_to_ovs for server project-proxy-puppetserver-1 [10:27:42] FIRING: [3x] CloudVPSDesignateLeaks: Detected 7 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:27:45] 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#9897767 (10fnegri) 05Open→03In progress [10:27:48] 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#9897769 (10fnegri) p:05Triage→03Medium [10:28:03] !log taavi@cloudcumin1001 project-proxy END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server project-proxy-puppetserver-1 [10:30:54] 10Cloud-VPS: Get rid of cloud-cumin VMs in cloudinfra project - https://phabricator.wikimedia.org/T367725 (10taavi) 03NEW [10:32:22] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloud-cumin-03 [10:33:12] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server cloud-cumin-03 [10:33:18] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloud-cumin-04 [10:33:59] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server cloud-cumin-04 [10:34:24] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloudinfra-acme-chief-02 [10:34:28] !log taavi@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server cloudinfra-acme-chief-02 [10:37:42] FIRING: [3x] CloudVPSDesignateLeaks: Detected 7 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:39:04] (03open) 10taavi: cloudvps_flavors: Add missing g3 flavors [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/4 [10:39:21] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server ntp-03 [10:40:06] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1035.eqiad.wmnet' [10:40:30] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server ntp-03 [10:40:42] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server syslog-server-audit01 [10:41:16] (03open) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [10:41:50] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server syslog-server-audit01 [10:48:34] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-static-2 [10:49:45] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-static-2 [10:49:58] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-acme-chief-2 [10:50:58] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-acme-chief-2 [10:51:14] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-docker-imagebuilder-2 [10:52:22] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-docker-imagebuilder-2 [10:53:22] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1035.eqiad.wmnet' [10:57:35] (03open) 10aborrero: k9s: update to latest version 0.32.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/143 [11:01:55] (03update) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [11:05:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:09:22] (03update) 10sstefanova: k9s: update to latest version 0.32.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/143 (owner: 10aborrero) [11:09:25] (03approved) 10sstefanova: k9s: update to latest version 0.32.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/143 (owner: 10aborrero) [11:10:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:16:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:21:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:22:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-3 is lagging behind the primary, the current lag is 3703 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [11:22:36] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-bastion-6 [11:23:52] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-bastion-6 [11:25:00] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-mail-2 [11:26:09] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-mail-2 [11:26:42] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-prometheus-1 [11:27:51] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-prometheus-1 [11:28:02] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-proxy-5 [11:29:03] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-proxy-5 [11:29:26] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-proxy-6 [11:30:36] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-proxy-6 [11:31:50] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-puppetdb-03 [11:32:49] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-puppetdb-03 [11:33:39] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-puppetserver-1 [11:34:49] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-puppetserver-1 [11:38:19] (03merge) 10aborrero: k9s: update to latest version 0.32.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/143 [11:39:41] (03update) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [11:40:02] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-harbor-1 [11:41:21] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-harbor-1 [11:41:31] (03approved) 10aborrero: cloudvps_flavors: Add missing g3 flavors [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/4 (owner: 10taavi) [11:41:43] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9898048 (10akosiaris) >>! In T367544#9897657, @Jelto wrote: >> packager02.packaging.eqiad1.wikimedia.cloud > > According to the [etherpad upgrade docs... [11:41:46] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-legacy-redirector-2 [11:42:01] (03merge) 10taavi: cloudvps_flavors: Add missing g3 flavors [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/4 [11:42:55] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-legacy-redirector-2 [11:49:12] (03PS1) 10Majavah: openstack: Fix comment [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046632 [11:49:12] (03PS1) 10Majavah: openstack: Add new g4 flavors to allowlist [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046633 [11:51:03] (03PS3) 10Majavah: openstack: ensure_canary: Use new g4 flavor for canary instances [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043149 (owner: 10Andrew Bogott) [11:52:04] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-test-k8s-haproxy-5 [11:52:48] (03CR) 10Majavah: [C:03+2] openstack: Fix comment [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046632 (owner: 10Majavah) [11:52:51] (03CR) 10Majavah: [C:03+2] openstack: Add new g4 flavors to allowlist [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046633 (owner: 10Majavah) [11:53:14] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-test-k8s-haproxy-5 [11:53:53] (03CR) 10Majavah: [C:03+2] openstack: ensure_canary: Use new g4 flavor for canary instances (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043149 (owner: 10Andrew Bogott) [11:55:59] (03Merged) 10jenkins-bot: openstack: Fix comment [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046632 (owner: 10Majavah) [11:55:59] (03Merged) 10jenkins-bot: openstack: Add new g4 flavors to allowlist [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046633 (owner: 10Majavah) [11:56:35] (03Merged) 10jenkins-bot: openstack: ensure_canary: Use new g4 flavor for canary instances [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043149 (owner: 10Andrew Bogott) [11:58:53] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1031', 'cloudvirt1032', 'cloudvirt1033', 'cloudvirt1034'] [11:59:12] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): [k8s,infra] Verify that kyverno policies are evaluated only for namespaced resources - https://phabricator.wikimedia.org/T367350#9898087 (10aborrero) 05In progress→03Resolved None of my poking around have demonstrated that the... [12:00:11] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1031', 'cloudvirt1032', 'cloudvirt1033', 'cloudvirt1034'] [12:00:22] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1035'] [12:00:40] (03close) 10sstefanova: d/changelog: bump to 16.0.11 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T366674) [12:00:41] !log taavi@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate True, for hosts list: ['cloudvirt1035'] [12:01:00] (03reopen) 10sstefanova: d/changelog: bump to 16.0.11 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/39 (https://phabricator.wikimedia.org/T366674) [12:01:13] (03update) 10sstefanova: d/changelog: bump to 16.0.11 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/39 (https://phabricator.wikimedia.org/T366674) [12:01:51] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1035'] [12:02:13] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1035'] [12:02:52] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [12:03:02] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [12:03:57] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9898112 (10taavi) [12:04:13] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-test-k8s-worker-10 [12:05:23] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-test-k8s-worker-10 [12:05:53] (03update) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [12:12:31] (03PS1) 10Majavah: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) [12:13:26] (03PS2) 10Majavah: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) [12:13:45] !log taavi@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.migrate_project_to_ovs [12:13:49] !log taavi@cloudcumin1001 testlabs END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=99) [12:14:28] (03PS3) 10Majavah: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) [12:14:49] !log taavi@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.migrate_project_to_ovs [12:14:53] !log taavi@cloudcumin1001 testlabs END (ERROR) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=2) [12:15:27] (03PS4) 10Majavah: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) [12:15:32] !log taavi@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.migrate_project_to_ovs [12:18:20] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [12:22:45] (03PS5) 10Majavah: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) [12:23:53] !log taavi@cloudcumin1001 testlabs END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=99) [12:24:09] !log taavi@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.migrate_project_to_ovs [12:24:31] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/12 [12:25:19] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' [12:25:38] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [12:26:45] !log taavi@cloudcumin1001 testlabs END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [12:29:45] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9898244 (10Jelto) It might be possible to schedule another Etherpad upgrade (T362432) before the packager02.packaging.eqiad1.wikimedia.cloud host is de... [12:30:18] !log taavi@cloudcumin1001 terraform START - Cookbook wmcs.openstack.migrate_project_to_ovs [12:30:41] (03merge) 10sstefanova: d/changelog: bump to 16.0.11 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/39 (https://phabricator.wikimedia.org/T366674) [12:34:21] !log taavi@cloudcumin1001 terraform END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [12:34:44] (03open) 10aborrero: kyverno: reintroduce resource limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 (https://phabricator.wikimedia.org/T367386) [12:34:51] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1036.eqiad.wmnet' [12:35:05] !log taavi@cloudcumin1001 testlabs START - Cookbook wmcs.openstack.migrate_server_to_ovs for server testlabs-nfs-1 [12:36:16] !log taavi@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server testlabs-nfs-1 [12:36:47] (03open) 10aborrero: k8s: deploy wmcs-k8s-metrics [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/144 [12:39:21] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate WMCS managed projects to g4 flavors - https://phabricator.wikimedia.org/T367723#9898331 (10taavi) [12:43:16] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9898383 (10taavi) [12:47:07] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9898401 (10JMeybohm) > builder-envoy-03.packaging.eqiad1.wikimedia.cloud Any objections to just remove the VM since we moved to (re-)packaging upstrea... [12:48:10] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9898402 (10akosiaris) >>! In T367544#9898401, @JMeybohm wrote: >> builder-envoy-03.packaging.eqiad1.wikimedia.cloud > > Any objections to just remove... [12:50:22] (03update) 10aborrero: kyverno: reintroduce resource limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 (https://phabricator.wikimedia.org/T367386) [12:52:09] (03update) 10aborrero: kyverno: reintroduce resource limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 (https://phabricator.wikimedia.org/T367386) [13:00:05] (03PS1) 10Btullis: Add a Cephx user key for the cephcsi plugin to use [labs/private] - 10https://gerrit.wikimedia.org/r/1046666 (https://phabricator.wikimedia.org/T327259) [13:00:28] (03PS2) 10Btullis: Add a dummy Cephx user key for the cephcsi plugin to use [labs/private] - 10https://gerrit.wikimedia.org/r/1046666 (https://phabricator.wikimedia.org/T327259) [13:01:44] 10Quarry: Deduplicate config load - https://phabricator.wikimedia.org/T349135#9898459 (10github-toolforge-bot) siddharthvp closed https://github.com/toolforge/quarry/pull/55 [13:01:50] siddharthvp closed https://github.com/toolforge/quarry/pull/55 [13:02:15] (03update) 10aborrero: kyverno: reintroduce resource limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 (https://phabricator.wikimedia.org/T367386) [13:02:47] (03update) 10sstefanova: k8s: deploy wmcs-k8s-metrics [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/144 (owner: 10aborrero) [13:02:56] (03approved) 10sstefanova: k8s: deploy wmcs-k8s-metrics [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/144 (owner: 10aborrero) [13:03:54] (03open) 10sstefanova: d/changelog: bump to 16.0.12 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/42 (https://phabricator.wikimedia.org/T366674) [13:17:53] (03CR) 10Btullis: [V:03+2 C:03+2] Add a dummy Cephx user key for the cephcsi plugin to use [labs/private] - 10https://gerrit.wikimedia.org/r/1046666 (https://phabricator.wikimedia.org/T327259) (owner: 10Btullis) [13:18:49] !log taavi@cloudcumin1001 devportal START - Cookbook wmcs.openstack.migrate_project_to_ovs [13:20:02] !log taavi@cloudcumin1001 devportal END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [13:26:49] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.migrate_server_to_ovs for server toolsbeta-test-k8s-ingress-7 [13:28:52] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server toolsbeta-test-k8s-ingress-7 [13:50:23] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1036'] [13:50:43] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1036'] [13:52:14] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:52:23] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [13:52:35] 10Tools: CitationBot v2 - https://phabricator.wikimedia.org/T367737#9898776 (10Aklapper) 05Open→03Invalid @RukminiInduru: Hi, please tell your fellow students and teacher(s) to please stop creating more such similar tickets in Phabricator. https://phabricator.wikimedia.org/maniphest/?ids=367489,367292,36... [13:52:49] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9898784 (10taavi) [14:00:50] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "recommendation-api" project Buster deprecation - https://phabricator.wikimedia.org/T367549#9898802 (10Isaac) > We're targeting a fix for T365347: Update endpoints used in Content and Section Translation to use the LiftWing version of the Recommendation API to b... [14:06:16] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07LDAP, 13Patch-For-Review: Update Wikitech's LDAP credentials to be read-only - https://phabricator.wikimedia.org/T367287#9898819 (10MoritzMuehlenhoff) p:05Triage→03Medium [14:18:05] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt-wdqs1003'] [14:18:18] !log taavi@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate True, for hosts list: ['cloudvirt-wdqs1003'] [14:19:13] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt-wdqs1003'] [14:19:27] !log taavi@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate True, for hosts list: ['cloudvirt-wdqs1003'] [14:21:44] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: hw troubleshooting: server fails to reboot for clouddb1018.eqiad.wmnet - https://phabricator.wikimedia.org/T367499#9898915 (10Jclark-ctr) 05Open→03Resolved @Marostegui Updated idrac and bios firmware took server down to min config... [14:26:04] 10Cloud-VPS (Project-requests): Request creation of wikimania-mautic VPS project - https://phabricator.wikimedia.org/T340439#9898931 (10Andrew) Hello @Robertsky ! Can this project now be deleted? If not, pleae mark it as in use on https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2024_Purge#wikimania-maut... [14:26:13] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "recommendation-api" project Buster deprecation - https://phabricator.wikimedia.org/T367549#9898947 (10Andrew) Sounds good! Thanks for keeping track of this. [14:28:45] (03merge) 10aborrero: k8s: deploy wmcs-k8s-metrics [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/144 [14:37:57] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:55:01] (03update) 10aborrero: kyverno: reintroduce resource limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/337 (https://phabricator.wikimedia.org/T367386) [15:01:00] (03update) 10aborrero: kind: add additional worker nodes to kubernetes [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/139 [15:01:06] (03update) 10aborrero: kind: add additional worker nodes to kubernetes [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/139 [15:05:26] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Upgrade clouddb1021 to bookworm - https://phabricator.wikimedia.org/T365450#9899186 (10Gehel) [15:05:54] 10Cloud-VPS, 07Puppet: systemd-timer-mail-wrapper should not send mail as root@wikimedia.org from Cloud VPS - https://phabricator.wikimedia.org/T367028#9899208 (10joanna_borun) [15:07:06] 10Striker, 10Bitu, 06Infrastructure-Foundations, 07Security: Special:NovaKey should have a message not to add production keys - https://phabricator.wikimedia.org/T276761#9899238 (10joanna_borun) p:05Triage→03Low [15:07:58] (03open) 10aborrero: k8s: metrics services require cert-manager [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/145 [15:09:14] (03merge) 10aborrero: k8s: metrics services require cert-manager [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/145 [15:11:16] (03update) 10aborrero: kind: add additional worker nodes to kubernetes [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/139 [15:13:14] 14Tool-citationanalyzer: CitationBot v2 - https://phabricator.wikimedia.org/T367737#9899311 (10JJMC89) [15:17:28] (03PS6) 10Andrew Bogott: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:19:27] 10Data-Services: [toolsdb] Clean up users and manage as code - https://phabricator.wikimedia.org/T367772 (10fnegri) 03NEW [15:19:57] 10Data-Services: [toolsdb] Clean up users and manage as code - https://phabricator.wikimedia.org/T367772#9899378 (10fnegri) [15:20:03] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:21:18] 06cloud-services-team, 10decommission-hardware: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773 (10Andrew) 03NEW [15:21:30] 06cloud-services-team, 10decommission-hardware: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899398 (10Andrew) [15:23:22] 06cloud-services-team, 10Cloud-VPS, 06Data-Platform-SRE: Decom cloudvirt-wdqs servers - https://phabricator.wikimedia.org/T367770#9899405 (10taavi) [15:24:50] (03PS7) 10Andrew Bogott: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:27:26] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:30:57] (03open) 10aborrero: lima-kilo: refresh source of lima_kilo_docker_addr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/146 [15:35:05] (03PS8) 10Andrew Bogott: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:41:04] (03CR) 10Andrew Bogott: [C:03+2] openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:41:16] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899476 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1001.eqiad.wmnet` - cloudvirt-wdqs1001.e... [15:43:49] (03Merged) 10jenkins-bot: openstack: Add cookbook to migrate an entire project to OVS [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1046644 (https://phabricator.wikimedia.org/T326373) (owner: 10Majavah) [15:44:00] (03update) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [15:46:03] (03merge) 10aborrero: helpers: add toolforge_kyverno_load_many_resources.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/142 (https://phabricator.wikimedia.org/T367386) [15:46:34] (03update) 10aborrero: lima-kilo: refresh source of lima_kilo_docker_addr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/146 [15:48:15] !log taavi@cloudcumin1001 wmcs-uptime START - Cookbook wmcs.openstack.migrate_project_to_ovs [15:49:27] !log taavi@cloudcumin1001 wmcs-uptime END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [15:51:18] !log taavi@cloudcumin1001 xtools START - Cookbook wmcs.openstack.migrate_project_to_ovs [15:54:28] !log taavi@cloudcumin1001 xtools END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [15:56:48] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899563 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1002.eqiad.wmnet` - cloudvirt-wdqs1002.e... [15:57:37] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899566 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1001.eqiad.wmnet` - cloudvirt-wdqs1001.e... [16:01:31] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/12 (owner: 10l10n-bot) [16:01:34] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/12 (owner: 10l10n-bot) [16:05:19] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778 (10fnegri) 03NEW [16:05:23] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899610 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1001.eqiad.wmnet` - cloudvirt-wdqs1001.e... [16:05:48] !log taavi@cloudcumin1001 account-creation-assistance START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:07:00] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb1017 (s1) - https://phabricator.wikimedia.org/T367778#9899615 (10fnegri) 05Open→03In progress p:05Triage→03High [16:07:54] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "sso" project Buster deprecation - https://phabricator.wikimedia.org/T367554#9899621 (10jbond) hi all i wanted to say that the sso project is used so that users have an SSO testing infrastructure to use in cloud services. Originally this was also used to provid... [16:10:36] !log taavi@cloudcumin1001 account-creation-assistance END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [16:11:43] !log taavi@cloudcumin1001 adiutor START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:14:13] !log taavi@cloudcumin1001 adiutor END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [16:14:59] !log taavi@cloudcumin1001 ajapaik START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:15:02] !log taavi@cloudcumin1001 ajapaik END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [16:15:22] !log taavi@cloudcumin1001 analytics START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:16:14] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899711 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1002.eqiad.wmnet` - cloudvirt-wdqs1002.e... [16:20:12] !log taavi@cloudcumin1001 analytics END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [16:20:38] !log taavi@cloudcumin1001 auditlogging START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:25:31] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899766 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt-wdqs1003.eqiad.wmnet` - cloudvirt-wdqs1003.e... [16:27:41] !log taavi@cloudcumin1001 auditlogging END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [16:36:02] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 13Patch-For-Review: decommission cloudvirt-wdqs100[1,2,3] - https://phabricator.wikimedia.org/T367773#9899829 (10Andrew) a:05Andrew→03None [17:16:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T364457) [17:17:03] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [17:25:28] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-42 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:29:02] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1037.eqiad.wmnet' (T364457) [17:29:07] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [17:30:28] FIRING: [2x] InstanceDown: Project tools instance tools-k8s-worker-nfs-25 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:31:23] FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-42 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [17:33:28] FIRING: InstanceDown: Project gitlab-runners instance runner-1030 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:35:00] 10Cloud-VPS: dwl reboot coordination request - https://phabricator.wikimedia.org/T367797 (10Giftpflanze) 03NEW [17:36:23] FIRING: [2x] ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-25 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [17:36:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1038.eqiad.wmnet' (T364457) [17:36:54] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [17:36:54] 06cloud-services-team: NeutronAgentDownForLong A Neutron agent has been down for more than 2h, VMs will have connectivity issues - https://phabricator.wikimedia.org/T365461#9900206 (10phaultfinder) [17:36:55] FIRING: NeutronAgentDownForLong: Neutron neutron-linuxbridge-agent on cloudvirt-wdqs1001 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [17:37:50] FIRING: [2x] NeutronAgentDown: Neutron neutron-linuxbridge-agent on cloudvirt-wdqs1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [17:41:15] PROBLEM - ensure kvm processes are running on cloudvirt1037 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:43:54] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9900226 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1037.eqiad.wmnet with O... [17:47:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-worker-11 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:50:22] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1038.eqiad.wmnet' (T364457) [17:50:28] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [17:50:28] FIRING: [5x] InstanceDown: Project tools instance tools-k8s-ingress-9 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:50:44] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch, 13Patch-For-Review: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9900252 (10Dzahn) @Ladsgroup new change https://gerrit.wikimedia.org/r/c/operations... [17:51:49] FIRING: NeutronAgentDownForLong: Neutron neutron-linuxbridge-agent on cloudvirt-wdqs1002 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [17:51:56] 06cloud-services-team: NeutronAgentDownForLong A Neutron agent has been down for more than 2h, VMs will have connectivity issues - https://phabricator.wikimedia.org/T365461#9900256 (10phaultfinder) [17:52:15] PROBLEM - ensure kvm processes are running on cloudvirt1038 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:52:28] FIRING: InstanceDown: Project paws instance paws-nfs-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:52:56] FIRING: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:54:24] FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node toolsbeta-test-k8s-worker-11 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [17:55:28] FIRING: InstanceDown: Project extdist instance extdist-06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:56:23] FIRING: [6x] ToolforgeKubernetesNodeNotReady: Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [18:07:56] RESOLVED: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:10:28] FIRING: [5x] InstanceDown: Project tools instance tools-k8s-ingress-9 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:11:23] FIRING: [6x] ToolforgeKubernetesNodeNotReady: Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [18:20:28] FIRING: [5x] InstanceDown: Project tools instance tools-k8s-ingress-9 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:21:23] FIRING: [4x] ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-ingress-9 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [18:28:28] RESOLVED: InstanceDown: Project gitlab-runners instance runner-1030 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:30:28] RESOLVED: InstanceDown: Project extdist instance extdist-06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:30:28] FIRING: [4x] InstanceDown: Project tools instance tools-k8s-ingress-9 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:31:23] RESOLVED: [3x] ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-ingress-9 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [18:31:43] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1037'] [18:32:05] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1037'] [18:32:28] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-worker-11 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:32:28] RESOLVED: InstanceDown: Project paws instance paws-nfs-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:33:31] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9900410 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1037.eqiad.wmnet with OS bo... [18:34:24] RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node toolsbeta-test-k8s-worker-11 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [18:35:28] RESOLVED: [4x] InstanceDown: Project tools instance tools-k8s-ingress-9 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:36:58] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9900445 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1038.eqiad.wmnet with O... [18:37:57] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:38:02] 10Cloud-VPS: grafana.wmcloud.org down - https://phabricator.wikimedia.org/T367803 (10JJMC89) 03NEW [18:48:19] RESOLVED: NeutronAgentDownForLong: Neutron neutron-linuxbridge-agent on cloudvirt-wdqs1002 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [18:48:52] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch, 13Patch-For-Review: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9900510 (10Ladsgroup) Yeah, I don't think we had any specific need for a very speci... [18:50:19] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-linuxbridge-agent on cloudvirt-wdqs1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [18:56:37] 10Cloud-VPS: grafana.wmcloud.org down - https://phabricator.wikimedia.org/T367803#9900571 (10JJMC89) now 503 Service Unavailable [19:08:36] 10Cloud-VPS: grafana.wmcloud.org down - https://phabricator.wikimedia.org/T367803#9900595 (10Andrew) 05Open→03Resolved a:03Andrew This was a combination of me making an error when migrating the trove VM (resulting in it having a broken network for a few minutes) and https://storyboard.openstack.org/#!/... [19:18:49] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1038'] [19:19:08] !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1038'] [19:21:44] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1038'] [19:22:07] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1038'] [19:22:36] RECOVERY - ensure kvm processes are running on cloudvirt1038 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:23:01] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9900622 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1038.eqiad.wmnet with OS bo... [19:28:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T364457) [19:28:25] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [19:32:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1039.eqiad.wmnet' (T364457) [19:32:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' (T364457) [19:44:06] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9900663 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1039.eqiad.wmnet with O... [19:55:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' (T364457) [19:55:34] T364457: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457 [19:55:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1041.eqiad.wmnet' (T364457) [19:57:39] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "globaleducation" project Buster deprecation - https://phabricator.wikimedia.org/T367531#9900719 (10Ragesoss) Thanks! I plan to work on this the first week of July. [19:58:23] 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9900722 (10Teslaton) If you look through //Execution time// column on// Recent queries// list, it actually seems like that results of virtually //any// query with execution time longer that ~120s will never make it back, cu... [20:10:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1041.eqiad.wmnet' (T364457) [20:15:47] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "research-collaborations-api" project Buster deprecation - https://phabricator.wikimedia.org/T367551#9900891 (10Isaac) > I'll use this as an opportunity to flesh out the README for wikinav with instructions and will link that back here shortly. Thanks! [20:32:12] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1039'] [20:32:34] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1039'] [20:33:52] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dashiki" project Buster deprecation - https://phabricator.wikimedia.org/T367526#9901041 (10Aklapper) [20:34:02] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "etytree" project Buster deprecation - https://phabricator.wikimedia.org/T367529#9901055 (10Aklapper) [20:34:12] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#9901043 (10Aklapper) [20:35:15] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901126 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1039.eqiad.wmnet with OS bo... [20:35:15] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "machine-learning" project Buster deprecation - https://phabricator.wikimedia.org/T367537#9901123 (10Aklapper) [20:35:39] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901138 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1040.eqiad.wmnet with O... [20:35:41] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#9901092 (10Aklapper) [21:20:53] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901302 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1040.eqiad.wmnet with OS bo... [21:55:21] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1040'] [21:55:44] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1040'] [21:55:58] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901378 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with O... [22:35:53] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch, 13Patch-For-Review: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9901559 (10Dzahn) >>! In T367479#9900510, @Ladsgroup wrote: > Yeah, I don't think w... [22:37:57] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:38:23] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [22:38:30] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch, 13Patch-For-Review: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9901565 (10Dzahn) docker containers with hound are running: ` dzahn@codesearch9:~$... [22:38:41] !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [22:40:34] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [22:40:56] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1041'] [22:42:48] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901567 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bo... [22:46:27] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9901568 (10Andrew) [23:11:59] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-project-Codesearch, 13Patch-For-Review: Replace or remove Debian Buster VMs in 'codesearch' cloud-vps project - https://phabricator.wikimedia.org/T367479#9901635 (10Dzahn) @Ladsgroup The next step to debug here is: https://codesearch-ba...