[00:02:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:03:55] FIRING: MaxConntrack: Max conntrack at 80.52% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:51:32] 06cloud-services-team: PuppetFailure Puppet failure on cloudbackup2003:9100 - https://phabricator.wikimedia.org/T365638#9916194 (10Andrew) [00:52:02] 06cloud-services-team: PowerSupplyFailure - https://phabricator.wikimedia.org/T368212#9916192 (10Andrew) →14Duplicate dup:03T365638 [00:54:10] 06cloud-services-team, 06DC-Ops, 10ops-codfw: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T368211#9916196 (10Andrew) Looks like this server needs a power supply replaced -- please let me know if we need to schedule downtime for this. [00:58:56] RESOLVED: MaxConntrack: Max conntrack at 85.45% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:10:45] 06cloud-services-team, 10Cloud-VPS, 10WikiCite: cloud-vps Trove instance 'wikicitations' shows host 'none' - https://phabricator.wikimedia.org/T368232 (10Andrew) 03NEW [01:11:56] 06cloud-services-team, 10Cloud-VPS, 10WikiCite: cloud-vps Trove instance 'wikicitations' shows host 'none' - https://phabricator.wikimedia.org/T368232#9916213 (10Andrew) @Harej you seem to be the only admin of this project currently. Note that the project is also unclaimed on https://wikitech.wikimedia.org/w... [01:18:57] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:19:06] 06cloud-services-team, 10Cloud-VPS: can the db server 'maps-test-2' be deleted? - https://phabricator.wikimedia.org/T368233 (10Andrew) 03NEW [01:22:13] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234 (10Andrew) 03NEW [01:22:44] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234#9916242 (10Andrew) @Dreamy_Jazz can you tell me if that database and associated data needs preserving, or if we can scrap it and start again? Thx. [02:03:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:54:54] 10Cloud-VPS, 10Wikispore: vanity domain for Wikispore - https://phabricator.wikimedia.org/T368236 (10jeremyb-phone) 03NEW [03:57:35] 10Cloud-VPS, 10Wikispore: vanity domain for Wikispore - https://phabricator.wikimedia.org/T368236#9916343 (10JJMC89) [03:57:36] 10Cloud-VPS, 13Patch-For-Review: Create mechanism to allow the use of vanity domains by projects behind the Cloud VPS shared HTTP proxy - https://phabricator.wikimedia.org/T342398#9916344 (10JJMC89) [04:35:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:40:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [04:40:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:45:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [04:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:13:08] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.207 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [05:28:04] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.277 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [06:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:54:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-43 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:18:06] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9916509 (10dcaro) @CDanis No problem, let's get that data gathered :) > Given the limited duration expected for thi... [08:34:15] (03PS1) 10Slyngshede: PAC4J secrets, required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) [08:34:53] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispeech: Cloud VPS "wikispeech" project Buster deprecation - https://phabricator.wikimedia.org/T367565#9916546 (10Lokal_Profil) [08:38:52] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispeech: Cloud VPS "wikispeech" project Buster deprecation - https://phabricator.wikimedia.org/T367565#9916558 (10Lokal_Profil) [08:40:22] (03open) 10nikerabbit: Don't relay wikibugs in #translatewiki [toolforge-repos/bridgebot] - 10https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/8 [08:51:42] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [08:51:50] (03PS1) 10FNegri: toolsdb: update replica host name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049098 [09:02:04] (03PS2) 10Slyngshede: PAC4J secrets, required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) [09:02:47] (03CR) 10Slyngshede: "Forgot another set of keys." [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [09:07:21] 10Toolforge (Toolforge iteration 11): Provision more non-NFS k8s workers - https://phabricator.wikimedia.org/T367964#9916748 (10taavi) a:03taavi [09:07:58] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [09:08:00] (03CR) 10FNegri: [C:03+2] toolsdb: update replica host name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049098 (owner: 10FNegri) [09:08:40] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [09:09:00] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [09:11:20] (03Merged) 10jenkins-bot: toolsdb: update replica host name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049098 (owner: 10FNegri) [09:14:16] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-06-21 - https://phabricator.wikimedia.org/T368250 (10fnegri) 03NEW [09:15:56] 10cloud-services-team (FY2023/2024-Q3-Q4), 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Maintenance: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264#9916810 (10fnegri) [09:15:57] 10Data-Services: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#9916809 (10fnegri) [09:15:58] 14cloud-services-team (FY2023/2024-Q1-Q2), 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Unplanned: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-01-19 - https://phabricator.wikimedia.org/T355411#9916811 (10fnegri) [09:16:00] 14cloud-services-team (FY2023/2024-Q1-Q2), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2023-09-01 - https://phabricator.wikimedia.org/T345450#9916812 (10fnegri) [09:16:01] 14cloud-services-team (FY2023/2024-Q1-Q2), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2023-08-08 - https://phabricator.wikimedia.org/T343819#9916813 (10fnegri) [09:16:03] 14cloud-services-team (FY2022/2023-Q4), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2023-07-13 - https://phabricator.wikimedia.org/T341891#9916814 (10fnegri) [09:16:04] 14cloud-services-team (FY2022/2023-Q4), 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Unplanned: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2023-06-02 - https://phabricator.wikimedia.org/T338031#9916815 (10fnegri) [09:17:30] 10Data-Services: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#9916818 (10fnegri) [09:18:02] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-checker-5 [09:18:58] !log taavi@cloudcumin1001 tools Added a new k8s worker tools-k8s-worker-105.tools.eqiad1.wikimedia.cloud to the cluster [09:18:58] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [09:19:13] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-checker-5 [09:19:48] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-8 [09:20:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [09:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:20:53] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster [09:21:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-8 [09:21:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [09:21:21] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-package-builder-04 [09:22:29] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-package-builder-04 [09:23:15] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-services-05 [09:24:18] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-services-05 [09:24:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-8 [09:25:46] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-8 [09:26:45] (03PS3) 10Slyngshede: C:apereo_cas Additional secrets required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) [09:26:50] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-imagebuilder-2 [09:27:43] (03merge) 10aborrero: kyverno_pod_policy: use patch operation in do_update() [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/47 (https://phabricator.wikimedia.org/T368141) [09:27:45] (03PS4) 10Slyngshede: C:apereo_cas Additional secrets required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) [09:27:59] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-imagebuilder-2 [09:28:40] (03PS5) 10Slyngshede: C:apereo_cas Additional secrets required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) [09:28:43] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-legacy-redirector-2 [09:30:12] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-legacy-redirector-2 [09:30:25] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-mail-4 [09:30:29] !log taavi@cloudcumin1001 tools Added a new k8s worker tools-k8s-worker-106.tools.eqiad1.wikimedia.cloud to the cluster [09:30:29] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [09:30:33] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: maintain-kubeusers: bump to 0.0.153-20240624092755-fd0244da [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/342 (https://phabricator.wikimedia.org/T368141) [09:30:39] 10Data-Services: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#9916892 (10fnegri) > Do they take longer to complete in the replica because of RBR replication or do they take a very long time in the primary too? > Are they getting logged in the slow-query l... [09:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:31:35] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-mail-4 [09:32:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetdb-2 [09:33:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetdb-2 [09:34:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetserver-01 [09:35:14] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetserver-01 [09:35:55] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster [09:39:02] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-06-21 - https://phabricator.wikimedia.org/T368250#9916950 (10fnegri) 05Open→03In progress [09:39:31] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-6 [09:40:53] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-6 [09:41:02] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-06-21 - https://phabricator.wikimedia.org/T368250#9916962 (10fnegri) p:05Triage→03High {F55819426} https://grafana.wmcloud.org/goto/6zWMbIQIz?orgId=1 [09:46:29] !log taavi@cloudcumin1001 tools Added a new k8s worker tools-k8s-worker-107.tools.eqiad1.wikimedia.cloud to the cluster [09:46:29] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster [09:47:56] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-harbor-1 [09:49:06] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-harbor-1 [09:50:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-5 [09:51:45] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-5 [09:52:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-cumin-1 [09:53:28] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-cumin-1 [09:58:19] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 [09:59:48] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-6 [10:01:49] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-6 [10:02:39] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-7 [10:03:34] !log fnegri@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-43 [10:04:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-7 [10:05:11] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [10:05:23] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [10:06:14] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-7 [10:08:14] FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-7.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HAproxy - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [10:08:17] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-7 [10:09:43] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 [10:11:02] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-7 [10:11:53] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-43 [10:13:05] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-7 [10:13:14] FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-7.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HAproxy - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [10:17:03] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [10:17:16] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [10:18:14] RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-7.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HAproxy - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [10:19:23] (03merge) 10aborrero: maintain-kubeusers: bump to 0.0.153-20240624092755-fd0244da [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/342 (https://phabricator.wikimedia.org/T368141) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:19:44] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-7 [10:20:53] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-7 [10:28:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-5 [10:29:52] (03update) 10aborrero: kyverno_pod_policy: set validation to Enforce [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/46 (https://phabricator.wikimedia.org/T368141) [10:30:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-5 [10:31:04] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 11): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#9917076 (10fnegri) The last error in `spi-tools` was on 2024-06-18, just a coincidence or have things improved for some reason? Is any... [10:32:49] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [10:34:53] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234#9917091 (10Dreamy_Jazz) @Andrew you can scrap it and start again if necessary. I presume that I'll need to look at setting up mediawiki vagrant after you do this? Is there a... [10:35:43] (03CR) 10Slyngshede: [C:03+2] C:apereo_cas Additional secrets required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [10:35:44] (03CR) 10Slyngshede: [V:03+2 C:03+2] C:apereo_cas Additional secrets required for CAS7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049095 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [10:52:34] (03update) 10aborrero: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) [11:01:12] (03update) 10aborrero: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) [11:04:30] 14MediaWiki-extensions-OpenStackManager, 06Diffusion-Repository-Administrators, 10Projects-Cleanup, 06translatewiki.net, 10Wikimedia-GitHub: Archive the OpenStackManager extension - https://phabricator.wikimedia.org/T367220#9917157 (10Jdforrester-WMF) [11:07:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:09:17] (03PS1) 10Slyngshede: C:apereo_cas Add dummy secrets for CAS 7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049129 (https://phabricator.wikimedia.org/T367487) [11:10:14] (03open) 10aborrero: kyverno: re-enable the reports controller [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/343 (https://phabricator.wikimedia.org/T368141) [11:12:53] (03approved) 10dcaro: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) (owner: 10aborrero) [11:12:56] RESOLVED: [2x] SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:17:38] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 [11:17:52] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-1 [11:19:07] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 [11:19:11] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 [11:20:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-2 [11:20:33] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Remove AAAA records from an-redacteddb1001 and allow connection from cumin - https://phabricator.wikimedia.org/T368220#9917187 (10BTullis) 05Open→03Resolved I have removed the AAAA record from an-redacteddb1001 in net... [11:20:43] (03CR) 10Slyngshede: [C:03+2] C:apereo_cas Add dummy secrets for CAS 7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049129 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [11:20:45] (03PS1) 10Majavah: kubernetes: Handle pods with no ownerReferences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 [11:20:46] (03CR) 10Slyngshede: [V:03+2 C:03+2] C:apereo_cas Add dummy secrets for CAS 7 [labs/private] - 10https://gerrit.wikimedia.org/r/1049129 (https://phabricator.wikimedia.org/T367487) (owner: 10Slyngshede) [11:20:49] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-2 [11:20:57] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-3 [11:21:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-3 [11:21:55] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-1 [11:23:13] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-1 [11:23:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-2 [11:24:33] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [11:24:43] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [11:24:53] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-2 [11:24:55] (03open) 10aborrero: kyverno: downscale the background controller [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/344 [11:25:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-3 [11:25:32] 10Cloud-Services, 06serviceops, 06SRE, 13Patch-For-Review: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#9917200 (10MoritzMuehlenhoff) CAS 7.0 (what we are currently migrating to) removed the memcached backend. As such, this change won't be nee... [11:25:45] 10Cloud-Services, 06serviceops, 06SRE, 13Patch-For-Review: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#9917201 (10MoritzMuehlenhoff) [11:26:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-3 [11:29:53] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 [11:30:30] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-4 [11:30:38] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-5 [11:30:42] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-4 [11:31:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-5 [11:31:34] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-6 [11:31:53] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-4 [11:32:02] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-6 [11:32:09] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-5 [11:32:19] 10Cloud-VPS (Debian Buster Deprecation), 06Research: Cloud VPS "research-collaborations-api" project Buster deprecation - https://phabricator.wikimedia.org/T367551#9917210 (10MunizaA) Hey @Isaac , since there's lots of moving parts to deploying this API (setting up nginx, installing dependencies, invoking guni... [11:33:28] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-5 [11:33:33] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-6 [11:35:43] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-6 [11:36:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-7 [11:36:40] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-7 [11:37:09] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 [11:37:13] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-7 [11:37:24] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-8 [11:38:29] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-7 [11:40:32] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 [11:40:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-8 [11:40:40] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 [11:40:42] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-8 [11:41:05] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-9 [11:41:51] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-8 [11:42:58] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-9 [11:44:07] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-9 [11:44:35] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno [11:44:48] 10Tool-bridgebot: lint:golang CI job times out - https://phabricator.wikimedia.org/T367969#9917239 (10Nikerabbit) 05Resolved→03Open My MR has timed out twice already: https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/8 [11:44:56] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno [11:45:45] !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno [11:45:57] !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno [11:48:38] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno [11:49:01] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno [11:49:35] (03merge) 10aborrero: kyverno: downscale the background controller [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/344 [11:50:50] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 [11:50:54] (03update) 10aborrero: kyverno: re-enable the reports controller [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/343 (https://phabricator.wikimedia.org/T368141) [11:56:07] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-10 [11:57:19] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 [11:58:05] (03update) 10aborrero: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) [11:58:52] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-10 [11:58:54] !log taavi@cloudcumin1001 tools END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=97) for node tools-k8s-worker-nfs-10 [11:59:16] (03update) 10aborrero: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) [11:59:58] (03merge) 10aborrero: kubernetes: introduce securityContext in the pod template [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37 (https://phabricator.wikimedia.org/T362050) [12:00:54] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-10 [12:02:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-acme-chief-4 [12:03:12] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-acme-chief-4 [12:05:22] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-static-15 [12:06:39] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-static-15 [12:26:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-8 [12:27:44] FIRING: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HAproxy - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:27:59] RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HAproxy - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:32:49] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-7 [12:32:52] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-11 [12:33:33] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-11 [12:34:07] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-7 [12:35:24] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-12 [12:35:28] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-11 [12:35:50] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-12 [12:36:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-11 [12:37:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-12 [12:37:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-13 [12:37:41] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-13 [12:39:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-12 [12:39:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:39:56] (03open) 10aborrero: kyverno: raise CPU request and limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/345 [12:44:55] (03open) 10aborrero: d/changelog: bump to 0.103.7 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/42 (https://phabricator.wikimedia.org/T362050) [12:44:59] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-13 [12:45:21] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-14 [12:45:59] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-14 [12:46:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-13 [12:46:32] (03open) 10dcaro: haproxy: use runbooks for each alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/16 [12:46:42] (03merge) 10aborrero: d/changelog: bump to 0.103.7 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/42 (https://phabricator.wikimedia.org/T362050) [12:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:49:56] (03approved) 10aborrero: haproxy: use runbooks for each alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/16 (owner: 10dcaro) [12:51:20] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-14 [12:51:23] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-15 [12:52:02] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-15 [12:52:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-14 [12:53:54] (03update) 10dcaro: functional-tests: add pod-policy smoke test [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/325 (owner: 10aborrero) [12:54:44] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [12:57:17] 10Cloud-Services, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Read-only access to Wikimedia mirror of Kiwix data in dumps.wikimedia.org/kiwix/ - https://phabricator.wikimedia.org/T348226#9917421 (10Benoit74) The #Cloud-Services project tag is not intended to have any tasks. Please check the li... [12:57:18] 10Cloud-Services, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265 (10Benoit74) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator... [12:58:26] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-15 [12:58:27] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-16 [12:59:08] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-16 [12:59:17] 10Data-Services, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917438 (10Benoit74) [12:59:36] 10Data-Services, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Read-only access to Wikimedia mirror of Kiwix data in dumps.wikimedia.org/kiwix/ - https://phabricator.wikimedia.org/T348226#9917439 (10Benoit74) [12:59:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-15 [12:59:48] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917441 (10Benoit74) [13:00:51] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [13:02:30] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [13:04:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:09:55] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: [bug] Quarry queries not completing - https://phabricator.wikimedia.org/T367464#9917461 (10fnegri) 05In progress→03Resolved I think I found the issue: the Trove database used by Quarry had `wait_timeout` set to 120 seconds, which meant that all idle... [13:09:58] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-16 [13:12:02] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-16 [13:13:14] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917487 (10taavi) Hi, sorry for that. The servers were rebooted to pick up updated network settings: https://lists.wikimedia.org/hyperkit... [13:14:54] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [13:15:41] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 [13:15:57] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 [13:16:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 [13:16:06] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 [13:16:26] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-17 [13:16:28] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-18 [13:17:06] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-18 [13:17:14] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-19 [13:17:35] (03CR) 10FNegri: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:17:52] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-19 [13:18:25] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-20 [13:18:32] (03CR) 10Majavah: [C:03+2] kubernetes: Handle pods with no ownerReferences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:18:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-17 [13:19:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-20 [13:19:12] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-18 [13:19:36] (03update) 10dcaro: api: auth and proxy requests to the backend APIs [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/23 (https://phabricator.wikimedia.org/T363983) [13:20:29] (03CR) 10David Caro: [C:03+1] kubernetes: Handle pods with no ownerReferences (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:21:01] (03CR) 10David Caro: kubernetes: Handle pods with no ownerReferences (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:21:09] (03CR) 10David Caro: [C:03+1] kubernetes: Handle pods with no ownerReferences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:21:16] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-18 [13:21:31] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-19 [13:22:01] (03Merged) 10jenkins-bot: kubernetes: Handle pods with no ownerReferences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049131 (owner: 10Majavah) [13:23:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-21 [13:23:33] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-19 [13:23:44] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-21 [13:24:27] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-20 [13:24:57] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-22 [13:25:27] (03update) 10dcaro: haproxy: use runbooks for each alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/16 [13:25:27] (03merge) 10dcaro: haproxy: use runbooks for each alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/16 [13:25:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-22 [13:25:45] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-20 [13:26:41] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-21 [13:26:45] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-23 [13:27:23] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-23 [13:28:51] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-21 [13:29:48] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-22 [13:30:57] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-22 [13:31:02] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917538 (10fnegri) 05Open→03In progress a:03fnegri > I checked the db schema (pasted below) and I have some minor privacy concerns if we were to expose all the tables an... [13:33:53] 10Data-Services, 07affects-Kiwix-and-openZIM: Read-only access to Wikimedia mirror of Kiwix data in dumps.wikimedia.org/kiwix/ - https://phabricator.wikimedia.org/T348226#9917596 (10Aklapper) [13:33:59] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-24 [13:34:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-23 [13:34:27] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-24 [13:35:11] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-23 [13:37:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 [13:37:20] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917616 (10fnegri) Maybe `wiki_uid` in the `user` table is potentially sensitive? It seems to be the Mediawiki id for the user, I'm not sure if we should expose it. [13:37:20] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-24 [13:37:44] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 [13:39:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-26 [13:39:23] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-24 [13:39:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-26 [13:39:48] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-25 [13:41:05] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-25 [13:41:19] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-26 [13:42:30] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-26 [13:43:24] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917646 (10Benoit74) @Rgaudin probably has higher chances to remember, since I wasn't there at that time. [13:43:42] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs for server tbd [13:50:05] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs (exit_code=0) for server tbd [13:54:24] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917694 (10SD0001) >>! In T367415#9917616, @fnegri wrote: > Maybe `wiki_uid` in the `user` table is potentially sensitive? It seems to be the Mediawiki id for the user, I'm no... [13:57:11] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 [13:57:16] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-nfs-2 [14:05:08] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Bring an-redacteddb1001 into service to replace clouddb1021 - https://phabricator.wikimedia.org/T365453#9917717 (10BTullis) That patch to refinery is merged, so we start using clouddb1021 during the next... [14:08:15] (03update) 10andrew: Add another integration-specific flavor [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 [14:08:55] (03update) 10andrew: Add new flavors: one for integration, one for tools nfs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 [14:10:33] (03update) 10andrew: Add new flavors: one for integration, one for tools nfs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 [14:10:55] (03update) 10taavi: Add new flavors: one for integration, one for tools nfs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 (owner: 10andrew) [14:11:32] (03update) 10taavi: Add new flavors: one for integration, one for tools nfs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 (owner: 10andrew) [14:11:36] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917737 (10fnegri) > MediaWiki user ids are public. They're exposed by the API and in the replicas as user_id field in the user table. Great, thanks! [14:12:20] (03merge) 10taavi: Add new flavors: one for integration, one for tools nfs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/7 (owner: 10andrew) [14:14:05] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917742 (10Rgaudin) We did not touch the mount points. [14:15:56] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917744 (10taavi) OTOH exposing the list of users that have logged in to Quarry, even if they've not interacted with anything that leaves a public trace, feels a bit questiona... [14:17:51] (03PS1) 10Andrew Bogott: migrate_server_to_ovs: support migration from two more flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049184 [14:18:55] (03CR) 10Majavah: [C:03+2] migrate_server_to_ovs: support migration from two more flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049184 (owner: 10Andrew Bogott) [14:19:24] (03CR) 10Jforrester: "check experimental" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/1027495 (owner: 10Libraryupgrader) [14:21:15] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917780 (10fnegri) > OTOH exposing the list of users that have logged in to Quarry, even if they've not interacted with anything that leaves a public trace, feels a bit questi... [14:21:42] (03Merged) 10jenkins-bot: migrate_server_to_ovs: support migration from two more flavors [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049184 (owner: 10Andrew Bogott) [14:24:14] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 [14:25:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#9917799 (10fnegri) p:05Triage→03Medium [14:25:49] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-nfs-2 [14:27:50] FIRING: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:30:00] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers [14:30:16] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-12 [14:31:17] 10Toolforge (Toolforge iteration 11): Provision more non-NFS k8s workers - https://phabricator.wikimedia.org/T367964#9917844 (10taavi) 05Open→03Resolved [14:31:52] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T368211#9917849 (10Jhancock.wm) I think it might have been a different issue. I reseated the cable and psu and this server's alert cleared.... [14:32:04] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T368211#9917854 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [14:32:26] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-12 [14:32:50] FIRING: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:34:15] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-13 [14:36:18] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-13 [14:37:19] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-sgebastion-10 [14:38:24] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-sgebastion-10 [14:42:50] RESOLVED: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:53:47] (03CR) 10Andrew Bogott: "Wait, I totally misunderstood what this was for, sorry for the incoherent comment!" [labs/striker] - 10https://gerrit.wikimedia.org/r/1035718 (https://phabricator.wikimedia.org/T362318) (owner: 10Slyngshede) [14:58:33] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9917973 (10aborrero) [15:01:16] (03approved) 10dcaro: [jobs-api] fix issues in openapi schema [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/96 (owner: 10raymond-ndibe) [15:01:30] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917979 (10Andrew) I can't guess at the historical reason why fstab doesn't have uuids, but adding them there is the right solution for t... [15:02:03] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T368211#9917983 (10Andrew) works for me! Thank you for reseating. [15:04:45] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9917991 (10Benoit74) We do have a header which indicate that /etc/fstab is managed by Puppet: ` # HEADER: This file was autogenerated at... [15:05:47] (03update) 10dcaro: api: remove unprefixed endpoints [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/94 (https://phabricator.wikimedia.org/T363346) (owner: 10sstefanova) [15:06:42] (03approved) 10dcaro: kyverno: raise CPU request and limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/345 (owner: 10aborrero) [15:06:43] (03update) 10dcaro: kyverno: raise CPU request and limits [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/345 (owner: 10aborrero) [15:06:57] (03update) 10dcaro: PSP: delete them [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/49 (https://phabricator.wikimedia.org/T368142) (owner: 10aborrero) [15:18:48] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers [15:26:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-3 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:31:03] FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-26 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:31:37] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services, 13Patch-For-Review: Cloud VPS "packaging" project Buster deprecation - https://phabricator.wikimedia.org/T367544#9918113 (10LSobanski) a:03Jelto [15:32:30] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-27 [15:33:10] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-27 [15:36:03] FIRING: [7x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-26 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:41:03] RESOLVED: [7x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-26 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:41:54] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918223 (10Andrew) >>! In T368265#9917991, @Benoit74 wrote: > We do have a header which indicate that /etc/fstab is managed by Puppet: >... [15:42:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-28 [15:42:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-27 [15:42:39] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-28 [15:43:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-27 [15:43:49] !log fnegri@cloudcumin1001 superset START - Cookbook wmcs.vps.add_user_to_project for user 'fnegri' in role 'member' (T367393) [15:43:52] T367393: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393 [15:43:56] !log fnegri@cloudcumin1001 superset END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'fnegri' in role 'member' (T367393) [15:43:59] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-28 [15:44:05] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234#9918233 (10Andrew) I don't know anything about your project so can't advise about what will or won't break after removal of the database. [15:44:30] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-29 [15:45:07] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-29 [15:45:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-28 [15:46:26] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-29 [15:46:28] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-30 [15:47:06] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-30 [15:47:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-29 [15:48:24] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-30 [15:48:25] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-31 [15:49:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-31 [15:49:34] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-30 [15:51:39] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-32 [15:51:41] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-31 [15:52:03] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-32 [15:52:51] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-31 [15:53:03] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-32 [15:55:12] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-32 [15:58:54] 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#9918315 (10fnegri) From a first look, it's not possible (at least not easily) to give superset access to //all// ToolsDB `_p` databases, but we need to... [16:01:00] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: Test new hardware candidate for cloudbackup replacement - https://phabricator.wikimedia.org/T353746#9918318 (10Jhancock.wm) got the other rail type and will test it out this week. [16:01:01] 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#9918319 (10fnegri) The databases Superset can access are listed here: https://github.com/toolforge/superset-deploy/blob/a974d90cd36f40010078e2c581ece0a... [16:03:39] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234#9918326 (10Dreamy_Jazz) 05Open→03Resolved a:03Dreamy_Jazz It seems to be un-used after looking at this further. As such I've deleted it without re-creating it. [16:04:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [16:09:15] (03merge) 10bd808: docs: Add assert_private_file() + load_private_yaml() [toolforge-repos/python-toolforge] - 10https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/merge_requests/24 (https://phabricator.wikimedia.org/T333728) (owner: 10lucaswerkmeister) [16:13:22] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1044 [16:14:40] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1044 [16:16:14] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1042 [16:17:40] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1042 [16:17:49] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918403 (10Benoit74) I rebooted all four instances (I modified mwoffliner4 myself) and everything is ok on mwoffliner2, mwoffliner3 and m... [16:17:50] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-cumin [16:19:07] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-cumin [16:19:14] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1040 [16:20:30] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1040 [16:20:42] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1041 [16:20:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1054.eqiad.wmnet' [16:21:15] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918426 (10Benoit74) @Audiodude do you wanna do the same on WP1 instance mwcurator? I just checked and it has the same problem in /etc/fs... [16:21:57] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1041 [16:22:53] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1047 [16:24:11] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1047 [16:24:27] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1048 [16:25:42] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1048 [16:25:50] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9918445 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1054.eqiad.wmnet with OS bookworm [16:26:04] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1049 [16:27:29] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1049 [16:28:59] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1043 [16:29:02] (03open) 10theprotonade: Fix typo in landing page text content [toolforge-repos/matchandsplit] - 10https://gitlab.wikimedia.org/toolforge-repos/matchandsplit/-/merge_requests/1 [16:29:19] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918490 (10Audiodude) I can try, but I'm not sure I know what I'm doing. Where do I get the UUIDs from? If you have root on mwoffliner y... [16:30:15] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1043 [16:30:53] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1046 [16:31:36] (03approved) 10soda: Fix typo in landing page text content [toolforge-repos/matchandsplit] - 10https://gitlab.wikimedia.org/toolforge-repos/matchandsplit/-/merge_requests/1 (owner: 10theprotonade) [16:31:42] (03merge) 10soda: Fix typo in landing page text content [toolforge-repos/matchandsplit] - 10https://gitlab.wikimedia.org/toolforge-repos/matchandsplit/-/merge_requests/1 (owner: 10theprotonade) [16:32:08] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1046 [16:33:00] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1050 [16:34:10] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1050 [16:35:58] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1056 [16:37:23] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1056 [16:37:29] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1055 [16:38:43] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1055 [16:39:16] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1057 [16:40:42] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1057 [16:41:00] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1051 [16:42:08] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1051 [16:43:18] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1052 [16:44:34] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1052 [16:46:50] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1053 [16:48:06] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1053 [16:48:21] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_server_to_ovs for server integration-agent-docker-1045 [16:49:38] !log andrew@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server integration-agent-docker-1045 [16:49:41] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:50:15] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_project_to_ovs [16:54:45] !log andrew@cloudcumin1001 integration END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [17:02:16] (03PS1) 10Andrew Bogott: migrate_server_to_ovs: two more flavors for the integration project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049240 [17:03:06] (03open) 10andrew: Two new weird flavors for integration [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/8 [17:05:50] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:07:06] (03approved) 10bd808: Don't relay wikibugs in #translatewiki [toolforge-repos/bridgebot] - 10https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/8 (owner: 10nikerabbit) [17:07:53] !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:08:32] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:08:40] 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#9918670 (10KCVelaga_WMF) @fnegri ad-hoc is fine. I don't need access a specific database at the moment. But let me explain the need: for product tea... [17:08:43] (03merge) 10andrew: Two new weird flavors for integration [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/8 [17:08:50] !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:09:15] (03CR) 10Andrew Bogott: [C:03+2] migrate_server_to_ovs: two more flavors for the integration project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049240 (owner: 10Andrew Bogott) [17:12:38] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:13:00] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1054'] [17:13:14] (03Merged) 10jenkins-bot: migrate_server_to_ovs: two more flavors for the integration project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1049240 (owner: 10Andrew Bogott) [17:13:43] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9918685 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1054.eqiad.wmnet with OS bo... [17:15:05] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_project_to_ovs [17:15:23] !log andrew@cloudcumin1001 integration END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [17:17:01] 10Cloud-VPS (Debian Buster Deprecation), 06Research: Cloud VPS "research-collaborations-api" project Buster deprecation - https://phabricator.wikimedia.org/T367551#9918692 (10Isaac) > I've containerized these and added a docker-compose.yml file (PR here) so that all this can be easily deployed on any instance... [17:17:23] !log andrew@cloudcumin1001 integration START - Cookbook wmcs.openstack.migrate_project_to_ovs [17:17:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1055.eqiad.wmnet' [17:19:28] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9918706 (10Andrew) [17:20:22] !log andrew@cloudcumin1001 integration END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [17:24:26] FIRING: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:27:19] 06cloud-services-team, 10Cloud-VPS: 'mariadb-main' db server in 'checkuser-beta-wiki' project - https://phabricator.wikimedia.org/T368234#9918742 (10Andrew) Thanks! [17:28:13] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1055.eqiad.wmnet' [17:28:59] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9918775 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1055.eqiad.wmnet with O... [17:29:38] 10Tool-containers, 10Toolforge: Provide a Redis container for use within a tool's namespace - https://phabricator.wikimedia.org/T360378#9918780 (10bd808) >>! In T360378#9891200, @bd808 wrote: >>>! In T360378#9888704, @Pintoch wrote: >> One problem I encountered when using the supplied commands was that I got a... [17:34:31] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: reapply thermal paste to processors in cloudvirt1063 - https://phabricator.wikimedia.org/T368093#9918796 (10VRiley-WMF) [17:35:17] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: reapply thermal paste to processors in cloudvirt1063 - https://phabricator.wikimedia.org/T368093#9918794 (10VRiley-WMF) Hey @Andrew we upon looking at this ticket, I'm guessing we are seeing some thermal issues on this server? We can check the thermal pa... [17:45:16] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918858 (10Andrew) > mwoffliner1 seems to be up but ssh is probably not starting, at least I cannot SSH in the instance. I tried to put t... [17:46:31] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: reapply thermal paste to processors in cloudvirt1063 - https://phabricator.wikimedia.org/T368093#9918865 (10Andrew) Sure, let's try moving it. You can do that at your convenience since the server doesn't have any workload on it. I don't have a theory ab... [17:49:39] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9918870 (10Audiodude) Just to be completely clear, I don't really feel comfortable editing /fstab on mwcurator. If one of you could do it... [18:04:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:05:20] 14MediaWiki-extensions-OpenStackManager, 06Diffusion-Repository-Administrators, 10Projects-Cleanup, 06translatewiki.net, and 2 others: Archive the OpenStackManager extension - https://phabricator.wikimedia.org/T367220#9918926 (10Pppery) [18:09:26] RESOLVED: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:10:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-36 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:12:26] FIRING: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:17:41] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9918973 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1055.eqiad.wmnet with OS bo... [18:23:45] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1055'] [18:24:07] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1055'] [18:28:23] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1055'] [18:28:27] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1055'] [18:29:14] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1055'] [18:29:36] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1055'] [18:31:02] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1055'] [18:31:24] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1055'] [18:38:01] !log andrew@cloudcumin1001 machine-learning START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:43:14] !log andrew@cloudcumin1001 machine-learning END (ERROR) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=97) [18:43:16] !log andrew@cloudcumin1001 machine-learning START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:43:19] !log andrew@cloudcumin1001 machine-learning END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [18:45:08] !log andrew@cloudcumin1001 machine-learning START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:45:43] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, and 2 others: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9919050 (10CDanis) Unfortunately `cloudcephosd1020` has too old a Debian / kernel for this without some more wo... [18:46:14] !log andrew@cloudcumin1001 machine-learning END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [18:47:32] !log andrew@cloudcumin1001 maps-experimens START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:47:33] andrew@cloudcumin1001: Unknown project "maps-experimens" [18:47:33] !log andrew@cloudcumin1001 maps-experimens END (ERROR) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=97) [18:47:33] andrew@cloudcumin1001: Unknown project "maps-experimens" [18:47:35] !log andrew@cloudcumin1001 maps-experiments START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:48:47] !log andrew@cloudcumin1001 maps-experiments END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [18:49:18] !log andrew@cloudcumin1001 search START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:49:37] !log andrew@cloudcumin1001 search END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [18:50:37] 10Tool-python-toolforge: Add read_private() function to python-toolforge library - https://phabricator.wikimedia.org/T333728#9919083 (10LucasWerkmeister) 05In progress→03Resolved [18:52:27] (03open) 10andrew: Add new flavor for 'search' project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/9 [18:53:32] (03update) 10andrew: Add new flavor for 'search' project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/9 [18:54:05] (03merge) 10andrew: Add new flavor for 'search' project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/9 [18:56:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1056.eqiad.wmnet' [18:58:13] !log andrew@cloudcumin1001 search START - Cookbook wmcs.openstack.migrate_project_to_ovs [18:58:56] !log andrew@cloudcumin1001 search END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [19:00:49] 10Tool-bridgebot: lint:golang CI job times out - https://phabricator.wikimedia.org/T367969#9919120 (10LucasWerkmeister) The first messages in a [successful build](https://gitlab.wikimedia.org/nikerabbit/bridgebot/-/jobs/293688) that aren’t seen in a [timed-out](https://gitlab.wikimedia.org/nikerabbit/bridgebot/-... [19:01:06] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloudinfra-internal-puppetserver-1 [19:02:15] !log andrew@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server cloudinfra-internal-puppetserver-1 [19:02:28] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloudinfra-cloudvps-puppetserver-1 [19:03:13] (03merge) 10lucaswerkmeister: Don't relay wikibugs in #translatewiki [toolforge-repos/bridgebot] - 10https://gitlab.wikimedia.org/toolforge-repos/bridgebot/-/merge_requests/8 (owner: 10nikerabbit) [19:03:37] !log andrew@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server cloudinfra-cloudvps-puppetserver-1 [19:03:44] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server cloudinfra-db03 [19:04:53] !log andrew@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server cloudinfra-db03 [19:05:51] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server enc-1 [19:05:55] !log andrew@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server enc-1 [19:06:05] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server enc-2 [19:07:13] !log andrew@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server enc-2 [19:07:26] RESOLVED: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:08:26] FIRING: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:10:01] FIRING: CloudinfraMariaDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCloudinfraMariaDBWritableState [19:11:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1056.eqiad.wmnet' [19:15:43] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server fff879b9-8300-4a11-9cf6-3d424be9ffa3 [19:17:02] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9919156 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1056.eqiad.wmnet with O... [19:20:01] RESOLVED: CloudinfraMariaDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCloudinfraMariaDBWritableState [19:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:21:52] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9919158 (10Benoit74) > This was due to me making a typo in fstab, which is now fixed. (I have access to a raw console which is annoying b... [19:22:28] FIRING: InstanceDown: Project cloudinfra instance cloudinfra-db04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:22:56] !log andrew@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server fff879b9-8300-4a11-9cf6-3d424be9ffa3 [19:23:36] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9919162 (10Benoit74) I don't know what "claiming" a task actually mean, feel free to claim this if it is important to you, I honestly... [19:23:59] 10Cloud-VPS, 07affects-Kiwix-and-openZIM, 05Cloud-Services-Origin-User: Disk volumes of cloud instances are completely mixed-up - https://phabricator.wikimedia.org/T368265#9919159 (10Benoit74) 05Open→03Resolved a:03Benoit74 [19:24:02] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_server_to_ovs for server fff879b9-8300-4a11-9cf6-3d424be9ffa3 [19:24:06] !log andrew@cloudcumin1001 cloudinfra END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server fff879b9-8300-4a11-9cf6-3d424be9ffa3 [19:27:28] RESOLVED: InstanceDown: Project cloudinfra instance cloudinfra-db04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:27:39] !log andrew@cloudcumin1001 cloudinfra START - Cookbook wmcs.openstack.migrate_project_to_ovs [19:30:32] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:34:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:35:08] !log andrew@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [19:38:18] !log andrew@cloudcumin1001 wikisp START - Cookbook wmcs.openstack.migrate_project_to_ovs [19:40:29] !log andrew@cloudcumin1001 wikisp END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [19:41:11] !log andrew@cloudcumin1001 wcdo START - Cookbook wmcs.openstack.migrate_project_to_ovs [19:54:14] RESOLVED: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:55:36] !log andrew@cloudcumin1001 wcdo END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0) [19:56:06] !log andrew@cloudcumin1001 metricsinfra START - Cookbook wmcs.openstack.migrate_project_to_ovs [20:00:43] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1056'] [20:01:05] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1056'] [20:01:31] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9919259 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1056.eqiad.wmnet with OS bo... [20:05:41] PROBLEM - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: DNS CRITICAL - 7.233 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.6.113) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:05:48] PROBLEM - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: DNS CRITICAL - 8.238 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:06:33] RECOVERY - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.210 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.6.113) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:06:33] RECOVERY - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.215 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:09:42] !log andrew@cloudcumin1001 metricsinfra END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) [20:11:53] FIRING: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:18:58] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Migrate eqiad1 hypervisors to Neutron OVS agent - https://phabricator.wikimedia.org/T364457#9919281 (10Andrew) [20:24:29] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:25:44] 06cloud-services-team: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316 (10Andrew) 03NEW [20:27:14] 10Quarry: refreshing a running query changes favicon from orange to blue - https://phabricator.wikimedia.org/T362101#9919300 (10SD0001) 05Open→03Resolved a:03SD0001 [20:27:30] 06cloud-services-team, 10Toolforge: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919298 (10Andrew) [20:28:04] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919303 (10taavi) [20:28:25] 10Tool-schedule-deployment, 10WikimediaDebug: Integrate schedule-deployment with WikimediaDebug - https://phabricator.wikimedia.org/T367213#9919304 (10bd808) >>! In T367213#9906179, @jhsoby wrote: > Can't extensions add CSS to pages? I have some extensions that do that. (It might require more permissions for t... [20:29:52] 06cloud-services-team, 10Data-Services, 06SRE: [wikireplicas] Make sure there is no sensitive data in clouddb hosts - https://phabricator.wikimedia.org/T368136#9919306 (10bd808) Some info on the sanitization (dropping columns, tables) and redaction (content hidden from end users via views) of the replicated... [20:36:47] 06cloud-services-team, 10Data-Services, 06SRE: [wikireplicas] Make sure there is no sensitive data in clouddb hosts - https://phabricator.wikimedia.org/T368136#9919314 (10bd808) What sort of data y'all are concerned about exposing to new roots on the replica db hosting nodes themselves? These boxes already e... [20:38:44] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919317 (10Andrew) clouddb1021.eqiad.wmnet is still appearing in dbusers.yaml, which doesn't really fit with the 'replacing' story. Seems like we may have two problems. [20:43:36] 10Tool-bridgebot, 10Toolforge: bridgebot tool build service quota not going down - https://phabricator.wikimedia.org/T368317 (10LucasWerkmeister) 03NEW [20:44:05] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919332 (10Andrew) ferm shows ` 10_mysql_wmcs_db_admin_s1:&R_SERVICE(tcp, 3311, (@resolve((cloudcontrol1005.eqiad.wmnet cloudcontrol1006.eqiad.wmnet cloudcontrol1007.eqiad.... [20:44:48] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919334 (10BTullis) We currently have both servers in the `


Role::Wmcs::Db::Wikireplicas::Dedicated::Analytics_multiinstance` role. ` btullis@cumin1002:~$ sudo cumin O:wmcs... [20:46:53] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919337 (10BTullis) Could you restart the tool, please? It might have a cached ipv6 IP address in memory, which I removed earlier today in {T368220}. [20:49:24] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919357 (10Andrew) I have restarted the tool a dozen or so times in the last hour. [20:50:02] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919358 (10BTullis) OK, sorry. I see what you mean. ` btullis@cloudcontrol1005:~$ telnet clouddb1021 3311 Trying 10.64.0.118... Connected to clouddb1021. Escape character is... [20:50:56] RESOLVED: SystemdUnitDown: The service unit maintain-dbusers.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:53:58] 10Tool-schedule-deployment: Display link to wikitext diff when reporting a successful patch addition - https://phabricator.wikimedia.org/T367948#9919372 (10bd808) [20:55:23] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919387 (10Andrew) For now I have removed references to an-redacteddb1001 from dbusers.yaml and stopped puppet on cloudcontrol1005; that allows db creds to update. @cmooney,... [21:12:15] 06cloud-services-team, 10Data-Services: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316#9919444 (10BTullis) I rebooted an-redacteddb1001 for good measure, but it didn't change anything. I also had a quick look at the [[https://gerrit.wikimedia.org/g/operations/h... [21:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:44:51] 10Cloud-VPS (Debian Buster Deprecation), 06Infrastructure-Foundations, 10Puppet CI: Cloud VPS "puppet-diffs" project Buster deprecation - https://phabricator.wikimedia.org/T367547#9919531 (10colewhite) pcc-db1001 is critical to the catalog compiler's continued service, no? Some workers are on bookworm, but... [21:52:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-36 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:56:02] 10Cloud-VPS (Debian Buster Deprecation), 06Infrastructure-Foundations, 10Puppet CI: Cloud VPS "puppet-diffs" project Buster deprecation - https://phabricator.wikimedia.org/T367547#9919554 (10jhathaway) correct, I plan to migrate pcc-db1001, before the end of the month [22:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:41:26] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10video2commons: Replace or remove Debian Buster VMs in 'video' cloud-vps project - https://phabricator.wikimedia.org/T360711#9919635 (10JJMC89) [22:48:21] 10Cloud-Services: Remove matanya as an admin from VPS projects - https://phabricator.wikimedia.org/T368330 (10Matanya) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific pro... [22:49:21] 10Cloud-VPS: Remove matanya as an admin from VPS projects - https://phabricator.wikimedia.org/T368330#9919666 (10bd808) [23:00:10] 10Cloud-VPS: Remove matanya as an admin from VPS projects - https://phabricator.wikimedia.org/T368330#9919689 (10bd808) > I am an admin on > > bastion Actually just a normal member here. This project is magically added/removed based on membership in other #Cloud-vps projects. > deployment-prep Just a normal mem... [23:09:30] 10Cloud-VPS: Remove matanya as an admin from VPS projects - https://phabricator.wikimedia.org/T368330#9919706 (10bd808) [23:18:40] 06cloud-services-team, 10Toolforge (Toolforge iteration 11): toolforge: review pod templates for PSP replacement - https://phabricator.wikimedia.org/T362050#9919714 (10bd808) ` $ webservice perl5.36 shell --mount=all Error from server (Forbidden): pods "shell-1719270590" is forbidden: PodSecurityPolicy: unable... [23:29:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 11): toolforge: review pod templates for PSP replacement - https://phabricator.wikimedia.org/T362050#9919718 (10bd808) Until `webservice shell` is fixed generally, hacky workarounds are: * Run from login-buster.toolforge.org which has an older version of...