[01:04:03] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10414678 (10Jhancock.wm) Andrew, getting this error now in the installer. [!!] Partition disks Failed to run preseeded command Ex... [03:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:31:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [04:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:34:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:31:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [07:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:27:25] 06cloud-services-team: KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382421#10415349 (10fnegri) 05Open→03Resolved a:03fnegri This is just a VMX warning that was logged after a reboot: ` fnegri@cloudgw1002:~$ sudo journalctl -k -perr -- Journal begins at Thu... [12:39:43] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA: Prepare and check storage layer for tigwiki - https://phabricator.wikimedia.org/T381378#10415374 (10fnegri) 05Open→03In progress [12:39:52] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Platform-SRE, 06DBA: Prepare and check storage layer for idwikivoyage - https://phabricator.wikimedia.org/T381079#10415376 (10fnegri) 05Open→03In progress [12:44:53] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10415385 (10fnegri) 05In progress→03Resolved No kernel errors were logged in cloudgw1002 since the reboot. It might be that the kernel errors are only happening wh... [13:29:53] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10415523 [13:38:49] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q2:rack/setup/install cloudvirt10[68-74] - https://phabricator.wikimedia.org/T382492 (10RobH) 03NEW [13:41:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q2:rack/setup/install cloudvirt10[68-74] - https://phabricator.wikimedia.org/T382492#10415566 (10RobH) a:03Andrew @Andrew, Two call outs! The original ordering task had a bad hostname range provided by you for racking "**Hostnames:** cloudvirt1068... [13:41:22] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q2:rack/setup/install cloudvirt10[68-74] - https://phabricator.wikimedia.org/T382492#10415571 (10RobH) [13:56:54] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10415630 [14:13:02] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10415684 [14:55:49] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install cloudvirt10[68-74] - https://phabricator.wikimedia.org/T382492#10415805 (10Andrew) >>! In T382492#10415566, @RobH wrote: > @Andrew, > > Two call outs! The original ordering task had a bad hostname... [14:57:18] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10415807 (10Andrew) [15:01:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10415854 (10Andrew) [15:02:16] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10415859 (10Andrew) a:05Andrew→03None [15:23:42] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10415928 (10Andrew) This server seems to have a raid controller, which is different from all the other standard ceph OSD nodes. Not sure how that happened b... [16:04:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-meta-monitor-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:09:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:14:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-50 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:46:48] 10PAWS: New upstream release for Wikimedia Commons Extension for OpenRefine - https://phabricator.wikimedia.org/T382444#10416474 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/472 [16:46:50] 10PAWS: New upstream release for Wikimedia Commons Extension for OpenRefine - https://phabricator.wikimedia.org/T382444#10416475 (10rook) 05Open→03Resolved a:03rook [17:08:25] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10416581 [17:10:37] 10Tool-bullseye: Indicate geolocation resolution on map - https://phabricator.wikimedia.org/T382526 (10AntiCompositeNumber) 03NEW [17:13:32] 10Tool-bullseye: 190.237.157.138 is globally blocked, but Bullseye claims it has no Wikimedia blocks - https://phabricator.wikimedia.org/T382527 (10AntiCompositeNumber) 03NEW [17:16:52] 10Tool-bullseye: Create "AmandaNP's recommendation" for proxy blocking - https://phabricator.wikimedia.org/T382528 (10AntiCompositeNumber) 03NEW [17:21:16] 10Tool-bullseye: Partial blocks are incorrectly displayed - https://phabricator.wikimedia.org/T382529 (10AntiCompositeNumber) 03NEW [18:44:17] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10416914 (10Andrew) I designated every drive a non-raid drive in the bios and now the install is completing. I can't make it stop installing though, it just... [18:45:04] 10Tool-bullseye: Update Bullseye to python3.11, update dependencies - https://phabricator.wikimedia.org/T382533 (10AntiCompositeNumber) 03NEW [18:52:57] (03open) 10anticomposite: Update README.md, footer after fok [toolforge-repos/bullseye] - 10https://gitlab.wikimedia.org/toolforge-repos/bullseye/-/merge_requests/1 [18:53:01] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudcontrol1011 - https://phabricator.wikimedia.org/T380499#10416940 (10Jclark-ctr) [19:32:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10417061 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2004-dev.codfw.wmnet with OS bul... [19:49:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance metricsinfra-meta-monitor-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:09:56] 10Tool-bullseye, 06Toolforge-standards-committee: Adoption request for bullseye - https://phabricator.wikimedia.org/T380537#10417107 (10AntiCompositeNumber) [20:10:12] 10Tool-bullseye: bullseye's Spur API key has expired - https://phabricator.wikimedia.org/T380193#10417108 (10AntiCompositeNumber)