[07:56:27] 10Traffic, 10Release-Engineering-Team, 10collaboration-services: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10LSobanski) [08:26:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: cr2-codfw:xe-1/0/1:1 down - https://phabricator.wikimedia.org/T353256 (10ayounsi) 05Open→03Resolved a:03ayounsi > Dear Customer, > A patch that was incorrectly connected/labelled and the tech fixed it. [08:45:26] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10ops-monitoring-bot) Draining ganeti5007.eqsin.wmnet of running VMs [09:01:26] 10Traffic, 10Release-Engineering-Team, 10collaboration-services: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) we currently perform manual tests on developer machines (far from optimal). So if we can spawn our own runner we could run docker conta... [09:28:33] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10ops-monitoring-bot) Draining ganeti5006.eqsin.wmnet of running VMs [09:45:47] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10ops-monitoring-bot) Draining ganeti5005.eqsin.wmnet of running VMs [10:52:44] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10ops-monitoring-bot) Draining ganeti5004.eqsin.wmnet of running VMs [11:28:01] 10Traffic, 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10CodeReviewBot) vgutierrez opened https://gitlab.wikimedia.org/repos/sre/tcp-mss-clamper/-/merge_requests/10 e2e: Provide e2e... [11:31:16] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This has been fixed, please reopen if you run into other network issues with the eqsin Ganet... [11:32:03] 10Traffic, 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) @thcipriani being able to run privileged containers seems to be enough, at least for basic eBPF tests (not sure a... [11:40:40] godog: anything I should consider before deploying https://gerrit.wikimedia.org/r/c/operations/alerts/+/980280 :? [11:43:31] grafana doesn't show any results for the query so it won't get triggered [12:09:21] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Fabfur) [12:09:48] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Vgutierrez) p:05Triage→03Medium @Papaul I see that you triggerd the cookbook last week. Are you stuck with something? do you need help from our side? it would be great to get this host... [12:54:27] vgutierrez: checking [12:56:35] vgutierrez: yeah good to go, as you said it isn't going to fire right away [14:10:04] godog: nice, I'll submit another CR adding cluster & site variables on the final result to be able to point to the proper dashboard on grafana [14:13:57] vgutierrez: SGTM, feel free to send the review my way [14:36:40] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Papaul) @Vgutierrez please give me until the end of today. Thank you [17:58:23] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Decommission task for old cp hosts (cp1075-1090) - https://phabricator.wikimedia.org/T352253 (10VRiley-WMF) [17:58:53] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Decommission task for old cp hosts (cp1075-1090) - https://phabricator.wikimedia.org/T352253 (10VRiley-WMF) [19:14:18] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) p:05Triage→03Low [19:19:44] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) [19:23:43] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) Also - we should change the regexp to also catch "et-" prefixes for 25G interfaces ` REGEXP "^(g|x)e-[0-9]+/[0-9]+/[0-9]+" ` [20:15:57] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Decommission task for old cp hosts (cp1075-1090) - https://phabricator.wikimedia.org/T352253 (10VRiley-WMF) 05Open→03Resolved a:05Fabfur→03VRiley-WMF [20:17:12] 10Traffic, 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) Docker is definitely not a valid option here since we need to test against several kernels (at least 5.10 and 6.1) [20:38:11] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) [20:59:59] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10vm-requests: eqiad: 1 VM request for acme-chief - https://phabricator.wikimedia.org/T353295 (10BCornwall) @MoritzMuehlenhoff The other instance also have 10G. Would you still recommend that despite it bringing inconsistency? [21:16:21] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10ayounsi) More or less a duplicate of {T306007} [21:20:43] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) [21:20:55] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10netbox: Avoid ghost hosts on the network - https://phabricator.wikimedia.org/T306007 (10cmooney) [21:22:14] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364 (10cmooney) Ah yeah I'd forgotten about that one. What do you think about changing the alert text? I'm sure after investigating today I'll remember the details... [22:35:20] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10vm-requests: eqiad: 1 VM request for acme-chief - https://phabricator.wikimedia.org/T353295 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host acmechief1002.eqiad.wmnet with OS bookworm [23:19:29] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Papaul) @Vgutierrez I had a meeting with network and automation team today. We discussed about this issue and we same to not know the really cause of this issue. We decided we let traffic t... [23:48:05] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10vm-requests: eqiad: 1 VM request for acme-chief - https://phabricator.wikimedia.org/T353295 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host acmechief1002.eqiad.wmnet with OS bookworm executed with errors: - ac...