[13:02:35] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10ayounsi) [13:06:11] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10ayounsi) Some of our transits like Lumen use MEDs so we need to make sure that a global knob doesn't impact those negatively. Another idea is to use BG... [13:18:37] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Do we need ping offload servers at all POPs? - https://phabricator.wikimedia.org/T345809 (10ayounsi) Thanks for the task and feedback. If the issue is abuse from a limited number of providers (like in {T163312} it seems better to filter out that kin... [13:59:46] sukhe [13:59:56] oops [13:59:58] hello! [14:00:01] haha [14:00:01] hi [14:00:41] just a random though, but with all the anycast checks relying on "/usr/lib/nagios/plugins/XXX" I'm wondering if we're going to have an outage the day we decom icinga [14:00:53] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) [14:02:04] 10Traffic, 10SRE, 10Patch-For-Review: Rename ACAST_PS_ADVERTISE in bird and anycast-healthchecker to BIRD_IP_ADVERTISE - https://phabricator.wikimedia.org/T348174 (10ssingh) 05Open→03Declined As mentioned above, abandoning this rename pursuit. There is a lot of stuff to rename and we won't get to it all,... [14:02:09] I'm wondering if we should add a "last hope comment" wherever it's being setup in puppet, or if it's a package, re-add it to the bird or dnsbox class [14:06:21] XioNoX: the /usr/lib/nagios/checks are all local though? [14:09:00] in other words, they're a dependency that we're not specifying in bird or dnsbox puppet class [14:11:01] it would be enough to require the File I think (which is being provided by icinga-related stuff, and would then break puppetization at that point rather than cause an outage) [14:13:40] yeah not sure where it's being configured, but that would make sens [14:17:07] found at least one modules/nagios_common/manifests/check_dns_query.pp [14:21:42] XioNoX: found the issue related to 10.3.0.1 being omitted [14:22:12] nice! [14:22:13] that's the recursor side of profile::bird::advertise_vips [14:24:12] coming back to the above, so the idea is to add dependencies on the various checks here? [14:25:47] XioNoX: while I have you here, want to merge https://gerrit.wikimedia.org/r/c/operations/homer/public/+/963375 (also happy to take care of it) [14:33:52] sukhe: yeah, on the scripts used by the checks [14:44:55] hello! I am afraid I have yet another lvs change if anyone could be so kind :) these are ingress services so they can skip lvs_setup https://gerrit.wikimedia.org/r/c/operations/puppet/+/964923 [14:51:29] sukhe: done [14:51:38] XioNoX: thanks! [14:52:15] guessing no more changes on the core-routers now and simply: announce the IPs via bird, make sure everything is OK, remove statics [14:52:23] hnowlan: hmm no conftool section at all? [14:52:38] sukhe: correct [14:57:20] vgutierrez: not for ingress, they just use k8s-ingress-wikikube [14:57:42] I followed https://wikitech.wikimedia.org/wiki/Kubernetes/Ingress#Add_a_new_service_under_Ingress which says the same [14:58:47] hnowlan: so I would say that it doesn't require a review from traffic? :) [14:59:30] vgutierrez: fair point - I guess the review is more a prelude to asking about restarting pybal [15:00:09] no pybal restart required in this case? [15:00:17] or maybe my brain is melting at 31C at the moment [15:01:02] 8C here so brain freeze, but no restart required IMO [15:01:39] no lvs section at all in those services [15:07:39] oh, awesome. Thanks! sorry for the bother in that case :) [15:08:16] no bother at all :) [15:35:47] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cp11[00-15] - https://phabricator.wikimedia.org/T342159 (10VRiley-WMF) [15:39:18] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Cabling for Eqiad racks E5-8 and F5-8 - https://phabricator.wikimedia.org/T334231 (10Jclark-ctr) [15:49:57] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet - https://phabricator.wikimedia.org/T296411 (10aborrero) [17:10:56] 10Traffic, 10DNS, 10SRE: Update DNS records for Greenhouse - https://phabricator.wikimedia.org/T348335 (10ssingh) Hi @NMariano-WMF: thanks for the request. We have some questions about this task, specifically related to some of the records requested here, that is better suited for a call. Is it fine if we se... [17:51:11] 10Traffic, 10DNS, 10SRE: Update DNS records for Greenhouse - https://phabricator.wikimedia.org/T348335 (10NMariano-WMF) Hi @Lhiraide would you be ok with meeting with @ssingh since this is your request to have DNS updated for Greenhouse? [18:07:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=60fd6a7d-c8e6-49a7-96ff-ccbed13297a2) set by cmooney@cumin1001 f... [18:10:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=01394557-10ca-4b57-b8c9-c263e86708ec) set by cmooney@cumin1001 f... [19:14:13] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm [19:29:22] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1acb901c-b161-4437-8a77-d11252fb6315) set by cmooney@cumin1001 for 2:00:00 on 6 host(s... [19:29:44] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7e1738b5-8479-4892-843b-26ddc9d964ea) set by cmooney@cumin1001 for 2:00:00 on 18 host(... [19:50:31] 10Traffic, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): cp3050 seemd more affected then otheres in recent incident - https://phabricator.wikimedia.org/T330682 (10BCornwall) 05Open→03Stalled [20:13:55] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm executed with errors: - ncredir5001 (**FAIL**) - Downt... [20:14:14] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm [20:40:38] 10netops, 10Infrastructure-Foundations, 10SRE: Change EPVN RR setup to use different cluster ID on each host - https://phabricator.wikimedia.org/T348583 (10cmooney) p:05Triage→03Low [21:33:27] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm executed with errors: - ncredir5001 (**FAIL**) - Remov... [21:34:58] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm [21:54:11] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Cabling for Eqiad racks E5-8 and F5-8 - https://phabricator.wikimedia.org/T334231 (10cmooney) Thanks @Jclark-ctr, I can confirm things look good (including light levels and pings I've not added here). ` cmooney@ssw1-f1-eqiad> show int... [22:23:21] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Move 25% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T348122 (10matmarex) The Kubernetes work so far has caused problems with cross-wiki Echo notifications (see T223413, T342201). Please help resolve this before... [22:37:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Change EPVN RR setup to use different cluster ID on each host - https://phabricator.wikimedia.org/T348583 (10cmooney) [22:38:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Change EPVN RR setup to use single BGP group and different cluster ID on every RR - https://phabricator.wikimedia.org/T348583 (10cmooney) [22:41:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Change EPVN RR setup to use single BGP group and different cluster ID on every RR - https://phabricator.wikimedia.org/T348583 (10cmooney) [22:45:47] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncredir5001.eqsin.wmnet with OS bookworm executed with errors: - ncredir5001 (**FAIL**) - Remov...