[06:13:51] 10netbox, 10Infrastructure-Foundations, 10serviceops: Netbox and Redis - https://phabricator.wikimedia.org/T311385 (10ayounsi) The git tree is a bit confusing and needs cleanup, but that file in master seems to be on the old 2.10 version. You can see the 3.2.2 version there: https://gerrit.wikimedia.org/r/pl... [06:15:01] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) [07:58:06] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10ayounsi) [07:59:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7eb8120c-f8b6-4c79-8deb-b18a305a2353) set by ayounsi@cumin1001 for 2:00:00 on 1 host(s) and th... [08:00:42] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10ayounsi) Nevermind, this does seem to work: ` cumin1001:~$ sudo cookbook sre.hosts.downtime -r 'router upgrade' -t T295690 -H 2 D{cr3-ulsfo.wikimedia.org} START - Cookbook s... [08:03:30] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10Volans) @ayounsi you have to use `--force` (see `--help`) and can pass a `NodeSet`-accepted syntax of hostnames as they are in Icinga, like: ` cr[3-4]-ulsfo,cr[2-3]-ulsfo IP... [08:10:08] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10Volans) From a dry-run test it seems that it should work despite the space. [08:56:36] 10Mail, 10Fundraising Tech - Chaos Crew, 10Infrastructure-Foundations, 10SRE, and 2 others: DMarc Email Address for Wikimedia.org - https://phabricator.wikimedia.org/T316899 (10MatthewVernon) @Jgreen It looks to me like this is no longer an SRE access request; are you OK with me removing that tag, please?... [10:13:52] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Remove prod-specific bits from cloud puppetmasters - https://phabricator.wikimedia.org/T309281 (10taavi) [11:40:17] 10netbox, 10Infrastructure-Foundations, 10serviceops: Netbox and Redis - https://phabricator.wikimedia.org/T311385 (10akosiaris) >>! In T311385#8212732, @ayounsi wrote: > The git tree is a bit confusing and needs cleanup, but that file in master seems to be on the old 2.10 version. > You can see the 3.2.2 ve... [11:47:03] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) This has been quite eventful. To keep in mind that those upgrade need the !!no-validate!! knob, more details in the [[ https://www.juniper.net/documentation/us/en/software... [11:50:40] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10taavi) [12:10:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) [12:23:21] ok, I can finally say I'm done with email backlog. I've just 2 large CR left (python-wmf-ldap and sre.network.peering) to do. Beside that, if you're still waiting me on something let me know as I've probably just missed it. [13:44:55] Hi folks; I'm person-on-clinic duty this week and there are some outstanding untriaged tasks for foundations - T316223 T316114 T315867 T315608 T315486 T189522 . Would you mind either triaging them, please? Or alternatively I can either triage them as medium or remove the SRE tag as you prefer. [13:44:55] T315608: icinga raid montioring inoperable for H750 controllers - https://phabricator.wikimedia.org/T315608 [13:44:56] T316114: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 [13:44:56] T315486: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 [13:44:56] T316223: Resize webperf1004/2004 VM for arc-lamp - https://phabricator.wikimedia.org/T316223 [13:44:57] T189522: Detect IP address collisions - https://phabricator.wikimedia.org/T189522 [13:44:57] T315867: Identity Management System for Wikimedia developer accounts - https://phabricator.wikimedia.org/T315867 [13:47:26] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update CAS to 6.5 - https://phabricator.wikimedia.org/T311235 (10MoritzMuehlenhoff) CAS 6.6 has been released two days ago and features several changes related to webauthn and OIDC, so we'll move to 6.6 instead. Notable changes are: **Ope... [14:09:07] 10Mail, 10Fundraising Tech - Chaos Crew, 10fundraising-tech-ops: DMarc Email Address for Wikimedia.org - https://phabricator.wikimedia.org/T316899 (10Jgreen) >>! In T316899#8213167, @MatthewVernon wrote: > @Jgreen It looks to me like this is no longer an SRE access request; are you OK with me removing that t... [14:42:05] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 (10jbond) p:05Triage→03Medium [14:44:35] 10Mail, 10Data Engineering Planning, 10Data-Engineering-Operations, 10SRE: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 (10jbond) p:05Triage→03Medium [14:45:30] XioNoX: in relation to T189522 seems yuo changed the priority from high to needs triage, looking at the description i suspect you meant to resolve it? [14:45:31] T189522: Detect IP address collisions - https://phabricator.wikimedia.org/T189522 [14:45:39] Emperor: i have triaged the rest [14:55:07] jbond: nah it's still valid but it's not high [14:55:53] 10netops, 10Infrastructure-Foundations, 10SRE: Detect IP address collisions - https://phabricator.wikimedia.org/T189522 (10ayounsi) p:05Triage→03Low [14:55:59] I set it to low for the sake of setting somthing [14:56:24] ack thanks [14:56:26] moritzm, slyngs: for the python-wmf-ldap project, how were you envisioning to deploy it, via venv or deb package? [15:03:45] via a deb [15:04:23] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) @ayounsi the mentioned : "All management routers are running Junos 20 except mr1-codfw and mr1-esams that are running 18." and "The current Junos recommen... [15:07:11] thanks :) [15:08:41] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10ayounsi) Only those 2 from 18 to 21. 20 is recent enough. [15:08:52] ack [15:12:06] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) Thanks [15:16:26] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10ayounsi) Ok, thanks. How are the alertmanager silences managed? would the command below do everything needed: * all Icinga "hosts" * alertmanager (and LibreNMS by extension... [15:55:03] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10Volans) It would try to do the same thing on Alertmanager, yes, assuming there are alerts that match the given hostnames i the proper tag :) [16:00:54] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.downtime: add network devices support - https://phabricator.wikimedia.org/T317082 (10ayounsi) 05Open→03Resolved a:03ayounsi Awesome, doc updated! https://wikitech.wikimedia.org/wiki/Juniper_router_upgrade [16:07:28] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) [17:48:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove 185.15.56.0/24 from network::external - https://phabricator.wikimedia.org/T265864 (10Dzahn) >>! In T265864#6995696, @Legoktm wrote: > This will remove Cloud VPS from `wikimedia_nets`, which gets some... [18:00:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Jclark-ctr) c2 <-- G2204190495000069 --> a1 c7 <-- G2204190495000136 --> a8 d2 <-- G2204190495000072 --> a1 d7 <-- G2204190495000097 --> a8 [19:09:31] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE, 10SRE Observability (FY2022/2023-Q1): Ingest Cron and Root Alerts Into Logstash - https://phabricator.wikimedia.org/T274377 (10lmata) p:05Triage→03Medium [19:09:47] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE, 10SRE Observability (FY2022/2023-Q1): Ingest Cron and Root Alerts Into Logstash - https://phabricator.wikimedia.org/T274377 (10lmata) a:05herron→03andrea.denisse [19:15:11] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: replace all puppet crons with systemd timers - https://phabricator.wikimedia.org/T273673 (10Dzahn)