[07:56:18] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10cmooney) Ok cool well we can close this in that case I think. Cheers. [08:01:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1e573369-5fdd-4621-8ae7-786b5a67de04) set by cmooney@cumin1001 for 2:00:00 on 1 host(s) and th... [08:32:45] topranks: FYI if you need to downtime also teh mgmt and the IPv6 hosts in icinga for the routers, you can use the downtime cookbook with --force [08:33:04] * volans doesn't recall if this was discussed publicly o privately with ar.zhel for the last upgrade last week [08:33:27] hmm... ok thanks [08:33:52] I didn't discuss this with Arzhel specifically no. [08:34:08] actually he upgraded the docs https://wikitech.wikimedia.org/wiki/Juniper_router_upgrade [08:34:10] I did run the downtime cookbook with the --force flag however, just for the one router (cr3-esams) I'm doing [08:34:15] see https://phabricator.wikimedia.org/T317082#8214799 [08:34:34] --force it's used to specify 'hosts' that are not puppetdb hosts [08:34:48] Ah sorry yes I see, I need to include the other hostnames [08:35:17] thanks!!! very timely bit of education I'd missed that change to the doc [08:36:25] np :) just saw the updates in the task and thought to mention it ;) [08:36:31] Will that cover both "management" IPs. The two RE's have mgmt IPs with these DNS names: [08:36:32] re0.cr3-esams.mgmt.esams.wmnet [08:36:35] re1.cr3-esams.mgmt.esams.wmnet [08:37:10] So should I downtime hosts "re0.cr3-esams.mgmt" and "re1.cr3-esams.mgmt" rather than just "cr3-esams.mgmt" ? [08:37:42] I see that icinga has only re0.cr3-esams.mgmt [08:37:55] so you should replace cr3-esams.mgmt with re0.cr3-esams.mgmt [08:38:10] I don't see re1 being monitoring [08:38:12] *monitore [08:38:14] d [08:39:02] ok cool. I'll have a chat with Arzhel about that when he's back - probably best we monitor both. [08:39:20] But yes thanks will do now - cookbook failed for just cr3-esams.mgmt which is expected I guess. [08:39:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=57f0ae1d-0fa1-4b98-9454-bea638ac3971) set by cmooney@cumin1001 for 2:00:00 on 3 host(s) and th... [08:39:59] that doesn't exist in icinga indeed [08:41:41] cool thanks, I added a note in the process about the mgmt naming when we've multiple REs. [08:43:26] topranks: only the main IP "cr3-esams" is paging, so that's the most important one to downtime :) [08:43:32] the other can be best effort [08:44:10] shit ok, I thought I'd done that one [08:44:12] * topranks checking [08:44:27] yeah I think you did [08:44:38] just adding context [08:45:01] ah ok sorry yeah [08:45:07] also re1 (the backup RE) used to not be reachable when acting as backup, that's why it's not in monitoring [08:45:20] but looks like with the move to "mgmt_junos" it is now [08:45:20] cool - good context thanks [08:45:36] so worth adding it [09:46:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=39465e0b-b93d-45ba-b1d8-0c49dacc39fb) set by cmooney@cumin1001 for 2:00:00 on 3 host(s) and th... [10:14:20] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove prod-specific bits from cloud puppetmasters - https://phabricator.wikimedia.org/T309281 (10taavi) [13:50:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) Upgrade of cr3-esams went well earlier. Firmware upgrade works as per docs. I will put up more info on that later for our own reference. [13:51:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [13:51:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [14:00:02] * cdanis running a few mins late [14:18:47] 10netops, 10Infrastructure-Foundations: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [15:04:16] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [15:26:01] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Facter is slow on a few hosts - https://phabricator.wikimedia.org/T251293 (10colewhite) @MoritzMuehlenhoff @jbond Facter does not appear to be detecting the raid on some hosts. Not sure how widespread the issue is. current fact (direct c... [15:48:54] cwhite: did you mean to remove the update to https://phabricator.wikimedia.org/T251293#8229337 [15:53:37] jbond: yeah, but was too lazy to input totp :) [15:55:02] ack [15:55:11] jbond: thanks for your review for systemd override_filename :] [15:55:26] I have call this evening, will follow up tomorrow [15:55:59] ack and no problem [17:26:58] 10Mail, 10Infrastructure-Foundations, 10fundraising-tech-ops: Investigate in-house DMARC analysis tool options - https://phabricator.wikimedia.org/T317443 (10Jgreen) [17:27:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [18:26:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10cmooney) @Jgreen I believe I've done what's required now (not all that familiar with this workflow however). Both ports that are labelled for frdata100... [18:42:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) @cmooney Both interfaces show no-carrier, can you confirm that the switch ports are enabled? [20:28:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10cmooney) @Jgreen my bad yeah they were both still part of the disabled group. Both up/up now, hopefully looks better your side too. ` cmooney@fasw-c-eq... [21:04:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) >>! In T317539#8230385, @cmooney wrote: > @Jgreen my bad yeah they were both still part of the disabled group. > > Both up/up now, hopefully lo... [21:04:14] 10Mail, 10Infrastructure-Foundations, 10fundraising-tech-ops: Investigate in-house DMARC analysis tool options - https://phabricator.wikimedia.org/T317443 (10Jgreen) [21:04:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) 05Open→03Resolved a:03Jgreen