[08:57:51] topranks, XioNoX: I'd like to reboot the ping* hosts, is either if you available for the router redirection changes today? [09:11:46] Hey Moritz I should be able to take care of it [09:12:08] when were you looking to do it? [09:28:39] 10Puppet, 10Infrastructure-Foundations, 10SRE: Validate all yaml files in puppet.git - https://phabricator.wikimedia.org/T305676 (10Volans) p:05Triage→03Medium [09:31:28] Hello, when you have a moment could you please set the priority of the following tasks: T305896, T305582, T305567. Thanks (cc cdanis and jhathaway specifically) [09:31:29] T305896: Internationalization (i18n) & localization (l10n) of www.wikimediastatus.net - https://phabricator.wikimedia.org/T305896 [09:31:29] T305567: MX: increasing disk space - https://phabricator.wikimedia.org/T305567 [09:31:29] T305582: Annotate X-Analytics header with any matching actions - https://phabricator.wikimedia.org/T305582 [09:40:34] topranks: thx, I'm fine any time, we can also do it somewhat async, just drain one of 1001/2001/3001 and I'll ping back when the next node can be drained? [09:41:34] moritzm: sounds good, let me disable the redirect for 1001 / eqiad now and then you can let me know when that's done and we move on? [09:42:07] excellent, sounds like a plan [09:45:57] moritzm: Ok should be good to go for Eqiad [10:02:17] topranks: eqiad/ping1002 is done [10:03:06] ok thanks give me a moment I'll re-enable the redirect there, and move on to codfw [10:04:14] no hurry :-) [10:05:09] ah it's a quick job, should be good to go now for codfw/ping2001 [10:06:31] topranks: out of curiosity, could this be automated? either via a cookbook once we have support for network devices there or via homer changing some netbox data from active to something else [10:14:11] it happens only a few times per year, don't think we really need a cookbook [10:18:57] volans: yes we could look at something like that, but it's not a major problem doing it manually given it's a rarity. [10:19:32] I was actually doing this manually, but it can be controlled with homer too (commenting out the "ping_offload_redirect" key for a site in sites.yaml) [10:20:23] I hadn't realized we had that option in automation so I did it from cli. [10:22:20] topranks: codfw/ping2002 is done [10:35:04] moritzm: thanks, offload re-enabled in codfw, I've disabled in esams if you want to do ping3001 [10:46:29] moritzm: in case you missed you're good to go for esams / ping3001 [10:47:54] thanks, I in fact missed it, lost the connection to Libera. will ping you when done [11:06:08] topranks: and esams is done as well [12:07:07] cool, offload config reapplied in esams. [12:55:53] out of curiosity, why do we only have ping_offload in some DCs but not in all of them? [12:57:20] taavi: where we get most of ICMP traffic [13:26:22] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Spicerack: Spicerack: add network devices support - https://phabricator.wikimedia.org/T306552 (10Volans) Thanks for opening the task to discuss details. As the first feedback I've a primary question that is how you envision this new third way... [13:40:08] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Spicerack: Spicerack: add network devices support - https://phabricator.wikimedia.org/T306552 (10ayounsi) Yeah, I'm expecting Netbox to always be the source of truth so a homer run after a spicerack run would be a NOOP. `junos-eznc` is what I... [14:16:21] 10Mail, 10Infrastructure-Foundations, 10SRE: MX: increasing disk space - https://phabricator.wikimedia.org/T305567 (10jhathaway) p:05Triage→03Medium [14:16:38] 10Mail, 10Infrastructure-Foundations, 10SRE: MX: increasing disk space - https://phabricator.wikimedia.org/T305567 (10jhathaway) a:03jhathaway [16:04:56] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) @arzhel fixed the reboot issue, the external disk attached to the router was causing the reboots. I updated JUNOS to junos-srxsme-20.2R3-S2.... [16:46:17] 10netops, 10Infrastructure-Foundations: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) [16:50:13] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson [16:50:49] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) [16:57:23] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10ayounsi) Swap has been done successfully! Left to do: wipe the old one, rename the console server port of the new one. [16:58:16] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) loaded, configuration file verified working moved cables to new mr1-eqiad left scs connection to old mr1 to wipe, still requires scs connecti... [18:35:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Add linecard diversity to the router-to-router interconnect in codfw - https://phabricator.wikimedia.org/T248506 (10Krinkle) [18:59:23] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Sustainability (Incident Followup), 10User-Ladsgroup: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Krinkle)