[00:08:55] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Bringing mx2001 back into service - https://phabricator.wikimedia.org/T297128 (10Dzahn) re: making current kernel version persistent The one running now was selected in grub but wasn't the default selection. Either edit gru... [07:16:32] 10netbox, 10Infrastructure-Foundations: Upgrade Netbox to 3.1 - https://phabricator.wikimedia.org/T296452 (10ayounsi) [07:16:55] 10netbox, 10Infrastructure-Foundations: Upgrade Netbox to 3.1 - https://phabricator.wikimedia.org/T296452 (10ayounsi) 3.1 is out of beta, updated the task description accordingly. [08:04:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) As a general note we need to be careful with rolling out config fixes in reaction to unexpected issues. Even... [08:04:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) 05Open→03In progress [08:07:36] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Patch-Needs-Improvement, 10User-herron: Outdated TLS config for MXes - https://phabricator.wikimedia.org/T203260 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This has been resolved with the update of the mail servers to Bullseye in the... [08:41:35] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi) [08:41:43] 10netops, 10Infrastructure-Foundations, 10SRE: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10ayounsi) 05In progress→03Resolved This is now set to alert to NOC through alertmanager. Added a quick mention in https://wikitech.wikimedia.org/wiki/Networ... [09:27:06] 10netbox, 10Infrastructure-Foundations: Netbox: import from PuppetDB script creates VIP also if exists - https://phabricator.wikimedia.org/T278936 (10Volans) a:03Volans The specific address issue was due to the fact that the original address was having the wrong netmask in netbox (/64 instead of /128). I've... [09:45:40] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: icinga Blocked by X-Frame-Options Policy - https://phabricator.wikimedia.org/T251513 (10jbond) 05Open→03Resolved a:03jbond Going to resolve this this as the current fix seems to iliviate the majority of the pain points and proivng further fixs dosn;t feel... [09:46:58] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10observability, 10User-jbond: Icinga Monitoring for CAS - https://phabricator.wikimedia.org/T233935 (10jbond) 05In progress→03Resolved We currently monitor the tomcat process and further have monitoring for now this is adequate [09:47:02] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Security-Team, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10jbond) [10:05:57] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10jbond) >>! In T272559#7546852, @Dzahn wrote: > icinga::nsca::client is used in fundraising. so there are special cases that can be in use but this audit scri... [10:14:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10cmooney) > To be clear, I agree that your proposal is a good solution however I'm wondering what's most future-proof.... [11:13:29] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Bringing mx2001 back into service - https://phabricator.wikimedia.org/T297128 (10MoritzMuehlenhoff) >>! In T297128#7551879, @Dzahn wrote: > re: making current kernel version persistent > > The one running now was selected i... [13:17:28] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) > What would be the next steps? Here is a proposal: # [DE, SRE]Agree on the name of the flow :) Will it be `sflow` (this f... [13:42:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) Ok, sounds good to me! [14:02:53] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) Sounds good! 1. we can use "internal_flows" (not _netflow as netflow is a protocol). 2. can I start this anytime, or we need to... [14:29:21] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > Agree on the name of the flow : Some guidelines: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Event... [14:31:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) >>! In T263277#7552972, @ayounsi wrote: > Sounds good! > 1. we can use "internal_flows" (not _netflow as netflow is a proto... [14:37:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q2), 10Sustainability (Incident Followup): Alert that should have paged via VictorOps was delayed because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) [14:40:21] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > can I start this anytime, or we need to create the kafka topic somewhere? Not really needed, unless you need to set special t... [15:35:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) `internal_network_flows` works, `network.flows.internal` too. @Ottomata indeed we do have restriction on the producer side (it'... [15:55:14] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10BTullis) In case it helps, I came across this abandoned change from 2020: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/6080... [17:28:52] topranks, XioNoX: ok to reboot rpki* for https://phabricator.wikimedia.org/T297180? AFAICT these are generally fine to reboot as long as one node is up and running [17:29:50] moritzm: yes as long as it is staggered it will be fine. Probably leave 10-15 mins between each if possible. [17:31:31] k, will start with rpki1001 now [17:51:52] both done now [17:54:51] ok thanks! [17:55:04] checked on a few routers and all sessions back up and looking healthy [17:59:17] great :-) [19:42:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) Ah, right! https://phabricator.wikimedia.org/T248865#6289287 So yeah, unless we can at least control the event format, we can... [19:45:53] 10Puppet, 10Infrastructure-Foundations, 10Readers-Web-Backlog, 10MobileFrontend (Tracking), 10User-Jdlrobson: Mobile site does not automatically redirect to desktop version (and not possible to use browser "use desktop view") - https://phabricator.wikimedia.org/T60425 (10Jdlrobson) [20:49:27] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10Dzahn) >>! In T272559#7552446, @jbond wrote: >>>! In T272559#7546852, @Dzahn wrote: >> icinga::nsca::client is used in fundraising. so there are special case... [21:06:39] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10Dzahn) > profile::beta::motd This isn't instantiated and does not have any include line elsewhere but it shows up like this: hieradata/cloud/eqiad1/deploy... [21:11:20] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10Dzahn) xdummy: T133183#7554483 [21:11:26] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10Dzahn)