[05:56:25] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) Thanks everyone! I will get this scheduled for Thursday 3rd Feb at 9:00AM UTC [05:56:35] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [10:08:38] XioNoX, topranks: did you do anything this morning on asw1-b13-drmrs? I saw the homer live config check failures and I was having a look but cannot repro it right now [10:09:08] volans: I think it was due to telxius failure [10:09:44] ack [10:10:10] Ah ok. Wasn’t looking at it myself anyway volans [10:24:54] sent patch, I've added you two [10:29:00] thanks! [10:29:02] +1 [10:32:54] godog: hello! not sure if expected but prometheus1003 is saturating its uplink https://librenms.wikimedia.org/device/device=160/tab=port/port=14116/ [10:33:40] (it's a 1G NIC, so not an issue network wise) [10:34:45] XioNoX: thanks for the heads up! it is yeah I'm transferring data to the new hw, should I limit it down or the interface is fine even if busy ? [10:36:06] yeah it's fine in term of overall network saturation, but of course if anything uses that server for anything else they will experience retransmits/slowdowns [10:41:09] yeah good point, I'll rate limit rsync a bit [12:16:48] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [13:42:55] XioNoX, godog: I've noticed that on alertmanager the BGP alert for esams has as "instance" label the FQDN ("cr2-esams.wikimedia.org"), while the others seems to just have the hostname. Is that intended? [13:43:28] no idea :) [13:45:47] 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10Observability-Alerting, and 2 others: ops-monitoring-bot creating dupes - https://phabricator.wikimedia.org/T226908 (10lmata) [13:49:38] 10Puppet, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Icinga alert for hosts with no Puppet roles - https://phabricator.wikimedia.org/T238006 (10lmata) [13:50:30] 10Mail, 10Icinga, 10Infrastructure-Foundations, 10Observability-Alerting, and 2 others: fix/streamline mail routing off of neon - https://phabricator.wikimedia.org/T80890 (10lmata) [13:52:11] volans: 'bgp peer above prefix limit' ? that's librenms vs icinga, but I hadn't notice that before no [13:52:22] not sure if it is an issue heh [13:56:50] godog: if we need to group by host I'd say it is [13:57:23] for example if we want to replicate what we have in spicerack for icinga for alertmanager, we need a way to group by host, to have a concept of "host is green" [13:59:11] yeah so no outstanding alerts, but sure we can change it in librenms [14:00:58] I noticed looking at the outstanding ones and yes is the 'BGP peer above prefix limit' one [14:09:53] yeah that reminds of T293198 again and whether to have the port in 'instance' for alerts [14:09:54] T293198: Strip port from "instance" label on outgoing alertmanager alerts - https://phabricator.wikimedia.org/T293198 [14:10:48] most don't have the port, some do, like centrallog2001:3903, kafka-jumbo1006:7800 [14:11:39] indeed, the "compat" alerts from icinga-exporter don't have the port because we didn't add it, but should [14:12:08] and the native prometheus ones do yeah [15:02:36] my machine is acting up here guys, trying to join meeting [16:06:20] 10netops, 10Data-Engineering, 10Data-Engineering-Kanban, 10Infrastructure-Foundations, and 3 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) 05Open→03Resolved [16:06:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Epic: Capacity planning for (& optimization of) transport backhaul vs edge egress - https://phabricator.wikimedia.org/T263275 (10JAllemandou)