[08:14:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017#10524070 (10ayounsi) Sure, as usual for power/console/mgmt. Regarding production ports : On the ssw1 side: `use `et-0/0/7` towards e8 and `et-0/0/15` tow... [10:09:06] A question for the trafficserver experts. I have a mapping for query.wikidata.org and query.wikidata.org/querybuilder in the backend.yaml. But the mapping for /querybuilder is not recognized (istio answers instead of apache). [10:09:06] My conclusion is that the order of the entries might be relevant. [10:09:06] So does this reordering makes sense and could solve the issue? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1117498 [10:12:54] jelto: I'm no expert but I think this might be relevant: https://docs.trafficserver.apache.org/en/9.2.x/admin-guide/files/remap.config.en.html#precedence [10:16:26] thanks a lot for the docs! The docs clearly state "that the order of the rules matters". So it should make sense to move the mapping of /querybuilder up before the generic mapping [10:21:33] yes, if both have a host (or both don't) [10:22:00] it seems there is a similar example [10:22:13] "In the above examples, the second rule is never applied because all URLs that match the second rule also match the first rule. " [10:23:21] jelto: I would add a comment maybe to remind to keep the / rule at the end [10:23:28] of the block [10:28:30] volans: I added a comment to very query endpoint in patchset 2. Or do you mean just one single comment at the end? [10:30:12] that's fine [11:02:48] jelto: CR looks good but the /querybuilder endpoint doesn't [11:03:28] +1ed anyways cause it definitely is out of scope for that CR [11:04:43] thank you for the +1 and uncovering the other issue. I'll take care of the http part after fixing the mapping for querybuilder [11:10:25] IIRC it's an outstanding issue [11:11:32] aka canonical wikidata URLs are still using http instead of https [11:39:46] 10netops, 06Infrastructure-Foundations, 06SRE: Extend sre.network.configure-switch-interfaces cookbook to add sflow and qos config - https://phabricator.wikimedia.org/T379549#10524854 (10cmooney) 05Open→03Resolved [11:50:31] 10netops, 06Infrastructure-Foundations, 06SRE: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10524944 (10cmooney) 05Open→03Resolved >>! In T381175#10520327, @ayounsi wrote: > For (1) we can have the `sre.ganeti.addnode` cookbook call... [12:54:00] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10525215 (10cmooney) >>! In T384731#10516013, @ayounsi wrote: > An alternative (or short term solution... [12:56:10] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Migrate port utilisation alert from LibreNMS to alertmanager - https://phabricator.wikimedia.org/T384052#10525220 (10cmooney) >>! In T384052#10516521, @ayounsi wrote: > I'm wondering if we could re-write the "instance" in Prometheus t... [14:22:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:42:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:49:11] 10netops, 06Infrastructure-Foundations, 10ops-magru: Jan 2025 - Magru core router connectivity blips - https://phabricator.wikimedia.org/T384774#10525590 (10cmooney) I've added BFD to this particular session now. Not that it will fix things but it should give us more datapoints for the (likely) case with Ju... [15:41:02] 10netops, 06Infrastructure-Foundations, 10ops-magru: Jan 2025 - Magru core router connectivity blips - https://phabricator.wikimedia.org/T384774#10525894 (10ayounsi) Good idea regarding BFD. From https://supportportal.juniper.net/s/article/Observing-BGP-IO-ERROR-CLOSE-SESSION-error-logs-when-BGP-protocolgoes... [16:46:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:51:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [17:29:36] sukhe: hi. thanks for having the conversation with the researchers. They are energized by it and just wrote that they may follow-up with me in the coming weeks if their work requires access to non-public data. [17:30:39] sukhe: fyi, I shared with them what I share with everyone else. I'm very happy to consider Formal Collaboration proposals (which is required for me to approve access to non-public data) if there is a team in WMF that has the area of research prioritized as a goal on their end (essential or okr). [17:31:33] sukhe: unfortunately I won't be able to prioritize without that alignment b/c the program is somewhat (by design) heavy handed and requires resources on multiple teams (including SRE:). [17:38:51] leila: thanks! I am glad they found it useful [17:38:56] :) [17:39:08] and also for sharing the above context, that makes sense [17:39:34] if you think it helps, we can set up some time to talk about this [17:39:46] for June onwards planning [17:40:30] I speak only for myself here but perhaps we can see what the extent of input is required by SRE and take it from there? [17:40:51] that is only if you and Research want to do the same [17:42:19] there might be some overlap with WE3.4 [17:43:11] bonus points if we can roughly scope it this week so we have something to talk about next week at SRE Summit [17:44:08] yep! [18:43:54] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10526733 (10cmooney) 05Resolved→03Open [18:48:23] 06Traffic, 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work, 10Experimentation Lab Radar: Cookie % has been rejected because it is foreign and does not have the "Partitioned" attribute - https://phabricator.wikimedia.org/T375256#10526751 (10YaroslavDubo) [20:33:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [20:58:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:05:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:12:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10527296 (10cmooney) @Jhancock.wm no rush but putting down while I remember. Whenever you've a chance it'd be good to do a revi... [23:15:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:51:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10527341 (10Jhancock.wm) we received two each of these. SFP-GIG BASE-T RJ45 SFP28-25GE-LR SFP+ 10GE-LR QSFP28-100GB-CWDM4 QSFP...