[09:42:50] jelto: feel free to merge my puppet change [09:43:11] thanks, I was just about to ask ! I'll merge now [09:43:29] :) [09:44:30] merged! [13:01:28] elukey, btullis, https://turnilo.wikimedia.org/#wmf_netflow/ is broken again [13:01:59] XioNoX: OK, sorry about that. Looking now. [13:04:04] thanks! I need to dig into netflow to figure out why we're using much more transit than expected (and that's going to cost us) [13:04:39] https://librenms.wikimedia.org/bill/bill_id=3/ NTT bill :) [13:07:31] Got it. I have restarted turnilo. The dashboard looks as it should now, I believe. I will take ownership of that ticket T351731 and put it on our board to have a proper look at a permanent solution. [13:07:32] T351731: Turnilo: invalid transforms on wmf_netflow dashboard - https://phabricator.wikimedia.org/T351731 [13:07:53] btullis: yep, looks good, thx for the quick turnaround [13:08:08] yw [13:15:05] vgutierrez, are you aware of any change over the weekend that would increase the amount of requests we receive? [13:15:23] XioNoX: which cluster? [13:15:27] text | upload? [13:15:46] vgutierrez: it looks organic, both [13:16:23] quite a lot of events during this weekend could explain an organic increase on traffic [13:16:53] mostly upload though [13:17:30] XioNoX: I am going to file a task for the cr3-ulsfo flapping on the weekend; will add you [13:18:01] btullis: the url shortener feature stoped working too on turnilo "Couldn't create short link" [13:18:26] vgutierrez: oh you think it could be news related? [13:19:03] for example https://w.wiki/Afpg [13:20:46] sukhe: thx [13:21:55] XioNoX: That may be my fault [13:22:00] XioNoX: Yes, I see. There is an error in the log, which seems to be related, but it's the first time I've seen it. [13:22:11] The turnilo short link error [13:22:22] I switched to use mw-api-int instead of the bare metal clutser [13:22:24] cluster* [13:22:29] I'd tested it though [13:22:54] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053657 [13:23:11] claime: Oh yeah, looks likely. The error is `RequestError: Error: tunneling socket could not be established, statusCode=403` [13:23:12] XioNoX: zooming out to the whole month per cluster we aren't seing an increase [13:23:32] XioNoX: see https://grafana.wikimedia.org/goto/PmvNzqlSg?orgId=1 [13:23:39] btullis: must have had the request url cached when I tested... [13:23:56] 403 though [13:28:39] vgutierrez: ah yeah, thx! I see what happened [13:29:03] https://librenms.wikimedia.org/graphs/to=1721049900/id=11610/type=port_bits/from=1720445100/ [13:29:03] vs. [13:29:03] https://librenms.wikimedia.org/graphs/to=1721048400/id=11611/type=port_bits/from=1720443600/ [13:29:55] dunno what happened in HE's network, but we're not doing much traffic with them anymore, all shifted to NTT (cc topranks) [13:32:15] and there is the weekend's events too [13:34:31] url shortening in turnilo should work again, puppet patch incoming [13:45:53] <_joe_> !oncall-now [13:45:54] Oncall now for team SRE, rotation business_hours: [13:45:54] s.ukhe, v.olans, e.ffie [13:46:28] <_joe_> sukhe / volans / effie please don't use conftool/requestctl on puppetmaster2001 for the next 10 minutes, I'm testing the current upgrade [13:46:32] claime: great, thanks! [13:46:41] noted! [13:48:02] ack [13:54:21] XioNoX: yeah odd switch, overall bw looks up also (i.e. not just a shift) [13:54:21] https://grafana.wikimedia.org/goto/n-qei3_IR?orgId=1 [13:54:54] topranks: the HE session bounced, and traffic naturally preferred NTT [13:55:04] and I'm wrong, my graph is missing HE [13:55:22] we treat HE as a transit as otherwise it "attracts" too much traffic [13:55:43] one option could be to bounce the NTT session, but it's not viable long term :( [13:56:23] yeah that seems to random [13:57:14] and not sure we want to add a new community/local-pref for "somewhere between transit and peering" :) [13:58:42] usage on the NTT link isn't insane, do we need to force it back? [13:59:20] topranks: we're above quota : https://librenms.wikimedia.org/bill/bill_id=3/ [13:59:21] an in-between local-pref is still gonna come down to a tie-break where one wins, so setting HE to "between" pref is probably gonna be not much different to setting them to "peering" pref? [13:59:36] ah yeha good point [14:00:16] it's also not going to be that much of a cost increase, so maybe it's better to just let it be [14:00:20] but overall no objection to either really if we've a good reason to push the traffic back to HE [14:01:35] in general it's a balance between cost and complexity on the network, and we shouldn't go too far down the road of the latter imo