[09:04:58] 10Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744 (10MoritzMuehlenhoff) I'll prepare the respective OpenSSL 1.1 forward ports. I'm optimistic I'll have something ready before the holiday break. Given haproxy's importance for our DDoS resiliency this seem... [09:19:23] 10Traffic, 10Data-Engineering, 10Movement-Insights, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10Vgutierrez) VCL patch submitted by @Ottomata (https://gerrit.wikimedia.org/r/c/operations/puppet/+/981352) looks good to me, @elukey C... [09:30:43] 10Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744 (10Vgutierrez) >>! In T352744#9398828, @MoritzMuehlenhoff wrote: > I'm wondering though if we reproduced this with the pilot bookworm cp installation? The pilot cp bookworm installation on cp4052 (upload@... [09:36:40] 10Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744 (10Vgutierrez) HAProxy 2.9 has been released, introducing AWS-LC support and with some interesting mention to OpenSSL [[ https://www.mail-archive.com/haproxy@formilux.org/msg44400.html | on its release no... [09:55:16] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, 10serviceops: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10ayounsi) 05Stalled→03Resolved Automation is up and running. Doc updated: https://wikitech.wikimedia.org/w/in... [12:34:30] 10Traffic, 10Data-Engineering, 10Movement-Insights, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10BTullis) Either approach seems fine to me and I don't have strong opinions on which is better. I have +1d the VCL change based on @Ott... [13:44:50] 10netops, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10fgiunchedi) [13:54:11] 10netops, 10Infrastructure-Foundations, 10ops-codfw: cr2-codfw:xe-1/0/1:1 down - https://phabricator.wikimedia.org/T353256 (10ayounsi) p:05Triage→03High [13:59:50] 10Traffic, 10Data-Engineering, 10Movement-Insights, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10Ottomata) > updating the varnishkafka JSON format Also, this would require schema changes in Hive. So ya let's go with VCL! [14:01:28] 10Traffic, 10Data-Engineering, 10Movement-Insights, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10JAllemandou) > So ya let's go with VCL! +1 [14:04:07] 10netops, 10Ganeti, 10Infrastructure-Foundations: prometheus5002 unable to ping ipv6 ganeti500[74] eqsin - https://phabricator.wikimedia.org/T353254 (10ayounsi) Thanks for finding the issue! The host lost its IP in favor of a SLAAC IP ` ganeti5007:~$ ip -6 addr 1: lo: mtu 65536 state... [15:04:58] I'll try again now that people are awake :) Horizon.wikimedia.org is defined in profile::trafficserver::backend::mapping_rules; is there any way to adjust the connection timeouts for that service? I'm trying to upload an enormous file and after a few minutes it errors out with a 'this site is down' message. [15:22:04] hi andrewbogott can't promise anything but I'm checking :) [15:24:02] did you already opened a ticket for this? [15:25:12] andrewbogott: is it the connection timeout that's an issue, or the total request time? [15:30:08] Good question, probably total request time [15:50:50] andrewbogott: what timeout are you hitting? :) [15:51:26] we do support large uploads to commons.wm.o [15:51:53] so what's the behavior that you're experimenting at the moment? [15:52:09] to answer your original question, no, we don't set timeouts per mapping_rule [15:52:18] I'll run another test in a few, I'm briefly out of the house [16:13:37] andrewbogott: also.. what are you trying to upload (in terms of size)? [16:17:04] 10Traffic, 10Data-Engineering, 10Observability-Logging: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Milimetric) >>! In T351117#9379025, @Fabfur wrote: > Hi @Milimetric sorry for the late reply, I'll try to answer to your question but consider we're still investig... [16:20:19] ok, I'm back (barely). I'm uploading a VM base image, 500 megs. I'm running a test now to see how long it uploads before failing... [16:23:34] It fails after around 3:30 with a message 'Our servers are currently under maintenance or experiencing a technical problem.' That last is why I thought it might be the proxy layer but I haven't yet investigated what's happening on the backend. [16:35:05] yeah it would be helpful to understand the case better at an HTTP level. Are you still transmitting packets of that 500MB regularly when it happens? or did everything buffer up and kind of halt because horizon can't ingest it as fast as you want to send, or? [16:35:23] there's a lot of different kinds of timeouts that can happen in the stack, some in just the rx or tx direction at just one layer, etc. [16:36:34] would be feasible to use the openstack cli to upload these VM images (eg. from a bastion host using local storage) to check if it's something directly related to horizon? [16:36:35] comparing what's happening here to large commons uploads might be interesting too [16:56:22] 10Traffic, 10Release-Engineering-Team: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) [16:57:34] 10Traffic, 10Release-Engineering-Team: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) p:05Triage→03Medium [16:58:38] andrewbogott: could you provide full request/response headers please? :) [17:00:26] 10Traffic, 10Release-Engineering-Team: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10Vgutierrez) [17:27:18] Sorry all, I disappeared into a meeting. I'll try to gather some more info and then will reappear. This is all buried deep in a docker container so it's tedious to extract extra logging [17:56:13] vgutierrez: here is what I'm able to get so far: https://phabricator.wikimedia.org/P54344 [17:56:36] I should also note that after discussion we maybe care less about this use case than I thought. So you should only pursue this if you're curious, otherwise I'll set it aside for now. [17:58:25] Err you might want to invalidate that session:) [18:03:01] It is [18:38:41] 10Traffic, 10Release-Engineering-Team: CI on gitlab for eBPF / networking heavy projects - https://phabricator.wikimedia.org/T353279 (10thcipriani) I think this is a case where it would make sense to bring your own runner to GitLab—that is, this is a use-case that would be likely to disrupt other users, so a s... [21:09:35] 10Traffic, 10Infrastructure-Foundations, 10vm-requests: eqiad: 1 VM request for acme-chief - https://phabricator.wikimedia.org/T353295 (10BCornwall) [21:14:01] 10Traffic, 10Infrastructure-Foundations, 10vm-requests: eqiad: 1 VM request for acme-chief - https://phabricator.wikimedia.org/T353295 (10MoritzMuehlenhoff) LGTM, but better use 20G for the disks, we have plenty of storage on the ganeti servers and that leaves some wiggle room for logs and e.g. kernel images...