[04:16:57] (EdgeTrafficDrop) firing: 42% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [04:21:57] (EdgeTrafficDrop) resolved: 62% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [06:49:56] (EdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [07:04:56] (EdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [08:40:27] 10netops, 10Infrastructure-Foundations: Paramiko > 2.8.1 incompatibility with some Juniper devices - https://phabricator.wikimedia.org/T299482 (10ayounsi) p:05Triage→03High [10:36:02] XioNoX, ema: do we have data about the expected RTT between a final user and the preferred PoP? [10:36:28] vgutierrez: what do you mean? [10:37:53] I need to configure a specific envoy timeout [10:37:54] https://www.envoyproxy.io/docs/envoy/v1.18.3/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto.html?highlight=delayed_close_timeout [10:38:11] To be useful in avoiding the race condition described above, this timeout must be set to at least +<100ms to account for a reasonable “worst” case processing time for a full iteration of Envoy’s event loop>. [10:38:29] so I was wondering what it would be a safe value without going extra high [10:39:33] vgutierrez: between any possible user on the planet and any PoP? (as we might depool one and they will go to some less-preferred ones) [10:40:08] the default value is 1 second, and that's too low [10:40:16] indeed [10:40:27] so I need a sensible value without setting it to 5 minutes ;P [10:40:39] 4min59s ? [10:40:42] lol [10:40:59] vgutierrez: maybe 302 performance team too? [10:41:08] they have all the latency data [10:42:04] latency wise 5s for a packet RTT is already quite high, unless you count retransmits [10:46:11] I think you should be able to get some data from the ripe atlas network and we have credits AFAIK [10:46:46] 10netops, 10Infrastructure-Foundations, 10SRE: Paramiko > 2.8.1 incompatibility with some Juniper devices - https://phabricator.wikimedia.org/T299482 (10ayounsi) 05Open→03Resolved a:03ayounsi Workaround pushed. [10:47:31] but if you're looking for the slowest connection then the limit will endup being quite high [10:50:47] 10netops, 10Infrastructure-Foundations, 10SRE: all network devices must run OpenSSH >= 7.2p1 but != 7.4p1 - https://phabricator.wikimedia.org/T254013 (10ayounsi) Juniper bumped their recommended version to at least Junos 20 on a lot of platforms. * pfw: T295691 * cr: T295690 * mr: T278289 [11:20:30] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10ayounsi) 05Stalled→03Declined Not needed anymore. [15:05:27] vgutierrez: I suggest 20 seconds [15:11:37] cdanis: thx [15:11:47] I've added 30 on the initial CR, not so far :) [15:11:52] s/added/set to/ [15:16:12] CR updated [15:26:14] 30 sounds fine too and it does seem like the kind of setting where in general over-estimating is better [18:06:38] 10Traffic, 10Data-Engineering, 10SRE, 10Patch-For-Review: VarnishKafka to propagate user agent client hints headers to webrequest - https://phabricator.wikimedia.org/T299401 (10phuedx) @JAllemandou: @elukey highlighted that we (Data Engineering and other stakeholders) should agree on the names for these he...