[10:20:04] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10937739 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=35502db1-983a-49a5-aa11-a581fc67c467) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [13:32:50] 06Traffic, 10Liberica: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10938311 (10Vgutierrez) [13:58:17] 06Traffic, 06SRE, 03FY2025-26 WE 3.3.4 Reading Lists on Web: [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#10938446 (10Jdrewniak) [14:02:51] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10938453 (10taavi) We talked about this in the WMCS team meeting last week and the result was that this can go ahead. [14:03:06] 06Traffic, 06DC-Ops, 10ops-codfw, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10938456 (10elukey) @Jhancock.wm Hi! The I/F team is doing an hackathon this week so I'll try to work on this but I can't promise a lot of progress :( From a quick che... [14:19:44] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10938507 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2e68756a-6bdb-400c-9e31-f38ec1973312) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [14:59:36] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10938639 (10Vgutierrez) [15:05:55] sukhe: related to what you helped me with last week - one of the prep tasks for sunsetting m-dot next quarter is adding the missing "Vary" header to mobile responses in MediaWiki. Could you or someone else from Traffic let me know what we should take into account when deploying this? i.e. Does it matter whether we ride the train vs backport? The main theoretical issue I for see (as raised by Tim) is that adding a new header, even if [15:05:55] there are no variants today, is that it might change some cache key and thus churn the cache once (in Varnish, ATS, or both, for pageviews). https://phabricator.wikimedia.org/T390929#10706729 [15:06:28] not might as it, it's possible, but rather it's posible that it's possible - because I don't know what Varnish/ATS will do there, hence the question :) [15:23:58] Krinkle: I don't see any particular concerns with adding Vary as in far as ATS is concerned (that's where we will add this). IMO we should just set Vary on all responses [15:24:06] (instead of the more gradual rollout) [15:24:17] but I would definitely 302 to bblack and vgutierrez here too [15:48:37] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10938907 (10ssingh) I updated Markmonitor to further add the v6 glue records: ` ;; ADDITIONAL SECTION: ns0.openstack.eqiad1.wikimediacloud.org. 3600 IN A... [15:58:27] sukhe: RE "where we will add this" - do you mean we should synthesize a Vary resp header at the ATS layer? [16:00:17] We create the X-Subdomain header at Varnish, passes through ATS to MW. MW varies by this header, but it doesn't emit Vary:X-Subdomain today. Thats masked today by the fact that we also have m-dot URLs (m-dot URL == X-Subdomain:M, canonical URL == no X-Subdomain). As prep for not having m-dot domains, it is my understanding that MW should emit Vary:X-Subdomain resp header. [16:05:54] 06Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744#10939037 (10Fabfur) a:03Fabfur [16:07:53] inflatador: I've refreshed the changes on https://gerrit.wikimedia.org/r/q/topic:%22T387309%22 [16:31:32] Krinkle: sorry, was in a meeting. looking at this again, I think if need to add Vary to mobile responses, that should only be in VCL but still with a full rollout rather than gradual (to answer your first question). [16:32:03] I am really not sure about the MW-bit but we did add X-Experiment-Enrollments to Vary recently so a 302 to vgutierrez is in order here for confirmation and the above [16:32:08] including the MW side [16:33:14] hmmm if I understood correctly Krinkle plans MW to send the Vary: X-Subdomain back to the CDN [16:33:35] that doesn't require any change for us AFAIK [16:40:18] X-Experiment-Enrollments is different cause we decided to do that at the CDN to avoid reconfiguring every backend service behind the text and upload CDN clusters [16:42:01] we did that at once BTW and we didn't experience any major disruption in cache hitrate [16:42:18] ah right, edge uniques is limited to the CDN itself [16:45:26] vgutierrez: the other question was slow vs full rollout. I suggested full based on what I have seen in the past. but could use your confirmation. [16:46:08] yeah.. hence my comment about the full rollout for Vary: X-E-E [16:46:53] we should be OK with a full rollout [16:48:19] cool thanks! [17:36:52] 06Traffic, 06DC-Ops, 10ops-codfw, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10939544 (10Volans) Those new servers are of generations 17, that is the first one shipped with iDRAC 10 and a firmware version of 1.20.x.x. It's Redfish support is slig... [19:01:54] hmm sukhe i just remembered that we were able to rollout vary: x-e-e at once cause we didn't have any traffic setting x-e-e initially [19:02:11] but i dont think that's the case with X-Subdomain right? [19:04:41] no, not the case [19:26:03] what's the current volume of mobile traffic? [19:32:10] well. not sure what you are specifically looking for but usually mobile traffic exceeds desktop [19:32:37] mostly I would say [19:34:37] but what's your worry here for my own understanding? isn't adding a new header such as x-e-e worse than adding variance on an existing header? [19:49:23] nope [19:50:06] vary: x-e-e didn't get triggered for a few days till we started the first a/b tests [19:50:35] so we were letting varnish and ATS that cache will be varied based on that header without impacting cache hit rate [19:51:29] ok interesting then and I recall this bit now. [19:52:25] if we start varying the cache on X-subdomain and we get a lot of traffic with x-subdomain set we definitely are going to have some "fun" [19:53:56] so yeah in this case a staggered rollout makes sense [20:03:36] please comnent on the task :) maybe tomorrow. or I can on your behalf [20:03:54] I still have some questions but I will bother you tomorrow. [20:03:57] vgutierrez: ack, I know it doesn't require, functionally a change in the cdn, but I was wondering about potential cache churn impact when you go from having no variant to having a single variant that already matches. [20:04:34] for the canonical urls we're going from not setting x-sd to still not setting it but MW saying it varies on it. [20:05:25] I don't think mobile exceeds desktop. human-classified pageviews are a bit above 50% on mobile compared to desktop. But if we include crawlers and bots, I think that's tipped the other way. [20:06:01] and we have a TTL on them so presumably short of a purge, varnish/ats won't know about it for any given URL until it naturally replaces itself. [20:08:29] nvm, I take back that crawlers tip the balance a lot. They tip it basically back to 50/50. [20:08:45] Turns out 1 billion people is a lot of people. [20:10:41] It's like 9B mobile vs 5B desktop for presumed-humans, and then 12B/12B when we include crawlers. [20:14:17] monthly pageviews, that is. [20:15:46] if we syntehtiucally told varnish/ats that everything 'varies', that'd be a hard miss across the board, but that won't happen when MW introduces the header, naturally gradually given it's not discovered until a refresh or renew. [20:16:14] although I suppose it might be "learning" from logged-in pageviews a bit quicker than cache expiry [20:16:30] not sure if there's cross-talk for hit-for-pass like that. [20:17:03] is [20:17:26] it's a hard miss if we vary it and it starts seeing x-sd traffic [20:19:01] well.. varnish/ats won't see the vary at all till TTL on those URLs expires [20:19:24] cause there is no good reason to reach the backend layer while the cache is fresh [20:21:13] so TTL should give us some breathing time instead of [20:21:29] seeing a sudden decrease of cache hit rate [20:41:03] 06Traffic, 10Phabricator, 06Release-Engineering-Team: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661 (10Krinkle) 03NEW [20:41:49] 06Traffic, 10Phabricator, 06Release-Engineering-Team: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#10940001 (10Krinkle) [20:46:57] 06Traffic, 10Phabricator, 06Release-Engineering-Team: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#10940019 (10Krinkle) [21:41:24] vgutierrez: yeah, ok, that's my thinking as well. So I suggest we don't do an immediate-everywhere backport but rather ride the train this week? [21:41:37] that'll roll it out somewhat gradually over the projects, with most on Thursday. [21:42:20] Would you mind writing something on T390929 or is it okay if I summarize this chat? [21:42:22] T390929: MobileFrontend should declare "X-Subdomain" variance via "Vary" response header - https://phabricator.wikimedia.org/T390929 [21:43:57] sounds good. feel free to summarize this chat [22:49:22] [non-urgent] hello, traffic friends - FYI, I have a somewhat unusual DNS patch [0] that some of you may soon be added to via reviewer-bot [22:49:22] it's a procedure we've used before for soft-shutdown of a "we swear nothing uses this, but can't prove it" discovery service, which we'd like to use in turning down the swift-r[ow] services that caused confusion (again) today [22:49:22] in any case, I wanted to flag it here and offer to chat more about the plan in [1] if desired :) [22:49:22] [0] https://gerrit.wikimedia.org/r/c/operations/dns/+/1163055 [22:49:22] [1] https://phabricator.wikimedia.org/T376237 [22:53:26] I am going to deploy this varnish config change now, if there are no objections: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1161727