[08:39:08] Hi there. I plan to enable collection of Client Hints data in the UTC afternoon backport window today. I wanted to get a quick check that this won't cause problems from a performance standpoint. Collection is already enabled on group0 and most group1 wikis. [08:39:09] The two things that me and Kosta have discussed could be problem from a performance standpoint are: [08:39:09] A ResourceLoader module is included for all page views for browsers that support Client Hints (at the moment browsers based on Chromimum) [08:39:09] After all edits a POST request is made to a CheckUser extension REST endpoint. [08:39:09] Thanks. If you would like a task and/or to delay enabling on all wikis I can do so. [08:39:36] (For clarity, this is part of my work as a contractor under the username WBrown (WMF)) [08:40:24] Furthermore, to be clear collection is enabled on all group0 and group1 wikis except commons and wikidata. [08:41:07] (Deployment task is T341110) [08:41:07] T341110: Deploy client hints functionality - https://phabricator.wikimedia.org/T341110 [08:42:08] <_joe_> so if it's group0 and group1 except the highest traffic wikis I am not worried [08:42:27] <_joe_> but I want to udnerstand one part of what you said [08:42:33] <_joe_> Dreamy_Jazz> A ResourceLoader module is included for all page views for browsers that support Client Hints (at the moment browsers based on Chromimum) [08:42:46] <_joe_> this means an additional call to load.php, in practice? [08:42:56] <_joe_> will that be easily cachable? [08:43:33] I'm not entirely sure, but the relevant PHP code is "$out->addModules( 'ext.checkUser.clientHints' )" [08:44:44] My plan would be to enable on all wikis today, so that would be enabling on all group2, wikidata and commons. [08:46:33] <_joe_> I'd let the folks from traffic chime in [08:48:01] The RL module is loaded in BeforePageDisplayHook, so it should be bundled with any other RL modules that are loaded on the page already [08:50:20] That is my understanding too. [09:06:27] _joe_: any thoughts on who to ping from traffic? Or should we just comment in #wikimedia-traffic? [09:07:23] <_joe_> kostajh: oh so it would be bundled in a single call to load.php [09:08:02] <_joe_> so while doubling the caching space for load.php, that's still irrelevant [09:08:27] <_joe_> i don't think there is any reliability worry then [09:09:01] ok [09:09:35] topranks: eoghan: Heads up, moving to 2% global traffic on mw-on-k8s [09:09:51] claime: Exciting! Good luck. [09:15:59] Thanks for thoughts. I will proceed with collection on all wikis unless objected to in the UTC afternoon backport window. [10:06:37] Dreamy_Jazz: do you check if the browser supports the client hint and then inject the module? [10:06:45] Yes [10:06:55] For logged out users that will lead to cache poisoning on the edge [10:07:22] As html is cached on the edge and shared between browsers [10:07:33] Only split is based on URL [10:07:41] Okay. Do you suggest removing this check? [10:07:48] Yes [10:08:08] Thanks. Will file a task to remove it. [10:08:44] Would you suggest halting the roll-out until this is fixed? [10:09:05] On top of that. Does it run anything specific? Make sure it short circuits in js if not supported [10:09:10] Yes please [10:09:38] Amir1: https://github.com/wikimedia/mediawiki-extensions-CheckUser/blob/master/src/HookHandler/ClientHints.php#L53 is relevant code [10:09:54] Amir1: also https://github.com/wikimedia/mediawiki-extensions-CheckUser/blob/master/modules/ext.checkUser.clientHints/index.js [10:10:06] The first line of the JS file checks for support and if not, it just stops execution of the JS file at that point. [10:10:14] As linked above in the second link. [10:10:36] Php wise I have a good understanding of what is happening but didn't know js wise [10:11:26] navigator.userAgentData provides access to the Client Hints API. If this is undefined (by checking using "!") then the browser doesn't support Client Hints. [10:11:27] One last question: I need to read the code but exactly why client hints are done js side? [10:11:44] This was to make it possible to use the postEdit hook [10:11:54] For interfaces that might not use the standard way of editing [10:12:00] i.e. DiscussionTools etc. [10:12:30] It's also not possible to set client hint headers selectively, e.g. just for API requests [10:12:50] ^ [10:13:04] Hmm. OK. Makes sense. Maybe it shouldn't be a separate RL module due to expensive nature of RL modules numbers [10:13:07] At least in Chrome's implementation. Either the header is set for the page, in which case, client hint headers are sent with API requests. Or they are not, and API requests cannot have client hint headers. [10:13:14] But that's fir later [10:13:42] Furthermore, if a user loads the edit form without intending to edit, the non-JS way would mean having to ask for Client Hints data then. [10:14:13] Which would mean if the user clicks on the "edit" window then exits, the next page they load on the wiki would have Client Hints data sent to it [10:14:21] Even if that wasn't submitting an edit [10:14:33] That's not too much of a loss tbh [10:15:01] Not happening often etc. [10:15:05] I think we are straying from discussion about what we need to do to enable in group2; we could discuss design decisions in the relevant phab task, e.g. T337944 [10:15:06] T337944: Implement support for requesting client hint header - https://phabricator.wikimedia.org/T337944 [10:15:13] Sure. [10:15:29] Yeah. I am trying to understand it [10:15:30] I will remove the enabling from the deployment window that is coming up. [10:16:12] Thanks [10:27:06] (discussion moved to #chrome-ua-deprecation on WMF slack, for any interested WMF staff.) [10:32:24] :( [10:38:02] I can copy you in taavi if wanted. [10:40:39] XioNoX, topranks: could I ask one of ye to give this a look please? simple enough homer-public change but just wanted to be sure to be sure :) https://gerrit.wikimedia.org/r/951439 [10:42:33] * topranks looking [10:45:15] hnowlan: lgtm [10:45:21] topranks: thanks! [10:45:37] cmd to run: homer cr*eqiad* commit "Add new K8s peers" [10:45:56] let me know if you've any trouble [10:45:59] ah cool, thanks! [10:58:36] topranks: I should do the same for cr*codfw* too right? [10:59:52] hnowlan: yep [10:59:53] hnowlan: sorry yep indeed you should :) [11:00:15] cool! [12:23:17] heads-up, I'm rebooting grafana1002 shortly, should be back in ~3 min [12:52:03] heads-up: we will be repooling esams today, in ~8 mins [12:53:16] <_joe_> https://media.tenor.com/MdHCDjq5R-IAAAAC/diving-pool.gif [12:58:57] _joe_: hopefully this but smoother [12:59:49] <_joe_> oh you mean like https://media.tenor.com/o3oR3bxwseYAAAAd/diving-fail.gif [13:00:12] this works [13:00:36] <_joe_> well godspeed, the time has come :) [13:01:21] <3 [13:21:24] esams is repooled [13:23:09] nice! [13:28:10] <_joe_> uhm did we repool esams for all its traffic? [13:28:22] <_joe_> I'm seeing a surge 4x of traffic to appservers right now [13:28:31] <_joe_> I guess that's what we get with esams with a cold cache [13:28:50] <_joe_> sorry not 4x, 1.5x [13:28:55] _joe_: yes it was repooled [13:29:35] <_joe_> topranks: yeah I was suggesting maybe next time we pool a dc that funnels half of our traffic during peak hours we do it progressively :) [13:31:24] <_joe_> https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red?orgId=1&refresh=1m&viewPanel=65 it keeps growing [13:31:36] <_joe_> we might need to move geodns for some regions to another dc [13:39:42] in the past we've also done it off-peak [13:40:22] <_joe_> cdanis: yeah either / or at least :) [14:07:56] I'm going to disable Puppet briefly (~ 5m) [14:11:41] sukhe: T344704 [14:11:42] T344704: Blocked on English Wikipedia / Wikimedia thinks my IP is 10.80.1.11 - https://phabricator.wikimedia.org/T344704 [14:11:54] that is probably because that one mediawiki config file is missing the ip ranges used by new esams [14:12:32] 11.1.80.10.in-addr.arpa domain name pointer cp3067.esams.wmnet. [14:12:49] 10.80/16 is the new IP range [14:12:58] and by "that one mediawiki config file" I mean https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/reverse-proxy.php [14:13:08] taavi: oh [14:13:23] yeah, these are the old ones [14:13:31] we need to update these [14:13:46] can we (easily) depool in the meantime, or would that take more time than fixing? [14:13:54] more time than fixing [14:14:10] a mw config push will be quicker than the DNS ttl [14:14:24] I can prep the patch with the new ranges. someone can help push the change? [14:14:29] I'm pretty sure that file only requires the private* ranges, as we don't have any caches in the public vlans and nothing else relevant for that file in the edges [14:14:37] I can push it out [14:14:46] thanks, I am working on the patch [14:19:26] puppet has been re-enabled [14:26:12] taavi: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/951508 [14:26:17] ranges have been reviewed [14:26:25] you should have all the rights to push this? [14:27:06] sukhe: it's failing the php linter, looks like a missing comma between elements [14:27:15] fixing [14:27:52] probably tabs vs spaces there too [14:27:55] fixing to make all tabs [14:28:04] I have root now, but yes I've had mw deployment rights for a while now [14:28:51] I'm pretty sure for the edges we only need the private ranges in there, but we can fix that later [14:28:55] taavi: yes [14:28:57] we discussed that [14:29:07] but because the other public ranges were there, we just decided to be extra safe for now [14:30:44] taavi: thanks for pointing this out and taking care of the deploy [14:30:56] we are going to put this in the puppet repo as well so when we update stuff there, we update this as well [14:30:59] when we do a new site [14:31:51] +1 to that, I'm pretty sure we got bit by this when we brought the first appservers in the eqiad expansion up too (and I barely remembered this before drmrs went live) [14:36:49] note to selves when we get around to cleaning up that file, is we should probably look at adding the localhost IPs to the set as well [14:37:16] I will start by adding the references to update it and then we should clean it up [14:37:17] (we've had issues in the past where we wanted to have some of our proxy layers use the real loopback IP, and then it broke the world because it's not in this file) [14:38:55] also, it makes me question what impact this had on those reqrates at the applayer (and transport saturation) [14:39:22] if we had bots that were failing because of the reverse-proxy list, they might've been spamming uncached requests due that, rather than our actual hitrate issues [14:40:30] fix is live [14:40:42] thanks! [14:44:30] https://gerrit.wikimedia.org/r/c/operations/puppet/+/951514/ [14:44:40] I will do the cleanup later but just so we don't forget this o