[11:22:14] hashar: Any chances https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114966 could have made gerrit act funny? I cannot +1 any changes [11:23:09] hashar: I've tried with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114969 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114971 [11:23:33] The +1 button isn't there anymore [11:25:01] oh, yeah, my CR now has no +1 or +2, only "Submit". YOLO time :-D [11:26:05] the whole code review line is gone, there is only the verified one [11:26:47] I also got a banner with a loading error, but cannot repro it anymore [11:29:44] Are sre-collab looking at this? [I gather this is how to ping them on IRC] [11:30:20] they have a chan [11:30:29] This is also the way :) [11:30:48] (I was going by https://office.wikimedia.org/wiki/Team_interfaces/SRE_-_Collaboration_Services/Team ) [11:31:04] bah s/Team/Request WTF [11:31:52] this was merged recently https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114966 and probably needs a rollback. I pinged hashar [11:32:12] jelto: yeah, I mentioned it above as possible issue [11:33:58] I'll create a revert [11:35:02] I will +1, oh wait! [11:35:07] :D [11:35:16] I'm sure you can via the API [11:36:05] lol, I'll self +2/+2, merge it and run puppet on the gerrit host [11:41:04] change is merged but puppet run was a noop. I'm not sure if there is anything else which is needed to update the puppet repo settings [11:41:15] Are the buttons back for you? [11:41:17] jelto: It seems to have worked though [11:41:20] jelto: Yep [11:41:24] great [11:41:38] Thanks [11:42:02] WFM also, thanks :) [11:45:09] marostegui: sorry I broke it, I wanted to monitor the effect of the change and well... I missed it :) [11:45:15] thank you for the revert jelto ! [11:45:50] I guess I should try on the test/gerrit-ping repo first :b [12:06:47] I'm seeing a change in the cookbooks.sre.dns.netbox cookbook due to cloudgw1004 [12:07:29] pinging @dcaro @arturo [12:07:40] hey [12:07:50] federico3: what would be the diff? [12:08:09] @arturo https://phabricator.wikimedia.org/P72739 [12:08:34] federico3: thanks, please proceed [12:08:44] ok, proceeding [12:09:03] the server was recently relocated [12:09:05] done, thanks [12:09:15] thanks [14:03:48] for some context that output is actually from the "sre.puppet.sync-netbox-hiera" cookbook, which gets triggered by the dns one [14:04:08] not a dns change as such [14:19:12] ottomata: re: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1114340 anything that should be done after merging ? [14:31:43] godog: hi! I have to defer to joseph and gabriele on that one. some discussion in slack here https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1738058018842099 [14:49:09] I'm trying to upload a debian pkg using https://wikitech.wikimedia.org/wiki/Debian_packaging#Upload_to_Wikimedia_Repo but getting some errors ( https://phabricator.wikimedia.org/P72760 ). Any suggestions? [14:51:14] you don't need opensearch-pkgs/ [14:51:40] component and the changes files should be enough [14:55:10] sukhe the `opensearch-pkgs/` part points it to the dir w/the pkg, do I need to put it in a specific dir? I tried running `sudo -i reprepro -C thirdparty/opensearch1 include bullseye-wikimedia opensearch_1.3.20_amd64.changes ` from inside the opensearch-pkgs dir and it's still giving an error [14:56:13] maybe it's different for thirdparty repo? hmm [14:56:25] inflatador: -C should take care of that ... I think? what is the error you get with the above? [14:56:48] `Error opening 'opensearch_1.3.20_amd64.changes': No such file or directory [14:56:48] There have been errors!` [14:56:56] I think there is an envvar I need to set...let me check [14:57:09] yeah, probably this https://phabricator.wikimedia.org/P19522$26 [14:57:28] ah, yeah, I just do sudo -H bash for that from my ~ [14:59:49] OK, making progress. Looks like it doesn't like the "changes" file generated by upstream...will work on it more. Thanks! [15:02:17] 'Sleeping for 20s to avoid race conditions...' always fills me with confidence :D [15:12:53] ...and a surprising amount of "do you really mean it?". I mean, I'm trashing 432T of storage, that's nothing really :) [15:16:10] Prompts? Bah! Give me a big red "self-destruct" button ;P [15:17:22] Delete All The Things! [15:17:29] (definitely not the data persistence team motto) [15:17:46] Emperor: ot [15:17:50] DATT-persistence [15:17:53] it's 2025, that can be the motto [15:19:00] :) [15:22:48] Should we be worrying about '/var/lib/puppet/lib/puppet/reports/logstash.rb:20: warning: Socket.gethostbyname is deprecated; use Addrinfo.getaddrinfo instead.' ? [15:29:05] not worried, since it's just a deprecation warning for now, but we should fix it since it's an internal module and the deprecation will turn into an error with some later Ruby release, can you please open a task and tag it with SRE Observability? [15:38:45] ottomata: nice, thank you! [16:06:58] moritzm: {{done}} T385058 [16:07:03] T385058: logstash.rb uses deprecated Socket.gethostbyname - https://phabricator.wikimedia.org/T385058 [17:21:02] reaching out from a discussion on Slack (please no pitchforks for a Slack link!): https://wikimedia.slack.com/archives/C8W3HEHLG/p1738164732203619 [17:21:14] tl;dr: on mediawiki.org, the search bar disappears when you click it [17:22:14] I'm going to add a userscript to make all the UI elements do that, it sounds very relaxing [17:22:22] https://phabricator.wikimedia.org/T385055 [17:22:29] (but thanks for the heads up) [17:22:54] question from the group is that is there anything we could have done to affect this change? [17:23:20] sukhe: have we checked what Vary headers MW is sending for load.php responses? [17:24:50] cdanis: I don't think so :) I am also having trouble reproducing it (and so are others) so I think that's not helping [17:26:39] hm [17:27:04] is there an example URL I can fetch to quickly check the presence/absence of the bug? [17:27:27] preferably without having to get as far as to be able to execute `mw.loader.require('vue').version` in js [17:29:38] mediawiki.org search bar but if you meant the vue version then perhaps https://www.mediawiki.org/w/load.php?lang=en&modules=startup&only=scripts&raw=1&skin=vector-2022? [17:37:27] * swfrench-wmf is trying figure out how this could possibly be related to PHP 8.1 [17:49:32] sukhe: https://phabricator.wikimedia.org/P72791 [17:50:24] the pattern is very very strange [17:50:36] I think it's likely that MW is serving inconsistent versions? [17:53:42] ouch [17:55:01] alright, so it's the smaller subset of cache-text hosts there that are serving the "correct" 3.5.x version? [18:03:47] cdanis: (back from lunch) yeah I don't see how it can be anything else if you look at the above but I am also curious why just vue and nothing else (or there are other things but we haven't found them) [18:04:32] anyway thanks to you and swfrench-wmf for jumping on the Slack thing [18:09:03] sukhe: sooooo.... https://phabricator.wikimedia.org/P72793 uhhhhhhh [18:09:52] cool cool ... cool cool cool [18:10:00] lol sigh [18:11:13] the difference in the cp hosts doesn't exactly explain it either :/ because in theory users shouldn't be switching between cp hosts often? [18:13:06] !log ๐Ÿ’™cdanis@mwmaint2002.codfw.wmnet ~ ๐Ÿ•โ˜• echo 'http://www.mediawiki.org/w/load.php?lang=en&modules=vue&skin=vector-2022&version=n99vv' | mwscript purgeList.php [18:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:56] !log and https://... too because idk if it matters tbh [18:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:00] https://phabricator.wikimedia.org/P72797 [18:15:01] cdanis: on the hitting the same host: I would say that not always though? [18:15:24] yeah thanks, nice debugging, that's the hash we want [18:15:47] please post a quick update on the Slack thread too (since you ran it) [18:16:18] thanks for doing that, cdanis! [18:16:29] I wish I knew *why* [18:16:37] RL cache entries aren't supposed to last in the cache for long at all [18:16:46] is this another revalidate-while-stale thing [18:17:03] also those &version URLs are supposed to serve as a content-addressable hash, IIRC [18:17:05] so like [18:17:08] what??? [18:17:34] > public, max-age=2592000, s-maxage=2592000, stale-while-revalidate=60 [18:17:45] literally the content of RL URLs isn't supposed to change [18:17:49] which is the confusing part [18:18:27] I know it is minified, but we were piping the output to sha256sum. did we ever notice what was differing? [18:18:36] I should have done that in retrospect [18:18:41] yeah, I am kicking myself for that now [18:18:51] honestly I want to write a tool for doing what I just did [18:19:04] I figure out how to cumin it from scratch about once every 3-6 months [18:19:34] and this is something that we should be able to give access to the web team to falsify the CDN being at fault, for instance (which is what I thought was I was doing in the first place) [18:21:22] make it a cookbook :D [18:22:04] sukhe: https://phabricator.wikimedia.org/P72797 still happening for enwiki [18:22:26] cdanis: old paste? [18:22:54] oops [18:22:56] https://phabricator.wikimedia.org/P72798 [18:23:11] same two hashes [18:23:38] cool let's check what's up [18:23:48] yeah, no one purge anything :D [18:29:17] This is the content of the 'bad' load.php hash, seen on enwiki on one (1) CDN host in the fleet: https://phabricator.wikimedia.org/P72802 [18:29:18] (warning: large) [18:29:43] I also generated a diff, but it's not exactly useful as-is: https://phabricator.wikimedia.org/P72800 [18:29:59] yeah it's a mess, diffing this [18:30:01] sorry, that's the abbreviated diff [18:30:19] full diff (even less legible) at https://phabricator.wikimedia.org/P72799 [18:31:24] why can't I reproduce this though, again [18:31:31] only cp3066 should have the right version (501b965e33a226dd818b6992701bbae17f34713866edbb4ec3ae7906b68e6dc9) [18:31:37] and yet I am not able to reproduce this for enwiki [18:31:38] at all [18:34:32] nope. [18:35:30] I am quite baffled [18:35:37] I think there's probably something we're missing on the Mediawiki side [18:38:32] apologies this is actually the presumed-bad Vue version https://phabricator.wikimedia.org/P72803 [18:39:06] cdanis: probably already known but Chrome formatted the response of /w/load.php?lang=en&modules=vue&skin=vector-2022&version=n99vv quite nicely for me (of course the variable names are minified but it's something) [18:39:12] aye [18:40:06] cdanis: I know I am not helping but the hash of this is the bad hash we had from the previous mediawiki version? [18:40:54] sukhe: it's... confusing [18:43:06] sukhe: yes https://phabricator.wikimedia.org/P72806 [18:43:07] BUT! [18:44:07] Roan tells me that this part of the structure is supposed to be the RL module version hash embedded in the response https://phabricator.wikimedia.org/P72807 [18:44:13] so now we're all even more confused [18:44:17] it's the same! [18:48:38] I mean on cp1102 (I am hitting that), I can't reproduce the bug [18:48:40] on enwiki [18:48:58] curl --connect-to 127.0.0.1:443 "https://en.wikipedia.org/w/load.php?lang=en&modules=vue&skin=vector-2022&version=n99vv" gives me the presumably incorrect version here, 3.4.27 [18:49:33] has anyone been able to reproduce this on enwiki anyway? [18:49:50] uhh [18:52:37] yeah so the bug does not occur there because of unrelated reasons (Codex upgrades) but that still doesn't solve the issue of why we are seeing 3.4 [18:53:09] at least it's not broken there [18:57:35] swfrench-wmf: do you know anyone who knows ResourceLoader internals [18:58:00] cdanis: unfortunately, I do not =/ [18:59:58] just to make sure I'm following correctly, we're fairly confident that (1) at least as of now, mw is returning the "correct" content, however (2) the older vue version was at some point cached under that same `version` (n99vv) hash param? [19:00:13] and we're trying to figure out how #2 came to be? [19:01:59] we are serving the older version of vue (3.4) as opposed to 3.5, but both share the same opcode for some reason [19:03:03] return ["vue@254op", { [19:03:11] and version 3.4 on enwiki [19:03:18] return ["vue@254op", { [19:03:26] and 3.5.13 on mediawiki.org [19:03:27] swfrench-wmf: yes, that's the current theory [19:03:37] both have the same query param [19:03:50] https://phabricator.wikimedia.org/P72808 [19:03:55] that's Not Supposed To Happen [19:06:28] if `version` param (and I guess more importantly) `etag` is not actually a function of the _content_ served that's ... fun [19:06:30] swfrench-wmf: doubt this is PHP8.1 related anyway so you can rest up :P [19:07:02] swfrench-wmf: yeah that is exactly what I am fretting about [19:07:36] sukhe: yeah, I sincerely doubt it at this point :) I'm oncall, though, so I'm here to help if you need more (admittedly less capable) hands, heh [19:08:06] swfrench-wmf: I am not capable as well but mostly learning as I go along (and also because I stirred the pot :)) [19:08:14] swfrench-wmf: so, one of the things is, mediawiki load.php computes a minified version and stashes it in memecached [19:10:31] ohwow [19:11:04] no, we already made mediawiki.org same; it was returning the same hash now sorry. [19:14:32] yeah [19:17:26] Roan has found something that seems like a smoking gun, and because of the (apparently) special construction of the Vue module for ResourceLoader [19:17:59] I quite literally laughed out loud when I read that :) [19:18:41] this seems pretty solidly "whoops, probably been like this for a while" [19:19:09] (not having the hash depend on the main library code) [19:20:09] yeah [19:24:18] I am a bit worried I killed Druid [19:47:09] I didn't, or at least, it came back after a while, and I was able to find some corroboration with looking at the change in request size over time [19:47:33] https://phabricator.wikimedia.org/T385055 is the UBN tracking this, which also has the two cherrypicks to prod attached to it, I need to run now, Roan said he would follow up with the deploys [19:50:05] thank you so much, c.danis! [19:54:32] indeed! thanks cdani.s! [21:41:00] hmm.. is netbox ok? [21:42:08] ok, I got a weird error at login but now it's gone