[08:49:45] gehel: do we run our ES instances on JRE 11 perhaps? [08:51:11] zpapierski: we're still on Java 8 [08:51:36] thx - it seems we aren't safe from log4shell, apparently [08:51:51] :/ [08:53:10] RCE isn't possible, but that wasn't the only vulnerability that came in this bunch [08:53:53] Phillip Krenn, useful as always - https://xeraa.net/blog/2021_mitigate-log4j2-log4shell-elasticsearch/#what-does-that-mean-for-elasticsearch [08:54:46] lot to digest, but if I read this correctly, in our case property should be sufficient [08:54:56] It wouldn't be super hard to monkey patch the elastic jars on the servers and replace with log4j 2.17.x [08:57:00] apparently, the only vulnerability left is one easily fixed with a system property, so it looks that monkey patching won't be necessary [08:57:44] but I fear something else here - it's super clear (as it was before, but one can hope) that Elastic has zero intention of maintaining line 7.10.* safe from CVE [08:58:44] which means that there might, as it often is the case, be a CVE that won't be as easily fixed and the only option will be to move forward [08:59:00] Yeah, we'll need to move to OpenSearch at some point [08:59:04] and I can't imagine that es -> opensearch migration can be done promptly [09:04:46] all our ES servers have the mitigation deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/745901 [09:05:48] perhaps some servers need a restart tho [09:06:55] checked a random one and it has the option at runtime [09:08:04] I think we're good for now, but I agree that staying on es 7.10.2 for long is not a great option [09:15:42] ok, I'm closing the ticket now [10:44:07] Lunch [11:05:21] lunch [11:13:17] lunch [11:13:19] Trey314159: I can't tell you how much sugar did Maciej consume, but stats say that on average, we Poles consume about 51kg sugar per year. Obviously, this takes into account all kinds of sugar [12:57:29] mpham: I summarized the latest access logs for both beta 1 and beta 2 - https://drive.google.com/drive/u/0/folders/1ojrcehL7Bz0Cc4wKgdtD8CccruNK8lyh [12:57:40] I'll do so again on my last day [13:17:59] lunch [14:28:58] greetings [14:36:39] dcausse how are you checking the ES runtime options? I know moritzm and ryankemper were talking about ES restarts yesterday. I have ~2 hrs before my training, can help w/restarts now if you like [14:40:52] inflatador: I just checked with "ps" [14:41:07] all jvm options are on the command line [14:41:15] and o [14:41:19] / [14:41:19] dcausse ah OK, that makes it easy ;) [14:53:36] o/ [15:00:50] * ebernhardson is dubious about the other half of "log4shell" which is an information leak from DNS. Like, sure thats a vulnerability somewhere. But what is the actual threat vector of someone convinving our dns servers to lookup their domain name. but anyways, not starting work yet :P [15:17:28] ebernhardson: I closed the ticket - the mitigation for that part was already introduced in December [15:17:33] in fact you reviewed it :D [15:38:25] zpapierski: indeed, i'm not opposed to closing the hole. I suppose I'm opposed to something more theoretical, it's that we would have never heard about the information leakage and likely would have only closed it when updating to a new version with new upstream defaults, if it wasn't found close-in-time to the RCE. Basically it feels like a bunch of people took a real problem (RCE) and [15:38:28] then a bunch of not-really problems, and conflated them together with a single name [15:39:10] but not actually important, just things that annoy me on the internet that don't really matter :P [15:49:08] isn't there a common thread when it comes to cause of those vulnerabilities (asking, I don't really know the DNS ones) [15:49:11] ? [15:49:52] zpapierski: afaik they all came from different researchers, the RCE came first and then over the next week other researchers announced additional findings about log4j [15:50:20] ah, I see - I guess the thread is an additional diligence spawned by RCE? [15:53:37] zpapierski: yea, i think the overall process is probably fine. An RCE was announced so more researchers looked to see if there were more problems, they found some information leaks to address but no more RCE's. Overall the process probably went well enough [15:54:26] i guess they are tied together, in that one was big enough it convinced people to spend some time looking for more, but i feel like the timeline is the only thing holding those pieces together [15:54:43] * ebernhardson apparently likes to complain about things that don't matter, ignore me :P [15:54:50] I think there are really no victims here [15:55:20] but I get your point [15:55:29] and somewhat agree [15:56:17] anyway, getting to more current things - I failed to reproduce the issue with T301650 [15:56:18] T301650: WCQS "Application Connection Error" E009 - https://phabricator.wikimedia.org/T301650 [15:56:41] asked for more clarification, in the meantime I wonder is it related with our Kask usage [15:56:47] working out, back in ~30 [15:56:49] zpapierski: hmm, you sohuld be able to force it to happen by issuing parallel requets [15:57:21] zpapierski: i've been able to cause that in this system, but not reliably. It's sent me back to .css pages instead of the domain root before [15:57:23] why? if you're already authorized with the proxy it shouldn't make more requests to WM, right? [15:57:59] I feel I'm confusing something here [15:58:11] zpapierski: while the redirect bounce is happening from oauth->mw->oauth make more requests, perhaps by having an old browser tab that you tab into that tries to grab a .css file [15:58:34] ah, but that happens during initial auth, right? [15:58:46] zpapierski: during any time the token has to be refreshed, the token is only valid fora couple hours [15:59:00] ah, and that [15:59:12] i suppose i've only seen it using tabs that were previously auth'd [15:59:20] but that hardly seems like a "frequent" issue [15:59:38] so, should be reproducible if session is left for a few hours [15:59:52] it depends on your use case. If your use case is to have browser tabs from days ago that you tab into and start doing something, this whole thing will fail regularly :)( [15:59:57] err, no :) just :( :P [16:00:16] ah, that's not good [16:00:41] but that's helpful, I might be able to create a test based on that [16:01:07] for hacky solutions, we can whitelist the valid paths for session redirects. Only allow root and the sparql query url perhaps. But I don't have a real solution yet without integrating oauth with the UI [16:02:14] also i guess i didn't say, but to answer my question yesterday about WCQS being an SPA with one url, only kinda. There are also the query urls so it does need to remember redirects [16:02:15] can't we do an additional token verification? [16:02:46] zpapierski: sure, but when? The SPA is already open. The only request will be AJAX [16:03:04] well, i guess the redirect bounce might work if it's already authed. hmm [16:03:14] that's what I meant [16:03:27] * ebernhardson wonders if only old people call it AJAX [16:03:31] anyways :) [16:03:43] it isn't suppose to be called AJAX? [16:04:12] ah, X here is XML [16:04:20] so maybe not necessarily :) [16:04:56] yea i was trying to look for a good reference, but poking around people still use the term ajax, still newing up an XMLHttpRequest object [16:05:00] also, what if we shorten the session length to be less than token validity? [16:05:38] zpapierski: hmm, can we? I thought those two had to be the same value [16:06:07] nope - public static final int SESSION_MAX_AGE = Integer.MAX_VALUE; [16:06:23] then there's public static final Duration AUTH_TOKEN_MAX_AGE = Duration.ofHours(2) [16:06:27] curiously not used [16:06:38] * ebernhardson has to open idea...sec :P [16:08:36] zpapierski: huh, indeed the default instead comes from config (also 2 hours). For how long we store data in the session, i guess that could be limited to a minute or two but not sure if it would help [16:12:14] if the issue is caused by using a stale token, that should help (no need to go as low as a minute) [16:12:16] if [16:14:45] zpapierski: hmm, i suppose i've been thinking more that the token is too-new rather than too-old, i suppose both are possible. I've been thinking multiple requests were made to the backend and only the last one gets to be the redirect [16:15:04] but i dunno....something doesn't line up [16:18:03] both can be correct and appear similarly [16:18:28] but authToken is definitely validity is shorter than that of the wcqsSession [16:20:26] zpapierski: i dunno if it would work, but in theory this might be easier to reproduce using the docker-compose setup i put in the mw-oauth directory, turn the timeouts down to a minute [16:21:21] i suppose what it doesn't have is anything that makes additional requests, it's a plain hello-world html page after auth success [16:24:27] need to think more, but my brain is not cooperating at this hour [16:24:55] I can't confirm it yet, but I don't think that the flow that caused the issue would trigger if wcqsSession would expire before token does [16:25:01] (it should anyway) [16:26:19] anyway, clocking out for today, will pick it up tomorrow [16:26:25] have fun :) [16:47:53] Hi search team, I'm making a new vm which will be 1 of 3 elasticsearch / opensearch nodes for the data catalog. I see elasticsearch nodes are running debian 9 stretch, do you have any advice on choosing between debian 9, 10, and 11 ? [16:51:00] razzi: i'm not aware of anything in particular. I would lean towards debian 11 and the latest openjdk that they package [16:51:29] sounds good ebernhardson , I'll try the latest and greatest and see how that goes [16:51:37] i forget which exact version we are moving to, but we are moving away from the current debian stretch in a fe wmonths [18:35:34] sigh... giving up on making a clean tested build of the hebrew plugin, the tests are very minimal so not sure we miss much... [18:37:43] ejoseph: uploaded a zip to https://people.wikimedia.org/~dcausse/analysis-hebrew-7.10.2.zip [18:45:51] dinner [19:23:01] ryankemper, inflatador : I suspect we cancel our pairing session since you are in the ES training ? [19:23:16] gehel: yes [19:23:40] gehel agreed [19:35:30] * ebernhardson ponders alert names: CirrusSearchIndexSanityDecreasing, CirrusSearchStrugglingWithSanity ... but probably something less fun :P