[13:22:34] greetings [13:30:12] o/ [13:32:07] hmm, apparently we have a tool called 'phaste' that lets you paste to phabricator from any server: https://github.com/wikimedia/puppet/blob/5e5a3101b621286f5ee249b430e433cfe95da701/modules/base/manifests/phaste.pp [14:08:10] oh did not know that we could create a paste from a server [14:11:09] Yeah, that's a new one on me [14:12:05] I found this too, user complaining that BG performance is worse on Java 17. Unscientific and vague, but we'll eventually need to upgrade so worth bookmarking I guess https://github.com/blazegraph/database/issues/222 [14:16:09] dcausse are we logging the kind of error you found in https://phabricator.wikimedia.org/T242453 ? I've been looking in /var/log/wcqs , haven't seen anything similar so far [14:17:12] inflatador: this is not really an error that can be logged sadly, only something that jstack is able to detect when taking a snapshot of the thread statcks [14:17:34] s/statcks/stacks [14:17:58] ah OK, too resource intensive to continually capture? [14:18:35] in a busy loop that'd be bad but I think it's fine to capture every minute or so [14:18:55] we don't need to be extremely reactive here anyways [14:19:13] OK, just thinking about how we could monitor that deadlock condition, like we were talking about yesterday [14:20:01] yes I think it's totally to ship a small script that does jstack + some grep scheduled every minute [14:20:16] s/totally/totally fine/ [14:21:51] cool, I will pop a ticket for that, probably won't get to it today but will keep it in mind [14:23:12] sure! [14:28:45] quick break, back in ~15 [14:42:25] and back [15:22:23] \o [15:24:07] I'm rebooting wdqs hosts via cookbook for remediation, see https://phabricator.wikimedia.org/T304938 for details [15:24:18] o/ [15:24:49] inflatador: sure, thanks [15:25:57] np, it's one at a time so no impact is expected [15:53:15] meh, checking out a patch with 1200 changed files over nfs is slow :P [15:53:34] * ebernhardson dreams of a day without crappy ways to share volumes into vm's/containers/etc. [15:56:22] :) [15:58:45] i keep thinking should go through the browsertest fixtures and prune them, i bet we could have 50 instead of 403...but then someone has to actually ponder those 350 cases and decide what to do [15:59:58] i guess they are just searchText fixtures, they are volumous though because i bootstraped that by copying all the queries the browsertests use [16:01:12] Quick workout. wdqs1010.eqiad.wmnet is in the middle of rebooting, but I'm not doing any production hosts until I get back [16:13:24] not sure there's an easy set of criteria to follow to prune these tests tho... I guess we'd have to categorize them and try to remove duplicates [16:13:47] also I'm sure a bunch of the feature we want to test are now tested more directly under some unit tests [16:14:36] yea, i'm not sure how to prune the list either. There isn't an obvious way to look at the queries and decide [16:15:34] still some failures coming out of cindy, but surprisingly not a single deprecation warning now [16:17:25] nice, we're getting close [16:51:58] sorry, been back [17:01:38] resuming reboots with wdqs-internal [17:21:41] ryankemper: I think we have an interview tomorrow together. Do you mind adding/reviewing our interview questions by end of day? [17:23:00] mpham: absolutely, thanks for the heads up [17:28:15] dinner [17:31:49] lunch, back in ~30-45 [18:01:14] meh, turns out when a bulk from the OtherIndex impl fails it's not actually recorded as a failure, the logs still claim success :( [18:01:25] * ebernhardson wonders if the regular bulks have same problem... [18:19:30] back [18:20:37] probably a question for our upcoming SRE pairing, but does anyone have a good idea what to use as test Elasticsearch data? Was thinking of something from https://wikimedia.bringyour.com/ [18:20:48] (wikimedia dumps mirror) [18:20:57] inflatador: something like curl -s https://dumps.wikimedia.your.org/other/cirrussearch/20220314/enwiki-20220314-cirrussearch-content.json.gz | zcat | head -n 20000 | split -l 40 --filter 'curl -s http://localhost:9200/my_wiki_content/_bulk -H "Content-Type: application/json" --data-binary @- >/dev/null' [18:21:20] inflatador: that imports the first 10k docs (each doc is two lines) from a public dump [18:21:26] inflatador: ebernhardson: no sre pairing for me today [18:21:38] Perfect ebernhardson , you are the man as usual! [18:21:40] inflatador: I’ll be able to pair around 1:15 my time tho [18:22:11] inflatador: i suppose one part that leaves out, you still have to use cirrus scripts to create the index [18:22:54] ebernhardson np, what/where are the cirrus scripts? [18:24:12] inflatador: i suppose if its a test instance and it's only this thing, run extensions/CirrusSearch/tests/jenkins/cleanSetup.php . How exactly that gets run depends on the installation :) [18:24:24] np, it's a test instance and will play around [18:24:41] ebernhardson ryankemper gonna go ahead and cancel that pairing session then, but will reach out later if that's cool [18:24:48] ok [18:25:26] thanks, still churning thru the remediation stuff. We'll have to do the elastic hosts too in the near future [18:27:32] i suppose if you are only testing rolling restarts/upgrade, in theory can `curl -XPUT /my_wiki_content` and it will create the index and allow dynamic names. [18:27:48] cirrus would only be needed for the proper settings and mappings to allow scripted updates/queries/etc. [19:21:58] lunch [19:27:18] the timestamp and size columns are smooshed together! https://dumps.wikimedia.your.org/other/cirrussearch/current/ [19:49:26] yea the interface there is pretty bad :P Thats a mirror (thats faster than our own dumps), but you can use https://dumps.wikimedia.org/other/cirrussearch/current/ which is a little easier to read [19:49:32] also, back [20:02:46] no worries, there must be a space somewhere in there. When I copy/paste it looks OK [20:05:33] seems like gitlabs auth process could be significantly improved, it often bothers me to provide my two-factor if it's been a few days. It seems like as long as i'm viewing pages that are the same as what every logged out person sees it shouldn't be necessary [20:05:52] * ebernhardson is perhaps annoyed because my phone is rarely in the same room :P [20:17:10] I wonder if they'll end up putting it behind Okta, I think that's how they did it at my old job [20:18:33] it would have to be some dual-auth setup, I don't think we're going to expect volunteers to use okta (but maybe?) [20:18:52] not sure if gitlab supports multiple auth sources or not [20:19:08] Oh good call, I'm not sure either [20:22:43] ryankemper I'm up at https://meet.google.com/reh-jcmr-fie , ebernhardson feel free to join if you want. I think we're going to do the wdqs reboots and importing data into our sandbox based on your one-liner above [20:31:01] inflatador: heading back from lunch in a sec so I’ll be around in 10 mins [22:41:49] gitlab here will never be linked to Okta. They are completely different user directories (Developer accounts vs WMF staff accounts) [22:42:24] and yes the current gitlab 2fa timeout is annoying