[08:30:42] dcausse, pfischer: could you have a look at https://wikitech.wikimedia.org/wiki/Incidents/2023-09-27_Kafka-jumbo_mirror-makers and make sure that the impact on WDQS is reflected appropriately? [08:31:51] btullis: I've pinged dcausse and pfischer (above) for a last look. I've already added minimal comments on the downstream impact of this outage and increased the number of responders. Even if David and Peter were not involved directly with Kafka, they have participated on the WDQS side. [08:32:26] thanks! looking [08:32:40] Great!. Many thanks. [09:11:12] errand + lunch [10:06:16] lunch [12:59:33] o/ [13:17:40] addshore having issues uploading to cloudflare using rclone...do you remember what perms you gave your token? I'm giving "object read & write" to mine and it seems to be read-only [13:20:04] *logs into the dashboard* [13:21:34] hmmm, i currently have no access tokens apparently, so can't check [13:21:48] its possible i used my Global API Key and just rotated it after [13:22:06] i beleive i also remember having some permission issue though [13:22:41] I've had a bit of weirdness before...my public buckets went into infinite redirects a couple of times [13:32:15] OK, I added a token with full admin capabilities and it works. I guess I'll open a ticket with CF...not sure what is going on there [13:50:24] https://community.cloudflare.com/t/r2-token-per-bucket/389067 looks like some of the perm stuff is new, maybe pre-existing buckets don't work with them [14:17:14] dcausse how's https://phabricator.wikimedia.org/T347515 going? Guessing we'll need to update the streaming updater unit file with the new pipeline option? If there's any other work to do around that LMK [14:19:18] inflatador: added a simple flag to disable emitting such problematic events in the meantime [14:21:27] dcausse nice. Does that mean we can start the experimental app again? [14:22:01] have two patches to review for that but hopefully soon :) [14:22:47] these are https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/962050 and https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/962626/ [14:23:10] will ping Erik and Peter, will make a new image very soon [14:23:24] np, if I can do anything to help LMK [14:23:57] sure thanks! [14:24:08] dr0ptp4kt the new JNL file is uploading, should be ready in ~10m or so [14:37:59] What should we call the wdqs graph split experiment hosts? Was thinking wdqs-split [14:39:12] dr0ptp4kt (or anyone else) JNL file from Sept 29th up at https://w.makeitrepeatable.com/wikidata.jnl.zst [14:39:35] ooooh, zst [14:39:54] inflatador: sounds good! [14:42:09] inflatador: we might just call them wdqsXXXX, like we do with the test servers. Their assignment is somewhat fluid and we might reassign them from one role to the other as time goes on. [14:42:28] OTOH, this might be confusing having the same name but somewhat different roles... [14:47:36] thanks inflatador! driving from weekly appointment momentarily, and once settled in will kick off download. i got some additional drives over the weekend [14:49:19] \o [14:51:29] o/ [14:52:02] perhaps just too early but these numbers don't seem all that sane...Asking kafka for message -10,000,000 in codfw.mediawiki.cirrussearch-request only goes back ~24 hours [15:02:03] pfischer, ebernhardson: triage meeting in https://meet.google.com/eki-rafx-cxi [15:54:47] inflatador axel is saying 'Server unsupported, starting from scratch with one connection.' the first attempt the connection dropped, so trying again. it's estimating 4h15m. any config that could allow concurrent conns? maybe it needs to get into their edge before they can do byte range offsets. [16:02:49] addshore, if you smoosh the bread into a zst sandwich is it still one sandwich? [16:24:24] * ebernhardson should not find thinking through a skewed join this difficult :P getting closer... [16:50:45] inflatador other thing: once this is downloaded will a pigz --keep wikidata.jnl.zst do the trick or will i need to invoke some other gunzip / tar magic (or even just a pigz command line switch) ? i realized i better ask now rather than later, as these things can take time. addshore i'll run it with time to see how it goes - the desktop rig is a six core, and assuming quasi-linear performance hopefully that'll give us a rough [16:50:45] estimate of things; i'll need to watch the ram a bit to see what it's looking like - usually these kinds of operations are disk bound more than memory bound, but i don't know what to expect and haven't grokked the entirety of pigz source...some other rainy day [16:52:31] * ebernhardson wishes idea were smarter. If i shift-shift-type a class name and press enter...that only works if i wait around for intellij to populate the dropdown [16:53:04] is there perhaps something obvious i'm missing that would cause it to jump to class regardless of whats in the dropdown? [16:53:37] sadly ideavim doesn't implement tag handling :P [16:54:54] * ebernhardson wonders what should be more annoying..that it goes to the wrong thing of that i can type `TimerCounterUnitTest` faster than idea can populate a single entry in the autocomplete [17:03:32] Ctrl-E is perhaps slithly better if you know what to open if it's empty pressing enter opens file name search which might perhaps be faster? [17:04:18] Ctrl-E does seem to populate faster, i'll try that. thanks! [17:05:30] Ctrl-n is class search (I think) [17:05:47] Ctlr-shift-n is file search [17:06:07] And it is smart about capitalisation [17:23:00] dr0ptp4kt `axel -n 16 $URL` works for me without complaining about unsupported [17:25:02] dr0ptp4kt also I don't think pigz supports zstd, but zstd supposedly decompresses in parallel a lot better. [17:28:39] thanks inflatador. i'll go try axel in a new terminal instance and see if it will play nicely, third time a charm and all that, and if it'll go faster that way than just letting the current thing finish (lemme know if i owe you a mineral water or something for bandwidth). so for pigz just use the standard invocation to defalte, is that right? or are you saying you compressed with zstd and i should use a different tool? [17:29:44] dr0ptp4kt I'm saying pigz probably won't work with zstd format, so you'll need to use the zstd CLI tool...I got it thru Macports on my M1 Mac but probably available thru homebrew as well [17:30:37] Don't worry about the bandwidth costs, it Cloudflare is pretty cheap, esp. since they have no egress charges [17:40:35] inflatador: here's a funny one. so the second connection attempt had also fallen over. so i tried a third, which landed on using just one connection. and i stopped that and tried a fourth, which also landed on using just one connection. so, in exasperation, i did axel -a -n 16 -v -v -v ... and it managed to make multiple parallel connections...it has dropped a couple of them 'unexpectedly' already, but it still seems to be going [17:40:35] some what strong, so i'm not going to question why this is - for once, adding more debugging makes something going faster (makes me wonder what funny conditional is possibly making this so, assuming it isn't a server gremlin) [17:42:09] Y, I've seen a bunch of "unexpected drops" with axel too...haven't looked closely, does it start a new connection when it detects a drop? [17:42:36] thx inflatador on the advice - i figured the filename extension was indicative of it in fact being zstd. the machine i'm doing this on is ubuntu 22 wsl on an i7 and it looks like i can get zstd easily (makes sense seeing as you could get it on macports) [17:46:17] does anyone know if we should still be using `team-discovery` as our icinga contactgroup? [17:46:24] i don't think it tries to reconnect b/c the other day with the plain .jnl it eventually got down to one connection but was chugging away fast enough that i didn't want to mess with it. it seems like the .st file is only scoped to one command line invocation - it'd be cool if it could pick back up where it started if the program terminates but one is using the same url with similar etags or whatever [17:49:34] `-a` flag for axel could be interesting ` This will show an alternate progress indicator. A bar displays the progress and status of the different threads`. Not at home but will check it out once I get back [17:53:00] re: contactgroups, looks like`admins` might be the right contact group [17:54:00] dinner [17:58:14] Yeah, I'm using -a. It's okay...I may have a bad ncurses setup or something, am curious to see what your experience is like inflatador...gotta run, 1:1 [18:01:22] er, 1:1 rescheduled [18:11:52] hmm, looks like we had an alert for cirrussearch p95 in codfw...checking [18:31:16] Alerting patch is ready for review if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/962660/ [19:07:14] * ebernhardson crosses fingers that the skew-join i just added actually works... [19:29:40] thanks for the review! Back from the doctor's office but I need to grab a late lunch. back in ~30 [19:35:26] seems to be running acceptable. As expected it now has much more work to do though. now we get 2 multi-tb join's :P [19:35:41] although not done yet... [19:54:47] OK back [20:07:18] finished, took 50 minutes and ~130 hours of compute time. Realized i forgot to turn on adaptive query execution though...lets try one more time. [21:09:42] took so long my okta key expired :P [21:10:27] hard to say if comparable though, probably depends on how busy the cluster is as well [21:11:53] * ebernhardson also notes the stage thats been running 54 minutes now is not the one i fixed, this has three distinct stages and i only fixed the second one :P [22:32:42] hmm, this article claims "Every business data has a unique way of identification". But i'm pretty sure the only unique id for an rdf triple is the full triple :P [22:33:06] * ebernhardson was trying to find reasonable ways to compare multi-gb datasets for equality. This one amounts to join on id and then compare the individual rows :P