[07:00:18] o/ [07:04:18] o/ [09:53:50] lunch [13:04:25] o/ [13:29:18] \o [13:30:02] o/ [14:02:28] dcausse stuck in mtg, will probably be at least 5-10m [14:06:08] inflatador_: np [14:12:34] dcausse actually, let's go ahead and cancel if that's OK...gotta get some stuff done before Weds mtg [14:12:43] sure [14:17:23] ACK, thanks [14:34:52] I won’t be able to join the Wednesday meeting today. [16:12:59] re: cluster quorum...another possibly-silly thing we could do is update to the latest 1.x opensearch [16:13:55] anyway, workout..back in ~40 [17:06:57] back [17:34:07] dinner [18:01:20] lunch, back in ~45 [18:41:06] meh, writing dumps from spark has a few more annoyances that i first thought of :P All solvable...but annoying [18:42:18] like, we have to partition the output by wiki, can't mix them together in a file. But we can't have a single file for big wikis, because that would all go through a single executor [18:42:57] so have to first loop through the data to determine partitioning, then a second pass to do the work. But we stored the data as avro, so the first-pass still has to read the entire text content [18:43:23] could use the index without content for the partitioning count...but then that feels odd to count one table and use it to partition another [18:50:44] back [19:09:45] hmm, maybe this would be easier if it entirely skipped pyspark...what we really needs is a simple 1-to-1 mapping from the source .parquet files to a .txt file. Then we can reuse the partitioning that was done at ingestion time [19:10:07] except we still need the distributed processing :P will have to ponder more [19:36:37] have to head out early today, but maybe found a solution to above. might actually be easy and i was making it overcmplicated. [20:38:05] break, back in ~20 [21:02:09] inflatador_: finishing up lunch, 7 mins [21:07:33] ACK, just joined [21:50:15] ebernhardson: I'm going down a phabricator related ticket infinite recursion loop and came across https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1154300 / https://phabricator.wikimedia.org/T391383#10891089. Is that patch still something we want to ship out?