[11:13:01] lunch [13:12:31] \o [15:14:06] workout, back in ~30 [15:34:30] hmm, cindy doesn't like something about highlighting now...looking into ti [15:42:08] hmm, so the problem is it's expecting an exact highlighted string, the response now includes the prefix `Template:Citation needed File:Damavand3.jpg`. Not exactly what it was testing, but does seem like a regression. Having that prefix is probably not useful [15:45:47] oh, actually it is what it was testing. That string is a thumbnail caption and was supposed to end up in auxiliary_text, but it's not there and is found in the text field [15:49:40] back [15:59:12] my okta password has expired :S How often do i need to change a 40 character random string [16:00:48] ^ same haha [16:01:01] saw the recent email about it but wasn't prepared for it to actually flip over :P [16:02:15] mpham, ryankemper: retrospective time: https://meet.google.com/eki-rafx-cxi [19:09:34] hmm, quite odd...the updated glent dags are failing to sparks new stricter checks. It complains `Cannot safely cast 'q1_hitstotal': string to int`. I wonder if somehow we've been treating these as strings and they've been auto-magically casted to ints when writing [19:10:09] sadly spark makes tracking and understanding the types of columns a bit difficult [19:12:36] oh wow, a little simpler...quite literally in the CirrusLogReader class we have `col("main_request.hitsTotal").cast("string")` [19:12:42] i...wonder why [19:15:12] of course the string cast comes from a mostly undocumented patch called 'initial version' :) [19:28:08] * ebernhardson is trying to decide how to validate this change...the other tables were easier because i just take some stats on the current data and re-run it in spark3 and expect that stats to be ~the same [20:27:50] spark has such detailed and useful error messages: : Failed to execute user defined function(functions$$$Lambda$1490/1143053352: (int) => string) [20:28:22] turns out code generation at execution time makes some things more difficult :P [20:34:56] ;P [21:04:11] sigh...got this working but it's going to require proper evaluation. m0prep went from 285M queries in the set to 72M. I'm not entirely sure how this even worked before with the hitsTotal read in as strings though...we were doing math on these strings and i just double checked, spark2 correctly complains that it can't do '2' - '1' [21:09:36] curiously all the other stats besides the total row count are about the same. the min/max values all match exactly [21:13:33] mean/stddev are different but quite similar, whatever change it made was fairly similar across the board i guess [21:19:56] doh, never mind i'm an idiot :P The first time i took the stats over all m0prep partitions and not just the most recent one. This works fine :P [21:59:53] ryankemper Here's the first patch for the elastic client version...no client specific stuff yet, the current ban.py has issues with variables that I think I fixed in this one https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/902502 [22:00:51] had to add a lot of "self.thing" where it was only "thing" before l)