[07:37:27] o/ [08:24:31] o/ [08:50:11] q: other than sessionization, are there other notable differences in the content of `query_clicks_hourly` and `query_clicks_daily`? I am working on an heuristic to detect "easy queries", and so far I've used `query_clicks_daily` (AFAIK it's what the mjolnir dag uses) [09:00:12] gmodena: using query_clicks_daily is fine imo, I think that query_clicks_daily only exclude actors making more than 1000 queries a day [09:29:10] dcausse ack [10:01:51] dcausse: I'll be 2' late [10:01:58] np! [10:33:19] gmodena: adding you as reviewer on some wdqs patches, no rush on these but whenener you have some time we can have a quick chat to discuss the context [10:33:42] dcausse ack [10:34:20] dcausse happy to hop on a call after lunch if that works [10:34:33] gmodena: sure! [10:34:56] sending a invite [10:36:22] errand+lunch [11:16:55] i had to rebuild some muscle memory, but I finally have a (query, page, wiki) dataset to experiment on with query heuristics [11:17:09] * gmodena dusts off the data scientist hat [11:17:19] errand+lunch [14:24:21] added Gabriele to https://gerrit.wikimedia.org/r/admin/groups/c98683b4697e675458519cccf4d8ff879f9283f0,members [14:25:53] and https://gitlab.wikimedia.org/groups/repos/search-platform/-/group_members [14:43:25] dcausse thank you! [15:44:26] I've put together a list of goals for Q3: https://docs.google.com/document/d/1DT2iH0-dh2PDS9aWFwq5gq8oqe7Vb6-WFgcyE66ktFU/edit . Fairly high level as always. [15:44:47] dcausse, gmodena: your review is welcomed. In particular the last section about improvement to MLR. [15:45:33] Trey314159: I have copied over the section on language stuff, since we're still working on Japanese. It probably make sense to review it based on what we've already accomplished. Could you have a look? [15:46:29] gehel: will do [15:52:53] gehel ack - will review [16:08:37] do you have a preferred library for fast string matching in python/spark? So far I've (ab)used pyspark.ml.feature. It ain't pretty. [16:26:43] gmodena: nothing specific comes to mind, (we tend to use lucene ananlysis components when we need to normalize/stem some texts) [16:28:07] dcausse ack, thanks. I'll go with whatever seems reasonable then.