[07:37:27] <gmodena>	 o/
[08:24:31] <dcausse>	 o/
[08:50:11] <gmodena>	 q: other than sessionization, are there other notable differences in the content of `query_clicks_hourly` and `query_clicks_daily`?  I am working on an heuristic to detect "easy queries", and so far I've used `query_clicks_daily` (AFAIK it's what the mjolnir dag uses)
[09:00:12] <dcausse>	 gmodena: using query_clicks_daily is fine imo, I think that query_clicks_daily only exclude actors making more than 1000 queries a day
[09:29:10] <gmodena>	 dcausse ack
[10:01:51] <gehel>	 dcausse: I'll be 2' late
[10:01:58] <dcausse>	 np!
[10:33:19] <dcausse>	 gmodena: adding you as reviewer on some wdqs patches, no rush on these but whenener you have some time we can have a quick chat to discuss the context
[10:33:42] <gmodena>	 dcausse ack
[10:34:20] <gmodena>	 dcausse happy to hop on a call after lunch if that works 
[10:34:33] <dcausse>	 gmodena: sure!
[10:34:56] <dcausse>	 sending a invite
[10:36:22] <dcausse>	 errand+lunch
[11:16:55] <gmodena>	 i had to rebuild some muscle memory, but I finally have a (query, page, wiki) dataset to experiment on with query heuristics
[11:17:09] * gmodena dusts off the data scientist hat
[11:17:19] <gmodena>	 errand+lunch
[14:24:21] <dcausse>	 added Gabriele to https://gerrit.wikimedia.org/r/admin/groups/c98683b4697e675458519cccf4d8ff879f9283f0,members
[14:25:53] <dcausse>	 and https://gitlab.wikimedia.org/groups/repos/search-platform/-/group_members
[14:43:25] <gmodena>	 dcausse thank you!
[15:44:26] <gehel>	 I've put together a list of goals for Q3: https://docs.google.com/document/d/1DT2iH0-dh2PDS9aWFwq5gq8oqe7Vb6-WFgcyE66ktFU/edit . Fairly high level as always.
[15:44:47] <gehel>	 dcausse, gmodena: your review is welcomed. In particular the last section about improvement to MLR.
[15:45:33] <gehel>	 Trey314159: I have copied over the section on language stuff, since we're still working on Japanese. It probably make sense to review it based on what we've already accomplished. Could you have a look?
[15:46:29] <Trey314159>	 gehel: will do
[15:52:53] <gmodena>	 gehel ack - will review
[16:08:37] <gmodena>	 do  you have a preferred library for fast string matching in python/spark? So far I've (ab)used pyspark.ml.feature. It ain't pretty.
[16:26:43] <dcausse>	 gmodena: nothing specific comes to mind, (we tend to use lucene ananlysis components when we need to normalize/stem some texts)
[16:28:07] <gmodena>	 dcausse ack, thanks. I'll go with whatever seems reasonable then.