[08:23:30] I'm getting puppet alerts about rel2.search.eqiad.wmflabs - what is this host? [08:25:37] no clue, might be a mwvagrant host to test some search profiles [08:55:48] Build timed out (after 45 minutes) sigh... [08:58:32] and it did not even finish to download artifacts [08:59:59] looks like downloading from wmf.mirrored is extremely slow [09:08:54] filed T287445 and going to release locally [09:08:55] T287445: wikidata-query-rdf-maven-release-docker build is too slow and always times out - https://phabricator.wikimedia.org/T287445 [09:45:26] hmm, that's new [09:45:43] errand [09:55:42] ryan had a similar issue for the previous release [10:09:55] Antoine is looking into it [10:12:40] folks would anyone be able to advise on "relforge1004" ? [10:12:58] It's in row B in eqiad - where we are doing maintenance at 15:00 UTC today [10:13:22] Expect a sub-second interruption to network (T286061), but the row should be considered "at risk" [10:13:22] T286061: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 [10:14:16] Not sure if it's something we'd want to take any pre-emptive action about? [10:16:26] topranks: no, relforge1004 is only used for offline analytics, thanks for the heads up! [10:20:40] dcausse: no probs thanks for the info :) [10:35:55] lunch [12:15:25] meal 3 break [13:03:50] For Query Completion, has there been a decision for how it interacts with the title match completions that are currently in effect? T250436 lists this UI design decision as a remaining question, but I don't recall seeing any mockups or answers to that question [13:03:51] T250436: [Epic] Query Completion - https://phabricator.wikimedia.org/T250436 [13:07:31] mpham: I haven't seen any mockup nor remember any discussions about that [13:10:05] note that Special:MediaSearch has already changed the classic title completion with wbsearchentities (wikidata item completion which is language dependend) [13:42:24] dcausse: so in the mediasearch case, the classic title completion was replaced entirely? [13:43:38] mpham: yes I think so, classic title search is still active top-right (we won't touch) and Special:Search (which we originally wanted to enable/test query completion) [13:45:30] so for mediasearch the question to figure out is do we want to completely replace wikidata suggestions or complement them with query completion [13:46:44] was there a designer working on mediasearch before that I can talk to? [13:47:19] yes but I think she left IIRC [13:47:41] no, it's Matthew [13:47:51] ok thanks [13:47:55] he's the UX designer for MediaSearch [13:48:05] but Matthias was the person who created the wbsearchentities based autcomplete [13:48:26] I was definitely assuming we would be *complementing* not *replacing* [13:48:33] and would prefer that approach if possible [13:49:07] i think that makes sense too, but I'm not sure what that looks like (literally) and want to make sure the design isn't confusing [13:50:00] Matthew hasn't really worked on the autocomplete part of it so might not be able to help much, but if you go to MediaSearch and type in any search term you can see the search suggestions [13:51:20] I remember Pam did some design but don't know if it's related to this particular completion widget [13:51:36] Pam did the design for SDoC stuff, but MediaSearch was all Matthew [13:51:45] but there wasn't really any design for the search widget [13:51:56] ok [14:23:24] break [14:25:00] dcausse: did you came across something like this, when creating pure unit tests (I'm trying to use CirrusTestCase)? [14:25:04] https://www.irccloud.com/pastebin/nOI5Onip/ [14:40:29] hmm, works for QueryCompletionSuggesterTest, not for TitleCompletionSuggesterTest, I'll stick that one with integration for now [14:50:13] zpapierski: yes when you depend on the Title class it's quite hard to have real unit test [14:50:38] I see, I'll stick with that integration test for Title completion then, thx [14:50:45] when you see MediawikiServices involved it generally means you need to mock more stuff or inject more dependencies [15:00:57] \o [15:02:42] o/ [15:03:17] o/ [15:05:42] Trey314159: I'd like to reindex 823 wikis, it might take a while so wanted to know if now is a good time [15:06:35] dcausse: 823! that's almost all of them! Go ahead! [15:06:53] thanks! [15:06:55] how many wikis do we have? [15:06:56] Do you have a list somewhere? [15:07:03] yes [15:07:16] zpapierski: I thought it was ~900 [15:07:33] it's mwmain2002.codfw.wmnet:~dcausse/wikis_to_reindex.lst [15:07:56] cool, thanks! [15:08:12] checking cloudelastic to see if it lacks something but eqiad&codfw seemed to agree on the one that needs to be reindexed for the ores_articletopics -> weighted_tags rename [15:08:53] we have 976 (expanddblist all) but some of them are private [15:08:57] zpapierski: my list from last week says 976 [15:09:08] what David said, LOL [15:09:24] I see [15:09:27] thx [15:12:17] dcausse: Looks like you are going to cover all the big ones from T284691. When you are done I'll pick up the leftovers. [15:12:17] T284691: Reindex Basque, Catalan, Danish Wikis - https://phabricator.wikimedia.org/T284691 [15:12:28] do you have a phab ticket? [15:13:15] Trey314159: no, I think we just mentionned this in the big tracking ticket but nothing specific [15:13:28] okay. no worries! [17:03:34] i must be losing my mind...joining my first round of NN preds against thumbs yesterday i had 4M matches, and was missing another ~9M. So i re-downloaded those thumbs and they are just finishing predictions now. But now looking at my previous dataset i'm only missing 500k images... [17:03:40] i need better history of what i did in jupyter :P [17:18:34] notebooks are a minefield for me, I rerun stuff many times to make sure I don't use a stale var... and makes me wonder if I should not just have a single cell... [17:24:44] yea, the whole notebook way of working is so different from typical software, but i can't figure out a better way :S I feel like the whole field of working with data is still in its infancy comparatively [17:45:13] dinner [19:41:46] meh, no great way to determine mediasearch query from cirrus logs, and the mediasearch schema doesn't record the query it sends (only the one typed) or the search_id to allow joining :S Have an insane plan to guess at what the user typed from cirrus logs and then join on the guess + # hits returned + timestamp windows...but not feeling promissing :P [19:49:29] we could strip the keywords attached (filetype: I guess?) but then we won't get any clicks anyways so I'm afraid that we have to adapt searchSatifaction js code to mediasearch [19:53:30] yea my guessing is mostly going to amount to stripping keywords from the beginning :) [19:55:09] it's mostly filetype:, but also haswbstatement: is common. A few others can be added through ui elements [19:57:37] maybe we can stick to the most common (the one set of filters you get by default when entering MediaSearch) [19:58:36] as to search satisfaction, might not be necessary. I haven't looked to deeply but they seem to be logging info about elements i interact with, like result_click [19:59:50] probably the longer term solution is to inject search_id in their schema probably, but for now yea maybe just strip the expected filter for image tab (filetype:bitmap|drawing), call it close enough [20:00:25] mostly i want the search_id because we need query + result lits [23:11:37] ooh thats fun, when using df.toLocalIterator() a python UDF throwing exceptions doesn't bubble up, instead the iterator ends without emitting anything and pretends it's ok :)