[09:11:29] dcausse: I was about to summarize our weekly report, but I am not permitted to create the following page: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-10-18 - are you permitted? [09:12:10] pfischer: this seems strange, looking [09:13:03] pfischer: do you have an error message? or perhaps you're not logged-in? [09:13:41] I think wikitech does not allow anon edits [09:15:04] I am logged in and I don’t see an error [09:15:57] Ah, maybe it has to do with account migrations… looking [09:45:00] errand+lunch [12:09:51] Reminder: engagement & inclusion survey is out. The deadline is Nov 1, just after I'm back from vacation. Please take time to answer it! [13:21:42] o/ [14:00:51] \o [14:16:21] o/ [14:18:58] i wonder if something about categories in wdqs broke? In the ticket about deepcat user mentioned now deepcategory:Science is simply returning incategory:Science. That's usually what the result of the rdf query looks like when the db is empty [14:19:43] re: T369808 [14:19:44] T369808: The Commons search "deepcategory" operator often does not work (Deep category query returned too many categories) - https://phabricator.wikimedia.org/T369808 [14:19:49] perhaps? we don't have much monitoring on this endpoint [14:21:11] wdqs2020 is having few triples than other nodes (https://grafana-rw.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs&var-graph_type=%289103%7C9194%29) [14:22:12] but maybe cirrus targets the internal endpoint [14:22:57] for which all nodes are varying triples count :/ [14:23:20] looking at the public side tof the api, getting reasonable responses to the documented example queries [14:23:31] hmm, i'm not sure which cirrus queries. checking [14:23:55] well +/- 400k triples [14:24:49] I did start and stop categories a couple of times on wdqs2020 a couple of weeks ago when I was looking at the migration stuff, but nothing too disruptive AFAIK [14:25:10] heh, we set wgCirrusSearchCategoryEndpoint twice in the same config array :S should fix. but i think the second wins, which means http://localhost:6009/bigdata/namespace/categories/sparql and 6009 is wdqs-internal [14:25:24] :/ [14:26:33] readiong the code there should be a warning [14:27:52] hmm, well i suppose this isn't super important, was just something i thought of reading the morning emails :) [14:28:39] but i'll look into it a little today [14:30:43] I think it's because media search does not display the warning [14:30:50] https://commons.wikimedia.org/w/index.php?go=Go&search=deepcategory%3AScience&title=Special:Search vs https://commons.wikimedia.org/w/index.php?go=Go&search=deepcategory%3AScience&title=Special:MediaSearch&type=image [14:31:13] oh, that makes sense. We should get that added [14:32:20] also the timeout at 3sec is perhaps a bit restrictive [14:33:07] i just adjusted the example query from docs to query Science, times out on the public api too :P But yea maybe 3s could be higher [14:33:40] ouch even with a limit ? [14:34:37] oh, i guess the example subcategories query is different than the one in the keyword, it has maxIterations 8 instead of a depth limit [14:34:47] but i'm assuming that results in the same thing [14:36:25] i wonder how we could "know" when to limit the depth. In a repl test i can get results with depth=3 on Science, but how would we know when to limit [14:37:22] or maybe with a 3s timeout we could simply fallback from 5 to 3 and hope it's better [14:37:39] but that might be mysterious if query behaviour depends on backend load [14:37:55] i dunno, anyways i gotta do a school run, back in 20 [15:00:23] back [15:02:34] switching the depth seems confusion... I realize that in its current form it's also confusing because you may return a number of categories < thrshold and not display any warnings but just because you did not scan the full graph... [15:02:43] s/confusion/confusing/ [15:06:25] also that the current error cases don't line up. Too many categories -> filter is empty. Way too many categories (times out) -> filter is the source category [15:07:11] maybe we should let users provide a depth and the "smart client" can choose a better depth [15:07:21] basically punt :P [15:13:02] true, yes peyhaps leaving the choice of maxdepth to the user could be interesting indeed [17:10:35] heading out, have a nice week-end