[00:29:28] If only Trey had said "provide the answer to any question, including making up an answer if the AI feels like it" he'd be nostradamus [05:21:31] If I were Nostradamus I'd be picking stocks, not prognostication about AI! LOL [08:18:31] pfischer: o/ would you have couple minutes to discuss redirects and the cirrusbuilddoc api? [09:51:29] lunch [12:33:53] o/ [13:03:53] dcausse: sorry, overlooked your request. I’m available now. [13:07:56] pfischer: np! https://meet.google.com/csd-ycbu-pgp [13:19:57] left this meeting ^, I'll schedule something for monday [13:32:01] Ah, crap, I don’t know why I only occasionally get the IRCCloud notifications. Sorry for that. [13:32:38] no worries :) [14:06:00] dcausse: In regard to “4:42 PM I wonder what happens with the set handler if you still provide the content of the array in source body” - I checked the SuperNoopScript.update method, and if no handler is specified, a source entry is mapped as is. If a handler is specified — in case of a set-handler — it would fail, as it expects “path.to.property” to be a map (with add/remove, and/or max_size) and [14:06:00] not a list. [14:25:04] oh... I see, that means the schema won't match if we wanted to encode that in the Update Row object... [14:32:27] Hm, that could be solved with a oneOf-schema-declaration but they are not allowed qua schema guidelines https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines [14:34:24] I could extend the set handler with “set” parameter (verify it’s that or a combination of add, remove, and max_size) [14:35:12] That would simply override the value and we’d only have to support one schema. [14:39:29] ah meaning that we'd always have to rely on the set handler for redirect field, there would no way to set it as a plain array (a field in the schema can't be an array and a map at the same time...) [14:39:59] unsure what's best/appropriate... let's discuss about the pros & cons next week [14:51:23] errand, back later tonight [15:09:57] \o [15:10:43] can anyone sanity-check this for me? I think the data says that we've gotten at most 25 concurrent requests to query.wikidata.org hosts in codfw over the last 2 weeks: https://superset.wikimedia.org/superset/dashboard/p/NjaBaYxBL1Q/ Let me know if I'm mistaken [15:12:35] I'm trying to add some of the new Bullseye hosts into rotation, but want to get a good idea on how much traffic goes CODFW before I pool them [15:12:50] inflatador: i dunno if we can really get concurrent counts from that, although we could make some estimates by combining the expected response time with the rate of requests [15:13:24] inflatador: the top graph goes to 25, but thats requests per some time period (unlabeled), [15:14:25] lol, then the superset tab crashed in my browser :P [15:14:48] Sorry to bring that evil upon you ;P [15:15:22] it's hanging my browser too. Is there a better place to check this type of data? [15:18:13] nah superset is probably it, it might just not like the way these datasets are aggregated or something, too many data points [15:19:21] 2 weeks might be a bit much [15:19:58] looks like the data is per minute, so it's ~25 requests per minute, but thats sampled 128 so 3200 per minute or 53/sec incoming. Not sure if thats too many or not [15:20:59] unfortunately this mostly seems to have TTFB which isn't that relevant here (sparql streams results), we're looking for the total time..hmm [15:23:10] Yeah, I need to figure out what % of our overall traffic that is...I think I have the pieces to figure that out [15:23:44] * inflatador tries to get the most of his laptop's RAM [15:24:13] ok, yea mostly have to combine the 53/s incoming with the expected time that each query is running to get a concurrency number. 500ms is probably a plausible guess getting back to the ~25 concurrent requests [15:27:39] eqiad seems like it's getting a smaller % of traffic than codfw, that's a little surprising [15:27:58] or at least, based on the requests per minute [15:28:40] does seem surprising [15:49:57] ah well, I'm gonna pool 1 wdqs2020 and see if errors start to tick up...I'll be looking at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&from=now-15m&to=now&viewPanel=1 and https://superset.wikimedia.org/superset/dashboard/p/yxXBlboOG8R/ , if you have suggestions for other places to check LMK [16:00:15] break, back in ~30-45 [16:01:13] I saw an error on 2020...probably nothing, but I've depooled it until I get back [16:35:17] back [16:42:50] I guess the cookbook isn't touching data_loaded at the end...hmm [16:48:13] o/ api question. when I explicitly set `srwhat` to `text` on the Search API, I get different results than when I leave it undeclared. looking at the code (https://gerrit.wikimedia.org/g/mediawiki/core/+/343b70a975a42639d53d1d98bfe72c135d134ab9/includes/api/ApiQuerySearch.php#129), I think it should be defaulting to `text` because my understanding is that title search is disabled so i'm confused as to what's going on and which is [16:48:13] better. example where result #4 is different: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=wedding&srlimit=10&srnamespace=0&format=json&formatversion=2&srwhat=text vs. https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=wedding&srlimit=10&srnamespace=0&format=json&formatversion=2 [17:02:49] isaacj: you can get explains that show whats going on, sec [17:05:24] basically append `&cirrusDumpQuery` to see what query is issued, or `&cirrusDumpResult&cirrusExplain=pretty` to get a breakdown of the scoring factors used. If you just reload the explain a few times you can see that the numbers are not the same for every execution, even for the exact same query [17:05:50] this happens because the queries land on different hosts, and different hosts have slightly different ideas of what language model is, they use the statistics for their slice of the dataset and not the whole thing [17:07:06] but reviewing the `&cirrusDumpQuery` part we can see that the same query is being issued in both cases. [17:12:39] ahh so it's not the parameter itself, it's just there's a slight randomness within search [17:14:08] thanks ebernhardson for the demonstration on how to debug and explanation! [17:14:55] isaacj: yup, can expect some minor reordering of the search results, particularly if the resulting scores are very similar to each other [17:21:08] Trey314159: looks like i was thinking about https://www.aps.org/publications/apsnews/201611/hossenfelder.cfm (or at least something similar). Turns out she still offers that service today, althogh it looks like they connect you with their group of physicist's and not her specifically. http://backreaction.blogspot.com/p/talk-to-physicist_27.html [17:22:05] ebernhardson: Cool—thanks for finding that! [17:25:19] actually https://aeon.co/ideas/what-i-learned-as-a-hired-consultant-for-autodidact-physicists looks a little closer to what i remember, but same premise difference is this is a blog written by sabine instead of an interview [17:28:06] OK, think I fixed the data_loaded thing [17:39:00] huh. Turns out when we converted ores jobs from weekly to hourly...i forgot to change the input data selector. It's been trying to read 7 days of content (but since thats today through today+7 days, it means each hour ships everything it saw that day [17:39:12] i guess in terms or correctness it's still correct, it just re-ships the same updates alot [17:47:03] 'autodidact' sounds so much nicer than 'dilettante' [18:01:49] lunch/library, back in ~45 [18:56:43] back [21:37:12] ryankemper FYI, here is current state of the hosts as far as data-transfers/scap deploys: https://phabricator.wikimedia.org/T332314#8997406 . Heading out in ~20 but LMK if you have questions [21:37:48] inflatador: excellent, thx