[09:14:14] FYI: I increased the SUP fetch timeout to 30s for the CODFW consumer-search, if that brings down the retry rate and number of retries per fetch, especially for wikidata, I’ll let EQIAD follow [09:32:01] pfischer: thanks! seems like it helped a lot! https://w.wiki/AJXN [10:08:21] dcausse: yes, interestingly, the retry rate (update it at least one retry is necessary) appears unchanged, but the number of attempts per fetch dropped 50% [10:08:41] from 2 to 1 [10:10:23] so we retry less often and overall with a greater success rate [10:10:59] …less ofter >per fetch< … [10:10:59] nice! [10:11:31] I’ll update EQIAD [10:18:43] lunch [12:20:47] ryankemper: created a smaller folder for testing the cookbook (https://phabricator.wikimedia.org/P64016#257349) [12:44:48] o/ [13:38:23] \o [13:46:40] o/ [13:54:35] dropping off my son, back in ~20 [14:24:42] back [15:01:32] hmm, i wonder if the saneitizer needs a special mode. The problem is i'm looking at our wikitech doc replacement, and SUP doesn't really have a replacement for the 'Populate the search index' step of creating indices for a new wiki [15:02:06] but for a tiny wiki, saneitizer could just do a quick loop over minutes instead of weeks [15:03:06] i dunno, maybe not important. Maybe we hope the indices are created soon enough to when the wiki is created (historically, it depends :P) [15:03:19] saneitizer will still fix it up, just takes 2 weeks [15:07:00] in theory index creation happens at wiki creation time but it's true that without ForceSearchIndex we're not very flexible... [15:10:11] maybe could make ForceSearchIndex.php work in non-jobqueue mode [15:10:41] Not sure if it would be a big hack or not, would have to look around, but for the use case of a new wiki letting cirrus write 10 or 20 documents shouldn't be a big deal [15:11:40] yes or perhaps relax the rerender queue to allow creating the pages, we could loop over allpages api and ship those [15:11:58] the rerender events have a "reason" field we could perhaps use? [15:13:27] what was the goal in switching from upsert to update? I suppose it shrank the request sizes, perhaps gives visibility into correctness? [15:15:26] I think to avoid re-creating the page if it was deleted given the rerenders are processed out of order? [15:16:37] but that might not happen a lot? since fetch very late [15:16:46] ahh, yes [15:17:28] a very late rerender event would now probably a 404 now so perhaps no need to worry? [15:19:08] i suppose, yea the api should fail the cirrusdoc [15:19:29] yes so perhaps we can revisit this and always upsert? [15:19:52] yea that seems reasonable, we can switch to upsert which also gives editors back the null-edit fix attempt [15:20:01] oh, thats unrelated. Not sure why i thought it was :P [15:20:18] well, i mean on the pages missing, but yea. [15:20:47] yes a rerenders would now have a chance to fix missing pages [15:21:32] also I can perhaps extend the little python script I'm working on to allow using allpages without filters (it currently expects a namespace filter) [15:22:05] might be enough to bootstrap small wikis [15:23:03] seems reasonable [16:00:16] * ebernhardson wonders why cindy is timing out more again [16:28:08] Hm, according to the our metrics, the constellation (PAGE_RERENDER -> _FAILURE) was the result in 2.75% of 5.3 million actions processes within the last 6h. (PAGE_RERENDER -> NOT_FOUND), which would come closest to updating a deleted page, did not occur at all. [16:29:26] (However, _FAILURE is a catch-all bucket, so it’s unclear what exactly the failure was) [16:32:32] So I guess, we could give page_rerender-based upserts a shot. [16:33:20] dinner [16:37:46] dcausse: CI for rdf is failing due to scala errors in SubgraphRuleMapperUnitTest - https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-java8/61/console did that pass for your changes? [17:02:54] bacl [17:07:32] pfischer: weird... https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/1034861 did pass (the postmerge build failed tho), will take a look tomorrow [17:09:39] ah might be because of https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/1032745 which might remove the all arg ctor? [17:11:02] or making it private [17:27:29] found a problem where we are indexing redirects which should be in text form in their dbkey form, should be an easy fix but tests are fighting me :P [17:49:45] lunch/picking up kid...back in ~1h [18:14:08] ebernhardson: do we have some measure of the load reduction on jobqueue due to SUP? [18:28:51] ebernhardson: did the previous search update pipeline rely on change prop? Or only on post save hooks? [18:31:17] Actually, let me move that to slack, Olja is interested... [19:20:12] back [19:21:55] Gonna resume codfw search reboots in ~10m [20:23:42] * ebernhardson realizes SUP integration testing doesn't have anything for moved pages. We might simply not have any handling for it? [20:56:52] Stepping out for 15m, codfw elastic reboot still ongoing [20:58:20] ACK [21:32:04] ryankemper if you have time, could you review https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039838 ? Just changing a namespace on the dse cluster [21:32:48] eating some food, can look in 8’ [21:43:30] np, thanks [21:45:46] inflatador: +1'd [21:46:35] ACK, thanks