[08:38:11] marostegui: From sal I assume you're preparing the circular replication for wednesday's switchover? Can you tell me when you're done so I can do the pre-checks? [08:42:52] claime: will do [08:44:17] ty [10:20:04] claime: you can proceed [10:20:15] marostegui: fantastic, thanks [10:23:39] marostegui: just checking, did you also change pc to rw in eqiad ? [10:25:03] claime: those are always writable [10:25:40] marostegui: Should we remove " Verify that the parsercache servers are set to read-write in the passive DC." from the checklist then? [10:25:58] claime: I would leave it, just in case, it doesn't take much to check [10:26:02] ack [16:04:58] Amir1: apropos of nothing, a three year comparison of jobqueue traffic: [16:04:59] https://performance.wikimedia.org/arclamp/svgs/daily/2021-04-01.excimer.RunSingleJob.svgz [16:05:02] https://performance.wikimedia.org/arclamp/svgs/daily/2022-04-01.excimer.RunSingleJob.svgz [16:05:06] https://performance.wikimedia.org/arclamp/svgs/daily/2023-04-01.excimer.RunSingleJob.svgz [16:05:29] you can see the increase in load (samples per day), as well as the relative and absolute increase/decrease in job type. [16:05:55] 399K samples/day in 2021, 510K/day in 2023. [16:06:21] 70% is Cirrus+RefreshLinks, more or less unchanged. [17:38:28] Krinkle: oh thanks. That looks interesting [17:39:34] I really wish we could optimize cirrus search jobs a bit [17:40:15] specially the multi-write piece [17:43:40] Amir1: do we have stats on whether Cirrus jobs are effective in using parser cache? [17:43:58] I don't think so [17:43:59] the flame graphs naturally show most of the job type spent in cache miss with almost nothing in cache hit, but that merely means cache hits are fast. [17:44:21] also there is a job that goes around the whole wiki to re-do indexes once a month, I assume that's also a bit part of the load [17:48:33] tbh, most of the job can be set to be done way slower for wikidata and commons which in total have 200M pages to reparse [17:58:33] I wonder if it could e.g. perform the different elastic writes in a single job after doing the parse, and then only re-queue itself for a subset of the clusters iff the write failed. The re-queue could in theory be idempotent for a given page ID and likely leverage parser cache. [17:58:52] I vaguely recall that search couldn't use parser cache, I think you ended up improving that a year ago or so. [17:59:26] It seems like it should be able t use the parser cahce if it's there, but I think you ended up optimising it so that for wikidata, it only generates the non-html parser output on-demand. [17:59:46] but presumably if there is a with-html parser output object in the cache already, it could/should/does use it? [19:31:31] Does anyone have recommendations for python libraries that handle long-running file xfers? We're basically using transfer.py for our data-transfer cookbook ( https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-transfer.py ) and it no longer completes [19:31:58] inflatador: why isn’t in completing ? [19:32:34] The DBA version of transfer.py was made for jynus I think so he might have ideas [19:32:40] we've gotten at least different 3 failure scenarios over the last week, I'm working on https://phabricator.wikimedia.org/T321605 with more info [19:33:38] We're looking for stability rather than speed. I'd like to get rid of all the shelling out and just use pure python if possible