[08:38:11] <claime>	 marostegui: From sal I assume you're preparing the circular replication for wednesday's switchover? Can you tell me when you're done so I can do the pre-checks?
[08:42:52] <marostegui>	 claime: will do
[08:44:17] <claime>	 ty
[10:20:04] <marostegui>	 claime: you can proceed
[10:20:15] <claime>	 marostegui: fantastic, thanks
[10:23:39] <claime>	 marostegui: just checking, did you also change pc to rw in eqiad ?
[10:25:03] <marostegui>	 claime: those are always writable 
[10:25:40] <claime>	 marostegui: Should we remove " Verify that the parsercache servers are set to read-write in the passive DC." from the checklist then?
[10:25:58] <marostegui>	 claime: I would leave it, just in case, it doesn't take much to check
[10:26:02] <claime>	 ack
[16:04:58] <Krinkle>	 Amir1: apropos of nothing, a three year comparison of jobqueue traffic:
[16:04:59] <Krinkle>	 https://performance.wikimedia.org/arclamp/svgs/daily/2021-04-01.excimer.RunSingleJob.svgz
[16:05:02] <Krinkle>	 https://performance.wikimedia.org/arclamp/svgs/daily/2022-04-01.excimer.RunSingleJob.svgz
[16:05:06] <Krinkle>	 https://performance.wikimedia.org/arclamp/svgs/daily/2023-04-01.excimer.RunSingleJob.svgz
[16:05:29] <Krinkle>	 you can see the increase in load (samples per day), as well as the relative and absolute increase/decrease in job type.
[16:05:55] <Krinkle>	 399K samples/day in 2021, 510K/day in 2023.
[16:06:21] <Krinkle>	 70% is Cirrus+RefreshLinks, more or less unchanged.
[17:38:28] <Amir1>	 Krinkle: oh thanks. That looks interesting 
[17:39:34] <Amir1>	 I really wish we could optimize cirrus search jobs a bit
[17:40:15] <Amir1>	 specially the multi-write piece 
[17:43:40] <Krinkle>	 Amir1: do we have stats on whether Cirrus jobs are effective in using parser cache?
[17:43:58] <Amir1>	 I don't think so
[17:43:59] <Krinkle>	 the flame graphs naturally show most of the job type spent in cache miss with almost nothing in cache hit, but that merely means cache hits are fast.
[17:44:21] <Amir1>	 also there is a job that goes around the whole wiki to re-do indexes once a month, I assume that's also a bit part of the load
[17:48:33] <Amir1>	 tbh, most of the job can be set to be done way slower for wikidata and commons which in total have 200M pages to reparse
[17:58:33] <Krinkle>	 I wonder if it could e.g. perform the different elastic writes in a single job after doing the parse, and then only re-queue itself for a subset of the clusters iff the write failed. The re-queue could in theory be idempotent for a given page ID and likely leverage parser cache.
[17:58:52] <Krinkle>	 I vaguely recall that search couldn't use parser cache, I think you ended up improving that a year ago or so.
[17:59:26] <Krinkle>	 It seems like it should be able t use the parser cahce if it's there, but I think you ended up optimising it so that for wikidata, it only generates the non-html parser output on-demand.
[17:59:46] <Krinkle>	 but presumably if there is a with-html parser output object in the cache already, it could/should/does use it?
[19:31:31] <inflatador>	 Does anyone have recommendations for python libraries that handle long-running file xfers? We're basically using transfer.py for our data-transfer cookbook ( https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-transfer.py ) and it no longer completes
[19:31:58] <RhinosF1>	 inflatador: why isn’t in completing ?
[19:32:34] <RhinosF1>	 The DBA version of transfer.py was made for jynus I think so he might have ideas
[19:32:40] <inflatador>	 we've gotten at least different 3 failure scenarios over the last week, I'm working on https://phabricator.wikimedia.org/T321605 with more info
[19:33:38] <inflatador>	 We're looking for stability rather than speed. I'd like to get rid of all the shelling out and just use pure python if possible