[16:36:29] Amir1: looking parser cache stats since the TTL was raised back from 20 to 30 days. I see only a small usage bump on 1 day one time, and then flat. https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=now-30d&to=now&viewPanel=9 ref T280604 [16:36:29] T280604: Post-deployment: (partly) ramp parser cache retention back up - https://phabricator.wikimedia.org/T280604 [16:37:03] The flat suggests usage is neither going up nor down, which is interesting. I was expecting a 10 days rise [16:37:40] I think that will show up in twenty days [16:37:54] right now, stuff that were set twenty days ago are being cleaned [16:38:48] and also PC's storage can have been changed a lot due to binlogs [16:39:53] (given that deletion don't shrink tables, it doesn't show itself and hides a lot of stuff, e.g. growth can be hidden now because we expanded clusters so it got smaller but it didn't show) [20:33:59] Amir1: hm.. right, I forgot that it's measuring disk space not logical db size. But.. given a longer TTL, we expect disk space used to increase not stay equal. But maybe theory is that a big portion of the disk space is not the db but the binlogs, and given that for next 10 days we basically won't delete anything, that makes much smaller binlogs, which maybe counteracts the logical db increase for a little while. [20:34:31] I thought the purge script acts based on retroactive measure, not based on TTL during set() but maybe we changed that? [20:34:39] since its a parameter the script [20:35:24] $expireTime = (int)$this->getConfig()->get( MainConfigNames::ParserCacheExpireTime ); [20:35:24] $timestamp = time() + $expireTime - intval( $inputAge ); [20:36:09] I remember how brain breaking this was when we changed it downward. There was a specific order of operations in either direction (mw-config change, puppet maint param change, one-off script run; not in that order) [20:40:51] Maybe it makes sense to switch the maint script to --expiredata=now and let the TTL handle it. That at least has more understandable behaviour compared to the retro-fragile-calculated --age logic. [20:41:55] If you want to proactively purge stuff e.g. akin to --age=10 when most objects are stored with ttl=30days then you'd do "--expiredata='10 days from now'" as a way to purge stuff earlier in an emergency (which I imagine is the only use case for having these options at all in the first place) [21:02:18] on top of the binlog: since we added fourth cluster, we reduced entries per table/db/cluster but the space is not reclaimed, just reserved for future growth and since now we added the extra 9 days in ttl, it doesn't show itself in growth of the table files or at least most of the impact is absorbed [21:03:57] I don't have a good reason for the change to --exiredate=now. It's nicer but we don't have any issues with deletion to my knowledge [23:28:29] It just seems odd that the day-over-day growth suddenly stopped when the ttl was increased, when I expected it to have been stable before and start temorarily increasing. I guess the 4th server was still populating its (temporarilly, logically duplicate) portion of the data.