[01:12:45] RoanKattouw: I'd like to know what your four-cornered triangle looks like. https://phabricator.wikimedia.org/T344386#9132451 [02:50:57] Reedy: did you need https://gerrit.wikimedia.org/r/c/mediawiki/core/+/954927/ for something? [06:04:39] source maps enabled on mediawiki.org, works for me [08:05:35] Krinkle: Not specifically, but was seeing how what would be needed to trivially backport https://gerrit.wikimedia.org/r/c/mediawiki/core/+/954643 [08:06:14] in turn, due to https://gerrit.wikimedia.org/r/c/mediawiki/core/+/954641 [08:06:29] (I know I can probably just delete the large chunks of conflict instead) [16:30:19] very promising results from applying apc.ttl to entries with an explicit TTL as well https://usercontent.irccloud-cdn.com/file/Pvug8PqN/Screenshot%202023-09-13%20at%2018.29.28.png [16:30:34] especially as we started getting expunges again after a recent change [20:12:36] mszabo: hm.. so if I understand correectly, you believe without this change, APCU stores entries as long as the TTL, evicting only if storage is full (and thus likely not very efficiently, e.g. oldest stored instead of least used). And after this change, you're able to use apc.ttl as a nudge to proactively GC stuff that hasn't been accessed for $ttl seconds where $ttl is presumably signifcantly smaller than the TTL most values are [20:12:36] allowed to stay in memory for, yet popular values can remain their full specified TTL. [20:12:40] Does that sound right? [20:12:57] well, almost [20:13:16] without this change, APCu cannot evict an entry if it has an explicit TTL and that TTL has not yet expired [20:13:51] so if there is an insert coming, there is not enough memory to accommodate it, and there are not enough expired items that can be removed to free up enough space, you get a full expunge [20:14:39] whereas with this change, once there isn't enough memory available for the next insert, it can remove not just entries that are past their TTL, but also entries not accessed in the last apc.ttl seconds [20:14:59] which then in practice allows it to free enough memory to accommodate the insert and avoid a full expunge of the entire cache [20:15:39] full expunge = complete apcu clearing/reset, so it's like opcache reset. [20:15:47] yep, removing all values from the cache [20:15:51] I thought that changed in PHP7, wow, that's terrible. [20:16:07] yeah it used to just fill up and then reset, so that's basically stil the same. [20:16:34] yup, that behavior is nicely visible on the left side prior to the change :) [20:16:45] mszabo: what about latency - does it manage to do some of this work in a side thread of sorts? [20:17:04] I think in our case we don't hit the reset maybe, just gave it more memory. [20:17:27] Yea I don't think Wikimedia gets full expunges, as you stay well below the SHM size [20:17:32] We inherited that model from HHVM era, where there was no GC either. So it diciplined us to make sure we don't treat apc like memcached, don't store what you can't fit basically. [20:17:48] I wonder what you have that makes it so that so much stuff ends up there. [20:18:09] it's mostly ExtensionRegistry keys (top right panel), probably due to more wikis and so more load queue variations [20:18:19] and a way smaller SHM - 768M, not gigabytes [20:18:35] right, we don't put high cardinatlity values in apc often but per-wiki is probably an example where we kind of get away with it and you can't. [20:18:57] yeah, ER is an outlier as it is not just high cardinality in our case but also very large size [20:19:25] yeah, that's definitely on the small side for a large wiki farm. what's your opcache size like? [20:20:27] 512M, and from utilization it seems it's only~50% used [20:21:02] that's incredible. [20:21:31] we're talking k8s pods with php-fpm right? [20:21:35] yea [20:21:56] https://grafana.wikimedia.org/d/GuHySj3mz/mediawiki-application-php?orgId=1 [20:22:12] ah yeah, now that we do restarts afetr every deploy, our use dropped a lot as well. [20:22:21] no more multiple compilations of the same file [20:22:45] yup plus the restart also prunes rarely used files [20:22:49] we turned opcache.revalidation off as well which in theory helps latency by reducing mtime stat lookups in native php for every require/autoloaded class. [20:23:07] yeah, we've had opcache.validate_timestamps=0; for a while [20:23:10] cool [20:23:20] and I guess you're not using l10n cache in .php files [20:23:26] that'd increase it to 1.5G or so [20:24:17] hmm, we actually do [20:24:58] https://grafana.wikimedia.org/d/GuHySj3mz/mediawiki-application-php?orgId=1&viewPanel=35&from=now-7d&to=now [20:25:30] our apcu seems to peak at 1.8G and then once it's forced to clear old stuff stabalises around 1.6G [20:26:05] hmm, what's your shm size? 4GiB still? [20:26:10] but far less on api and jobrunner https://grafana.wikimedia.org/d/GuHySj3mz/mediawiki-application-php?orgId=1&viewPanel=35&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-server=All [20:26:17] 500M api appservers [20:27:07] https://codesearch.wmcloud.org/puppet/?q=apc.shm&files=mediawiki&excludeFiles=&repos= [20:27:24] yeah, 4G on most appservers, 6G for some [20:27:36] Hm.. I wonder if I'm misinterpreting something here [20:27:51] https://github.com/wikimedia/operations-puppet/blob/ef0f8341eeae7da05c019ad126a2fbb9da80b58d/hieradata/role/common/mediawiki/appserver/api.yaml#L40 [20:27:53] interesting [20:27:58] 4G on api cluster [20:29:49] units seem correct. [20:29:52] last 1y not very different [20:30:16] and given switch from graphite to prometheus, no more retnetion beyond 1y unfortunately [20:30:42] I'm guessing our stable equilibrium used to me higher when didnt' restart every deploy [20:31:12] which, given your team's php-apcu change, I guess was largely made up of unevictable entries that werent' used. [20:31:36] we actually did a sprint around 2015 where we explicitly triaged all apcu use and made sure a ttl was set [20:31:56] to avoid OOM in HHVM [20:32:09] yeah, before a change recently caused us to have expunges again, our usage would also grow e.g. over the weekend [20:32:42] we now also typically hold only opcache and apcu for 1 MW version [20:32:54] https://grafana.wikimedia.org/d/GuHySj3mz/mediawiki-application-php?orgId=1&viewPanel=35&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-server=All chart shows the usual trends [20:33:04] instead of multiple, assuming by thursday we're on 1 version and so the last restart removes any other MW version related keys and compiled data [20:33:09] deployment at 09/07 evening, rapid fill of ER and other common keys, slow gradual growth with other keys [20:33:21] and then 24h later the first ER keys and other TTL_DAY keys expire and usage drops [20:34:31] in any case, we will observe our change over this week and weekend and if all works out, we'll probably try to upstream it behind an INI FLAG [20:34:35] how does apcu GC? It seems to do it lazily during apcu_store if-and-only-if there is no space, right? [20:34:59] yeah, and there is opportunistic cleanup in apcu_store() if there happen to be expired items in the same hash table slot being inserted into [20:35:02] so how would that result in a big drop, other than if the stored key is smaller then the evicted one. but that would also mean it won't evict more for a while [20:35:14] right, so it always does the whole slab [20:36:12] I wonder how your php-apcu impacts latency given today we basically never engage soft expire given practicaly nothing in MW without ttl [20:36:27] seems like it'd be doing more work, but is it significant? [20:36:35] yep, if it finds it does not have enough memory for the entry being stored, then it iterates through all entries, removes expired ones, then checks if that freed up enough memory [20:36:37] lock contention as well. [20:39:04] so before this Monday (when we removed some content from apcu which paradoxically caused it to have full expunges again), we didn't trigger full expunges [20:39:16] but performance would badly degrade if left running for extended periods e.g. the weekend [20:39:29] the sunday daily flamegraph puts APCU's share at a whopping 7% [20:39:37] today's incomplete graph puts it at 1.3% [20:39:42] wall time [20:40:33] so before the current change, it would iterate over the whole cache, and actually manage to find enough expired items to insert the next one and so avoid an expunge [20:40:42] however, it didn't prevent growth of cached item count over time [20:40:55] so that iteration would become more and more expensive as we proceeded into the weekend [20:41:12] entry count would go from ~30K to nearing 80KL [20:41:16] ~80K [20:41:56] now that we consider apc.ttl for all items, the cache size is stable around 25-30K entries, so there is no growth over time [20:48:05] mszabo: hm.. so previously it looked only for expired items and (after hard looking) always found enough to make space, and now it's also practicely pruning unexpired-but-unused items more readily. [20:48:42] I guess they happen in the same pass. [20:49:05] what value did you use for apc.ttl as the unused threshold? [20:52:40] 10 minutes based on the time-to-expunge that we observed this week [20:52:53] you can probably get away with a higher amount ;) [20:57:46] and its all in the same pass. it was a one-line change: https://github.com/Wikia/apcu/pull/1/commits/f9314aff6308c7ef82ae9938d53d1e89555cfe4d [21:32:30] mszabo: right. apc_cache_entry_soft_expired controls 1 entry, called from apc_cache_entry_expired (also for 1 entry, chcks both hard and soft expired), which in turn is called either from apc_cache_wlocked_insert or from apc_cache_default_expunge (as "sma->expunge"). [21:33:02] the apc_cache_wlocked_insert says something about runtime clean up to reduce expired items in the linked list. [21:33:19] yeah, but that'd only occur if there were other items in the same hash table slot (hash collision) [21:33:45] it does not seem to amount to much, based on the behavior at least [21:35:40] ah you mean there's usually a 1:1 between a key and the linked list this iterates? I thought each linked list there is akin to a memc slab with a fairly large number of items in each one. [21:36:02] the implication of that doing evictin is preusmaby that in practice we won't apply the soft ttl just when we need space but basically all the time. [21:36:19] probab ly not a big differnece in practice since we'll eventually be at size anyway [21:36:59] and it's doing it to reduce cost of insertions so the cost of eviction could be offset by the win in less soon-to-be-evicted items to be iterated past. interesting [21:37:34] but I'm guessing then that broadly speaking you haven't seen notable changes (or perhaps reduction?) in appserver latency percentiles? [21:38:00] not over the last day or so, but we will compare the upcoming weekend with the last one [21:38:17] as the performance degradation over weekends used to be significant [21:41:56] the weekend effect https://usercontent.irccloud-cdn.com/file/lx6mTyGw/Screenshot%202023-09-13%20at%2023.41.34.png [22:00:50] ack