[09:41:53] <_joe_> In production, where we have 900 languages and 6 GB worth of l10n data, it was evaluated to have a cost larger than the benefit. [10:44:12] Last I heard was "Memory usage is higher with LCStoreArray, but not significantly enough to be a worry in our current setup.", after confirmimg up to 5% latency win https://phabricator.wikimedia.org/T99740#6085180 [10:45:38] I'd expect 1-2GB theoretical max, not 6? Our apcu is also over allocated last I checked. Afaik there are no blockers at this time, besides scheduling, and awaiting other projects (k8s, php8) [10:50:56] More recent data at https://phabricator.wikimedia.org/T99740#6111784, but some conclusion, given one deploy per container. [11:24:28] <_joe_> 6 GB is the space on disk, yes the expectation is to need 1 GB per version IIRC. I'm still convinced the cost of having larger memory usage per pod will require us to have fatter pods (because the cost is independent from the number of workers you're running) and thus worse performance under decent usage of resources. But I'm willing to experiment with that once we're fully on kubernetes [15:25:21] _joe_: the php files are smaller than the db files last we checked. Before 2018, the array format was more verbose, in 2018 this was improved to be smaller. It's reasonable to question this since binary formats should be smaller, but CDB is optimised for fast reads, so I imagine it's probably not gzipped and has a lookup table. The php on the the other hand can count on a giant parser and opcache infra to leverage so the format doesn't [15:25:21] need any of that. It's worth double-checking this on latest master. [15:25:54] https://phabricator.wikimedia.org/T99740#4609279 874 Kio for this slim PHP version, …, 980 Kio for the CDB version. [15:26:14] <_joe_> Krinkle: yes, that's what i was sayung, the cdb files are 6 gb of disk we'd trade with a smaller footprint, but in ram [15:26:22] oh I see, got it. [15:26:54] <_joe_> my point is that RAM is a more scarce resource so it might or might not be beneficial in the global scale of things [15:31:23] ack, I'm thinking about it from pov of T302623, and how few things there are that have a >1% latency win with essentially only (albeit non-trivial) configuration. Closing some sources of latency outliers by relying on disk less. I'm sure there's endpoints that will benefit from this more than others, incl for example load.php warm up after deployments which need to do a large number of l10n reads. The alternative there long-term would be [15:31:23] requiring a warm-up script as part of pod start. Which might be feasible as well. But prebuilding things into opcache def has the potential to cut many complex things out. Ultimately, I suppose we could make it more formal and say X latency gains, increases effective pod capacity, and thus justify Y RAM increase. We could meet that target by working on other latency gains first to pay down the budget (although we've done many such things [15:31:23] by now since 2018, so maybe we've created that space already, if we measure from back then). [15:31:28] T302623: FY2022-2023: Improve Backend Pageview Timing - https://phabricator.wikimedia.org/T302623 [15:32:04] also thinking about pre-request memory which has been increased each year with seeemingly no push back. is that not more impactful? [15:32:12] per-request* [15:33:09] One we agree on a goal post, I'm sure we can trade some goods and get there. We've not done any work afaik to reduce per-request memory limits. We totally could, though. [16:47:08] RelEng planning has started talking about single version containers once we get fully deployed on k8s. Would having only one MW version of strings in a pod make any meaningful difference in the grand scheme of cached i18n RAM usage? [16:48:32] Naively is feels like it would be cut in half, but in practice probably not exactly as we would at least half the time have version N and N+1 pods running (group0->1->2 train progression) [17:36:26] <_joe_> bd808: if anything, I think that single-version containers would create some more problems than they'd solve [17:36:50] <_joe_> I've gone back and forth on this, there's some unsolved engineering and operating issues we need to iron out [17:37:38] Bold to say that they are more problematic in the long term, but I agree there are still lots of things to consider [17:38:17] <_joe_> oh no, I said that *right now* they would. I'm just saying it's not as simple as having a proper dispatcher of requests [17:38:53] :nod: [17:39:21] <_joe_> there's quite a few problems to balance and consider carefully. Right now our traffic is divided by purpose, which is very useful when we're under pressure [17:39:51] <_joe_> we can sacrifice the internal api requests or jobs while giving more resources to the web requests or to the external api clients [17:40:22] <_joe_> so we would probably want to keep that distinction [17:40:46] <_joe_> but then you have 3x the clusters to reason about [17:41:10] <_joe_> and they're gonna be of veyr different sizes. Group 0 receives basically no traffic [17:41:17] <_joe_> group1 is maybe 10%? [17:42:21] <_joe_> so while I agree there would be quite a few advantages (including the ability to deploy selectively, smaller image sizes, smaller memory/disk footprint at runtime) we need to do something about the rest [17:42:44] <_joe_> maybe one day we'll have an amazing horizontal pod autoscaler we trust and most of these problems will be gone. [17:45:10] I think we can figure it out. We've been working for 7 years to get to this point in the mw-on-k8s saga, so I guess we are not afraid of challenging problems with unknown time commitments. :) [17:46:10] the per-version traffic imbalance is an aspect of this I had not thought much about yet [17:48:08] _joe_: when you mention 3x clusters does that mean you think these use cases need full k8s control plane separation from each other, or just that there would be more groupings in the same k8s hosting cluster to reason about? [17:48:45] If the answer is "it depends" or "its complicated" at this point that's fine too :) [17:48:52] <_joe_> sorry yeah, overloaded terms [17:48:56] <_joe_> "mediawiki deployments" [17:49:11] :nod: [17:49:12] <_joe_> right now we have [17:49:34] <_joe_> mw-web, mw-api-{ext,int},mw-jobrunner,mw-parsoid [17:50:48] <_joe_> the other engineering problem is what is your source of truth about which hostname is in which group, and how to sync that information across layers, but that's relatively easy to solve [17:52:08] I have casually assumed etcd storage of that stuff, but again not pondered the consequences and tradeoffs deeply [17:53:09] This is all very much in the "it must be possible, we just need to figure out how" stage of thinking for me and I think the other folks I've been chatting with.