[20:58:30] [discord] So its cgroup OOM which means mariadb must have a misconfig some where @CosmicAlpha rather than something eating up all the ram [20:58:52] [discord] eg Maria is requesting more than its allowed in Cgroup config [21:01:20] [discord] Well yeah that makes sense, but they all should have the same amount on each of our db servers IIRC, so it didn't make sense why only one server (db121) seems to OOM often. Unless they don't actually have all the same resources. [21:03:39] [discord] I would run ```cat /proc/meminfo``` on them all to confirm they do have the same resources, and ill take a look at puppet to see what the config is 🙂 [21:07:02] [discord] Thanks I'll try that in about 15 minutes. Stepped away from PC. [21:08:07] [discord] No rush from me, just throwing idea's/suggestions [21:08:52] [discord] Thanks a lot for that. This has been an issue I have been trying to figure out for months also, because it OOMs far to often (sometimes multiple times in just a few hours) [21:10:33] [discord] we limited the buffer pool to 5g i think [21:10:55] [discord] i have this change https://github.com/miraheze/puppet/pull/3013 [21:10:56] [url] mariadb: stop using thread pool by paladox · Pull Request #3013 · miraheze/puppet · GitHub | github.com [21:11:00] [discord] dunno if that would help [21:11:12] [discord] also we have open tables cache at like 50k [21:12:27] [discord] So you have two things, the cgroup setting for the service, and then the config for maria... are the db servers dedicated nodes? [21:13:12] [discord] well they are hosted as vms [21:13:28] [discord] but they have around 20g of ram (apart from one of the smaller dbs that host our misc) [21:13:48] [discord] it seems only db121 OOMs whereas the others doesn't [21:13:58] [discord] https://grafana.miraheze.org/d/W9MIkA7iz/miraheze-cluster?orgId=1&var-job=node&var-node=db121.miraheze.org&var-port=9100&viewPanel=78 [21:19:17] [discord] 20G of ram, but a 5B buffer pool ? what else are you running on there [21:19:39] [discord] Nothing else [21:19:43] [discord] just mariadb 10.5 [21:34:49] [discord] All the same [21:42:58] [discord] we could lower the buffer pool to 1g, lower the open cache size and also stop using the thread mode in mariadb [21:43:21] [discord] Oh I might know why it happens only on db121 [21:43:26] [discord] oh? [21:43:48] [discord] Because that is our parsercache database, it is used more. [21:43:56] [discord] oh [21:47:37] [discord] > root@db121:/srv/mariadb/parsercache# du -sh ../parsercache [21:47:38] [discord] > 63G ../parsercache [21:47:39] [discord] :wow: [21:48:12] [discord] I am looking at something if we can improve that... [22:12:15] [discord] https://phabricator.wikimedia.org/T245489 [22:12:16] [url] ⚓ T245489 Possibly disable optimizer flag: rowid_filter on 10.4 | phabricator.wikimedia.org [22:12:22] [discord] wonder if we should disable that [22:15:56] [discord] This has been super fascinating to get caught up on. Currently still in low availability, but love to get a better handle on service/logical server structures. [22:16:28] [discord] lol I was literally just about to say that (its why I opened this back up) [22:16:40] [discord] whoops. Wrong reply lol [22:16:43] [discord] @paladox [22:17:33] [discord] heh [22:17:46] [discord] well we could just use what the wmf set the optimiser is set to [22:17:56] [discord] they use mrr which is https://mariadb.com/kb/en/multi-range-read-optimization/ [22:17:58] [url] Multi Range Read Optimization - MariaDB Knowledge Base | mariadb.com [22:18:05] [discord] was just reading which is festinating. [22:21:01] [discord] optimiser can be done on the fly fyi [22:21:03] [discord] it's dynamic [22:23:33] [discord] ok... lets experiment. I'm going to set the optimiser_switch to 'mrr=on,mrr_cost_based=on,mrr_sort_keys=on,optimize_join_buffer_size=on,rowid_filter=off' [22:23:36] [discord] @CosmicAlpha [22:24:42] [discord] > | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,ou [22:24:46] [discord] is what it currently is [22:28:03] [discord] ok, done, set. [22:37:36] [discord] did you find anything [22:43:23] [discord] @CosmicAlpha https://github.com/miraheze/mw-config/pull/4990 thoughts? [22:43:24] [url] Cache: shard parsercache table by paladox · Pull Request #4990 · miraheze/mw-config · GitHub | github.com [22:43:52] [discord] Not certain what that will do. [22:44:39] [discord] ok we can give it a go i guess [22:59:41] [discord] @paladox https://github.com/miraheze/puppet/pull/3035 also [22:59:42] [url] mediawiki: add purge_parsercache cron by Universal-Omega · Pull Request #3035 · miraheze/puppet · GitHub | github.com [22:59:53] [discord] nice [23:00:29] [discord] shouldn't that equal 11 days not 21? [23:00:39] [discord] as we have wgParserCacheExpireTime set to 10 [23:03:46] [discord] Updated to 10 days. I will run it once to make sure it works right now also @paladox [23:03:58] [discord] thanks! [23:04:09] [discord] note that i've truncated the table [23:06:24] [discord] ```ps [23:06:24] [discord] Deleting objects expiring before Sat, 26 Nov 2022 23:05:57 GMT [23:06:26] [discord] ... 100.0% done (+1 iterations in 0.3s) [23:06:27] [discord] ``` [23:07:21] [discord] @paladox okay to merge? [23:07:30] [discord] yeh [23:09:05] [discord] done