[02:37:27] Hi All, I am not feeling well down with fever. I will taking sick leave for Tuesday to rest and recover. [10:05:56] get better soon :( [10:37:36] Amir1: afaict the new es RO host looks happy, can we move forward with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1185879/2 ? [10:45:38] go for it [10:47:10] once that's in place, please add it to dbctl and pool it in before moving to es2050 [12:26:53] federico3: I'm investigating why many sections have way more replicas in api group than their codfw counterpart. You did provision db1253 (T385141) which is currently in api group of eqiad in s7. I can't find any place that says it should be in api group. Do you remember why it's there? [12:26:53] T385141: Productionize db125[0-4] - https://phabricator.wikimedia.org/T385141 [12:28:24] I can't even figure out what replica it's supposed to refresh so I could check history of that host [12:31:43] Amir1: similarly to https://phabricator.wikimedia.org/T385141#10938494 https://phabricator.wikimedia.org/T385141#10946600 , I was told to mimic the weights set on some other hosts in the same section [12:53:54] Amir1: are there some dbs I can start moving between sections in codfw? [12:54:49] Amir1: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188769 when you have a sec [12:57:55] sorry I was in a meeting [12:58:17] it should mimic the weight of the host it's replacing [13:03:20] Amir1: yes I'll copy from `sudo dbctl instance es2026 get` [13:57:47] federico3: also you forgot to add back the 0 weight of the old s7 master. I added it last night: https://phabricator.wikimedia.org/P83351 [14:08:19] uhm, I saved the initial values of db2220 and db2218 before the flip: *if* we want to use the values from db2218 the main weight was 300 and it also had api with 100 [14:09:22] nope, we have way too many replicas in api group. I'm standardizing the weights [14:10:00] if a host is only pooled in the general group -> weight 500. If it's pooled in another group too. Weight in the group -> 100, weight in the general group -> 300 [14:10:18] the rest mediawiki can adjust with load monitor [14:10:47] (the actual weight changes automatically in mw based on number of connections) [14:12:47] ok in this new setup can a host have weight > 0 in more than a non-main group ad the same time? [14:13:21] it always has. We don't really pool hosts in api or vslow group only [14:14:06] I'm asking about the new setup [14:14:38] I like to start using Amazon's shuffle sharding and have both hosts that are shared and non-shared but that's way future-me problem. [14:14:40] "If it's pooled in another group too" -> if it's pooled in more than another group e.g. vslow and dump are they both at 100? [14:14:55] yeah, they always should be 100 in other groups [14:15:01] and 300 in the main group [14:18:30] hah you reminded me of Colm from https://aws.amazon.com/blogs/architecture/shuffle-sharding-massive-and-magical-fault-isolation/ which was in an "adjacent" team in amazon :D [14:19:25] yup, I really like that idea [14:19:36] They made it for Route 53 IIRC [14:20:15] (I was in the DNS and LB team but R53 is a different team) [14:21:18] ok let me summarize the weight "classes" and then I can put a table in https://wikitech.wikimedia.org/wiki/MariaDB#Core [14:21:40] Thanks! [14:22:20] After MW team finishes building an auth system for mw requests, I think we can actually start properly using shuffle sharding and get rid of api group altogether [14:24:03] https://phabricator.wikimedia.org/P83376 something like this Amir1 ? [14:25:00] sometimes we have hosts with a main weight of 1 [14:25:43] Yup, That's my goal and I'm fixing it [14:25:57] we should not have any host with weight of 1, it breaks the circuit breaker [14:26:16] unless temporarily for testing new version of mariadb or os or something like that [14:28:04] if the paste is ok I can save it in the wiki [14:33:59] yup, lgtm [14:38:18] ok, it's here -> https://wikitech.wikimedia.org/wiki/MariaDB#Core_(MediaWiki_databases) [14:47:05] thanks. I need to be afk for a bit [14:47:10] TTYL [17:13:02] Amir1: can I merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188769 ? [17:43:03] Amir1: I added a quick basic consistency check on https://zarcillo.wikimedia.org/ui/weights