[11:35:56] sigh. It's lunchtime, and so far today I have refactored all the code I wrote yesterday [14:49:35] qq, best practices for defining master vs replica read_only setting? [14:49:47] different roles and different hiera declaring read_only for the instance? [14:49:54] e.g., ...:master and ... :replica [14:50:03] almost the same except different read_only setting in role hiera? [14:50:24] or...some host specific hiera declaring master or replica? [14:50:40] we use the same roles, we specify the mariadb "role" in hiera [14:50:43] (master/slave/standalone) [14:50:59] in host specific hiera? [14:51:12] yes [14:51:18] k will do the same then, ty [14:52:14] hm kormat, do you have an example (perhaps with multi instance?) do i need to explicitly hiera merge things? or do I need to provide the whole profile hiera config block in the host hiera to override? [14:53:16] look at any of our db primaries in heira [14:53:41] multiinsance never sets a role as they are defined to be replicas in production [14:55:29] aye hm, i have multinstance masters [14:55:42] i think the only setting to vary between the two is read_only [14:55:50] since replication setup is manual [14:55:56] i see you vary expire_log_days [14:56:06] annnnd, some semi_sync stuff (what is that? :) ) [14:56:20] replication monitoring needs to know [14:57:09] oo monitor_replication [14:57:11] semisync means that the primary don't acknowledge a write until at least one replica has replicated it [14:57:28] oh wow [14:57:30] that's cool [14:58:07] that's a plugin? so the master waits for the replica to ACK somehow? is that through binlog or some other channel? [14:58:45] yes re plugin (a pair, actually, one for primary, one for replicas) [14:59:08] it's via an rpc mechanism, of some sort [14:59:23] i don't know the specifics [14:59:32] correct re ACK [14:59:48] pretty cool, i guess it somehow just waits to make sure the replica executes past position X in the binlog [15:02:37] kormat: it looks like monitor replication can figure out how the instance shoudl be monitored by looking at the show slave status etc. output [15:02:51] e.g. it does [15:02:54] unless ($status) { [15:02:54] printf("%s %s not a slave", $OK, $check); [15:02:54] exit($EOK); [15:02:54] } [15:07:43] monitoring needs to know the _expected_ state in order to alert you when it's not in that state. that's especially important if you're doing replication changes by hand, imo [15:08:36] as far as i can tell though, mariadb::monitor_replication doesn't take a param like that? [15:08:39] except for maybe... source_dc? [15:08:45] which may not be applicable here..hmmm [15:08:52] it's not the only monitoring [15:08:57] oh? [15:09:35] mariadb::monitor_readonly [15:09:36] ? [15:10:00] I'm not at a pc, but there's also replication lag monitoring, for example [15:10:28] aye k. looking for it. i think [15:10:29] mariadb::monitor_replication [15:10:29] does that [15:10:42] but check_mariadb.pl is smart enough to not try to check that on a non slave [15:11:22] but that's the one that uses $source_dc [15:11:24] nrpe_command => "${check_mariadb} --check=slave_sql_lag \ [15:11:24] --shard=${name} --datacenter=${source_dc} \ [15:16:03] ottomata: monitor_replication is only included by puppet if the host is a replication client [15:16:39] i'm using mariadb::instance [15:16:51] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/mariadb/core.pp#L53-L60 [15:16:52] looks like it is always included [15:17:17] yea was just reading profile::mariadb::section_params this looks useful [15:17:47] ottomata: mariadb::instance is only included in multiinstance profiles [15:17:53] which are all replicas by definition [15:18:08] oh. i'm writing a multi instance master profile...which may be my problem [15:18:13] yes. [15:18:34] as discussed previously, you're doing something we never do [15:18:44] so you won't be able to directly adapt things [15:18:48] that seems like somehting we should support though, right? at least for non MW things [15:18:56] which 'we'? :P [15:19:09] WMF SRE? [15:19:19] unclear. [15:19:20] does data persistence only work with MediaWiki app? [15:19:45] we work with a bunch of different things. we do not use multi-instance primaries in any of those installations. [15:19:53] it's an unnecessary complication. [15:20:16] yargh...so maybe we shouldn't be doing multi instance at all then? [15:20:37] but we do have many different isolated databases [15:20:44] would be a shame to have to buy hardware for each one [15:21:16] from what i saw, you don't have a demonstrated need for that level of isolation? [15:21:33] for load reasons? probably not (at least not yet) [15:22:09] our misc sections (m{1,2,3,5}) all host multiple unrelated databases [15:23:25] yeah i guess we thought it was the right thign to do from an aesthetic / maintenance perspective. isolated things should be isolated. colocating them will work most of the time though. [15:24:04] kormat: i am close to making it work, and it may be only a param to mariadb::instance that is needed to make it work. [15:24:17] may I keep trying? or is that a patch that you would not accept? [15:29:13] yeah go on, let's see where it ends up [15:31:42] kormat: qq, is the separation out of configs into thigns like section_ports, section_params, instance params, etc. desired or a historical artifact? [15:32:10] if i were doing this from scratch, i'd probably have params for an instance defined in one place (e.g. port, readonbly, is master, etc.) [15:33:53] am starting to wonder if i shoudln't even use the mariadb module...except maybe for ::packages / ::packages_wmf [15:36:02] kormat: if i'm running totally separate non mediawiki mariadb servers...shoudl I be using packages_wmf or just the default debian packages? [15:36:18] i do want to use the mariadb backup support we have [15:39:54] ottomata: section_ports needs to be defined centrally, as it's used by our tooling. it's the canonical mapping between a section name and it's port in a multi-instance context [15:40:27] section_params being defined in a single place greatly simplifies dc-switchover, as there's only one file to edit [15:40:27] right, we have lots of cases like that, e.g. kafka [15:40:39] so we define a common.yaml kafka_clusters var [15:41:04] defining in one place makes sense, but there are params for a single instane defined in many places [15:42:12] > i'd probably have params for an instance defined in one place (e.g. port, readonbly, is master, etc.) [15:42:17] that sounds like a maintenance nightmare [15:42:33] why? wouldn't it be easier to see what the params are for an instance? [15:42:42] rather than searching through multipel files? [15:42:47] you'd still have overrides like you have them [15:42:51] it would be the polar opposite of DRY [15:42:53] but the global config woudl be in one place [15:43:19] * kormat squints [15:43:26] it's possible i don't understand what you're proposing [15:43:31] re: mariadb packages, yes, use ours [15:44:05] no, i mean, e.g. a section port would be declared in the same place as the other params for that 'section'. param overrides, like read_only or is master, would be declared in host specifci hiera or something [15:44:39] ottomata: how do you iterate over sections in order to build /etc/wmfmariadbpy/section_ports.csv ? [15:44:49] i'm suggesting that everything that is common to a 'section' is in the same config place, likely common.yaml. overrides go in other hiera override places [15:45:53] kormat: everything is defined in common.yaml, mariadb_core_instances or seomthing [15:46:01] so you iterate over each of those by section / instance name [15:46:03] and get the port [15:47:47] trying to figure out how that's different to hieradata/common/profile/mariadb.yaml [15:48:02] you only have very few params there [15:48:14] what if you watned to set the innodb_buffer_pool size? [15:48:22] those are defined ellsewhere [15:48:26] that's host-specific [15:48:30] typically, at least [15:48:35] right, so you'd put that in host specific override [15:48:46] but in common.yaml, and not in the host hiera? [15:48:52] yes, host hiera [15:49:01] common.yaml has global / shared config [15:49:06] host specific overrides in host hiera [15:49:13] common.yaml has global / shared / default config [15:49:14] * [15:49:14] ok. so we're back to my question about how this differs from what we already have :) [15:49:27] oh, just having the configs all in one place, rather than in multiple files/variables [15:49:36] i'm having trouble finding and reasoning about them all, as a newcomer to this code [15:49:51] in your proposal: you'd have common stuff in common.yaml, and host specific-stuff in host hiera [15:50:00] what we have: common stuff in mariadb.yaml, and host specific stuff in host hiera [15:50:16] oh you'll get no argument from me about the structure being easy to follow. it absolutely isn't. [15:50:22] it needs a vast overhaul [15:50:38] but the section_ports/section_params things was one of the things i did which greatly _improved_ things [15:50:40] you don't have all common stuff in mariadb.yaml tho [15:51:09] e.g. [15:51:10] that's entirely possible, what do you have in mind? [15:51:10] profile::mariadb::dbstore_multiinstance::instances [15:51:28] or [15:51:29] mariadb::binlog_format [15:51:39] or [15:51:39] profile::mariadb::mysql_role [15:51:45] profile::mariadb::dbstore_multiinstance::instances is host-specific [15:51:53] binlog_format is at least partially host-specific [15:51:57] mysql_role is _definitely_ host-specific [15:53:04] fine, execpt, they are different variables. i'd expect to find those kind of configs in the same top level variable, all merged together on the host to do the right thing [15:53:37] e.g. i'd expect binlog_format to be the same for the same 'section' / instance [15:53:48] oh no, it's not, that's my point [15:54:10] e.g. have a look at https://orchestrator.wikimedia.org/web/cluster/alias/s3 [15:54:16] you'll see a mix of ROW vs STATEMENT [15:54:21] for various reasons [15:54:32] same for any other mw section [15:54:54] interesting, ok so sometimes override makes sense, but i'd guess you'd want a default for a particular 'section'? [15:55:09] aside: is that for backup / performance purposes? [15:55:12] for non-mw sections, i _think_ some of them use the same binlog_format for all hosts [15:55:24] (if so for those it's a section-level setting) [15:55:35] sometimes for performance, sometimes for correctness [15:55:57] ayue [15:55:58] e.g. parsercache is all statement replication for performance [15:56:12] sanitarium masters are all rows for correctness [15:56:22] (or robustness, depends on how you look at it) [15:56:40] aye [15:56:43] the primary and primary candidate are statement, again for performance [15:56:55] the other replicas are _usually_ row, i believe [15:57:34] a given puppet profile usually has a default, and it's overridden on a per-host level as needed [15:58:02] e.g. profile::mariadb::dbstore_multiinstance hard-codes ROW [15:58:08] so no overrides there [15:58:53] aye, ok. thanks for info. I'm going to look at this a litttttle bit more to see if i can adapt to a multi instance master, but likely i will give up with this refactor and keep what we have now [15:59:08] no problem [15:59:19] i'm heading off now, talk to you next week [16:02:37] ok, thanks so much! laters! [18:05:52] hey, who should i talk to about a possible major change in how video transcodes are stored might impact on swift storage & the front-end caches? [18:14:55] godog comes immediately to mind but I'm pretty sure he's off [18:15:12] Emperor might also but it'll be past end of day as he's UK [18:16:36] and re: front-end caches, I'd talk to either bblack or ema [18:17:48] cool i'll catch up with everybody next week, no need to track people on a friday ;) [18:18:10] of interest is the possibility of splitting large files into many small files [18:18:19] and if i should, instead, keep them in large files ;) [18:21:36] brionv: fyi godo.g is on vacation now. Not sure how long for [18:22:10] it'll keep, i have other stuff to finish ebfore this is near deployable :D [18:22:20] thanks! [21:27:42] brionv: [not really here, but] my inclination would be to stick with larger files unless they're really very large; but I should probably understand your use case first :) [21:28:03] * Emperor got their Openstack verification email through - only took a week(!)