[01:07:28] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 6.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:10:20] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:33:29] Amir1: Any idea why this has stopped being readable? https://noc.wikimedia.org/dbconfig/eqiad.json [07:08:34] It works for me [07:09:08] In a readable format? [07:09:26] well, firefox makes it readable for me [07:09:51] indeed, it is chrome :( [07:09:58] but I guess depending on what you mean by readable :-D [07:10:09] just tried with firefox, and it is readable :) [07:10:15] I wonder what chrome has changed :-/ [07:14:03] maybe we changed the content to be minified now? [07:14:22] there is a json formatter extension for chrome AFAICT [07:14:44] I was just looking for that in the store XD [07:15:25] fixed with that [07:45:11] yesterday i had the same feeling on that file. can't sleep 🥱 [07:46:23] get that extension installed and you'll fall asleep in no time [07:53:05] alias dbconfig_eqiad='curl -sSL https://noc.wikimedia.org/dbconfig/eqiad.json | jq "." | bat -l json' I've put this in my shell marostegui [07:53:09] if it helps [07:53:50] Oh that's actually cool [07:53:53] Thanks! [07:57:40] 😄 apropos that file, for schema changes, is it about accurate to say that things break down into the following workflows? [07:57:52] 1. if a schema-change-in-production tagged change is *not* adding a new table, most likely it'll be an extension-side change from abstractSchemaChanges folder whose statements will get copied into a change in the schema-changes repo, then deployed via auto_schema (which usually will use that file or other dc file) [07:57:59] 2. if stuff is *only* adding a new table, most likely anyone with deployer rights will do https://wikitech.wikimedia.org/wiki/Creating_new_tables [08:00:56] 3. if stuff is *not* adding a new table and it's core / extension-side stuff and it's been agreed upon in Phabricator / IRC / whatever, it might sometimes be the case that auto_schema won't be used and someone will just use sql.php ... or is this never to be the case? [08:01:50] you are probably answering to someone else, but I wonder if you are confusing eqiad.json with tables.json? [08:02:58] 4. if stuff isn't core / extension-side change it'd probably happen with auto_schema, or perhaps be applied directly (maybe with something left in puppet land repos) [08:04:02] in any case you probably will get most of the answers at https://wikitech.wikimedia.org/wiki/Schema_changes [08:05:06] jynus: i was referring to https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/+/refs/heads/master/dbtools/auto_schema/auto_schema/config.py#49 [08:06:31] i've been reading through that and going back through old tickets and patches and stuff. and i just wanted to make sure i had it right. i was wondering if i was interpreting the documentation and then the real-world practices about right? or if there may be other nuances (i mean, schema changes are hard and have plot twists) [08:10:16] I will let the DBAs answer by themselves, but my guess is they will chose the best/prefered method on a case by case basis, based on factors like risk, performance and easiness/availability [08:11:17] yeah, i was sort of taking that impression, pareto principle and all that. [08:12:14] auto_schema is not as mucha a schema deployment sistem as an automation for maintenance in general, with can be used for several tasks (including schema changes) [08:12:16] dr0ptp4kt: Whatever is creating a new table, we usually like to review it, but we don't execute it. Whatever change that touches a table (adding, removing a column or an index, changing a data type etc) we deploy them via auto-schema unless it is something that touches metadata (adding an enum for instance), which we could potentially deploy directlyo on the masters [08:14:44] ah okay, thank you jynus and marostegui - i think i do understand, then! okay, then i wanted to verify this, too: [08:17:49] as far as actual replication of alter statements to wiki replicas, what i infer is that it would be atypical for those to be applied directly to a sanitarium (db1154, db1155, db2186, db2187). rather, i think the sanitariums (sanitaria?) get their alters propagated via a source that'll (not coincidentally) be in https://noc.wikimedia.org/dbconfig/{dc}.json . and the clouddb#### then get the alters by way of a sanitarium [08:18:05] is that right? [08:18:41] dr0ptp4kt: We deploy them on the sanitarium master (that is the host above db1154, db1155, db2186, db2817) and it gets replicated to sanitarium and then to wiki replicas via replication channel [08:27:00] (happy to hop on video if it'll be easier, and could replay back here; i don't know if my typed words are making sense - not that spoken ones will be that much better at this hour!) [08:28:31] So, we deploy it on db1196 [08:28:40] And it automatically goes to db1154 and then to wikireplicas [08:28:43] All through replication [08:30:46] aha! okay, sweet. tysm marostegui ! now i'll go listen to some related metallica... [08:31:01] hahaha, sleep well [08:35:26] dr0ptp4kt: for when you wake up, this might help https://wikitech.wikimedia.org/wiki/Auto_schema [08:36:51] thanks Amir1 ! [08:38:24] the reason auto_schema needs eqiad.json file, is to find the master, then from master it takes all replicas [08:43:05] looking at replica_set.py and config.py a little, got it i think - that's neat. [08:50:44] arnaudb: I'm going to run the schema change for pagelinks in s3 master as I need to make a config change afterwards [08:50:57] noted! [08:52:17] Amir1: remember last day of maintenance today :) [08:52:33] that's why I'm rushing to do everything :((( [12:42:08] Amir1: I have a first draft- I may need suggestions on how to make the text clear: https://phabricator.wikimedia.org/T346233#9163481 [13:25:57] thanks! [17:57:11] PROBLEM - MariaDB sustained replica lag on m1 on db2132 is CRITICAL: 180 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [17:57:37] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 257 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [18:16:42] m1 lag? [18:19:33] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [18:20:27] RECOVERY - MariaDB sustained replica lag on m1 on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [18:21:47] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 296 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [18:27:27] PROBLEM - MariaDB sustained replica lag on m1 on db2132 is CRITICAL: 244 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [18:27:57] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 273 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [18:36:22] we're doing a librenms schema migration, maybe it is that [18:36:41] ah, I just saw it [18:36:59] let me downtime so it doesn't alert many times until tomorrow [18:37:06] thank you [18:37:30] please log something like ongoing schema changes on m1 for librenms [18:37:46] will do so now [18:38:48] {{done}} [18:39:40] I've downtime it, but it will log a recovery one more time [19:07:31] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [19:11:21] RECOVERY - MariaDB sustained replica lag on m1 on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [19:41:40] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321