[05:10:19] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 63 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [05:16:13] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [07:12:15] I am keeping an eye on db1150, if the schema change (and impact on replication) extends close to 24 hours, I will switchover the backups to a separate instance, to make sure backups generated are fresh (replication-wise) [10:17:25] for all hundreds (maybe millions) of people using backup-mariadb, I am about to change its default behaviour: https://gerrit.wikimedia.org/r/c/operations/software/wmfbackups/+/820664 [10:18:31] "backup-mariadb" will no longer generate a full backup, you have to write "backup-mariadb all" (or "backup-mariadb s1 s2 ...") [10:19:03] this is to avoid accidental backup runs when checking its command line options, now it will just fail [10:19:42] the other thing is that it will only return exit code != 0 on config or agument error, but not if the backups themselve fail [10:20:21] this is to prevent alert spam on the systemd unit- logs or monitoring of its metadata has to be used to check backups completed correctly (this was already true for icinga checks) [10:21:09] this will also make rerunning failed backups easier, one will just have to do backup-mariadb
, will document when deployed on the troubleshooting part of the docs [10:21:31] thank you for attending my TED talk [10:46:17] apparently db1150 finished the schema? Should then I revert the change so it can be done on db1145? or should I leave it like that for the weekend? [10:46:32] *schema change [13:32:20] hi Amir1, I noticed that itwikisource and testwiki have the new schema for templatelinks. Can we recreate those views now or is the change in progress and more complicated than that? [13:32:50] milimetric: change is not fully deployed yet [13:33:11] it is WIP [13:33:23] milimetric: sure thing jynus to my knwoledge it's fully done in those wikis [13:33:49] milimetric: just do it on those two wikis, maintain-views has support for it [13:33:57] (no need to depool either) [13:34:46] well, I can't run maintain views (no SRE permissions) and Ben is out today. If you want to run it, cool, otherwise we can just wait until Monday [13:35:55] I do it soon [13:39:56] milimetric: thinking about it again, I accept bribe [13:40:12] hahaha [13:40:13] you know how high is inflation? [13:40:29] ok, don't worry Amir1, we'll do it Monday, nothing is really blocked on it [13:40:32] (nothing big) [13:41:37] jk, I do it now. don't worry [14:33:06] (works great, sorry was in a meeting, thx again!) [15:33:04] jynus: Thank you so much <3 [15:38:41] I have an extra gift for you, but for monday [15:41:47] https://gerrit.wikimedia.org/r/c/operations/puppet/+/820773 we will comment it on monday [15:49:48] so many [15:49:49] Thanks [15:55:30] only 3 db really, the other are a cleaner way to disable test hosts [16:05:45] have a nice weekend, going to away for lunch