[05:11:45] PROBLEM - MariaDB sustained replica lag on m1 on db2078 is CRITICAL: 105 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [05:17:05] RECOVERY - MariaDB sustained replica lag on m1 on db2078 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [07:39:36] just disabled prometheus.service on db1144 (as it is multi-instance) - it was alerting [07:39:41] i guess it was reimaged yesterday? [11:45:01] marostegui: yes, that was my fault [11:45:14] np, it was quick [11:45:22] I forgot to check it because we decided to stop [11:45:50] Thanks [11:46:12] oh, swift-ring-builder why do you make me so sad? [11:48:32] FYI I am going to file a feature request for mariabackup to implement upstream a percona xtrabackup feature for T253959 [11:48:33] T253959: Check we are preparing (xtrabackup --prepare) with the same package version as the server version of which the backup was taken - https://phabricator.wikimedia.org/T253959 [11:48:57] not as much implement, as backport [11:49:30] jynus: once you've got it, if you can send the link I will vote for it [12:06:11] there was already an open isssue, so I tried to provide more context: https://jira.mariadb.org/browse/MDEV-23718 [12:06:38] I belive it is an "easy" backport from the functionality already being at percona Xtrabackup [12:09:49] the linked ticket on the one I put, marostegui, may be interesting for you- lots of people complaining on unclear path for upgrade beyond 10.4. Seems scary [12:24:32] which one? [12:25:15] https://jira.mariadb.org/browse/MDEV-23394 [12:25:49] people are having problems recovering into 10.5 tables from 10.4 [12:26:13] I think we wouldn't be in that case, as we run prepare with the same version, but still an interesting read [12:26:16] jynus: I am expecting to get a 10.6 in s1 later this week or next week, so we can start testing that [12:26:39] I just saw it linked on the ticket I commented and wanted to flag it [12:26:45] I will first do a normal clone (transfering the files + mysql_upgrade) we'll see! [14:38:21] marostegui, jynus, Amir1, Emperor: just a heads-up in case you didn't see the email, cumin1001 gets reimaged tomorrow. make sure you back up anything you care about beforehand, and use cumin2002 in the meantime. [14:44:41] yep, saw it earlier. I am wrapping up long running tasks there (mentioned in the mail) [15:52:11] My ~ there seems to have a clone of operations/software and operations/software/tendril [15:56:24] FYI /home will be copied all by mo.ritz to /root/old-home (according to the email) [15:57:50] Mmm, but now I know I can just clone fresh :) [15:58:13] tendril isn't dead? [15:58:30] as for ops/software, if it's needed should be puppetized IMHO [15:58:53] people get upset when i suggest checking it out via puppet 🤷‍♀️ [15:59:16] swift frontends get it deployed with puppet :)