[01:07:58] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 15.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:08:16] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 9.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:10:26] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:11:56] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:16:27] Amir1: db1207 was depooled by your script the 14th, can you double check why it's not repooled back? [05:43:40] I am going to switchover x2 eqiad, which is always super complex, fingers crossed [06:18:14] eqiad done, it is such a complex process to switch master-master :( [06:18:29] I am going to wait a few hours before doing codfw, to make sure it is all ok [06:18:34] or possibly wait till tomorrow [07:52:35] marostegui: sure, thanks, I'll check [07:52:41] x2 :( [07:52:49] flaggedrevs of databases [07:52:54] XDDDDD [08:08:24] sigh, the whole thing broke halfway through [08:08:46] what thing? [08:09:02] https://phabricator.wikimedia.org/P46949 [08:09:21] I should make it retry [08:09:22] ah the schema change [10:36:26] Hi folks, could I get a +1 for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909207 please? move rclone a bit earlier [10:40:39] thanks jynus :) [10:42:35] as usually the dbas have each other for sanity check reviews, we should "make pineapple" (as we say in Spanish) for reviews [10:52:05] Emperor: deploying the fix now [10:52:16] (if you want to take a look at logs/errors/etc.) [10:52:29] <3