[00:04:35] PROBLEM - MariaDB sustained replica lag on s1 on db1234 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104 [00:09:35] RECOVERY - MariaDB sustained replica lag on s1 on db1234 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104 [04:49:29] I am going to switch s6 eqiad primary master [05:25:28] Aborting the primary master swap due to https://phabricator.wikimedia.org/T364067#9782437 [06:14:22] Starting es4 codfw switch [09:12:06] PROBLEM - MariaDB sustained replica lag on s7 on db2218 is CRITICAL: 3.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2218&var-port=9104 [09:12:14] PROBLEM - MariaDB sustained replica lag on s7 on db2182 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2182&var-port=9104 [09:12:44] PROBLEM - MariaDB sustained replica lag on s7 on db1181 is CRITICAL: 6.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1181&var-port=9104 [09:12:56] PROBLEM - MariaDB sustained replica lag on s7 on db1227 is CRITICAL: 4.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1227&var-port=9104 [09:12:56] PROBLEM - MariaDB sustained replica lag on s7 on db1170 is CRITICAL: 6.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1170&var-port=9104 [09:13:06] RECOVERY - MariaDB sustained replica lag on s7 on db2218 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2218&var-port=9104 [09:13:44] RECOVERY - MariaDB sustained replica lag on s7 on db1181 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1181&var-port=9104 [09:13:56] RECOVERY - MariaDB sustained replica lag on s7 on db1227 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1227&var-port=9104 [09:13:56] RECOVERY - MariaDB sustained replica lag on s7 on db1170 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1170&var-port=9104 [09:14:14] RECOVERY - MariaDB sustained replica lag on s7 on db2182 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2182&var-port=9104 [09:24:42] jynus: you ok if we enable writes on es6 and es7 next week or would you rather wait until the 14th to make sure the automatic backup runs correctly? I am fine either way. [09:25:27] I tested it works, so I was pinging you to unblock and you can decide any time you want [09:25:36] basically saying: anytime now is ok [09:25:39] jynus: great, thank you! [09:26:29] Amir1: Let's enable writes on es6 and es7 monday morning, you okay with that? I will merge the MW patch and leave writes going to es4, es5, es6 and es7 for a couple of days [09:27:26] Oh, he's off today! [09:27:40] Sorry for pinging you [09:29:52] I am going to upgrade db1150 & db1171 [10:49:48] marostegui: fine with me. Can I run a schema change on s6 now? [10:50:32] you can [10:50:41] the master has not been switched though [10:55:46] yeah, I want to prepare it for next week :D [10:59:50] oh actually nvm, I thought s6 pk change is done and I can start dropping it but no I have to wait for the switchover [11:00:35] I will do it next week :) [11:23:35] \o/ [13:58:26] db1150:s3 replication broke [13:58:39] "Unknown column pl_namespace" [14:00:12] let me check [14:02:04] sorry yeah, I initiated the conversation in -operations [14:18:52] I won't consider this an outage, as it didn't affect end users [14:19:04] I am going to leave for the day, - will see you all on Monday as I am off tomorrow o/ [14:19:05] just as an avoided incident [14:19:34] but it may be interesting to see what can be done to avoid cases like this, that would be actual outages [14:19:39] have a good day, marostegui [14:19:46] thanks jynus :* [19:16:32] PROBLEM - MariaDB sustained replica lag on s8 on db1214 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1214&var-port=9104 [19:19:32] RECOVERY - MariaDB sustained replica lag on s8 on db1214 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1214&var-port=9104