[01:08:28] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:09:54] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [06:33:06] Amir1: Something weird happened with the schema change on s1 I think, db1132 was downtimed,but it is not depooled and lagging behind 3h [06:33:22] So it didn't page, but it has been lagging behind while in production [06:33:33] And I cannot find it on SAL [06:33:37] So I wonder what happened there [06:33:44] Can you review your schema change logs? [06:33:50] I have depooled it [06:34:30] It was downtimed yesterday at 15:13 for 3 days [06:35:00] 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 22 hosts with reason: Schema change [06:35:00] 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 22 hosts with reason: Schema change [06:35:08] Ah right [06:35:13] you deployed on the master directly? [06:35:26] I forgot eqiad is depooled :) [06:49:40] marostegui: I assume it is also why it doesn't show on https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance as there's Replag for toolforge wiki replica [06:51:43] Yeah, that lag is expected [07:06:32] Cool [07:06:46] Relayed in -cloud where it was asked [07:09:30] marostegui: sorry I just woke up. Yeah, s1 and s8 [07:12:34] I actually asked Alex to make sure, on top of that I pinged you here yesterday :P [07:16:25] I go with s4 now