[06:07:09] PROBLEM - MariaDB sustained replica lag on s4 on db1190 is CRITICAL: 124.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1190&var-port=9104 [06:07:28] ^ that is gone [06:12:09] RECOVERY - MariaDB sustained replica lag on s4 on db1190 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1190&var-port=9104 [08:32:48] FIRING: MysqlReplicationLag: MySQL instance db2243:9104@s8 has too large replication lag (13h 8m 29s). Its replication source is db2161.codfw.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2243&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [08:32:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db2243:9104 has too large replication lag (13h 8m 30s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2243&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [08:40:03] ^ this was because of yesterday's raid controller tests [09:37:48] RESOLVED: MysqlReplicationLag: MySQL instance db2243:9104@s8 has too large replication lag (5m 27s). Its replication source is db2161.codfw.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2243&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:37:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db2243:9104 has too large replication lag (5m 27s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2243&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:44:19] I'm going to do a couple of schema changes on old masters of codfw (T371742) [10:44:20] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [14:42:25] In the past three months, we removed 22TB just from one swift backend of codfw https://grafana.wikimedia.org/d/000000378/ladsgroup-test?from=now-90d&orgId=1&to=now&viewPanel=26 [14:42:41] (I assume this is the case for the rest of backends too) [15:23:46] PROBLEM - MariaDB sustained replica lag on s2 on db1182 is CRITICAL: 45 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [15:23:52] PROBLEM - MariaDB sustained replica lag on s2 on db1222 is CRITICAL: 24.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104 [15:24:20] PROBLEM - MariaDB sustained replica lag on s2 on db1156 is CRITICAL: 56 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [15:24:40] PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 76 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [15:24:41] woot? [15:24:46] PROBLEM - MariaDB sustained replica lag on s2 on db1254 is CRITICAL: 33 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1254&var-port=9104 [15:24:48] PROBLEM - MariaDB sustained replica lag on s2 on db2207 is CRITICAL: 48.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2207&var-port=9104 [15:24:52] PROBLEM - MariaDB sustained replica lag on s2 on db1233 is CRITICAL: 45.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1233&var-port=9104 [15:24:58] Oh s2 [15:25:04] It is all lagging [15:25:10] Amir1: anything you are running there? [15:25:33] I'm running but it should have sleep [15:25:34] PROBLEM - MariaDB sustained replica lag on s2 on db2238 is CRITICAL: 18.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2238&var-port=9104 [15:25:41] why the sleep is not working [15:25:42] PROBLEM - MariaDB sustained replica lag on s2 on db2148 is CRITICAL: 27 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2148&var-port=9104 [15:25:42] jeez [15:25:44] looks like it doesn't have enough XD [15:25:46] PROBLEM - MariaDB sustained replica lag on s2 on db2175 is CRITICAL: 26.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [15:25:58] s2 shall sleep no more! Amir has murdered sleep! [15:26:34] I'm sure there is a bug. We have a full system of measuring lag and sleeping between each batch in maint scripts [15:27:07] something made it ignore that [15:27:10] I stopped the script [15:28:52] RECOVERY - MariaDB sustained replica lag on s2 on db1222 is OK: (C)10 ge (W)5 ge 1.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104 [15:31:46] RECOVERY - MariaDB sustained replica lag on s2 on db1254 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1254&var-port=9104 [15:31:46] RECOVERY - MariaDB sustained replica lag on s2 on db1182 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [15:31:48] RECOVERY - MariaDB sustained replica lag on s2 on db2207 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2207&var-port=9104 [15:32:34] RECOVERY - MariaDB sustained replica lag on s2 on db2238 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2238&var-port=9104 [15:32:40] RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [15:32:42] RECOVERY - MariaDB sustained replica lag on s2 on db2148 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2148&var-port=9104 [15:32:46] RECOVERY - MariaDB sustained replica lag on s2 on db2175 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [15:32:52] RECOVERY - MariaDB sustained replica lag on s2 on db1233 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1233&var-port=9104 [15:33:20] RECOVERY - MariaDB sustained replica lag on s2 on db1156 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [17:01:39] dropping u4c2024_edits and u4c202404_edits everywhere https://phabricator.wikimedia.org/T355594#10700267