[01:06:42] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 16.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:07:38] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 10 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:11:00] PROBLEM - MariaDB sustained replica lag on m1 on db2132 is CRITICAL: 16.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:12:02] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:12:08] RECOVERY - MariaDB sustained replica lag on m1 on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:14:14] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:30:01] I just switched over s6 [05:30:07] So the master is now running 10.6 [05:33:09] Amir1: I just noticed the commit comment here is wrong: https://gerrit.wikimedia.org/r/c/operations/dns/+/953488/ (it says s2, but it should be s6) the change itself is okay, but you might want to double check what is going on. [06:49:47] "There is no switchback: codfw will stay primary for the next ~6 months" interesting [07:01:23] I'll check. Thanks [07:05:02] I'm giving a look at m1, sometimes backups can overload the server, but it shouldn't be them this time, as they start at 2am [07:09:27] I think it was either network or etherpad [07:28:39] marostegui: Fixed https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/commit/d12471ffd045646a93100c5df90b4a838730985a [07:28:48] thanks [07:33:42] btw my schema change is only not done in s6. Can I run it there? https://phabricator.wikimedia.org/T343718 [07:33:44] or when [07:34:11] yeah [07:34:13] it can go now [07:34:39] awesome. thanks. [09:02:18] hi! there is an alert [09:02:20] clouddb1017/MariaDB memory is CRITICAL [09:02:26] CRIT Memory 98% used. Largest process: mysqld (1667042) = 60.2% [09:02:37] is this known / tracked / etc ? [09:03:23] let me open a phab ticket [09:06:00] https://phabricator.wikimedia.org/T345322 [13:28:30] PROBLEM - MariaDB sustained replica lag on s1 on db1132 is CRITICAL: 6051 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1132&var-port=9104 [13:45:44] RECOVERY - MariaDB sustained replica lag on s1 on db1132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1132&var-port=9104 [15:40:32] I gotta love MariaDB, the exact same schema change on two different hosts of the same sections take 2 minutes and 30 minutes respectively. [15:40:56] cache? [15:41:09] My guess is that in one it just changes the DDL, the other one re-builds the db [15:41:16] *the table [15:57:59] that'd be a bug [15:59:15] and it is not the first time that happens, I sent this years ago, and it was between two minors: https://jira.mariadb.org/browse/MDEV-13175