[01:08:43] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 15.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:09:51] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 14 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:11:29] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:11:57] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [07:17:30] db1125 (test host) got rebooted [07:17:42] I am not creating a task since it is a test host and will be decommissioned anyways [09:09:32] (MysqlReplicationLag) firing: (2) MySQL instance db1196:9104 has too large replication lag (1h 9m 30s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:14:32] (MysqlReplicationLag) firing: (3) MySQL instance db1154:13311 has too large replication lag (59m 8s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:15:45] ^that is all me [09:19:32] (MysqlReplicationLag) firing: (3) MySQL instance db1154:13311 has too large replication lag (31m 18s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:23:40] iasaw [09:23:43] ups [09:23:47] the cat [09:24:32] (MysqlReplicationLag) resolved: (3) MySQL instance db1154:13311 has too large replication lag (20m 9s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:26:31] "iasaw" I think she may be comunicating murder attempts? [09:28:30] hahahaha [09:34:47] (MysqlReplicationLag) firing: (4) MySQL instance db1166:9104 has too large replication lag (49m 58s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:35:46] ^ me [09:39:47] (MysqlReplicationLag) resolved: (3) MySQL instance db1166:9104 has too large replication lag (21m 16s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:05:05] db2139 and db1145 with replication stopped is me testing the new host for backups [10:05:25] cool [10:07:53] I wonder if I should run the es bacula now early or wait for tomorrow [10:08:13] as it takes 17 hours [10:09:19] I think I will wait [10:09:27] yeah, maybe tomorrow is safer [10:10:16] I will wait for disabling the jobs, as in case bacula restarts, the disabling gets lost [10:13:06] I will also now start the es2020 data check [10:13:21] nice, let me know if you need help [10:40:00] I will leave compare running on the last 4 million rows of each table on cumin2002 [11:15:39] I have started a new compare on db1198, the last time, before the DIMM was changed, it made the host crash. So I have started it again to see if it crashes again [16:00:07] taking care of the bacula unscheduling now [16:10:45] should be good now [16:11:10] what time in the end for tomorrow? marostegui after the x1 thingy? [16:12:27] whenever you feel comfortable with it jynus [16:12:41] I don't mind [16:13:10] ok, then I predict around 8 UTC should be ok [16:15:45] to finish the day in a positive way, things looking good for the dbprov expansion so far: https://phabricator.wikimedia.org/T327155#8557575 [16:16:58] good!