[00:58:52] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:11:44] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [04:52:58] 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Marostegui) a:03jcrespo Thank you for creating this task. This is a backup source host, so I'd rather not start mysql until @jcrespo has taken a look (better to have... [05:38:55] 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Marostegui) p:05Triage→03High Setting to high as we don't have any other backup source for s7 and s8 in codfw. [08:28:38] marostegui: thanks for the quick triage [09:04:20] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [09:06:08] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [10:07:15] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) Given this was a hw crash- which probably won't fail again, I will recover from the latest successful backup (2021-05-28, as the one tod... [14:31:37] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Marostegui) See email - s8 reported some tables that need to be dropped [15:53:19] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) a:05jcrespo→03Papaul Data has been recovered, I am generating a new backup now. This should be still under warranty- most likely cause (se... [20:39:32] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [20:41:18] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [21:22:53] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10RhinosF1) Backup freshness alert for s8 went off a few moments ago. Not sure if ack/downtime worth it.