[05:20:59] both sanitarium hosts seem to be broken [05:21:37] and db2110 crashed [05:21:40] which is unrelated to the above [10:22:58] https://phabricator.wikimedia.org/T337445#8879465 [10:38:50] it's time to switch to ARM! [11:05:29] I wonder why that's not in the idrac's log [11:06:35] No idea, it is a different kind of log, but lately I've been using the http one as my default goto because of that [11:07:23] I will update https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Hardware_Troubleshooting_Runbook#Gathering_Support_Logs_for_Warranty_Replacement because it is not well explained how to setup the proxy [11:14:47] https://wikitech.wikimedia.org/w/index.php?title=SRE%2FDc-operations%2FHardware_Troubleshooting_Runbook&diff=2080003&oldid=2067899 [11:56:43] thanks! [12:04:18] I am so stupid that I cloned all the sanitarium hosts but noted the wrong positions and took the primary mater and not the sanitarium master [12:04:27] I can't believe I was such careless [12:04:39] I definitely need a break [12:05:56] :'( [17:14:43] PROBLEM - MariaDB sustained replica lag on m3 on db1217 is CRITICAL: 3.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13323 [17:20:53] RECOVERY - MariaDB sustained replica lag on m3 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13323