[07:17:19] PROBLEM - MariaDB sustained replica lag on s4 on db1244 is CRITICAL: 87 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1244&var-port=9104 [07:20:19] RECOVERY - MariaDB sustained replica lag on s4 on db1244 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1244&var-port=9104 [10:51:13] Hi folks, I'm still hoping for a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054864 please? [10:56:54] Emperor: +1ed the puppet side, as for the acutal service to be stopped I trust your analisys [10:57:05] TY :) [12:00:46] PROBLEM - MariaDB sustained replica lag on x2 on db2144 is CRITICAL: 400 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2144&var-port=9104 [12:08:40] FIRING: [84x] MysqlReplicationLagPtHeartbeat: MySQL instance db2115:9104 has too large replication lag (8m 14s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [12:12:29] RESOLVED: [84x] MysqlReplicationLagPtHeartbeat: MySQL instance db2115:9104 has too large replication lag (8m 14s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [12:17:14] arnaudb: What's the issue with the long running backup email? [12:17:51] backup took longer than last time, I was waiting for the next one to run to compare, it finished properly [12:18:35] That's the one from yesterday or today? [12:18:47] There are two [12:19:30] I missed the one from yesterday, hold on [12:22:03] don't see the one you're mentionning marostegui is it from backupmon1001? [12:23:55] We had s6 and today's email [12:24:59] I don't see s6 email, could you forward it to me? [13:13:51] I've pinged g.odog on T351927 again, will silence the thanos disk space alers for 1w [13:13:51] T351927: Decide and tweak Thanos retention - https://phabricator.wikimedia.org/T351927 [13:15:05] ...though that does highlight a bunch of nodes with failed Prometheus-mysqld-exporter for a couple of days. [14:49:57] arnaudb: hey! [14:50:16] sorry I should have reached out, that last switch upgrade has been postponed until next Tues (Jul 23rd) [14:50:26] oooh [14:50:31] even simpler for me [14:50:36] thanks topranks I forgot to double check the date [14:51:31] no probs - we've another big network change today so didn't want to trip over ourselves [14:51:47] last switch upgrade Tuesday so we're almost there!