[03:52:32] (MysqlReplicationLag) firing: MySQL instance db2139:13313 has too large replication lag (59m 50s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2139&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [03:57:32] (MysqlReplicationLag) resolved: MySQL instance db2139:13313 has too large replication lag (11m 51s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2139&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [04:23:32] (MysqlReplicationLag) firing: MySQL instance db1145:13313 has too large replication lag (1h 3m 15s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1145&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [04:28:32] (MysqlReplicationLag) resolved: MySQL instance db1145:13313 has too large replication lag (14m 21s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1145&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [06:26:59] marostegui: thanks for the s3 switchover [06:28:22] <3 [06:59:56] jynus: when would it be the best time to reboot db2151 (mariadb::shard: 'mediabackupstemp' profile::mariadb::mysql_role: 'master') [07:00:41] any time now [07:11:54] Cool, will do it now then [07:22:58] jynus: all done [07:55:30] All misc codfw masters rebooted [08:13:29] I made a first pass on the backups presentation section [08:13:55] I started DBs yesterday [08:14:02] Hopefully I will finish the first draft today too [08:16:58] let me know what you think of my header image [08:17:55] XDDDD [08:18:10] Are you sure you're going to be done in 5-6 minutes? XD [08:18:28] yes, I created a lot of slides but each one is like one of your bullet points [08:18:33] Ah cool [08:19:04] I thought it was more visual to add images [08:19:51] jynus: Don't forget to mention that you'll give a talk about backups too [08:19:57] So those interested can also go there [08:20:16] right now I speak for 3 minutes [16:45:12] PROBLEM - MariaDB sustained replica lag on s6 on db2117 is CRITICAL: 15.75 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2117&var-port=9104 [16:46:52] RECOVERY - MariaDB sustained replica lag on s6 on db2117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2117&var-port=9104 [17:12:32] PROBLEM - Check unit status of swift_ring_manager on thanos-fe1001 is CRITICAL: CRITICAL: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:43:14] RECOVERY - Check unit status of swift_ring_manager on thanos-fe1001 is OK: OK: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers