[01:07:39] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 5.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:08:33] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:09:07] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:09:59] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [08:53:24] at this point, I might just patch etherpad to avoid replag like this daily ugh [09:02:08] marostegui: arnaudb: I'm rebooting eqiad masters, you might see a sea of red and yellow in orch, don't worry [09:03:14] noted! [09:09:24] oki [09:45:23] there is a trnsfr_db1210.eqiad.wmnet_4400.lock in /tmp/ of db1210 making script skip that host, is that intentional? [09:48:06] check if it is actually running and the modification timestamp, could be old ? [09:49:25] drwxr-xr-x 2 root root 4096 Apr 11 06:12 trnsfr_db1210.eqiad.wmnet_4400.lock [09:51:12] if pgrep looks also clean, probably an old error [13:20:31] Amir1: slow query log enabled on db2173 for +6h queries, the log is at /srv/sqldata/db2173-slow.log [13:20:43] thanks!