[05:44:14] Rebooting the proxies I failed over yesterday [08:21:02] es backups finished [09:39:13] hi [09:39:32] I see some alerts for clouddb1017 and clouddb1013 about replication lag [09:39:36] is that known / expected? [09:40:35] arturo: yup T321562 [09:40:35] T321562: db1154 is not coming back after restart - https://phabricator.wikimedia.org/T321562 [09:40:55] It's connected to the status of -cloud channel [09:41:14] right, thanks [10:49:34] BTW you got an "ERROR: No query specified" because you used 2 end query delimiters, so you sent an empty query [14:45:48] * Amir1 imagines T321562 happen on a section master, shivers and cries [15:09:57] That's why we have https://phabricator.wikimedia.org/T196366 [15:10:11] But orchestrator should fix that for us, I hopefully can get to test it this Q [15:11:46] yeah, the base was taken into account long ago, it is the automation that is not -yet- there [15:13:36] also: https://phabricator.wikimedia.org/T196367 [15:14:16] yeah, orchestrator won't help with that one :( [15:14:57] well, what I mean it is possible with the current design, "just" waiting for someone to implement it :-P [15:15:19] for a bit value of just :-D [15:15:21] *big [15:17:35] (if it was simple and fast it would have been done years ago) [22:47:38] PROBLEM - MariaDB sustained replica lag on s1 on db1154 is CRITICAL: 1.19e+05 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13311 [22:48:44] PROBLEM - MariaDB sustained replica lag on s8 on db1154 is CRITICAL: 1.185e+05 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13318 [22:48:52] PROBLEM - MariaDB sustained replica lag on s3 on db1154 is CRITICAL: 1.186e+05 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13313 [22:48:52] PROBLEM - MariaDB sustained replica lag on s5 on db1154 is CRITICAL: 1.169e+05 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315 [22:49:32] (MysqlReplicationLag) firing: (4) MySQL instance db1154:13311 has too large replication lag (1d 8h 56m 39s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [23:29:32] (MysqlReplicationLag) firing: (4) MySQL instance db1154:13311 has too large replication lag (1d 2h 44m 11s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [23:31:50] RECOVERY - MariaDB sustained replica lag on s5 on db1154 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315