[01:09:50] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 8.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[01:11:24] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[14:36:57] <jynus>	 I see an UNKNOWN on db2151, coult it need some zarcillo updates?
[14:51:25] <marostegui>	 mmmmm maybe
[14:51:29] <marostegui>	 I'll take a look
[14:51:34] <marostegui>	 thanks 
[14:54:36] <jynus>	 no rush, not meaning to tell you to do stuff, just tring to understand ongoing status- as there was recently a flood of alerts due to eqsin, and I was reviewing icinga
[15:00:17] <marostegui>	 Should be fixed
[15:00:18] <marostegui>	 now
[15:00:48] <marostegui>	 I have reforced the check
[15:03:01] <marostegui>	 mmm it is not recovering, but the issue was two entries on instance_section table on zarcillo
[15:04:23] <jynus>	 do you want me to have a second look? maybe it is not zarcillo, but somewhere else
[15:04:39] <jynus>	 oh, sorry, I just read the last sentence
[15:04:57] <jynus>	 then it may need a faster refresh on prometheus hosts, I can do that
[15:05:46] <jynus>	 I did "/usr/local/sbin/mysqld_exporter_config.py codfw '/srv/prometheus/ops/targets'" on prometheus hosts
[15:05:52] <jynus>	 let's see if that helped
[15:07:33] <jynus>	 (I think it normally only runs every 20 minutes or so)
[15:11:27] <jynus>	 I think that did it
[15:14:21] <marostegui>	 Yeah, I also completed deleted it and re-added it
[15:15:48] <jynus>	 he he
[15:16:06] <marostegui>	 Anyways, looks good, thanks!
[15:18:33] <marostegui>	 I am going to stop sanitarium master for s1, so there will be lag on s1 on wikireplicas
[15:18:36] <marostegui>	 I will !log it now
[16:30:48] <Amir1>	 marostegui: size of the table in every wiki https://phabricator.wikimedia.org/P42984
[16:31:43] <marostegui>	 Amir1: thanks, yeah I checked it too :)
[16:31:49] <marostegui>	 I am running it with replication enabled
[16:31:55] <Amir1>	 cool
[16:32:10] <marostegui>	 I will get the patch merged anyways
[16:32:13] <marostegui>	 As I already sent it
[16:32:19] <marostegui>	 It doesn't hurt to have another example there
[16:32:23] <Amir1>	 in s1 we probably have to run it without replication I guess, 1M rows
[16:32:32] <marostegui>	 yeah, enwiki will need host by host
[16:37:15] <zabe>	 btw. it seems that https://phabricator.wikimedia.org/T321126 hasn't reached labtestwiki, could someone do that? Thanks. :)
[16:42:52] <jynus>	 zabe: AFAIK labtestwiki is not a WMF production wiki
[16:43:35] <jynus>	 I think it lives in cloud or outside our network, andrew used to handle that, but not sure if he is still doing it
[16:45:49] <zabe>	 ok
[16:45:55] <jynus>	 (someone in the cloud IRC channel may know more, sorry)
[16:48:34] <marostegui>	 zabe: I will get it done
[16:48:51] <marostegui>	 zabe: done
[16:53:32] <jinxer-wm>	 (MysqlReplicationLag) firing: MySQL instance db1106:9104 has too large replication lag (1h 26m 11s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1106&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[16:53:39] <marostegui>	 ^ me
[16:58:32] <jinxer-wm>	 (MysqlReplicationLag) firing: (3) MySQL instance db1106:9104 has too large replication lag (1h 0m 4s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica  - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[17:08:32] <jinxer-wm>	 (MysqlReplicationLag) firing: (3) MySQL instance db1106:9104 has too large replication lag (8m 44s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica  - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[17:18:32] <jinxer-wm>	 (MysqlReplicationLag) resolved: (3) MySQL instance db1106:9104 has too large replication lag (8m 44s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica  - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[18:27:36] <zabe>	 thanks