[06:57:08] <arnaudb>	 08:50:21 <jinxer-wm> (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1208:13351) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed
[06:57:08] <arnaudb>	 muted until tuesday ↑
[12:15:29] <elukey>	 urandom: ack thanks! I was out yesterday for bank holiday :)
[12:26:57] <elukey>	 filed a change to move restbase2021 to pki
[14:05:20] <urandom>	 👍
[14:11:30] <elukey>	 urandom: do you prefer to do it on monday? Golden rule states to change nothing on a friday, but for 2021 it may be low impact
[14:12:26] <urandom>	 yeah, I think it would be safe enough
[14:43:32] <elukey>	 urandom: 2021 is on pki! If you want to sanity check
[14:43:42] <urandom>	 elukey: thanks, I'll have a look
[14:50:16] <urandom>	 seems ok!
[15:35:35] <elukey>	 super :)
[15:36:20] <elukey>	 urandom: if clients are not going to be impacted (IIUC nothing uses TLS and checks the cert's validity atm for Restbase) we can set up the move next week
[15:36:25] <elukey>	 for the whole cluster I mean
[15:36:42] <elukey>	 we can do codfw and then eqiad as we did for aqs
[15:40:13] <urandom>	 elukey: works for me
[16:02:12] <elukey>	 urandom: filed all the patches, including moving sessionstore to the new truststore, but I have no idea if there are special things to do for that cluster
[16:02:29] <elukey>	 since it is very delicate and IIRC kask didn't like cassandra instances restarted a while ago
[18:10:45] <urandom>	 Ha! Yeah, oddly enough that wasn't restarts, it was reboots of all things.  And Kask wasn't at fault, it was the cluster in a sort of split-brain.  Either way, those days should be behind us.
[18:12:32] <urandom>	 I think sessionstore should be ready to go too.  Since the consequences of any unforeseen problems are high, we could depool codfw, upgrade there, and repool after everything checks out.
[18:13:02] <urandom>	 I think the risk at this point is basically zero, but it's easy enough to take the precaution
[19:40:10] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s3 on db1154 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13313
[19:41:10] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s3 on db1154 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13313