[06:57:08] 08:50:21 (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1208:13351) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [06:57:08] muted until tuesday ↑ [12:15:29] urandom: ack thanks! I was out yesterday for bank holiday :) [12:26:57] filed a change to move restbase2021 to pki [14:05:20] 👍 [14:11:30] urandom: do you prefer to do it on monday? Golden rule states to change nothing on a friday, but for 2021 it may be low impact [14:12:26] yeah, I think it would be safe enough [14:43:32] urandom: 2021 is on pki! If you want to sanity check [14:43:42] elukey: thanks, I'll have a look [14:50:16] seems ok! [15:35:35] super :) [15:36:20] urandom: if clients are not going to be impacted (IIUC nothing uses TLS and checks the cert's validity atm for Restbase) we can set up the move next week [15:36:25] for the whole cluster I mean [15:36:42] we can do codfw and then eqiad as we did for aqs [15:40:13] elukey: works for me [16:02:12] urandom: filed all the patches, including moving sessionstore to the new truststore, but I have no idea if there are special things to do for that cluster [16:02:29] since it is very delicate and IIRC kask didn't like cassandra instances restarted a while ago [18:10:45] Ha! Yeah, oddly enough that wasn't restarts, it was reboots of all things. And Kask wasn't at fault, it was the cluster in a sort of split-brain. Either way, those days should be behind us. [18:12:32] I think sessionstore should be ready to go too. Since the consequences of any unforeseen problems are high, we could depool codfw, upgrade there, and repool after everything checks out. [18:13:02] I think the risk at this point is basically zero, but it's easy enough to take the precaution [19:40:10] PROBLEM - MariaDB sustained replica lag on s3 on db1154 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13313 [19:41:10] RECOVERY - MariaDB sustained replica lag on s3 on db1154 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13313