[05:30:50] going to switch s5 primary [13:27:51] fyi this patch has been merged: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1048006 [13:28:29] its supposed to be transparent, but stupider last words have been written before so I prefer to raise awareness :D [13:30:18] https://grafana.wikimedia.org/goto/I46f7JlSR?orgId=1 [13:49:50] arnaudb: FYI the spicerack patch is ready for review w/ tests completed [13:55:48] FIRING: [32x] MysqlReplicationLagPtHeartbeat: MySQL instance es2020:9104 has too large replication lag (55d 8h 11m 7s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [13:57:46] 🎉 thanks volans ! [13:58:02] I'm handling those false positives and will go for it [13:58:33] thanks! [14:00:41] marostegui: silence id 86ac4d8e-a940-40ec-beb9-a1a4760b887c [14:00:48] FIRING: [63x] MysqlReplicationLagPtHeartbeat: MySQL instance es1023:9104 has too large replication lag (55d 7h 41m 23s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [14:00:52] urandom: hey just checking is all ok for the switch upgrade in rack E1 today? [14:01:08] I know Matthew had some comments on task relating to ms-fe1012 [14:04:12] (and silenceid: 2cb2adf8-0dad-4988-9c51-5c930c36504f as the first one was not enough) [14:04:42] topranks: gtg on my end (including backup1010 as we're missing j.ynus) [14:04:59] arnaudb: great, thanks! [14:12:36] nb PrometheusMysqldExporterFailed is a side effect of the false positives ↑ godog I'll push an exclusion for es1..5 soon [14:13:25] the exporter does not fail silently while the timestamp is inssing [14:26:00] arnaudb: interesting, ok thank you ! [14:27:59] marostegui: db1233 is in vslow/dump with weight of 1, it doesn't make a lot of sense to me. Is it intentional? https://noc.wikimedia.org/dbconfig/eqiad.json should we change that [14:28:37] (it is in general group with weight of 500) [14:29:06] What doesn't make sense to have 1 or to have 500? [14:30:42] Ah I see what you mean [14:30:48] We shouldn't have two hosts there [14:30:54] It can be removed [14:30:54] the 500 part is okay. just weight of 1 for vslow is weird. number of vslow/dump queries should be quite low (while being slow) so the chance of it actually taking anything is low and on top it has a high weight in general group so it might cause issues [14:31:14] awesome. I remove it :D [14:31:36] Just leave db1246 [14:32:02] yeah. I thoguht it might be a test host for 10.11 or something but it's not [14:49:11] topranks: sorry, I was in a meeting...which task? [14:51:01] T365993 I guess... [14:51:02] T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad - https://phabricator.wikimedia.org/T365993 [14:53:38] topranks: I'll depool ms-fe1012 [14:53:54] urandom: super, thanks [14:56:00] done. [15:01:44] awesome [15:49:32] switch upgrade is complete and things look ok if people want to re-pool things [15:53:28] topranks: thanks; done!