[01:08:50] <icinga-wm_>	 PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 20.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321
[01:09:16] <icinga-wm_>	 PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 14.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[01:11:58] <icinga-wm_>	 RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321
[01:14:00] <icinga-wm_>	 RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[05:52:24] <marostegui>	 dbproxy1020 needs a reload
[05:52:29] <marostegui>	 I guess due to the maintenance yesterday?
[05:52:50] <marostegui>	 Anyways, I have done it
[05:53:29] <marostegui>	 so does dbproxy1021
[05:53:42] <marostegui>	 dhinus: dbproxy1018 needs it too
[05:53:48] <marostegui>	 which is the wikireplicas one
[07:03:54] <marostegui>	 I am failing over m3-master and m5-master
[07:18:28] <dhinus>	 thanks marostegui, I should've checked yesterday after the switch upgrade! how did you spot it? the maintenance affected dbproxy1018 but not the other two IIRC
[07:18:50] <marostegui>	 dhinus: it was on icinga along with the other dbproxy hosts (which are owned by us)
[07:19:19] <marostegui>	 dhinus: the other two were not affected, the hosts IN there were affected, but those are owned by DBAs :)
[07:20:52] <dhinus>	 gotcha
[07:38:10] <jynus>	 marostegui: I mentioned the proxies, but I couldn't find a maintainer for those, so I left them untouched, as it is not something we handle
[07:38:34] <marostegui>	 jynus: no worries, two of them were ours, the other one was dhinus' :)
[07:38:46] <dhinus>	 just reloaded dbproxy1018
[07:39:07] <dhinus>	 maybe we should route alerts for 1018/1019 to team=wmcs in alertmanager? not sure if that's easy
[07:39:29] <dhinus>	 pinging me or #-cloud-admin also works :)
[07:40:35] <marostegui>	 dhinus: I believe those should've arrived to #wikimedia-operations
[07:48:17] <dhinus>	 yes I see them in #-operations, but that channel is quite noisy especially during a maintenance
[07:48:57] <marostegui>	 dhinus: As I said, they were showing up in icinga as well, so whatever works best for you
[07:49:54] <dhinus>	 yep, Icinga works, but the way I look at Icinga is usually through alerts.wikimedia.org, and it was in there but with team=sre so I didn't spot it :)
[07:50:16] <marostegui>	 yeah, so far I am still using icinga.wikimedia.org and only check criticals 
[08:00:54] <jynus>	 we should report those problems, multitenancy is something we want to make better
[08:01:16] <jynus>	 there are many services that indeed are reported to the wrong team
[08:29:17] <marostegui>	 jynus: db1102 needs to be decommissioned, so it needs replacement with db1225 (https://phabricator.wikimedia.org/T326669)
[08:29:37] <marostegui>	 no rush, just adding it there for your TO-DO
[08:29:40] <jynus>	 ok
[08:29:44] <jynus>	 please add me to ticket
[08:29:48] <marostegui>	 I can create an specific task for you as I did for db1108, just let me know if that's easier
[08:29:59] <jynus>	 both are ok
[08:30:04] <marostegui>	 ok I will create one!
[08:30:30] <jynus>	 I still have pending to fix db1150, it's been a busy week :-(
[08:31:29] <marostegui>	 no worries, create a task and assigned it to you
[08:31:33] <marostegui>	 *created
[08:56:09] <jynus>	 thank you so much for coordinating that
[08:56:36] <jynus>	 I am going to try to finish the minio upgrade to at least advance on something
[08:57:12] <jynus>	 please ignore grafana issues on db1150:13315 db1145:13313, I hopefully will fix that today
[09:26:56] <jynus>	 new console access to minio: https://wikitech.wikimedia.org/wiki/Media_storage/Backups#How_to_access_the_web_UI_of_minio It is 16% fancier, 50% less supported file storage features!!!
[09:30:44] <marostegui>	 Amir1: I need to switchover s2 master in eqiad
[09:30:50] <marostegui>	 Can I proceed?
[09:31:09] <marostegui>	 I also need to change s1 master eqiad too
[09:40:25] <Amir1>	 marostegui: hi. Sure
[09:40:38] <marostegui>	 cool thanks
[10:05:26] <jynus>	 Emperor: thanks for subscribing, I was about to send you that ticket to ask you if you wanted me to do it
[11:38:43] <jynus>	 db1150 should be now back as a (passive) backup source for s3 and s4