[01:08:50] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 20.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:09:16] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 14.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:11:58] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:14:00] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:52:24] dbproxy1020 needs a reload [05:52:29] I guess due to the maintenance yesterday? [05:52:50] Anyways, I have done it [05:53:29] so does dbproxy1021 [05:53:42] dhinus: dbproxy1018 needs it too [05:53:48] which is the wikireplicas one [07:03:54] I am failing over m3-master and m5-master [07:18:28] thanks marostegui, I should've checked yesterday after the switch upgrade! how did you spot it? the maintenance affected dbproxy1018 but not the other two IIRC [07:18:50] dhinus: it was on icinga along with the other dbproxy hosts (which are owned by us) [07:19:19] dhinus: the other two were not affected, the hosts IN there were affected, but those are owned by DBAs :) [07:20:52] gotcha [07:38:10] marostegui: I mentioned the proxies, but I couldn't find a maintainer for those, so I left them untouched, as it is not something we handle [07:38:34] jynus: no worries, two of them were ours, the other one was dhinus' :) [07:38:46] just reloaded dbproxy1018 [07:39:07] maybe we should route alerts for 1018/1019 to team=wmcs in alertmanager? not sure if that's easy [07:39:29] pinging me or #-cloud-admin also works :) [07:40:35] dhinus: I believe those should've arrived to #wikimedia-operations [07:48:17] yes I see them in #-operations, but that channel is quite noisy especially during a maintenance [07:48:57] dhinus: As I said, they were showing up in icinga as well, so whatever works best for you [07:49:54] yep, Icinga works, but the way I look at Icinga is usually through alerts.wikimedia.org, and it was in there but with team=sre so I didn't spot it :) [07:50:16] yeah, so far I am still using icinga.wikimedia.org and only check criticals [08:00:54] we should report those problems, multitenancy is something we want to make better [08:01:16] there are many services that indeed are reported to the wrong team [08:29:17] jynus: db1102 needs to be decommissioned, so it needs replacement with db1225 (https://phabricator.wikimedia.org/T326669) [08:29:37] no rush, just adding it there for your TO-DO [08:29:40] ok [08:29:44] please add me to ticket [08:29:48] I can create an specific task for you as I did for db1108, just let me know if that's easier [08:29:59] both are ok [08:30:04] ok I will create one! [08:30:30] I still have pending to fix db1150, it's been a busy week :-( [08:31:29] no worries, create a task and assigned it to you [08:31:33] *created [08:56:09] thank you so much for coordinating that [08:56:36] I am going to try to finish the minio upgrade to at least advance on something [08:57:12] please ignore grafana issues on db1150:13315 db1145:13313, I hopefully will fix that today [09:26:56] new console access to minio: https://wikitech.wikimedia.org/wiki/Media_storage/Backups#How_to_access_the_web_UI_of_minio It is 16% fancier, 50% less supported file storage features!!! [09:30:44] Amir1: I need to switchover s2 master in eqiad [09:30:50] Can I proceed? [09:31:09] I also need to change s1 master eqiad too [09:40:25] marostegui: hi. Sure [09:40:38] cool thanks [10:05:26] Emperor: thanks for subscribing, I was about to send you that ticket to ask you if you wanted me to do it [11:38:43] db1150 should be now back as a (passive) backup source for s3 and s4