[03:00:44] PROBLEM - MariaDB sustained replica lag on m2 on db2160 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13322 [03:01:44] RECOVERY - MariaDB sustained replica lag on m2 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13322 [09:01:57] heads up for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941368 [09:16:32] jynus: you wanting that reviewed? [09:40:19] nah, it is ok, it is a single digit change, it is the deployment that is not super easy [09:44:41] it is more of a heads up of an ongoing backup issue [09:59:16] 👍 [10:26:01] I think we should not be unhappy that the issue happened, and actually be happy that the monitoring we put in place caught it almost instantly [18:18:43] marostegui: based on beta cluster drop (https://phabricator.wikimedia.org/T312666#9042365), the production drop would be around 245GB for s4 and 70GB for s1. Fingers crossed it'll be that big of change [18:21:39] Made T342685 [18:21:40] T342685: Create cluster28 and cluster29 in existing es4 and es5 hosts - https://phabricator.wikimedia.org/T342685