[07:49:44] Going to restart sanitarium hosts for T297696 [07:49:44] T297696: Create ipinfo_ip_changes table [M] - https://phabricator.wikimedia.org/T297696 [08:30:36] pc1014 promoted to pc2 master [08:54:37] marostegui: before bringing back pc1012, I got this error in email "SMART error (OfflineUncorrectableSector) detected on host: pc1012" [08:54:46] that's expected during reimages yep [08:54:51] ok cool [08:55:11] I always double check the raid status anyways [08:55:22] As I need to reconfigure replication so it get the keys it "lost" during the reimage [09:00:02] https://grafana.wikimedia.org/d/000000106/parser-cache?viewPanel=1&orgId=1&from=1642465126264&to=1642496355180 small drop on hits, but not as big as if I had put pc1014 without replicating for a few days [09:00:13] I will do the same for pc3 so the impact is limited [10:00:12] pc1012 is back as pc2 master [10:28:51] jynus: I am going to reimage db1117 (logical backups finished there already) [10:29:03] ok [10:42:23] Amir1: i left you a few comments on the preso [10:42:48] Thanks! [11:58:41] PROBLEM - MariaDB sustained replica lag on m1 on db2132 is CRITICAL: 15.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [11:59:59] RECOVERY - MariaDB sustained replica lag on m1 on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [14:38:18] marostegui: there's a problem with the wmfmariadbpy 0.8 deployment. investigating currently. [14:38:31] kormat: interview :) [14:38:40] no ty