[01:09:24] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 14.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:09:54] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 7 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:10:58] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:11:30] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [07:05:33] I am going to swap s1's sanitarium master as part of https://phabricator.wikimedia.org/T326669 [08:41:08] I feel so sad I need to depool db1106, I have some good memories with that host [08:41:14] and then decommission it [08:51:06] marostegui: I can gladly do the honors [08:51:43] hahaha [08:51:58] I have a special relationship with that host, so I'd rather do it myself [08:52:03] Our last connection [08:57:18] is db1106 notable somehow, then? [09:09:30] haha yeah [09:09:39] We've spent quite some time together [09:09:51] I also miss db1064 and db1089... [09:13:19] mine is db1157 and db1115. I will personally would like to kill them myself [09:13:58] why 1157? [09:18:23] s3 master, always pain to deal with [09:23:49] haha [11:28:16] Hm, there may be more swift sadness [11:28:51] ? [11:30:40] "ms-be1069 swift-rclone-sync[1539164]: Failed to sync with 56355 errors: last error was: failed to delete 29156 files" ; suggests that a) there's a similar number of similarly-sad objects in codfw [not surprising] b) it may be hard to delete these objects from the container listing [quite concerning, but I don't know what rclone's deletion code path is] c) there are 56 more such sad objects in eqiad than there were last week [11:48:35] Amir1: https://phabricator.wikimedia.org/T321312 some of those hosts are going to be decommissioned soon (ie db1106), I will check the others. But from a quick check, we still have db1108 which is also analytics. I will create a task for them [11:48:55] I will check the others and prepare the ones for misc as that involves floating HW that we don't currently have at the moment. But I will figure it out [11:49:06] Thank you <3 [11:49:13] do you want me to assign it to you? [11:49:20] sure, that works [11:50:01] I think I'm done with it unless some core dbs left (I think eqiad s8 master is one of them) but I will get them done once you're done [11:50:36] we haven't had a new batch recently, it's fishy [11:50:39] sounds good [11:56:33] * Emperor updates T327253 with the fresh sorrow [12:05:30] dbproxy2001/2 failed over? [12:05:45] ah, it is maintenance [12:06:46] yep [12:06:47] it is me [12:06:51] should recover in a bit [12:07:19] the others will failover too [12:17:53] jynus: when is it a good moment to reboot db2160 (misc codfw backup source) [12:18:22] any time between now and 0 hours [12:18:25] \o/ [12:18:29] doing it now then [12:29:13] jynus: all done [15:09:53] PROBLEM - MariaDB sustained replica lag on s2 on db1105 is CRITICAL: 40 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1105&var-port=13312 [15:14:43] RECOVERY - MariaDB sustained replica lag on s2 on db1105 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1105&var-port=13312