[08:21:35] <godog>	 mmhh thanos-be2004 is basically out of disk space on sdb3 (the ssd for containers), I'm taking a look at what's up with that
[08:23:00] <godog>	 there's a couple of big container dbs in "quarantined" status, what I found so far
[08:27:36] <Emperor>	 :-/
[08:29:13] <Emperor>	 does thanos have smaller SSDs or just containers with many more entries in?
[08:31:17] <godog>	 ssd size should be the same IIRC, but yeah the tegola/osm containers are huge
[08:31:37] <godog>	 not a new problem sadly, i.e. https://phabricator.wikimedia.org/T307184
[08:40:19] <godog>	 I'll open a task for tracking and move the quarantined container dbs out of the way into the sd[ab]4 partitions
[08:40:41] <godog>	 which we could shrink to make space for the container partitions, if it comes to that
[09:18:33] <Emperor>	 godog: checking on replication, the frontends are getting ECONNREFUSED on polling the new backend nodes for replication info; do I need to do a rolling-restart or somesuch (e.g. to make something notice the change to swift::storagehosts)?
[09:18:42] <Emperor>	 (this is ms)
[09:25:30] <Emperor>	 backends seem to have got at least some data on them, so I think they're working OK
[09:27:08] <Emperor>	 backends have a ferm conf.d file with swift nodes in
[09:31:25] <Emperor>	 Hm, problem seems to be on the backends.
[09:32:40] <Emperor>	 godog: new object backends lack a listener on port 6022, whereas older ones have something listening (swift-object-server). Going to try restarting swift on one backend
[09:33:57] <Emperor>	 Aug 01 09:33:24 ms-be2066 object-server[3044374]: Unable to bind to port 6012: >
[09:34:43] <Emperor>	 oh, red herring, that's not the affected port, and it got there in the end
[09:35:23] <Emperor>	 right, that fixed it on that node, so I'll do it on the others. 
[10:06:23] <godog>	 Emperor: hah! so a reboot did it? or a restart object-server ?
[10:19:35] <Emperor>	 I restarted swift-* because I wanted a moderately-sized hammer :)
[10:20:09] <godog>	 fair!
[10:24:02] <godog>	 ok so I freed some space on thanos-be2004, though I think depending on how things shuffle the isn't enough space for the tegola containers
[10:24:11] <Emperor>	 :(
[10:24:26] <Emperor>	 also, eqiad dispersion report picked up way too many unmounted disks, going to investigate
[10:25:07] <godog>	 sigh, I'm assuming for ms cluster ?
[10:25:13] <Emperor>	 yep
[10:26:18] <Emperor>	 FCOL, some of this is the new nodes have stupid disks. ms-be1071 has 2x swift-sdl1 and 0x swift-sd01
[10:27:34] <godog>	 sigh
[10:28:02] <Emperor>	 I'll fix it, but :sadface:
[10:43:48] <godog>	 ok for T314275 I don't see many other short term solutions but to shrink sd[ab]4 and grow sd[ab]3
[10:43:49] <stashbot>	 T314275: thanos-be2004 sdb3 fully used - https://phabricator.wikimedia.org/T314275
[10:58:27] <Emperor>	 that's going to be a bit painful, isn't it?
[11:02:48] <godog>	 sorry, a bit painful?
[11:03:36] <Emperor>	 well they're not LVM partitions, so changing them is going to be quite invasive? I think it's probably the only answer, though.
[11:06:27] <godog>	 yeah I think we're going to have to lose the filesystem on "4" partition (not a lot of data on those, not a huge deal) though for the "3" partition I think we can extend the partition and grow the filesystem
[11:08:20] <godog>	 I'll test ^ in pontoon after lunch
[11:09:59] <Emperor>	 I think I've got all the ms drives back into a known-plausible state (except for the two waiting for repair)
[11:10:41] <godog>	 nice!
[11:10:45] * godog lunch, bbiab
[15:15:22] <godog>	 Emperor: ok I got this out, I've tested it on thanos-be-01 and it looks like to me it does the right thing https://gerrit.wikimedia.org/r/c/operations/puppet/+/819095
[15:15:58] <Emperor>	 👀
[15:21:56] <Emperor>	 I think I'd have been tempted to use shell for this, since it's so much gluing commands together :)
[15:23:56] <godog>	 :)