[09:45:36] I'm not sure this is accurate: "Those clusters do not the replicas for reads, everything goes to the master." [09:45:50] Feel free to change it [09:45:54] I have no idea how that works- but it would contradict an outage happening [09:46:09] and it getting fixed when depooled [09:47:09] if my understanding of how it works, based on the outage is the opposite- not try to fix them, but depool them as the primary can work on its own [09:47:16] * is right [09:47:43] but you are right to ask for help to mw people, that is only a guess [09:48:34] From the start it was said they were not going to receive reads, if that has changed I don't know. I wasn't in the outage so I don't know. If something else was said, please modify it [09:48:44] I am just documenting what I knew at the time [09:48:49] But again, I wasn't present [09:48:52] So I don't know [09:48:57] he he [09:49:07] it looks like we both don't know much [09:50:37] my suggestion is to try to meet with Tim an Timo sometime early in the morning to coordinate those doubts I only had [09:50:41] *also [09:51:03] I just asked timo in _security [09:51:07] Anyways [09:51:35] not it is possible you are right and the outage was a bug, cannot say [09:52:17] https://phabricator.wikimedia.org/T306118#7981824 [09:52:30] I hope that makes it clear why I added such documentation [09:52:35] sure [09:52:47] I just, during the outage, didn't see that are right [09:52:59] so just trying to communicate that [09:53:39] I belive the promises were not met [09:54:25] I will reopen that [09:58:27] https://phabricator.wikimedia.org/T306118#8177503 [10:01:41] my guess is that was the intention, but either it changed or there was a bug [10:41:08] ugh, puppet and swift-drive-audit are fighting over sdh1 on ms-be2039 and it's also filled up / [10:53:52] godog: I've fixed ms-be2039 (at least temporarily) but am swamped today and about to go on leave - could you have a look at the puppet stuff for mounting swift drives, please? I don't see how it can work - if swift-drive-audit comments out the drive in fstab and umounts it, puppet will make a new fstab entry (not commented out) and then systemd will try and mount it again, leading to these sort of cycles, with an occasional race fai [10:53:52] that fills up / [11:00:17] It seems a bit "how can this ever work with systemd's desire to keep re-mounting filesystems?" [11:01:30] (but I don't really have time to look further today) [11:02:38] I guess the mount resource needs some sort of "is this already in fstab but commented-out" check? [11:09:59] marostegui: Tim kind of backed my fear- there is some work to do to reach that thing. I suggested a way to move forward, but will both of you decide on the best way. [11:11:31] as technically I created more work, I offer myself to amend the docs, so not to put the burden on you [11:30:10] Emperor: sorry I really can't today :| [11:42:26] jynus: sure, I hope they fix that soon [11:42:44] I documented the bug on the docs [11:43:05] so you are right that is how it should work, sadly that is not true at the moment :-( [11:43:51] I am also updating the incident as this was new to me, and I think that is at the root of the issue, not operator error [11:52:05] Cool, excellent thanks [17:35:37] just a heads-up, I'm doing a mass delete of the giant container from T314835 . I'm taking it slowly (concurrency at 5) but did want to let y'all know [17:35:38] T314835: wdqs space usage on thanos-swift - https://phabricator.wikimedia.org/T314835