[06:47:49] <federico3>	 Amir1: I'm starting a schema change in s3 in eqiad
[06:49:46] <federico3>	 actually there's a warning on db1150
[06:50:41] <federico3>	 running a backup it seems
[08:00:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on ms-fe1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[08:10:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on ms-fe1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[10:11:54] <jynus>	 Last dump for db_inventory at eqiad (db1215) taken on 2025-07-29 00:42:28 is 105 KiB, but the previous one was 112 KiB, a change of -6.9 %
[10:41:59] <Emperor>	 that puppet failure was related to gitlab maintenance
[10:42:20] <Emperor>	 (I was going to ignore it since it self-resolved, but then couldn't actually leave it alone)
[10:50:29] <jynus>	 last hours to ask my anything 🙌
[10:55:02] <Emperor>	 is there any progress on backups for gitlab, in case I'm asked about it while you're away? I think full deployment of its storage on apus is currently blocked on this question
[10:55:37] <jynus>	 the blocker was you being available so we coud discuss it
[10:56:17] <jynus>	 as we don't want to setup something that wouldn't work for other things outside of gitlab
[10:57:10] <jynus>	 tbf, it was also blocked on me splitting gitlab on its separate storage
[10:57:29] <jynus>	 but the backup sources got priority these weeks
[10:59:55] <Emperor>	 if you want to send me questions to think about while you're away, fire away (though maybe by email); I hadn't realised you needed more input from me
[11:00:30] <jynus>	 yeah, basically, I will set it up, but there always has to be a compromise
[11:00:54] <jynus>	 gitlab maintainer setup their requirements
[11:02:29] <jynus>	 for gitlab, maintaining an offline copy and send it to bacula is enough, but that may not scale for other stores
[11:03:20] <Emperor>	 At least initially it'll only be small use cases, 'cos the cluster is quite small :)
[11:03:35] <jynus>	 yeah, but that is why I wanted to talk to you
[11:04:02] <jynus>	 what could be future hypotehical needs for other users
[11:04:08] <jynus>	 and if we need to prepare for those
[11:04:36] <jynus>	 or if we should just do something easy for gitlab only, and change strategy later
[11:07:10] <jynus>	 in general: what does a backup of an object store looks like?
[11:07:28] <jynus>	 and it is ok to say: "don't know, ignore until we get there, just backup gitlab"
[11:07:48] <jynus>	 I mean in requirements, not in implementation
[11:12:24] <Emperor>	 I think I'm tempted to say "let's not block solving the problem for gitlab on trying to solve the wider question"
[11:12:50] <jynus>	 I will interpret that in the worse possible way: "I will do whatever I think is ok"
[11:13:47] <Emperor>	 I think if you do whatever is sensible to achieve gitlab backups, I'm content - I've previously said a couple of ways I'd think about doing so, and am happy to review any plan you come up with :)
[11:14:40] <jynus>	 I haven't seen those. Would love to see even hypotethical or theoretical solutions. Not that I don't have some, but I am open to further ideas.
[11:14:56] <jynus>	 as maybe I could be missing some
[11:15:47] <jynus>	 Note this will be the first time we backup an object storage (media backups mediawiki, not swift)
[11:17:50] <jynus>	 I've shared you a doc if you want to share those ideas (ok to copy and paste)
[11:17:59] <jynus>	 and I will have a look after I come back
[11:18:22] <jynus>	 (I will ofc end up adding more stuff
[11:18:44] <jynus>	 I am more like asking for things I haven't had into account like "read rate should be limited to 1000 reads/s"
[11:18:55] <jynus>	 that as the service owner you may want to impose
[11:19:00] <jynus>	 if that makes sense
[11:19:27] <jynus>	 but any input is welcome
[11:19:48] <jynus>	 or you may have expertise on cepth-related tooling I don't know
[11:20:01] <jynus>	 which are all reasons I value highly your opinion
[11:20:08] <jynus>	 *why
[11:20:17] <jynus>	 even if it is ultimatelly my thing to solve
[11:25:32] <jynus>	 I added a coment focusing on my main questions (not for you to solve, but so you can understand my biggest questions, in case you have a good answer)
[11:27:30] <jynus>	 having said that, gitlab is "easy" because almost anything will work
[11:27:48] <jynus>	 I just like asking questions even if we decide not to answer them yet
[11:28:50] <jynus>	 And then there are things that I may need an expert like you: "Can I ask for all objects changed since X timestamp efficently?"
[11:29:16] <jynus>	 if not, is there a way to do so by activating some log?
[11:31:37] <Emperor>	 rclone has a command option for "objects more recent than [date]"
[11:31:48] <jynus>	 oh, I don't doubt it
[11:31:55] <jynus>	 the questions is, how fast it runs
[11:32:07] <Emperor>	 I suspect it's a metadata query, so scales with container size
[11:32:15] <jynus>	 as if it has to read every single object to do so
[11:32:29] <jynus>	 it won't scale to the sizes of e.g. mediabackups
[11:32:45] <Emperor>	 No, I think the bucket listing tells you what you need
[11:33:03] <jynus>	 those are the kind of things that I may have questions about, and while I don't expect you to have all answers
[11:33:07] <Emperor>	 i.e. you list the bucket once (which will slow for a large bucket), and that contains the info you need
[11:33:11] <jynus>	 you sometimes may know better
[11:33:33] <jynus>	 yeah, that's the kind of bad scenario I feared
[11:34:51] <jynus>	 again, later that will have a practical answer "it takes X seconds to do it" and that will work or not
[11:35:54] <jynus>	 anyway, please dump there everything you can think of it
[11:36:16] <jynus>	 and I will take it into account and try to see what's the best way
[11:37:07] <jynus>	 Re: your current answer, I was talking about versioning on storage, not on source
[11:37:52] <jynus>	 on backup storage
[11:38:44] <jynus>	 because in the end, someone will want "give me the status of the storage at 11:38"
[11:39:02] <jynus>	 and depending how we store it that will be either impossible or possible but too slow
[11:39:37] <jynus>	 and that is ok if we decide "you will only be able to get the status of it every 1 hour, or every 1 day"
[11:39:49] <jynus>	 but it is still a decision
[11:42:11] <jynus>	 it is more important to think about the speed (and logic) of recovery than the speed of backup taking
[11:42:14] <Emperor>	 I'm sorry, I don't understand what "status of the storage at 11:38" means
[11:42:41] <jynus>	 I am not that worried about knowing that something changed at 11:38
[11:43:00] <jynus>	 that can be done even if slower or faster
[11:43:36] <jynus>	 but how can I recover the 11:38 state, where some files had not been created, others modified later
[11:43:49] <jynus>	 if I have a backup, e.g. every day
[11:44:45] <jynus>	 "A bug started at 11:38, corrupting files, please revert gitlab to that timespamp"
[11:45:38] <jynus>	 files uploaded after that time should not be available for security reasons
[11:45:52] <jynus>	 can you see that is not trivial?
[11:46:37] <jynus>	 I think you are now starting to understand my doubts :-D
[11:46:56] <jynus>	 based on your last doc comment
[11:47:16] <Emperor>	 In an old-style backup setup (like I have at home), the answer would be "I can restore from the last backup I took before that time, which was last night"
[11:47:26] <jynus>	 yep
[11:47:41] <jynus>	 it is just that with an object storage things are more granular
[11:48:39] <jynus>	 plus also some more restrictions- usually we want to use deduplication as much as possible to save space
[11:50:09] <jynus>	 bacula handles incremetals for us in terms of files. I don't know of a tool doing the same automation for object storage
[11:50:36] <jynus>	 anyway, I don't want to steal a lot of your time, but wanted to know if you had thoughts
[11:50:59] <jynus>	 you will have time to add stuff while I am out, and I will see what's the best way after I come back
[11:53:37] <jynus>	 at least I made you see matrix :-D
[11:54:20] <jynus>	 and if I was able to demonstrate that backing up commons was theoretically impossible, but still we made it work gives us hope 0:-)
[11:56:41] <Emperor>	 do you have a view on whether you'd like to backup to an object store or a filesystem?
[11:57:06] <jynus>	 view, as in preference? nope
[11:57:06] <Emperor>	 [and do you know if bacula or minio claim to be able to usefully backup S3 buckets? ]
[11:57:16] <Emperor>	 [yes, I meant preference]
[11:57:25] <jynus>	 I have a view that we shouldn't use minio
[11:57:57] <Emperor>	 fair enough
[11:58:14] <jynus>	 and that filesystem is simple and reliable, but only if a bunch of objects are stored toghether (the bacula solution)
[11:58:38] <jynus>	 I mean, everything is a filesystem right?
[11:59:15] <jynus>	 so it will be like a circle of competing needs: reliability, simplicity, performance, etc.
[11:59:17] <Emperor>	 No definitely not :)
[11:59:22] <jynus>	 ha ha ha
[12:00:15] <jynus>	 I have not looked at plugins
[12:00:30] <jynus>	 but I consider that implementation details, I am yet on "what's the architecture we need"
[12:00:46] <jynus>	 and once we fix requirements, we chose the best technology there is for it
[12:01:01] <jynus>	 it helps ofc knowing available architectures to look at tech
[12:01:50] <jynus>	 but it is important to note backups are a complete different set of needs than production serving (no need for high concurrency)
[12:02:51] <jynus>	 there was this other s3 open source solution that I have yet to look at
[12:03:34] <jynus>	 garagehq
[12:04:28] <jynus>	 good thing is gitlab is small enough to be able to experiment, unlike mediabackups
[12:10:48] <jynus>	 thank you, Emperor only this talk was already useful
[12:11:12] <jynus>	 and I hope at least this was informative to you to understand the questions I face
[13:12:38] <jynus>	 best wishes
[14:30:15] <dhinus>	 o/ can I get a review of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1173974 ?