[08:59:51] dcausse: o/ [08:59:58] o/ [09:00:03] helloooo [09:00:06] hey :) [09:00:28] sorry to ping you, I noticed two alerts related to unassigned shards in karma - https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DElasticSearch%20unassigned%20shard%20check%20-%209643 (and another one for a different port) [09:00:32] is it known? [09:00:51] the other one is https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DElasticSearch%20unassigned%20shard%20check%20-%209443 [09:00:58] seems to have started a day ago [09:01:23] elukey: yes, I think Guillaume silenced them, it's related to the opensearch migration in eqiad [09:01:36] I have a meeting with him now, I'll raise the issue [09:01:40] thanks for the ping! [09:01:51] ooook thanks! I'll leave it there then! [12:59:50] jynus: have you see this? https://www.reddit.com/r/selfhosted/comments/1kva3pw/avoid_minio_developers_introduce_trojan_horse/. Linked github is at https://github.com/minio/object-browser/pull/3509 [13:00:10] I haven't dug enough yet, but there is a bit of drama around minio [13:03:19] XioNoX: there's a sync-netbox-hiera diff for the virtual-magru subnets you added [13:03:33] taavi: right, you can merge it [13:03:39] or I can if you prefet [13:03:43] prefer [13:03:58] sure [13:04:40] done [13:07:09] Jeff_Green, see the message above from akosiaris about Minio, it might interest you ^ [13:14:20] don't overhype it fwiw. It seems contained to the web interface right now. [13:14:32] but it is somewhat concerning [14:19:45] akosiaris: yeah, not the first person that point to me [14:21:31] minio is something that I wanted to migrate away from, not an ideal piece of tech for me, but I have yet to find a proper alternative that is not swift or ceph [14:21:42] s/me/our needs/ [14:28:31] how 'bout gluster? (kidding) [14:29:28] brouberol: ok to merge yourt change? [14:29:33] has anyone worked with Garage? https://garagehq.deuxfleurs.fr/ [14:29:58] jynus: I'm using that in my homelab, works reasonably well for my use cases [14:30:13] vgutierrez: yes please sorry [14:30:22] thx, merging [14:30:33] the phone rang and I got distracted [14:30:45] Cool, I'll have to check out Garage too. I'm using Cloudflare R2 but it's always good to have a self-hosted option [14:31:02] akosiaris: in any case it is not a use case that would impact us, there are other things worse in their development policy [14:31:40] brouberol: done [14:31:57] thanks [14:32:18] taavi: thanks, their upgrade policy seems already very sane [14:39:10] Here's another slightly funnier story about abusing the unwritten rules of open source https://virtualize.sh/blog/ground-control-to-major-trial/ [14:39:10] * akosiaris bookmarking to review Garage [14:43:30] in any case, backups have very little redundancy and are slow to migrate, so we will probably wait for a hw refresh to rclone them to another data store [14:43:55] *very little redundancy per dc [14:44:03] jynus: for what is worth, I am experimenting with minio in my VPS. There is something that somehow stops me from adopting it more. [14:44:18] maybe that instinct wasn't wrong. [14:44:23] let me telly you my biggest issue [14:44:37] a major upgrade requires reimporting all files, with no upgrade path [14:44:45] what? [14:44:49] I was unaware [14:44:50] as you hear it [14:44:56] holy... [14:45:23] You had one job- being able to store stuff long term [14:45:35] imagine if that was true for the dbs [14:45:52] fwiw, this was (still is?) true for postgres too [14:46:52] it has always been my major pain with operating that software. [14:47:05] I don't think that's true anymore, but it is not a big issue for our usage [14:47:30] but almost 1 Petabyte of data- it would be insane [14:47:46] which is literally the first thing I checked Garage had: https://garagehq.deuxfleurs.fr/documentation/operations/upgrading/ [14:51:55] But I agree that mysql had some operational & querying simplicity, and that made it more liked by both wmf devs and ops, given we didn't need a lot of fancy features that postgres did better [14:52:59] akosiaris: this is not urgent, but I would like at some time in the future pick your mind about suggestions regarding backup technology for a thing [14:53:57] we have some new requirements upcoming, and it would be nice to count with you with brains of arch/tech stack suggestions [14:54:18] count me in [14:54:40] thank you, likely it will be some future meeting invite only, shouldn't take a lot of your time [14:54:55] cool [14:54:58] we are still gathering requirements still, they are not yet clear [15:53:45] small downtime on bacula, in case someone will want to recover something in the next few minutes [15:53:58] should be back soon [17:45:37] FYI, I've merged a change [0] to avoid an issue with spurious metrics collection by k8s prometheus instances (in short, due to IP collisions with terminated pods, more details in T395052). [17:45:37] this seems to be working as expected, but just in case: if you encounter metrics normally collected by k8s* prometheus instances that are missing since ~ 17:30-17:40 UTC today, this may be the cause (and can be trivially rolled back). [17:45:37] [0] https://gerrit.wikimedia.org/r/1149505 [17:45:39] T395052: Stale labels applied when the pod IP of a terminated k8s job is reused - https://phabricator.wikimedia.org/T395052