[02:29:22] * AntiComposite grumbles about dependencies [02:30:28] if it worked on my machine, and it worked on toolforge-python39-sssd-base:latest on Docker on my machine, why shouldn't it work in webservice --backend=kubernetes python3.9 shell [02:30:59] * AntiComposite reverts back to 3.7 [02:33:41] (it being in this case `pip install pywikibot==6.6.0` and not working being https://tools-static.wmflabs.org/anticompositebot/pip.log) [02:34:43] (and no, installing git+https://gerrit.wikimedia.org/r/pywikibot/core didn't work either [04:10:06] AntiComposite: your tool accounts .bash_profile adds /shared/pywikibot to $PYTHONPATH, which is the difference between your local machine and toolforge [08:53:51] Hi. I'm a new(ish) SRE in Data Persistence, and we're thinking about using Ceph (and the RGW) for MOSS (the next object storage cluster). I've used Ceph before (at the Sanger, my previous job), but we'd be interested in hearing your thoughts on Ceph as a platform; and what you think about Pacific (e.g. ceph orch and containers / still using packages)? Happy to put this in an email if you'd like to tell me where to send it :) [08:59:30] that seems like a dcaro question ^ [09:00:56] heyo :) [09:02:55] hi Emperor!! nice to meet you! [09:03:25] Emperor:Welcome :) [09:05:26] We are still using Octopus with packages on our systems, so far it's easy to keep it up and running, and nice on failures (self-healing, etc.), performance wise it's a bit trickier as we have very little control on the rack network setup and the racking itself among others [09:07:51] About Pacific and ceph orch/cephadmin, I'm only using it on my personal setup, you'll have to learn how to play with containers, but once you get it sorted out (know where the logs are, how to debug a running container, etc.), it's ok too, though if you have full control of the hardware it will run in (ex. no other services running on the host), might be simpler to use packages, as the [09:07:54] main benefit (as I see it) of the container deploy is multiversion on the same host (ex. if you have some client on the same host, upgrading ceph will upgrade some libraries that might mess up the client) [09:09:10] (I must say that my personal cluster is built on raspberry pi hardware, and there were a bunch of bugs, so it might be even smoother for supported setups) [09:10:23] oh, and we don't use RGWs yet (on the roadmap) [09:17:08] Thanks :) [09:17:43] Sanger made quite heavy use of RGWs, so if you want surplus opinions on them at any point, do ask :) [09:28:44] That'd be great! is there anything we should avoid at any cost? (you can add more details if you have them on T276961, it's not specific to RGWs but input is appreciated about them too) [09:28:44] T276961: Support Openstack Swift APIs via the radosgw - https://phabricator.wikimedia.org/T276961 [10:00:29] I don't think there are any total show-stoppers, but I'll look at that Phab task [10:27:52] Containers do let you decouple distro and Ceph upgrades (which we managed at Sanger by using the Ubuntu Cloud Archive packages), but upstream are really pushing cephadm & containers (despite lengthy ceph-users threads!), which is why I was thinking of trying that for MOSS [10:33:12] yep, it makes life way easier for them to be able to control the runtime versions of the libs along with the ceph services (for testing/debugging/etc.) [10:33:38] but it makes it a bit more complex (imo) as you have to learn and manage that indirection on your systems (containers, orchestrator, ...) [10:35:56] Mmm, particularly the "what is this daemon doing / logging" typically requires a layer of indirection via podman (or Docker if you must)... [10:36:23] the upgrade process does look quite nice, though :) [10:37:45] yep, and the fact that it will respawn any secondary services automatically is also nice (dashboard, etc.), even autosetup osds xd (though I'd be a bit afraid of that without testing properly) [10:49:00] !log admin [codfw1dev] create manila database on cloudcontrol-dev nodes (galera) T291257 [10:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:49:04] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [10:57:06] !log admin [codfw1dev] created manila user @ labtestwikitech (T291257) [10:57:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:57:10] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [11:06:10] !log admin [codfw1dev] created manila project (T291257) [11:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:06:14] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [11:06:15] we had fun with that, particularly given we needed to tie up NVME partition and OSD device (which could be hard once the disk had failed), ended up writing some hacky tooling... https://github.com/wtsi-ssg/ceph-disk-utils [11:06:28] !log admin [codfw1dev] give manila user admin role @ manila project (T291257) [11:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:32:01] !log admin [codfw1dev] populated manila DB & created service endpoints (T291257) [11:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:32:07] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [11:45:04] !log admin [codfw1dev] created rabbitmq user (T291257) [11:45:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:45:08] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [12:13:12] !log admin [codfw1dev] trying to create a manila service image (T291257) [12:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:13:17] T291257: Cloud: NFS: PoC: manila with generic driver using DHSS=true - https://phabricator.wikimedia.org/T291257 [14:00:06] majavah, thanks, that was the problem. only question is why that didn't break in 3.7 [19:29:52] !log tools.notwikilambda undeployed workaround for T291325, seems to be fixed [19:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL [23:40:48] !log tools.notwikilambda set pygments-server memory limits to 256Mi (3cea8f6a18) [23:40:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL