[09:21:14] I was thinking about adding support for network policy generation for egress to the kerberos and HDFS namenode servers, the same way we do it for kafka, zk, mariadb, etc. This will probably be a recurring theme as we start migrating data platform services to k8s. [09:22:34] It does not have to be done now, as we're only working on packaging our first service to run in k8s, but it would help scaffolding the next ones without copy/pasting too much. WDYT? Thanks! [09:27:36] I'm not very fond how we do those things currently tbh (but it's a two fold problem really as it usually is about networkpolicies *and* poviding IPs/ports etc. to charts). I have created https://phabricator.wikimedia.org/T331894 to think about a more generic solution for the first (networkpolicy) problem [09:35:08] https://wikitech.wikimedia.org/w/index.php?title=Kubernetes%2FClusters%2FAdd_or_remove_nodes&diff=2134698&oldid=2134615 fyi [09:39:37] jayme: oh I see, so we'd only need to maintain the IPs within the service/endpoint for a given, say, kafka cluster, and we'd reference that service within our network policies [09:40:34] XioNoX: Great news, thanks! [09:40:50] tell me if I'm wrong, but I think that'd be particularly sweet as we wouldn't have to redeploy _every single app_ that needs to talk to a given cluster when we add/remove a node. Insead, we'd only have to redeploy the service itself? [09:41:04] yeah, that's the idea [09:42:08] because that really sucks...also it's a lot of repetition currently in some places (lists of IPs all over the place) [09:42:31] agreed [09:44:05] I'm happy to try this for one service I need egress to that isn't currently supported by the current way of doing things [09:45:16] jayme: and this https://wikitech.wikimedia.org/w/index.php?title=Kubernetes%2FClusters%2FNew&diff=2134700&oldid=2112654 [09:45:38] brouberol: unfortunately there is no "this" yet :) [09:46:03] oh right, I got that, I meant I'm happy to give it a shot, or pair on it [09:46:50] this would be a real DX improvement for data platform SREs [09:47:12] it would def. help to have another brain on this...but I fear that there is no quick solution in sight. [09:47:39] we could add this as topic for next weeks k8s-sig and try to start things of there - wdyt? [09:48:29] for sure [10:13:00] XioNoX: <3 thanks for the work! [10:16:03] jayme: synced ml-serve-codfw, we should be ok with the istio/cert-manager rollout right? [10:16:22] yes, all done then [10:49:21] very interesting reading https://github.com/python/cpython/issues/80235 [10:49:52] we are finding more and more this issue in various libraries, and everybody implements their own workarounds [11:34:42] indeed [11:38:21] I am adding some support for the inference-services repo, maybe we could agree on something to add on wmflib? [16:36:31] isn't wmflib a cookbook thing? [16:36:41] well..cumin [16:38:16] oh, it's not. But is it used in services code. Anyhow. Might make sense to have something indeed. At least written to the evil wikitech page as best practice for python [16:41:02] ack, I am testing https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982401 for our model servers, if all goes as planned I can propose it to Riccardo and/or document it [17:24:03] sounds good