[12:07:46] <topranks>	 effie: sorry to bug you but in case you might know, mc2038 is in rack A2 which we are going to do maintenance on 
[12:08:20] <topranks>	 in Netbox it's at status=failed so we didn't notice it before, but it is reachable via ssh 
[12:08:24] <topranks>	 is it an active server?
[12:19:27] <effie>	 kill it
[12:19:39] <effie>	 it is active but we have backup servers
[12:19:56] <effie>	 so nothing will happen
[12:20:28] <XioNoX>	 cool, thx!
[13:27:18] <topranks>	 effie: seems like "failed" status is wrong in Netbox btw?  I'll change that to active if so (maybe some weirdness when it was provisioned?)
[13:29:02] <effie>	 mmm it should be online  and active yes, let me take a quick look
[13:32:03] <effie>	 topranks: yes this should be active and all 
[13:32:23] <topranks>	 cool I'll make the change 
[13:32:39] <topranks>	 I was just laughing about when I first started and you mentioned "mc" to me and I thought you were talking about rappers 
[14:36:52] <urandom>	 If I have a k8s service that needs to connect to other k8s services, presumably I need an egress rule (yes?).  What is the right way to do that?
[14:40:09] <urandom>	 i.e. do you hardcode the IP and port?
[14:57:39] <jayme>	 urandom: in our current setup it does not since we generally allow pod-to-pod traffic. But I'd prefer to make that depencency clear with a specific rule. Can you talk about which services are involved? Since I might suggest to go via the service mesh in which case the helm modules will create appropriate rules for you
[14:58:42] <urandom>	 jayme: linked-artifacts needs to be connect to inference-staging
[14:58:56] <urandom>	 (inference-staging.svc.codfw.wmnet)
[14:59:03] <jayme>	 oh, that's cross cluster then
[14:59:19] <urandom>	 oh right, yeah, that's ml's cluster
[15:00:07] <jayme>	 the pod-to-pod rule won't do it then, obviously. Since it's a different IP space
[15:01:19] <jayme>	 urandom: I would suggest to go via the service mesh then
[15:01:41] <jayme>	 it has listeners defined for inference and inference-staging: https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/hieradata/common/profile/services_proxy/envoy.yaml#337
[15:01:54] <jayme>	 See https://wikitech.wikimedia.org/wiki/Envoy#Use_a_listener
[15:03:33] <urandom>	 oh, I see
[15:03:38] <urandom>	 yeah, that would be better
[15:04:12] <jayme>	 as said, once you enable the listener in linked-artifacts it will auto create the required egress rule for you
[15:04:52] <jayme>	 just remember to talk to localhost instead of inference-staging.svc.codfw.wmnet ;)
[15:27:17] <Raine>	 +1 to service mesh
[16:40:47] <XioNoX>	 that's for next week's rack maintenance, very k8s - https://phabricator.wikimedia.org/T427301
[16:44:48] <jayme>	 XioNoX: cool, thanks! As long as it's just some workers or single ctrl nodes it's fine
[16:45:18] <jayme>	 XioNoX: what about the 'skipping host' things? Those won't loose connectivity?
[16:46:02] <XioNoX>	 jayme: it's part of my WIP cookbook, "skipping host" means it won't be depooled automatically by the cookbook
[16:46:16] <XioNoX>	 so it means either manual depool, or nothing special to do
[16:48:02] <jayme>	 ah, I see. Doing nothing is fine if it's just one kafka-main broker. Will note in the task
[16:48:19] <jayme>	 (doing nothing apart from downtiming that is)