[12:07:46] effie: sorry to bug you but in case you might know, mc2038 is in rack A2 which we are going to do maintenance on [12:08:20] in Netbox it's at status=failed so we didn't notice it before, but it is reachable via ssh [12:08:24] is it an active server? [12:19:27] kill it [12:19:39] it is active but we have backup servers [12:19:56] so nothing will happen [12:20:28] cool, thx! [13:27:18] effie: seems like "failed" status is wrong in Netbox btw? I'll change that to active if so (maybe some weirdness when it was provisioned?) [13:29:02] mmm it should be online and active yes, let me take a quick look [13:32:03] topranks: yes this should be active and all [13:32:23] cool I'll make the change [13:32:39] I was just laughing about when I first started and you mentioned "mc" to me and I thought you were talking about rappers [14:36:52] If I have a k8s service that needs to connect to other k8s services, presumably I need an egress rule (yes?). What is the right way to do that? [14:40:09] i.e. do you hardcode the IP and port? [14:57:39] urandom: in our current setup it does not since we generally allow pod-to-pod traffic. But I'd prefer to make that depencency clear with a specific rule. Can you talk about which services are involved? Since I might suggest to go via the service mesh in which case the helm modules will create appropriate rules for you [14:58:42] jayme: linked-artifacts needs to be connect to inference-staging [14:58:56] (inference-staging.svc.codfw.wmnet) [14:59:03] oh, that's cross cluster then [14:59:19] oh right, yeah, that's ml's cluster [15:00:07] the pod-to-pod rule won't do it then, obviously. Since it's a different IP space [15:01:19] urandom: I would suggest to go via the service mesh then [15:01:41] it has listeners defined for inference and inference-staging: https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/hieradata/common/profile/services_proxy/envoy.yaml#337 [15:01:54] See https://wikitech.wikimedia.org/wiki/Envoy#Use_a_listener [15:03:33] oh, I see [15:03:38] yeah, that would be better [15:04:12] as said, once you enable the listener in linked-artifacts it will auto create the required egress rule for you [15:04:52] just remember to talk to localhost instead of inference-staging.svc.codfw.wmnet ;) [15:27:17] +1 to service mesh [16:40:47] that's for next week's rack maintenance, very k8s - https://phabricator.wikimedia.org/T427301 [16:44:48] XioNoX: cool, thanks! As long as it's just some workers or single ctrl nodes it's fine [16:45:18] XioNoX: what about the 'skipping host' things? Those won't loose connectivity? [16:46:02] jayme: it's part of my WIP cookbook, "skipping host" means it won't be depooled automatically by the cookbook [16:46:16] so it means either manual depool, or nothing special to do [16:48:02] ah, I see. Doing nothing is fine if it's just one kafka-main broker. Will note in the task [16:48:19] (doing nothing apart from downtiming that is)