[15:14:33] kamila_: || jelto: staging-codfw should be fine now. There was a config piece missing which blocked bgp announcements on the core routers (where the masters BGP peer with) - so packets could not flow back to the pods running on the masters as their IPs where not announced [15:17:44] great, happy to hear that. Yes the istio pods look healthy now. Should I try to deploy also one dummy service to staging-codfw (like ipoid?) [15:17:44] What was the config piece? Somewhere in puppet or on the router? [15:35:04] jelto: basically homer was not run against core routers but only ToR [15:36:03] jelto: you should rerun the istioctl apply to make sure it did all it had to, then you can deploy ipoid to make health checks happy, yes [15:36:35] Ah yes that makes sense. I'll do that now [15:38:26] oh, thanks jayme [15:39:58] Istio is happy "✔ Installation complete" [15:40:03] nice [15:46:15] I have an RBAC question that has been on my mind lately. Context: We're working on orchestrating the wiki dumps from airflow. The process works by having airflow read a job spec, tweak it slightly, and create a pod from that spec that will run the dumps process. [15:46:17] ipoid deployment worked [15:46:17] ipoid-staging-95bdc9869-697vb 2/2 Running 0 11s [15:47:05] The issue is: the way that this whole thing was initially thought would have the job spec deployed in a separate namespace. The spec would be deployed ns A and airflow would be running in ns B [15:47:25] and airflow would need to be able to create the pod in ns A as well [15:47:51] until then, I've gone around this by deploying everything (airflow, the job spec, the dumps pod) to the same NS [15:48:43] one issue that I _think_ I see is that we'd need to have a ClusterRole bound to the airflow service account to be able to read the job spec / create the pod in another NS, and that clusterRole would need to be defined in our chart [15:49:08] which means that the role deploying our chart would need to be able to create clusterroles, which sounds like a security issue to me [15:51:27] one alternative btullis was thinking about was to define the clusterrole (and associated permissions) in admin_ng/helmfile_rbac, and ship the clusterRoleBinding in the chart itself [15:52:58] jelto: could you also remove downtimes for all staging-codfw nodes? [15:53:12] yep one sec [15:54:43] * s/and ship the clusterRoleBinding in the chart itself /and ship the roleBinding in the chart itself/ [15:54:50] as we can rolebind a clusterole [15:58:34] brouberol: I feel like my bouncer missed the start of this but that is something we did for other things as well IIRC [15:59:22] so TLDR: clusterrole in admin_ng/helmfile_rbac and rolebinding in the chart? [16:00:08] jayme: I removed the downtimes for kubestagemaster[2003-2005] and kubestage[2001-2004] [16:00:39] jelto: ack, thanks [16:01:24] brouberol: I'm not 100% from the top of my head...might as well be that regular deploy users can't reploy rolebindings [16:01:50] they can't, _but_ we're also using a specific deploy user for airflow, with appropriate permissions [16:02:51] because the airflow serviceaccount need to be able to create pods and tail pod logs in their current namespace, when running tasks as k8s pods [16:02:55] so I think we're good [16:03:09] as said, I think I'm missing context. :) [16:03:26] It could also be an option to deploy the rolebinding during namespace creatiion