[09:01:49] claime, effie: can I do the experiment on staging today? or should I wait until Monday? [09:02:29] duesen: I am off today I am afraid [09:03:21] ok, enjoy! [09:03:42] duesen: yeah go for it I'm around [09:18:03] ok, I'll start fiddling with it in about an hour or so [10:54:21] claime: i'm giving it a go now. chart version is pinned to 0.15.16. [10:55:32] ack [10:57:48] claime: i'm hitting: pods "api-gateway-main-smokepy" is forbidden: violates PodSecurity "restricted:latest" [10:58:04] do you have time to chat and denug, or should i just record on the ticket and we come back to it later? [10:58:25] let me look at something [10:58:41] "denug", nice. can you tell i have a new keyboard?... [11:02:05] I think what's happening is something is trying to use an image with "latest" tag [11:02:23] let me check [11:04:26] smokepy should use v0.2-dev [11:09:33] that may not be it though [11:09:44] I'm getting lost in ValidatingAdmissionPolicy jayme do you have an idea? [11:10:06] 👀 [11:10:41] I need to go afk for a little bit sorry [11:10:41] hold on, let me paste the full output... [11:10:51] claime: np, I can take it [11:11:16] duesen: what are you trying to deploy and where to? [11:13:38] https://phabricator.wikimedia.org/P93890 [11:14:19] jayme: i'm experimenting with "helmfile test" ojn the rest gateway (on staging). The idea is to run integration tests from pa pod deployed in k8s. [11:14:35] The deployment works fine, running the test (that is, creating the test pod) fails [11:15:27] jayme: this is totally experimental. i deployed from my home dir, i'll roll staging back to master after testing. [11:15:51] Afaik this is the first time we do this kind of thing, so problems are expected. and making it work is not urgent. [11:15:51] this is the relevant part: [11:15:53] allowPrivilegeEscalation != false (container "smokepy" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "smokepy" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "smokepy" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "smokepy" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") [11:16:00] i'm just trying to collect info [11:16:14] the test container needs to specify all those things in order to pass validation [11:16:57] Claude tells me I should include base.helper.restrictedSecurityContext. would that doer the trick? [11:17:49] yep [11:18:15] hm.. can i tell helmfile to use a local chart, instead of loading a packaged chart? that woullow me to experiment with that without the need to merge into master to generate the package, then revert... [11:18:17] that needs to be included for every container IIRC, or at pod level [11:19:10] I need some way to to deploy experimental chart versions... [11:19:14] I don't think there is a cli option to override that, but you can modify the helmfile.yaml temporarily [11:19:42] what would i put there? [11:19:56] or better copy all of /srv/deployment-charts to your $HOME and modify it there, to not interfere with others [11:20:23] this is already in my home [11:20:26] I'm already pinning the chart version, but that still means i have to generate and publish a package for that version... [11:20:42] IIRC relative filesystem path (from where helmfile.yaml is) works [11:20:46] or absolute fs path [11:21:22] where would I put that pth? [11:21:47] helmfile currently has: chart: wmf-stable/api-gateway [11:21:58] I suppose i need to replace that with... something? [11:21:59] instead of 'wmf-stable/api-gateway' [11:22:20] so 'chart: ../../../chart/api-gateway' [11:22:27] if I'm counting correctly :D [11:22:41] 'chart: ../../../charts/api-gateway' [11:22:54] huh, right. I can do that. in that case, i wouldn't specify a version i suppose. [11:23:06] no, you don't have to [11:23:14] ok, let me fiddle with that [11:23:16] eleases: [11:23:18] - name: foo [11:23:20] chart: ./path/to/foo [11:23:44] so yeah, local chart path should work [11:25:34] duesen: you can also use the kind.sh script in deployment-charts to bootsrap a local cluster on your machine (via kind) that has the same config as production has [11:25:45] in terms of all these validations etc. [11:27:28] jayme: i have a paretial setup with minikube and helm, i never got hemlfile working locally... I should try out kind... [11:27:41] claime: under relases? not under templates? [11:28:02] wherever it currently is [11:28:05] oh i see, the's where the yaml alias goes [11:28:10] yea [11:30:27] the kind.sh thing won't help with the complexities of the helmfile setup, but it will give you a cluster with the same features/limitations as wikikube [11:31:10] ok nice, the pod runs now, but the tests fail [11:31:13] i'm getting: [11:31:14] OSError: Cannot connect to http://api-gateway-main:8087. Perhaps the service is not running or port-forwarding needs to be enabled. [11:31:50] (also, I have to go graăt log manually, hemlfile test is only saying "pod api-gateway-main-smokepy failed" [11:33:43] helmfile test --logs will help with the latter [11:35:20] jayme: yes, but it will fail if the pods talks more than a fraction of a second to come up. if it needs to fgetch an image, ---log fails because it can't find the pod. it's kind of stupid... [11:35:50] (at least t6hat's what happens with helm test, haven't tested with hemlfile, but I assume the problem exists there as well) [11:37:28] anyway... any idea why accessing to http://api-gateway-main:8087 fails? that's the right service name and port...it should be accessible from inside the pod, right? [11:37:45] this is two pods in the same namespace talking to each other [11:40:23] hm, looks like on minikube I am using http://localhost:8087... that doesn't seem like it should work actually... or it works by accident, ince minikube is only one node... [11:40:41] claime: thoughts? [11:43:06] scratch that, minikube tests run against http://api-gateway-restgw:8087, which is the local service name. [11:44:54] I can look in 5 [11:48:05] If you are busy and we should just leave it and revisit next week, just tell me. this is an experiment, we can just abort it any time. [11:54:10] I do have other things on the side ofc :) [11:55:59] in staging the name is api-gateway-staging not api-gateway-main [11:58:11] jayme: that's not what kubectl get svc tells me... [12:11:00] jayme: found it. needs to be https, not http. [12:13:16] oh yes, that as well [12:13:20] # kubectl -n api-gateway get svc [12:13:22] NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE [12:13:23] api-gateway-staging NodePort 10.64.76.43 8087:8087/TCP 445d [12:14:40] maybe you're deplying to somewhere else [12:15:58] jayme: -n rest-gateway [12:16:27] api-gateway chart, rest-gateway service. [12:17:17] that does raise the question why the rest-gateway service uses "main" for the satging service... but apparently it does. [12:18:27] ah 🤦 sorry [12:18:53] I think the different names in staging are an historic artifact [12:19:07] there is def. no need to make them different from prod [12:32:18] jayme, claime: staging is back to master now. than you for your help! running the tests in the pod works, i can get logs, and it's a LOT faster! It takes less than 5 seconds to run all tests. Nearly as fast as it goes locally on minikube. Running them frem the deployment hosts takes 5 minutes! [12:32:40] nice one! [12:35:17] I'll go ahead and merge the Rakefile change. I was hoping to get another +1, but I guess it's safe. [12:44:44] sgtm [14:41:20] ^ oh, neat!