[10:58:33] hi folks [10:58:42] I left some notes about istio cni experiments in https://phabricator.wikimedia.org/T297612#7759552 [10:59:08] it seems to me that upstream assumes that people install the cni plugin only via their install daemonset [10:59:11] (sigh) [11:00:56] I prepped https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/767924 to test the install-cni workflow, but I am aware that it may be very controversial [11:01:07] so before taking any actions I'll wait for feedback [11:18:28] (of course there is the nit to solve that puppet will try to enforce a calico-only config, meanwhile the install-cni daemonset will append the istio-cni chained one to the file) [14:24:31] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: High API server request latencies (LIST) for istio API groups - https://phabricator.wikimedia.org/T303184 (10JMeybohm) [14:35:08] 10serviceops, 10SRE: Move VTRS db passwords to a different hiera location - https://phabricator.wikimedia.org/T303272 (10Kormat) [15:09:01] jayme: about the istio-cni thing - I see stuff like https://github.com/istio/istio/blob/release-1.9/manifests/charts/istiod-remote/files/injection-template.yaml#L13 that make me thing that the flag is needed to fully configure pods for the mesh [15:09:53] any issues with using a guaranteed QoS strategy in k8s? (request == limits) Increasing requests has improved throttling on jobqueue but it's still way above the levels that changeprop is seeing for a similar service https://gerrit.wikimedia.org/r/769038 [15:12:01] elukey: ah, I see. You comment led me to belive it was more or less about the missing annotation. Hmm. That looks indeed like it needs at least extra config to work [15:13:18] hnowlan: nothing in particular. Just the fact that you are actively blocking all the resources you define, regardless if you actually use them or not. [15:14:13] in jobqueue's case using them is almost certain (hopefully if this works as well as I hope we can actually tune the number of replicas down) [15:14:23] there's a lot of hope in that sentence [15:15:27] Youu should maybe also set requests.memory == limits.memory as well I'd say. Just to be very explicit about what happens there [15:15:55] hnowlan: abandon any hope with k8s [15:15:57] :D [15:16:22] (I am new to this new fancy world but I already learned the lesson) [15:19:04] hnowlan: +1ed ...lets see [15:19:10] jayme: thank you! [15:19:15] a few days ago I saw someone wearing a bag for an infra team that had the slogan "ní straitéis é dóchas" on it which translates to "hope is not a strategy" [15:19:47] which is a good slogan but also something I think requires an asterisk at the end [15:20:25] something like *in k8s, hope is the only strategy :P [15:20:42] speaking of hope [15:21:02] jayme: I merged https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/768681 but helmfile diff looks weird [15:21:13] grrrreat :) [15:21:19] now I suspect that it is helm that wants to add things that are already there [15:22:47] hmm...well. That might actually be the case because we have strict dependencies in helmfile yaml (helm does not really know about them, as they are not part of the same release) [15:22:51] umpf [15:23:06] thankfully, I got to go :D [15:23:32] * elukey complains about k8s and colleagues [15:23:34] :D [15:23:45] I am going to revert, or maybe ask others [15:24:01] we can test this in ml-serve in theory, but I'd be afraid for other clusters [15:24:05] elukey: I'd revert the change for now...guess the path to fixing this typo needs thorough testing :-| [15:24:40] exactly that. One does not simply add an "n" to dependecies! :D [15:25:14] sigh [15:25:19] I am going to revert and open a task [15:25:45] ack, thanks! [15:29:30] 10serviceops, 10Machine-Learning-Team: Fix calico, cfssl-issuer and knative-serving Helm dependencies - https://phabricator.wikimedia.org/T303279 (10elukey) [15:38:18] ah lovely, since I reverted I'd still need to bump the chart's versions right? [15:41:13] that is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/769062/ [15:46:42] akosiaris: if you have a minut to proof-read --^ (it is a follow up after a revert, I merged https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/768681 but the helmfile diff was weird, opened T302279 about it) [15:54:31] (going to merge it, I'd like to avoid leaving things in the current status quo for too much time) [15:56:50] elukey: just saw, +1ed [15:56:59] even if a posteriori [15:57:06] ah thanks! [15:59:35] yeah now the helmfile diff looks better [22:46:17] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-iOS-App-Backlog, 10iOS-app-v6.9-Carp-On-A-Zamboni: Rotate APNS key before deploying Push Notifications to Production - https://phabricator.wikimedia.org/T288546 (10MattCleinman) [22:46:55] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-iOS-App-Backlog, 10iOS-app-v6.9-Carp-On-A-Zamboni: Rotate APNS key before deploying Push Notifications to Production - https://phabricator.wikimedia.org/T288546 (10MattCleinman) While we won't need help for ~3 weeks, adding Service Ops to thi...