[09:42:19] elukey: do you have time for the istio/calico cert changes right now? [09:42:42] testing them on ml-staging I mean [09:49:02] jayme: you can go ahead anytime [09:49:13] I can do it if you want, lemme know [09:50:08] elukey: ack. I just wanted you around for a second pair of eyes for verification actually [09:50:40] the change is totally capable of breaking everything :-D [09:53:08] ah yes definitely :D [09:53:26] we can kill pods here and there, it should be sufficient to verify if it works [09:53:34] (also if the kubelet screams in the logs etc..) [09:53:47] yep [09:53:58] going to stop puppet on all k8s first [09:54:27] <_joe_> pods have feelings you know elukey [09:57:16] _joe_ I have feelings as well and k8s drained all of them from me, I deserve some justice :D [09:57:41] <_joe_> elukey: stop deflecting; you have been drained by istio and kubeflow [10:02:02] elukey: changes applies to ml-staging* [10:05:27] elukey: non-istio pods restart just fine [10:05:37] calicoctl is working as well [10:07:56] _joe_ yeah sure the rest is so nice and easy :D [10:08:00] jayme: nice! [10:08:24] we can proceed with ml-serve [10:08:29] elukey: can I just kill random revscoring pods to check on istio? [10:08:37] yes yes definitely [10:08:43] I thought you already did it [10:08:58] if you tell me which one I'll try to get a score from it [10:09:02] to double check [10:09:07] ruwiki-articlequality-predictor-default-00005-deployment-5vj6p7 [10:09:19] ack lemme know when I can test [10:10:02] a new one did start up already [10:10:12] the old one is terminating [10:11:33] "PreStop hook failed" err="Get \"http://10.194.61.185:8022/wait-for-drain\": EOF" pod="revscoring-articlequality/ruwiki-articlequality-predictor-default-00005-deployment-5vj6p7" [10:11:47] probably unrelated to the change, but might be interesting [10:11:56] mmmm [10:12:02] I tested the endpoint and it works [10:13:10] nice [10:13:12] thanks [10:13:32] the old one is still in terminateing state [10:16:20] I've re-enabled puppet on the other hosts as well to have this roll out [10:21:59] all good now right? [10:22:04] I don't pods in terminating state [10:22:19] yeah, it got cleaned up now [10:28:02] jayme: do you want to attempt to break calico this afternoon? [10:28:10] after the switchover etc.. [11:23:15] elukey: wdym? [11:23:29] the client cert change for calico? [11:25:08] yes exactly [11:25:08] that one I already merged together with the istio clien cert [11:25:22] so everything is on fire already, apart from that it's not [11:25:24] :) [11:29:32] ahhh I lost that part, I thought you did istio only [11:29:34] perfect :) [14:26:32] jayme: is there anything left before merging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/887945 ? [14:26:51] nothing urgent of course, but Jenkins +2 gave me hopes [14:28:27] XioNoX: I think it's fine, but let me take a look again first please [14:36:29] no pb! [15:54:49] actually, all the patches under https://gerrit.wikimedia.org/r/q/topic:T306649 should be ready to merge if I understand correctly [15:55:36] akosiaris (and everybody else) let me know when would be a good time to discuss https://phabricator.wikimedia.org/T306649#8618735 [16:01:46] XioNoX: Sent you an invite to reserve an hour, otherwise we will never get it done [16:01:58] and it will be entirely my fault and I can't have that [16:02:41] akosiaris: damn, I can't even blame you now :( [16:02:45] accepted :) [16:10:19] akosiaris: I'd like to join please :) [16:29:17] jayme: added [16:29:22] thanks [16:37:03] ah damn, I'm out friday. nvm then