[09:44:08] I am rebooting all ganeti ml-* nodes (requested by SRE) [09:47:49] we only need to reboot VMs in codfw, though? [09:48:20] none of the baremetal hosts need a reboot and the equivalent migration for eqiad will onkly start in ~ 2 weeks [09:48:52] sorry if that was unclear earlier [09:49:16] essentially for now just the VMs listed at https://phabricator.wikimedia.org/T294119 [09:50:17] moritzm: my bad I didn't get the "codfw" part, should be finished sooner then :) [09:50:32] :-) [09:52:14] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Add an envoy proxy sidecar to Kserve inference pods - https://phabricator.wikimedia.org/T294414 (10elukey) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/741092 is an example of something that may work (needs to be tested), to add a sidecar... [10:01:50] while trying to mess with the istio sidecar proxy, I thought if using a dedicated istio egress gateway for all inferenceservices [10:02:00] could be a good way to go [10:02:32] it is more consistent with the whole mesh concept, and in the future if we want to enable mTLS it should be easier [11:33:46] * elukey lunch! [16:10:38] o/ [16:13:16] good morning :) [16:15:11] elukey: i like this idea of a dedicated istio egress gateway for the isvcs [16:15:51] it could keep things clean/organized [16:23:16] the bit that I am not sure is about how it works when you don't have a tls mesh, like in our case [16:23:50] in the mesh use case, IIUC istiod instructs the envoy sidecars on each pod how/where to route egress traffic [16:23:59] in our case, nothing does that [16:24:12] so we should, in theory [16:24:24] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Ciell) There has not been much discussion, but everybody seems to agree on loosening the requirements for sources in th... [16:24:25] 1) create a kubernetes svc in front of the egress pods [16:24:56] 2) add network policies to allow only istio egress pods to contact, say, the mediawiki api [16:25:06] 3) instruct our pods to point to this new svc [16:25:29] then I have no idea if the proxy bits on the egress side will work [16:32:25] ah i see, yeah i forgot about us not having mtls [17:59:12] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) Do you think we could apply the new criteria to list of articles I have in my Sandbox? https://nl.wikipedia.or... [18:16:19] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Ciell) Yes, please do! [19:04:30] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) Ahh. That was more of an ask to Dutch Wikipedians to help choose what label those articles should ultimately h... [20:24:45] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Ciell) No: they say something they labelled an E now, would for instance become a C becomes without the strict sourcing... [20:48:37] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) If folks aren't interested in doing more labeling, it sounds like the best approach would be to just take the m... [21:17:43] Yay, I got the new ml-sandbox running enwiki-goodfaith using the old storage bucket, still working on integrating minio for local dev storage [21:18:43] so at least we know how to setup the kserve stack with the wmf images/helm chart/etc. for dev now [21:30:43] oh weird it seems like the top-level knative service is working but kserve inference service route doesn't get updated... [21:38:41] when i look at the kserve-controller logs it says `failed to create IngressConfig: unable to parse ingress config json` [21:40:33] the isvc pods are running fine though -- nothing unusual in those logs