[08:09:20] chrisalbon: o/ re: HF docker image size - the size is considerable, so ideally SREs handling the Docker registry are not 100% happy about it, but the main plus in this case is that a layer of 10G is sharable between multiple docker images, so in case a k8s worker has already pulled it down future docker image updates wouldn't require a complete repull but just the upper layers that changed (say hf [08:09:26] pip changes, code fixes, etc.. - basically the extra 1G that Ilias mentioned) [09:08:59] o/ morning [09:14:29] (03PS5) 10MPGuy2824: Migrate usage of Database::delete, insert, update and upsert to QueryBuilder [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1007862 (https://phabricator.wikimedia.org/T358831) [10:15:31] 06Machine-Learning-Team, 06serviceops: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9692225 (10JMeybohm) [10:15:36] (03CR) 10MPGuy2824: Migrate usage of Database::delete, insert, update and upsert to QueryBuilder (035 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1007862 (https://phabricator.wikimedia.org/T358831) (owner: 10MPGuy2824) [12:49:37] o/ [12:49:39] hello folks [12:49:50] sorry I am a little late today, my schedule is messed up :D [12:49:53] will stay a bit more [13:20:45] (03CR) 10Elukey: "Left some nits and questions but LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [14:07:34] Good morning all [14:09:20] morninggg [14:13:48] 06Machine-Learning-Team: 14Add Dragonfly to the ML k8s clusters - 14https://phabricator.wikimedia.org/T359416#9692846 (10elukey) 05Open→03Resolved [14:13:52] 06Machine-Learning-Team: 14Find an efficient strategy to add Pytorch and ROCm packages to our Docker images - 14https://phabricator.wikimedia.org/T359067#9692848 (10elukey) 05Open→03Resolved [14:14:04] 06Machine-Learning-Team, 13Patch-For-Review: 14Create a Pytorch base image - 14https://phabricator.wikimedia.org/T360638#9692859 (10elukey) 05Open→03Resolved [14:16:56] hi Chris! [14:29:06] o/ I'm going to roll out RRLA KI v0.6 to production [14:29:36] I'll be monitoring the istio and kserve dashboards to make sure everything works fine after deployment [14:29:46] https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&var-cluster=eqiad%20prometheus%2Fk8s-mlserve&var-namespace=revertrisk&var-backend=All&var-response_code=All&var-quantile=All&from=now-1h&to=now&refresh=30s [14:29:52] https://grafana.wikimedia.org/d/n3LJdTGIk/kserve-inference-services?orgId=1&var-cluster=eqiad%20prometheus%2Fk8s-mlserve&var-component=All&var-namespace=revertrisk&var-model_name=revertrisk-language-agnostic&from=now-1h&to=now [14:44:17] aiko: for the next time, deploying on a friday is generally not advised, if anything goes south it may spill to the weekend [14:52:48] elukey: ah sorry > I'll wait until Monday :) [14:58:01] deployment on codfw looks good but we don't have traffic there [15:00:30] aiko: it is fine to complete if codfw looks ok, so we are not imbalanced (say if SRE needs to depool eqiad etc..) [15:00:45] just keep it in mind for the next time :) [15:24:40] ack! [15:45:38] aiko: time for a quick code review? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017284 [15:47:30] (03PS2) 10Elukey: python: upgrade aiohttp's version to avoid issues with py3.11 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1015070 [15:57:17] elukey: looking [15:59:55] super thanks [16:00:06] also the next in the chain if you have more time, it is related to RR [16:00:15] I'll deploy only to staging [16:14:23] +1 [16:16:04] thanksssss [16:47:09] going afk for the weekend folks! Have a nice rest of the day! [16:47:18] (and weekend [16:47:19] ) [16:52:09] bye Luca! have a nice weekend :D [18:57:01] 06Machine-Learning-Team, 13Patch-For-Review: Deploy RevertRisk language-agnostic with knowledge integrity v0.6.0 - https://phabricator.wikimedia.org/T360423#9693606 (10achou) [19:00:33] 06Machine-Learning-Team, 13Patch-For-Review: 14Deploy RevertRisk language-agnostic with knowledge integrity v0.6.0 - 14https://phabricator.wikimedia.org/T360423#9693612 (10achou) 05Open→03Resolved 14We have deployed the new RRLA model server to production.