[06:01:26] Guten Morgen o/ [06:13:04] we're having a "wonderful" heatwave here :) [06:42:20] 06Machine-Learning-Team: Solve revscoring models increased latencies for big revision sizes - https://phabricator.wikimedia.org/T366772#9878277 (10isarantopoulos) [06:43:14] 06Machine-Learning-Team: Apply multi-processing to preprocess() in isvcs that suffer from high latency - https://phabricator.wikimedia.org/T349274#9878279 (10isarantopoulos) [06:43:15] 06Machine-Learning-Team: Solve revscoring models increased latencies for big revision sizes - https://phabricator.wikimedia.org/T366772#9878280 (10isarantopoulos) [06:43:55] 06Machine-Learning-Team: Solve revscoring models increased latencies for big revision sizes - https://phabricator.wikimedia.org/T366772#9878283 (10isarantopoulos) [06:43:55] 06Machine-Learning-Team: Tweak partman recipe for ML k8s workers - https://phabricator.wikimedia.org/T365971#9878284 (10isarantopoulos) [06:43:56] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services - https://phabricator.wikimedia.org/T362674#9878282 (10isarantopoulos) [08:40:34] good morning o/ [08:40:58] isaranto: did you turn on AC already? :D [08:41:21] full on [08:41:26] morniing [08:42:12] I try to delay it a bit but by noon it is boiling hot. I think it is going to be like that until friday [08:55:19] (03CR) 10Kevin Bazira: [C:03+1] "Thank you for working on this Ilias! The hf model-server README contains instructions for updating kserve that mention syncing the `liftwi" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos) [09:02:55] (03PS4) 10Ilias Sarantopoulos: huggingface: kserve 0.13.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) [09:03:42] Morning! [09:04:30] (03PS5) 10Ilias Sarantopoulos: huggingface: kserve 0.13.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) [09:05:14] (03CR) 10Ilias Sarantopoulos: "Right! thanks for spotting this. I updated it to give both options (branch and tag)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos) [09:05:24] o/ Tobias! [09:26:56] (03CR) 10Kevin Bazira: "Thank you for updating this. LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos) [09:38:01] 06Machine-Learning-Team, 06DC-Ops: hw troubleshooting: memory errors during boot for ml-staging2001.codfw.wmnet - https://phabricator.wikimedia.org/T366670#9878631 (10klausman) I repooled the machine just now, as I don't want to fly this close to capacity ceiling for prolonged periods. [09:45:31] 06Machine-Learning-Team: Tweak partman recipe for ML k8s workers - https://phabricator.wikimedia.org/T365971#9878667 (10klausman) 05Open→03Resolved [09:47:01] (03CR) 10Ilias Sarantopoulos: [C:03+2] huggingface: kserve 0.13.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos) [09:47:44] (03Merged) 10jenkins-bot: huggingface: kserve 0.13.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041082 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos) [10:02:16] * klausman lunch [10:48:48] 06Machine-Learning-Team, 06serviceops, 13Patch-For-Review: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9879003 (10JMeybohm) [11:01:09] There is a network policy update that affects ores-legacy (and rec-api). I will deploy & test in staging, then codfw [11:01:31] isaranto: are there any pending changes for o-legacy that should not be deployed? [11:05:37] Doesn't look like it, proceeding [11:10:46] All is fine! Proceed! [11:10:54] * isaranto afk lunch [11:30:04] Ok, all deployed [12:34:05] https://news.ycombinator.com/item?id=40610794 - "Ask HN: Machine learning engineers, what do you do at work?" [12:34:08] :D [12:36:55] lol [12:37:47] Cry over YAML. Sob about Helm. Rage at Istio. And that's just the morning! [12:38:03] Ohway no, that's ML SREs. [12:44:16] Goood morning all! [12:50:17] good morning Chris o/ [12:55:24] 06Machine-Learning-Team, 06Research: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#9879514 (10isarantopoulos) Adding a todo list of tasks: [] Add a model server to inference-services - start with a dummy preprocess/predict function can be just identity functions [] Dreate b... [13:12:22] 06Machine-Learning-Team, 06serviceops, 13Patch-For-Review: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9879651 (10JMeybohm) [13:28:48] kevinbazira: o/ if you want to help you can work in one of the following things from the list https://phabricator.wikimedia.org/T360455#9879514 [13:29:07] I'm creating the dummy service at the moment [13:29:16] we can discuss it later! [13:34:22] isaranto: okok. will there be a model onboarding presentation for the team? [13:44:44] not a presentation, but I have added it as a topic for tomorrow's meeting so we can have the "model onboarding session" as we have had before and discuss any potential issues and solutions [13:45:45] great! ty for adding the topic. [13:46:56] this one is a bit different because it is an internship project [14:00:45] 06Machine-Learning-Team, 05Goal: 2024 Q4: Users can "pip install liftwing" and access 20% of models - https://phabricator.wikimedia.org/T359140#9879944 (10isarantopoulos) We made request validation optional and it is now really simple to add support for a new model to the package. Have also added metadata (opt... [14:40:21] 06Machine-Learning-Team: Investigate kserve 0.13.0 upgrade - https://phabricator.wikimedia.org/T367048#9880128 (10isarantopoulos) [14:41:01] 06Machine-Learning-Team: Investigate kserve 0.13.0 upgrade - https://phabricator.wikimedia.org/T367048#9880130 (10isarantopoulos) a:03isarantopoulos [14:45:27] 06Machine-Learning-Team: Solve revscoring models increased latencies for big revision sizes - https://phabricator.wikimedia.org/T366772#9880134 (10isarantopoulos) 05Open→03Declined Covered by other tasks (https://phabricator.wikimedia.org/T363336) [14:48:28] 06Machine-Learning-Team, 13Patch-For-Review: Use local tls proxy for Lift Wing staging (inference-staging) - https://phabricator.wikimedia.org/T366801#9880143 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [14:59:06] (03PS1) 10Ilias Sarantopoulos: ci: .gitignore(s) only top level /models dir [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1041680 [15:04:48] btw Mercelis noticed an issue with the API GW docs https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_language_agnostic_prediction. The code blocks for curl and python aren't visible . For me it works with Firefox and Safari but not with Chrome [15:59:52] I've visited the link and code blocks for curl, python and JS are visible on both Chrome v125.0.6422.141 and Firefox v127.0. [16:01:23] aha! ok thanks Kevin! [16:01:49] np [16:07:16] logging off folks, have a nice day/afternoon/evening o/ [21:03:20] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install ml-staging2003 - https://phabricator.wikimedia.org/T357415#9882249 (10RobH)