[11:27:42] * elukey lunch! [15:45:25] 10Machine-Learning-Team, 10Observability-Logging: Indexing errors from logs generated by Activator - https://phabricator.wikimedia.org/T288549 (10fgiunchedi) This is back just now FWIW ` "@timestamp": "2021-12-06T15:43:54.769Z", "message": "Could not index event to Elasticsearch. status: 400, action: [\"... [15:59:14] 10Machine-Learning-Team, 10ops-codfw: Possible faulty cable between asw-d-codfw and ml-serve2004 - https://phabricator.wikimedia.org/T297126 (10elukey) [16:09:41] o/ [16:11:09] 10Machine-Learning-Team, 10ops-codfw: Possible faulty cable between asw-d-codfw and ml-serve2004 - https://phabricator.wikimedia.org/T297126 (10Papaul) @elukey Replaced the cable ` Interface Admin Link Description ge-6/0/4 up up ml-serve2004`` ` note: If ml-serve200[1-4] are in service can y... [16:11:12] 10Machine-Learning-Team, 10ops-codfw: Possible faulty cable between asw-d-codfw and ml-serve2004 - https://phabricator.wikimedia.org/T297126 (10Papaul) 05Open→03Resolved a:03Papaul [16:13:58] 10Machine-Learning-Team, 10ops-codfw: Possible faulty cable between asw-d-codfw and ml-serve2004 - https://phabricator.wikimedia.org/T297126 (10elukey) Applied the "Active" label to all nodes, thanks! [16:14:15] accraze: o/ [16:24:18] thx for the code review all! upgrading the inference-service images is pretty tricky :) [16:25:44] i still haven't found a great process other than upgrading requirements.txt (swap out kfserving for kserve) and then just see what all breaks while installing in a fresh virtualenv [16:26:01] I think it is the same approach that I'd take :( [16:26:40] and of course kept running into the pip backtracking issue where it attempts to resolve dependencies by downloading all know versions and trying them out :/ [16:31:41] oh shoot one of my jenkins jobs for editquality ran for 9 hours due to backtracking :( [16:34:12] Did it still complete? [16:35:58] haha no it failed: `pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2000000` [16:36:28] https://integration.wikimedia.org/ci/job/inference-services-pipeline-editquality/42/execution/node/77/log/ [16:37:17] maybe we should reduce the max resolution depth a bit, to at least prevent bunring CPU for nine hours. [16:37:35] (not sure if resolvelib even exposes that) [16:37:59] i believe it does and i am definitely down to reduce max res depth [16:40:58] i think we can also still use the legacy-resolver flag, but that's supposed to be deprecated in the next version [16:42:38] Yeah, I think the new resolver's behavior needs to be addressed anyway, so going with the legacy flag should only be the last resort if things get completely stuck otherwise. [16:45:15] agreed, btw i checked out poetry (pip alternative) again last Friday and it seemed to do something similar for backtracking, just much much slower [16:50:10] Grrreat. [16:54:09] python dependency management is just one of things i guess ¯\_(ツ)_/¯ [16:55:17] Now if all our models were written in Go.... 8ducks and runs* [16:55:35] loooooll [16:55:45] actually im trying to think of a language i've used where i haven't encountered dependency hell at some point [16:55:52] Go's new(ish) library versioning etc system is very neat, tho [16:55:59] is it all just binaries? [16:56:20] No, but you applications specify minimum versions and it's all enforced to be semver [16:56:33] And the go building tool resolves stuff for you. [16:57:16] Granted, not all libraries are up-to-date in this regard, but for my needs, I have to yet run into an incompatibility [16:57:59] * accraze is intrigued [16:59:19] We can chat about it after the team meeting :) [16:59:29] i've only done a little bit of Go, but it I remember it felt pretty ergonomic [16:59:39] Yeah, agreed [16:59:51] accraze: interview in 1m ;) [17:00:04] haha thanx 4 the reminder :) [17:00:46] Just making sure :-P [17:16:06] Morning all! [17:16:14] My kid is sick so I have a 3yo with me today [17:16:24] Totally not going to affect my productivity :/ [19:07:10] * elukey afk! [20:52:24] 10Lift-Wing, 10Machine-Learning-Team: Configure outlink topic model deployment pipeline - https://phabricator.wikimedia.org/T290930 (10ACraze) 05Open→03Resolved a:03ACraze Alright, pipelines have been configured for the outlink topic predictor and transformer, you can see them configured in jenkins here:... [20:52:26] 10Lift-Wing, 10Machine-Learning-Team: Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10ACraze) [21:04:47] 10Lift-Wing, 10Machine-Learning-Team: Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10ACraze) 05Stalled→03Open Removing the 'stalled' status and setting back to 'open' now that the work in T272919 is complete. We have also added deployment pipelines and images to t... [22:06:34] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Add enwiki-articlequality inference service to LiftWing - https://phabricator.wikimedia.org/T294141 (10ACraze) To reach feature parity with ORES, we have added a pre-processing [... [22:30:13] Alright I’m [22:30:15] Back [22:30:26] I had to take my kid to her grandparents [22:31:22] cool cool [22:35:16] wow trying to upgrade the editquality inference-service image to use kserve v0.7.0 is turning out to be a difficult game of jenga! [22:39:26] have a WIP CR with my work so far here: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/743496 [22:40:17] have to do that weird git+https syntax for yamlconf in requirements.txt again [22:41:05] mostly because kserve deps use pyyaml==5.4 and yamlconf has updated it's deps in the repo but has not pushed a release to pypi [22:41:56] gonna go take a walk and think through my approach a bit more