[07:41:33] klausman: ah okok! I was worried that I missed something :) I agree that it could lead to problems, hopefully not, we'll see! [08:47:04] (03CR) 10Elukey: WIP - outlink: use tornado async http client to fetch outlinks (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/807135 (https://phabricator.wikimedia.org/T311043) (owner: 10AikoChou) [08:47:55] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Test async preprocess on kserve - https://phabricator.wikimedia.org/T309623 (10kevinbazira) @achou thank you for digging into the async-mediawiki library. Following yesterday's chat in the meeting, I wonder whether we would benefit more... [08:57:43] (03PS2) 10Elukey: WIP - add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [09:00:09] (03CR) 10CI reject: [V: 04-1] WIP - add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [10:12:30] (03PS3) 10Elukey: editquality: add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [10:19:10] (03PS4) 10Elukey: editquality: add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [10:20:24] (03PS5) 10Elukey: editquality: add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [10:23:31] (03PS6) 10Elukey: editquality: add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [10:28:24] * elukey lunch! [10:28:37] event generation code basically ready, now I need to test it somehow [12:36:21] elukey: Where do we define the (default) replicacount for the revscoring pods on LW? I have tried two different approaches, but I am not seeing a diff in the Jenkins-triggered tests [13:15:21] klausman: if you check the .fixtures of the inference-services chart there should be some example IIRC [13:18:20] but it is 1 by default [13:19:03] ok so this is an example [13:19:05] - name: itwiki-goodfaith [13:19:05] predictor: [13:19:05] config: [13:19:05] serviceAccountName: "kserve-override" [13:19:07] minReplicas: 2 [13:19:10] canaryTrafficPercent: 10 [13:19:21] Then I dunno why we get three replicas for recscoring [13:20:12] what namespace? [13:20:19] all of them? [13:21:19] # kubectl get pods -n revscoring-draftquality [13:21:21] NAME READY STATUS RESTARTS AGE [13:21:23] enwiki-draftquality-predictor-default-7mbtv-deployment-5d87zwlt 3/3 Running 0 23h [13:21:28] ah no okok [13:21:35] if you describe the pod you'll see 3 containers [13:21:45] istio, storage-initializer and kserve-inference [13:21:52] it is expected [13:22:08] I see. [13:22:39] Well, then I misunderstood your comment on change 811313 as well :) [13:23:56] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) Informed @LSobanski via email as well, so Data Persistence is aware of this extra new cluster :) I think that, if everybody agrees, this task... [13:24:18] klausman: what I meant was to have less isvcs defined for each namespace, to limit the number of pods [13:24:32] like instead of 5 to reduce it to 2 or 3 max at the beginning [13:24:36] (since we have two workers) [13:24:38] You mean having only , e.g. en and ru, but not more? [13:24:46] exactly yes [13:24:50] Right. [13:25:02] I mean it was a proposal, just to be more conservative on capacity in there [13:25:03] I can trim that, but how do we keep it somewhat representative? [13:25:08] but we can do anything :) [13:25:24] I mean, why not pile it in, see if things explode/swap/meltdown? [13:26:07] my view was a little different, conservative deploy and then watch for resource usage :) [13:26:39] Looking at what we have now, the machines seem msotly bored. But sure we can do a slow start [13:27:06] for example, damaging can have "en" "eswikisomething" "wikidata" [13:27:15] that are particular use cases [13:27:31] remember that Aiko and Kevin will likely deploy more pods during the coming months [13:27:38] better to leave space in my opinion [13:29:36] Alright, PTAL [13:30:09] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10elukey) @Ottomata if you have time I'd need some help in deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/810007, I... [14:13:23] (03PS7) 10Elukey: editquality: add support for revision-score events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/808247 (https://phabricator.wikimedia.org/T301878) [14:35:12] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10elukey) I tried to send an event manually with curl (see below) and I am getting: ` context":{"message":"event 50571b69-47d4-4923-8502-8524... [14:35:13] ok so mediawiki config deployed for the mediawiki.revision-score-test stream [14:35:23] but I think that eventgate's pods need to be restarted [14:35:41] I'll wait for Andrew's confirmation, and then I'll retry T301878#8055728 [14:35:59] the code is ready and it just need to be tested somewhere (maybe in the ml-sandbox) [14:39:54] nice work! [15:34:01] going afk folks, have a nice rest of the day :) [15:34:52] \o [15:36:32] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10Ottomata) eventgate-main does require a pod restart: https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#EventStrea... [17:10:42] 10Machine-Learning-Team, 10ORES, 10SRE, 10serviceops: Migrate ORES Redis servers to Stretch/Buster - https://phabricator.wikimedia.org/T224569 (10akosiaris) 05Open→03Resolved a:03akosiaris Done a long time ago. Now [misc_redis](https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc) is being us...