[04:26:54] 10Lift-Wing, 10Machine-Learning-Team, 10I18n, 10NewFunctionality-Worktype, 10Patch-For-Review: Create a language detection service in LiftWing - https://phabricator.wikimedia.org/T340507 (10santhosh) [06:06:24] 10Machine-Learning-Team, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10isarantopoulos) The intended use of this image is to be used with GPUs, so torch/lib/rocblas is def needed for operation... [07:28:12] (03PS6) 10Santhosh: Add language identification service [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/932828 (https://phabricator.wikimedia.org/T340507) [07:35:12] (03CR) 10CI reject: [V: 04-1] Add language identification service [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/932828 (https://phabricator.wikimedia.org/T340507) (owner: 10Santhosh) [08:42:11] 10Machine-Learning-Team: The nsfw model hangs in predict() after moving to Kserve 0.10 - https://phabricator.wikimedia.org/T331416 (10elukey) @Htriedman The model server seems really not used/supported, shall we remove it from Lift Wing for the moment? [08:48:11] elukey: o/ I think everything required by https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/929690 in the schema repo and MW EventBus are merged? Not really urgent at this point but wanted to check with you what would be best to keep track of it, e.g. should I file a task for the deploy? [08:48:41] s/are merged\?/are merged./ [08:59:00] dcausse: o/ we have a problem with CI at the moment but I'll take care of merging and deploying! [08:59:03] sorry for the delay [08:59:21] isaranto: I think that we should add the CI configs for the language identification service [08:59:49] elukey: actually I think I messed up something and https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/929735/5/python/preprocess_utils.py might be required sooner than I thought [09:00:11] (03PS4) 10Elukey: Unify the meta subfield in events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929690 (owner: 10DCausse) [09:02:02] (03PS6) 10Elukey: events: propagate the event time with the dt field [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929735 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [09:02:08] (03PS4) 10Elukey: events: drop support for /mediawiki/revision/create#1.x events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930665 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [09:02:56] yes I messed up something... :(, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/930668 will ride the train [09:03:50] that'll start producing revision/2.0.0 events that'll probablye get rejected without https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/929735 [09:03:55] damn sorry about that [09:06:36] dcausse: no problem! When will the next train run? [09:07:20] (03CR) 10Elukey: [C: 03+2] Unify the meta subfield in events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929690 (owner: 10DCausse) [09:07:46] elukey: today for group1 :( [09:07:55] ok the CI issues are not related to this so I can merge and update the docker images :) [09:08:44] thanks! [09:09:30] (03CR) 10DCausse: [C: 04-1] "this one will have to wait for MW 1.41.0-wmf.15 to be deployed everywhere" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930665 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [09:11:34] elukey: everything is ready regarding the ci configs for language identification service [09:13:12] let's do it then :) [09:13:36] (03Merged) 10jenkins-bot: Unify the meta subfield in events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929690 (owner: 10DCausse) [09:15:51] (03CR) 10Elukey: [C: 03+2] events: propagate the event time with the dt field [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929735 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [09:22:14] (03Merged) 10jenkins-bot: events: propagate the event time with the dt field [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929735 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [09:23:33] 10Lift-Wing, 10Machine-Learning-Team, 10I18n, 10NewFunctionality-Worktype, 10Patch-For-Review: Create a language detection service in LiftWing - https://phabricator.wikimedia.org/T340507 (10isarantopoulos) @santhosh is this model service going to be used by an existing service? Is it going to replace a o... [09:26:34] isaranto: I am thinking that maybe it would be nice to add ores-legacy.wikimedia.org, so that we can test it end-to-end [09:26:50] 100% in [09:26:57] then we'll just add a CNAME ores.wikimedia.org -> ores-legacy [09:27:23] we'll need to add a SAN to the TLS cert of ores-legacy but I think it should be fine [09:27:35] shall I try to do it? [09:28:00] nono you have enough on your plate, lemme take care of it [09:28:57] I'm a bit blocked trying to break the wall with ores extension, so let me know if I can help somehow [09:30:06] also regarding new Lift Wing requests I'll make a request so that we have a custom form on phabricator https://www.mediawiki.org/wiki/Phabricator/Help/Forms#Creating_custom_forms [09:31:38] does anyone have access to this url? https://phabricator.wikimedia.org/maniphest/task/edit/form/17/?projects=Phabricator [09:31:42] I'm getting access denied [09:32:53] I do yes [09:47:33] regarding GPU tests I just figured I can just go and do them in an interactive shell on statnodes [10:34:13] yep , everything kind of works (apart from older rocm version) but experimenting will be much faster/better that creating patches, go through CI etc [10:43:29] * isaranto afk lunch [10:45:28] all patches are ready to make ores-legacy live, will wait for Traffic's +1 [10:45:39] hopefully we'll have it in the afternoon [10:45:48] also, ml-cache cassandra runs on java 11 [10:48:05] * elukey lunch [11:39:15] wow great stuff! [11:43:06] * klausman lunch too [11:43:26] I need comfort food after wrangling mediawiki syntax all morning :) [12:41:43] 10Machine-Learning-Team, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10isarantopoulos) There isn't any cache in the image since only the specific files are copied in the production variant. B... [13:17:17] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10jnuche) [13:54:17] klausman: after puppet updates the cp nodes we should be able to have ores-legacy.wikimedia.org [13:54:27] \o/ [13:54:46] I fixed the SANs on the istio ingress, so we will be able to just CNAME ores.w.o -> ores-legacy.w.o in theory [13:55:12] and see the new API to test it etc.. [13:57:20] \o/ [14:00:32] Just in time for the brundown meeting :) [15:00:39] https://github.com/asciimoo/wuzz <- I came across this tool the other day, may come in handy when debugging http endpoints/rest services [15:00:54] It's sortof "interactive curl" [15:01:39] it works! https://ores-legacy.wikimedia.org/ [15:01:56] Hooray! [15:02:22] and here https://ores-legacy.wikimedia.org/docs [15:02:38] great job folks [15:03:48] mmm it is weird, I get nxdomain [15:04:32] does it work for all of you? [15:04:47] definitely works here, yes [15:05:03] maybe you have a caching resolver somewhere that got primed during testing? [15:05:12] works for me! [15:05:38] also works from sta1004 [15:05:40] +t [15:06:14] mmm I get nx domain also via dig ores-legacy.wikimedia.org @8.8.8.8 [15:06:43] That gives me a correct anser (185.15.58.224 aka dyna) [15:07:08] Does your ISP maybe intercept DNS? [15:07:21] yeah I think my router is messing with me [15:12:25] lol I disabled the firewall on the vodafone station in my office and now it works [15:13:14] isaranto: can we make https://ores-legacy.wikimedia.org/docs to be the front page via uvicorn? [15:16:27] sure, I'll do it [15:36:12] hey folks I'd need a review for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/933967/ [15:36:42] the mw train went out with a change and we'd need to fix the model servers [15:39:57] on it [15:46:11] (03PS1) 10Ilias Sarantopoulos: ores-legacy: redirect root to docs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/933970 [15:46:36] (03CR) 10Elukey: [C: 03+1] "<3" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/933970 (owner: 10Ilias Sarantopoulos) [15:47:23] (03CR) 10CI reject: [V: 04-1] ores-legacy: redirect root to docs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/933970 (owner: 10Ilias Sarantopoulos) [15:47:38] The simplest way is to just redirect root to docs, so I did that [15:47:50] yes yes it looks ok [15:47:59] elukey: there is just a typo in your patch, other than that feel free to deploy! [15:48:06] I double checked the images [15:48:20] saw it thanks! [15:48:33] I am going to deploy drafttopic and outlink, since we have streams for those [15:48:36] the rest tomorrow [15:48:44] cc: dcausse: --^ [15:50:58] elukey: <3 [15:54:03] dcausse: done! [15:54:09] thanks!! [15:55:16] going afk for today folks! [15:55:20] have a nice rest of the day [15:58:44] (03PS2) 10Ilias Sarantopoulos: ores-legacy: redirect root to docs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/933970 [15:59:31] ciao! I'll be logging off as well in a short while. have a nice evening every1 [16:13:13] Hello, where can I find a list of all supported languages in revertrisk-language-agnostic model? Example: be-tarask: "not supported lang", be_tarask: "not supported lang", be-x-old: just a timeout (!) error. [16:16:07] 10Lift-Wing: Test LiftWing API/Predictions from Hadoop - https://phabricator.wikimedia.org/T304425 (10Aklapper) [16:31:34] 10Machine-Learning-Team, 10Gerrit-Privilege-Requests: Grant ML Team members +2 rights to the recommendation-api repository - https://phabricator.wikimedia.org/T340531 (10Aklapper) **[offtopic]** @kevinbazira: Hi, as this ticket seems to have been created in a WMF staff role, it would be helpful for faster veri... [16:43:49] 10Machine-Learning-Team, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10hashar) 05Resolved→03Open [17:12:14] Hi Iluvatar, thanks for reaching out. The revertrisk-language-agnostic model supports any wikipedia language edition (total 253 wikis) including 'be-x-old' [17:12:32] but I just discovered there is a bug in our codebase for this language that results in a timeout error. I will fix this issue asap. [17:13:04] I will also add a redirect between 'be_tarask' and 'be-x-old', so that you can use 'be_tarask' to query [17:14:40] Thanks!:) [19:30:28] 10Machine-Learning-Team, 10Gerrit-Privilege-Requests: Grant ML Team members +2 rights to the recommendation-api repository - https://phabricator.wikimedia.org/T340531 (10thcipriani) 05Open→03Resolved a:03thcipriani Added folks to the `mediawiki-services-recommendation-api` group in Gerrit (which was prev...