[03:33:48] (03CR) 10Kevin Bazira: "Thank you for working on this Ilias!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [07:31:18] (03PS12) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [07:31:51] (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [07:38:23] (03PS13) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [07:38:41] (03CR) 10Ilias Sarantopoulos: article-descriptions: enable local run (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [07:38:49] o/ [07:40:46] kevinbazira: thanks for the review! I made some changes. I'm not sure how to best tackle the rest gateway url . I added a solution and we can discuss about it [07:42:46] isaranto: o/ [07:42:56] ok, let me check ... [07:50:37] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.42-notes (1.42.0-wmf.9; 2023-12-12): Update ORES extension configuration - https://phabricator.wikimedia.org/T351703 (10isarantopoulos) 05Open→03Resolved [07:52:01] (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [07:52:11] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.42-notes (1.42.0-wmf.9; 2023-12-12): Update ORES extension configuration - https://phabricator.wikimedia.org/T351703 (10isarantopoulos) `OresLiftWingAddHostHeader` is now set to true by default for all mediawiki deployments. [07:53:01] (03PS14) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [07:53:40] missed a change so CI failed --^. Just updated it! [08:22:19] (03CR) 10Kevin Bazira: article-descriptions: enable local run (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [08:22:44] isaranto: most of the changes LGTM, I've just added a small correction about the rest-gateway endpoint path. [08:26:32] good morning folks [08:26:56] isaranto: o/ remember to change the integration/config paths beforehand [08:27:59] ah already done, lovely [09:06:48] o/ yeah I had changed them but forgot that bit [09:07:27] * elukey bbiab [09:33:01] (03PS15) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [09:33:34] (03CR) 10Ilias Sarantopoulos: article-descriptions: enable local run (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [09:35:14] kevinbazira: did you manage to run the model locally? Please spend some time to do it to make sure it works and I haven't missed something or if we need to add something in the instructions [09:45:59] Morning! [09:46:25] elukey: I presume you already synced Kevin's update of art-desc? [09:46:48] (the pod shows 62m age, so...) [09:47:39] Ah, in experimental kevin can actually do it himself [09:47:57] So did the asyncio session name change help? [09:51:49] Morning Tobias! [09:53:29] afaik we can all run helmfile sync them in all our namespaces (at least I can). If this is not the case we should follow up and allow everyone to do it [09:55:53] ack. It does make sense you have that ability, after all [09:56:16] I guess the only things only Luca and I can do are the admin_ng charts (e.g. network policy) [09:58:02] Yes that's what I remember [09:58:20] (03CR) 10Kevin Bazira: [C: 03+1] article-descriptions: enable local run (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [09:58:48] isaranto: I have built the image on the ml-sandbox and run it locally. the model-server run well and returned the expected prediction. thank you for working on this. added a +1! [09:58:48] klausman: o/ [09:58:48] yes, I updated the isvc image. when I send a request it hangs like it did yesterday and times out: [09:58:48] ``` [09:58:48] $ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1 [09:58:48] request timeout [09:58:48] real 5m0.084s [09:58:49] user 0m0.013s [09:58:49] sys 0m0.013s [09:58:50] ``` [10:00:44] weird. The hand-built requests still work (tried a few secs ago) [10:01:30] I wish pdb could attach to a running program [10:05:07] Another interesting thing in the logs I see is "INFO:root:Opening a new Asyncio session for restgateway." _twice_ [10:07:35] when you're in the model-server under path: `/srv/article_descriptions/model_server` [10:07:35] try running the python code below and we see at what point it fails. we would like to especially see the `print(preprocessed_data)` part. [10:07:35] ``` [10:07:35] # load model [10:07:35] import model [10:07:36] model_server = model.ArticleDescriptionsModel("article_descriptions") [10:07:36] # preprocess [10:07:37] import asyncio [10:07:37] input = { [10:07:38] "lang": "en", [10:07:38] "title": "Clandonald", [10:07:39] "num_beams": 2 [10:07:39] } [10:07:40] preprocessed_data = asyncio.run(model_server.preprocess(input)) [10:10:34] `import model` is taking a long time, but I guess that's to be expected [10:14:06] yeah, that just makes the container OOM in the end :) [10:15:05] ok so we are running into a memory issue. is it possible to bump up the memory for this test? [10:15:47] Working on it [10:16:01] kevinbazira: o/ can I try to make some requests to article-desc in staging? as test [10:16:10] I don't want to step on your current testing setup [10:16:41] elukey: sure no problem. [10:17:25] Go ahead, holding off on memory edits [10:19:48] kevinbazira: I don't get a request timeout though [10:19:50] it hangs [10:20:01] The timeout is 5m [10:20:02] ah wait you have 5 minutes [10:20:04] yes yes [10:20:40] I am checking via nsenter if the code is hanging waiting for the network, but no socket in SYN or similar [10:21:17] I have too little knowledge of what asyncio might be doing internally to know if maybe it's waiting for something to happen. DNS resolution clearly works in the requests case. [10:23:54] elukey@ml-staging2001:~$ ps -eLf | grep 3203798 | wc -l [10:23:55] 138 [10:24:13] does it use xgboost? [10:25:00] ahahahahah wow [10:25:04] klausman: https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=experimental&var-pod=article-descriptions-predictor-default-00008-deployment-64hstgv&var-container=All [10:25:05] no it doesn't [10:25:18] it is being heavily throttled [10:25:28] see the kserve container in the graph above [10:26:16] that memusage was likely me running a second python process loading the model [10:26:33] not mem usage, cpu throttling [10:26:47] well, loading the model is likely hitting CPU as well [10:27:05] At least the moments around 10:10 were me, after 10:25 probably not [10:27:26] But why would asyncio burn so much CPU for five minutes? [10:28:05] so the number of threads is 138 (see above), if some of them work at the same time the Limit is hit very fast [10:28:14] let's use perf to figure out what it is doing [10:28:46] can nsenter use external (host-side) tools like strace or ltrace? [10:29:14] you can strace a process, it runs on different namespaces but it is not a problem [10:29:18] for perf is the same [10:29:32] so I started another call to article descr [10:29:36] we have 5 mins :) [10:30:04] I am on ml-staging2001, just ran `sudo perf record -F 99 -p 3203798` [10:30:13] will wait some seconds, then I'll run perf report -n [10:30:33] at the top I see [10:30:34] 33.76% 368 python3 libgomp-a34b3233.so.1 [10:30:59] that's a multiprocessing lib [10:31:05] libgomp is OpenMP, it seems to be the same issue that we had with xgboost [10:31:12] when not recognizing the cgroups v2 limits [10:31:47] So you're thinking it spawns a lot of threads that are then ground to standstill by the cgroup limits? [10:31:47] klausman: do you want to try perf on ml-staging2001? [10:32:25] sure [10:32:49] go ahead I am not using it now [10:32:55] the request is still ongoing [10:33:07] kevinbazira: lemme know if you want more info about what's happening [10:34:13] elukey: I am following :) [10:34:30] sure I meant if you have doubts questions etc.. [10:35:35] Hmm perf record -n just hangs? [10:36:12] you need to use perf record -F 99 -p $pid, then perf report -n [10:36:24] at least this is what I usually do first, there are other combinations [10:36:35] yes, That's what I am trying [10:36:48] no you have `perf record -n` [10:36:52] not `report` [10:36:56] oh [10:37:15] dman you, short edit distance! [10:37:43] yeah, showing hanging around in libgomp and libtorch a lot [10:37:44] I used nsenter -m to get the content of /opt/lib/python/site-packages/ in the container, and I see [10:37:47] root@ml-staging2001:/opt/lib/python/site-packages# find -name *gomp* [10:37:50] ./torch/lib/libgomp-a34b3233.so.1 [10:38:40] But libtorch is supposed to be there, no? [10:38:52] yes torch is used IIRC [10:38:54] https://github.com/pytorch/pytorch/issues/57715 [10:39:27] https://github.com/pytorch/pytorch/issues/18183#issuecomment-474629623 [10:39:56] so OMP_NUM_THREADS may need to be added [10:40:09] Should be easy enough [10:40:25] we can try to set it briefly to see if it solves [10:40:37] but it is not great for maintainability [10:40:57] catboost and xgboost do recognize cgroupsv2 and their limits [10:41:09] Arguably, we want that env var permanently, even if it doesn't completely solve this, so we might as well make it a whole change [10:41:12] but maybe the article-description's code does something weird behind the scenes [10:41:29] (as opposed to kubectl edit) [10:41:48] klausman: I'd prefer not to have any variables, so we can change the limits as we want and torch automatically adjusts [10:42:00] otherwise we will surely forget to fix values [10:42:12] Agreed, but as it is, it doesn't work at all. [10:42:39] sure, but I'd like to check the article-descr code first [10:42:45] to make sure that they don't set num_threads or similar [10:42:50] I didn't mean permanent as in "forever", but trather making it oart of the chart [10:43:02] s/oart/part/ [10:43:08] or values, rather. [10:43:16] we can add custom env variables to isvcs, no need for changes (just add them in helmfile's value.yaml) [10:43:25] yeah. [10:44:00] kevinbazira: what is the repo that contains the code for article-description? [10:44:08] RR-wikidata has the entry already, it's simple c&p [10:44:13] can you check if they use something weird like num_threads etc.. ? [10:44:20] elukey: https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/article-descriptions/model-server/model.py [10:44:42] yep that's it [10:44:44] 10Machine-Learning-Team: refactor revertrisk model server to run locally - https://phabricator.wikimedia.org/T352181 (10achou) a:03achou [10:44:47] o/ I'm following the conversation as well. I'm curious why other model servers that use pytorch dont do this (like nllb) [10:44:56] I mean if there was anything external like we have for readability [10:45:02] I know where to find model.py :) [10:45:20] isaranto: same question, maybe it depends on the version? [10:45:25] or functions used [10:45:47] Luca is referring to this repo https://github.com/wikimedia/descartes [10:45:53] I don't see any issue there [10:46:07] yes thank you :) [10:47:27] jumping in a call with Aiko and I'll be back. I want to update the README with some info missing that Aiko pointed out and then I'll push the change. If you want I can set the env var OMP_NUM_THREADS in the patch I'll create to update the image [10:47:54] nono we can set it in helmfile's value.yaml [10:48:07] klausman: can you kubectl edit and add it on the fly to see if we resolve? [10:48:10] already making a change [10:48:16] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/979042 [10:49:13] klausman: nit on the commit msg - we use ml-services: description or similar [10:50:22] fixed [10:50:54] ack [10:51:04] doing the kubctl edit now [10:52:25] another thing to ask in our form for onboarding models - threads, parallelism, etc.. [10:52:46] it should also be easy to check it when testing on ml-sandbox or locally [10:52:52] Ok, new pod is up, hitting it with a query [10:53:03] takes 15s, but it works! [10:53:12] super [10:53:22] kevinbazira: can you confirm? [10:53:35] okok checking ... [10:55:06] \o/ [10:55:07] it's working thank you klausman and elukey: [10:55:07] ``` [10:55:07] time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1 [10:55:07] {"lang":"en","title":"Clandonald","blp":false,"num_beams":2,"groundtruth":"Hamlet in Alberta, Canada","latency":{"wikidata-info (s)":0.04006314277648926,"total network (s)":0.30153417587280273,"model (s)":13.69589614868164,"total (s)":13.997446298599243},"features":{"descriptions":{"fr":"hameau d'Alberta","en":"hamlet in central Alberta, Canada"},"first-paragraphs":{"en":"Clandonald is a hamlet in central Alberta, Canada within the [10:55:07] County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.","fr":"Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."}},"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"]} [10:55:07] real 0m14.030s [10:55:08] user 0m0.013s [10:55:08] sys 0m0.000s [10:55:09] ``` [10:55:54] kevinbazira: one suggestion - for long paste let's use https://phabricator.wikimedia.org/paste/ [10:56:24] For screenshots, https://phabricator.wikimedia.org/file/ is great [10:58:42] elukey: should I wait for your +1 on the OMP change, or just submit? [10:58:54] already +1ed [10:59:14] huh. completely missed that. Submitting. [11:01:04] as FYI I just upgraded the ml-staging-codfw istio control plane [11:02:04] the bullseye update? [11:02:25] yep [11:02:43] Roger that [11:25:28] 10Machine-Learning-Team: Fix istio gateway's PodDisruptionBudgets for ml-serve - https://phabricator.wikimedia.org/T352400 (10elukey) [11:30:02] * klausman lunch [11:35:51] aiko: https://docs.python.org/3/reference/import.html#regular-packages on __init__.py [11:38:40] (03PS16) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [11:45:06] Nice job making this work folks! [11:45:13] isaranto: o/ nice, thanks! [11:54:50] * elukey lunch! [12:04:11] aiko: let me know if you are you ok with the article-descriptions patch [12:31:43] (03PS17) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [12:32:05] I added the git clone step that was missing --^ and tested the whole process [12:32:09] * isaranto afk lunch [13:13:30] (03CR) 10AikoChou: [C: 03+1] "LGTM! just have a question to better understand a change you made." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [13:17:05] aiko: the issue is tha setting headers= when creating the object would throw an error as there is no such class attribute as a parameter in the constructor and self.headers is initialized afterwards [13:32:35] isaranto: but we set it after creating the object? I meant the change here https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/976670/4..5 you removed line 134 session.headers["Host"] = host_header and added line 136 session_params={"headers": {"Host": host_header}}, [13:32:52] (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [13:33:31] isaranto: session.headers["Host"] = host_header would throw an error? [13:35:38] no no that would work. what failed was this example https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/c46ed2a8e02637cfcaf37b9e42f9fda345eb3b02%5E%21/article-descriptions/model-server/model.py [13:35:51] Morning all [13:37:20] the only reason I changed the working part was to simplify the code. Since the constructor offers the variable setting it is a better practice to use that function instead of manually modifying a class attribute [13:37:37] which would be private in another language other than python [13:37:42] o/ Chris! [13:42:06] isaranto: ooh got it! yeah I think using the variable setting is better. I'll change it in revertrisk as well. [13:42:48] morning Chris o/ [13:45:34] (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [13:49:28] kevinbazira: aiko: oupsy it seems that headers is not set that way. The way you folks did it previously is the only way to set the headers [13:50:31] there is a session inside session so my approach sets the header in the session inside the object which is not used in the request [13:50:33] https://github.com/mediawiki-utilities/python-mwapi/blame/master/mwapi/async_session.py#L84 [13:51:01] in my patch I set the host in self.session.headers where it should be set in self.headers [13:52:37] well , saved by CI as the llm image failed so the patch wasn't merged [13:53:23] isaranto: ahhh I see. yeah that wouldn't work [13:54:45] yep, local runs couldn't reveal this but k8s/LiftWing would have asked for the host headers :) [13:54:52] (03PS18) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) [13:55:21] I'm going to open a patch for mwapi to support setting headers for async requests [14:00:10] ack [14:02:56] *pull request [14:03:22] pull request/patch/merge request let's just give it one name :) [14:04:59] (just kidding, don't want to open that can of worms - the one about proper names) [14:11:19] (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [14:18:56] (03Merged) 10jenkins-bot: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [14:42:50] my internet router failed earlier today. DUnno yet if it's bricked or fixable. Will have spotty availability for no [14:42:57] now* [14:44:58] ack! [15:08:24] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10Samwalton9-WMF) [15:18:22] (03CR) 10Kevin Bazira: [C: 03+1] "ok, so this means we'll have to set the `LOW_CPU_MEM_USAGE="True"` in the helm configs." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [15:20:51] (03CR) 10Elukey: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [15:27:38] (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [15:28:00] (03PS2) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) [15:28:25] (03CR) 10Elukey: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [15:29:36] (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [15:32:30] (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [16:58:09] (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [16:58:53] hm, I think that PageTriage extension is still using ores.wikimedia.org behind the hood https://github.com/wikimedia/mediawiki-extensions-PageTriage/blob/46ba114c53a30582299645565bfb1f5e4711f8df/includes/Api/ApiPageTriageList.php#L15 [16:58:55] (03Merged) 10jenkins-bot: article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos) [16:59:06] I'll have to verify by looking at the code + access logs [17:09:28] Updated the article-desc with latest image. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/979114 [17:09:46] Going afk folks, cya tomorrow! [17:15:32] kevinbazira: thanks for the +1 I just merged and deployed it! [17:18:21] * elukey afk! [17:19:06] oh no, the article-desc model now fails :( [17:26:40] 10Machine-Learning-Team, 10Patch-For-Review: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) After deploying the changes in this task I'm getting a 500 with the following error logs ` Traceback (most recent call last): File "/opt/lib/python/site... [17:27:54] I reverted the deployment and will fix it later or in the morning [17:28:07] * isaranto afk! nighty night! [17:29:06] isaranto: revertrisk is running locally \o/ [17:29:24] 🎉 [17:30:05] I'll submit the patch and you can review it tomorrow! [17:30:41] have a nice evening :) [17:30:53] aiko: I was thinking the following: in order to make it even easier for anyone to run the model server locally to add a script (a Makefile would be better) and have all the instructions there [17:31:41] which will include a wget command to download the model from the public repo (analytics.wikimedia.org) etc. If folks like the idea we can create a task for it [17:31:49] isaranto: kubectl logs show: [17:32:00] https://www.irccloud.com/pastebin/ZbOcaOf2/ [17:32:19] this URL: [17:32:19] ``` [17:32:19] http://api-ro.discovery.wmnet/v1/page/summary/Clandonald [17:32:19] ``` [17:32:19] should be: [17:32:20] ``` [17:32:20] http://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/page/summary/Clandonald [17:32:21] ``` [17:32:36] isaranto: that sounds good! +1 [17:33:00] *rather: [17:33:00] ``` [17:33:00] http://rest-gateway.discovery.wmnet:4111/en.wikipedia.org/v1/page/summary/Clandonald [17:33:00] ``` [17:33:51] you're right I missed that part when I copied the logs [17:35:27] sorry for the back and forth with the fixes Kevin! The old version is running and I'll fix this tomorrow [17:36:02] sure sure, no problem. we shall pick this up tomorrow. [17:36:07] enjoy your evening o/