[03:33:48] <wikibugs>	 (03CR) 10Kevin Bazira: "Thank you for working on this Ilias!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[07:31:18] <wikibugs>	 (03PS12) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[07:31:51] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[07:38:23] <wikibugs>	 (03PS13) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[07:38:41] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: enable local run (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[07:38:49] <isaranto>	 o/
[07:40:46] <isaranto>	 kevinbazira: thanks for the review! I made some changes. I'm not sure how to best tackle the rest gateway url . I added a solution and we can discuss about it
[07:42:46] <kevinbazira>	 isaranto: o/
[07:42:56] <kevinbazira>	 ok, let me check ...
[07:50:37] <wikibugs>	 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.42-notes (1.42.0-wmf.9; 2023-12-12): Update ORES extension configuration - https://phabricator.wikimedia.org/T351703 (10isarantopoulos) 05Open→03Resolved
[07:52:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[07:52:11] <wikibugs>	 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.42-notes (1.42.0-wmf.9; 2023-12-12): Update ORES extension configuration - https://phabricator.wikimedia.org/T351703 (10isarantopoulos) `OresLiftWingAddHostHeader` is now set to true  by default for all mediawiki deployments.
[07:53:01] <wikibugs>	 (03PS14) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[07:53:40] <isaranto>	 missed a change so CI failed --^. Just updated it!
[08:22:19] <wikibugs>	 (03CR) 10Kevin Bazira: article-descriptions: enable local run (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[08:22:44] <kevinbazira>	 isaranto: most of the changes LGTM, I've just added a small correction about the rest-gateway endpoint path.
[08:26:32] <elukey>	 good morning folks
[08:26:56] <elukey>	 isaranto: o/ remember to change the integration/config paths beforehand
[08:27:59] <elukey>	 ah already done, lovely
[09:06:48] <isaranto>	 o/ yeah I had changed them but forgot that bit
[09:07:27] * elukey bbiab
[09:33:01] <wikibugs>	 (03PS15) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[09:33:34] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: enable local run (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[09:35:14] <isaranto>	 kevinbazira: did you manage to run the model locally? Please spend some time to do it to make sure it works and I haven't missed something or if we need to add something in the instructions
[09:45:59] <klausman>	 Morning!
[09:46:25] <klausman>	 elukey: I presume you already synced Kevin's update of art-desc?
[09:46:48] <klausman>	 (the pod shows 62m age, so...)
[09:47:39] <klausman>	 Ah, in experimental kevin can actually do it himself
[09:47:57] <klausman>	 So did the asyncio session name change help?
[09:51:49] <isaranto>	 Morning Tobias! 
[09:53:29] <isaranto>	 afaik we can all run helmfile sync them in all our namespaces (at least I can). If this is not the case we should follow up and allow everyone to do it
[09:55:53] <klausman>	 ack. It does make sense you have that ability, after all
[09:56:16] <klausman>	 I guess the only things only Luca and I can do are the admin_ng charts (e.g. network policy)
[09:58:02] <isaranto>	 Yes that's what I remember
[09:58:20] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] article-descriptions: enable local run (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[09:58:48] <kevinbazira>	 isaranto: I have built the image on the ml-sandbox and run it locally. the model-server run well and returned the expected prediction. thank you for working on this. added a +1!
[09:58:48] <kevinbazira>	 klausman: o/
[09:58:48] <kevinbazira>	 yes, I updated the isvc image. when I send a request it hangs like it did yesterday and times out:
[09:58:48] <kevinbazira>	 ```
[09:58:48] <kevinbazira>	 $ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H  "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1
[09:58:48] <kevinbazira>	 request timeout
[09:58:48] <kevinbazira>	 real	5m0.084s
[09:58:49] <kevinbazira>	 user	0m0.013s
[09:58:49] <kevinbazira>	 sys	0m0.013s
[09:58:50] <kevinbazira>	 ```
[10:00:44] <klausman>	 weird. The hand-built requests still work (tried a few secs ago)
[10:01:30] <klausman>	 I wish pdb could attach to a running program
[10:05:07] <klausman>	 Another interesting thing in the logs I see is "INFO:root:Opening a new Asyncio session for restgateway." _twice_
[10:07:35] <kevinbazira>	 when you're in the model-server under path: `/srv/article_descriptions/model_server` 
[10:07:35] <kevinbazira>	 try running the python code below and we see at what point it fails. we would like to especially see the `print(preprocessed_data)` part.
[10:07:35] <kevinbazira>	 ```
[10:07:35] <kevinbazira>	 # load model
[10:07:35] <kevinbazira>	 import model
[10:07:36] <kevinbazira>	 model_server = model.ArticleDescriptionsModel("article_descriptions")
[10:07:36] <kevinbazira>	 # preprocess
[10:07:37] <kevinbazira>	 import asyncio
[10:07:37] <kevinbazira>	 input = {
[10:07:38] <kevinbazira>	     "lang": "en",
[10:07:38] <kevinbazira>	     "title": "Clandonald",
[10:07:39] <kevinbazira>	     "num_beams": 2
[10:07:39] <kevinbazira>	 }
[10:07:40] <kevinbazira>	 preprocessed_data = asyncio.run(model_server.preprocess(input))
[10:10:34] <klausman>	 `import model` is taking a long time, but I guess that's to be expected
[10:14:06] <klausman>	 yeah, that just makes the container OOM in the end :)
[10:15:05] <kevinbazira>	 ok so we are running into a memory issue. is it possible to bump up the memory for this test?
[10:15:47] <klausman>	 Working on it
[10:16:01] <elukey>	 kevinbazira: o/ can I try to make some requests to article-desc in staging? as test
[10:16:10] <elukey>	 I don't want to step on your current testing setup
[10:16:41] <kevinbazira>	 elukey: sure no problem.
[10:17:25] <klausman>	 Go ahead, holding off on memory edits
[10:19:48] <elukey>	 kevinbazira: I don't get a request timeout though
[10:19:50] <elukey>	 it hangs
[10:20:01] <klausman>	 The timeout is 5m
[10:20:02] <elukey>	 ah wait you have 5 minutes
[10:20:04] <elukey>	 yes yes
[10:20:40] <elukey>	 I am checking via nsenter if the code is hanging waiting for the network, but no socket in SYN or similar
[10:21:17] <klausman>	 I have too little knowledge of what asyncio might be doing internally to know if maybe it's waiting for something to happen. DNS resolution clearly works in the requests case.
[10:23:54] <elukey>	 elukey@ml-staging2001:~$ ps -eLf | grep 3203798 | wc -l
[10:23:55] <elukey>	 138
[10:24:13] <elukey>	 does it use xgboost?
[10:25:00] <elukey>	 ahahahahah wow
[10:25:04] <elukey>	 klausman: https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=experimental&var-pod=article-descriptions-predictor-default-00008-deployment-64hstgv&var-container=All
[10:25:05] <kevinbazira>	 no it doesn't
[10:25:18] <elukey>	 it is being heavily throttled
[10:25:28] <elukey>	 see the kserve container in the graph above
[10:26:16] <klausman>	 that memusage was likely me running a second python process loading the model
[10:26:33] <elukey>	 not mem usage, cpu throttling
[10:26:47] <klausman>	 well, loading the model is likely hitting CPU as well
[10:27:05] <klausman>	 At least the moments around 10:10 were me, after 10:25 probably not
[10:27:26] <klausman>	 But why would asyncio burn so much CPU for five minutes?
[10:28:05] <elukey>	 so the number of threads is 138 (see above), if some of them work at the same time the Limit is hit very fast
[10:28:14] <elukey>	 let's use perf to figure out what it is doing
[10:28:46] <klausman>	 can nsenter use external (host-side) tools like strace or ltrace? 
[10:29:14] <elukey>	 you can strace a process, it runs on different namespaces but it is not a problem
[10:29:18] <elukey>	 for perf is the same
[10:29:32] <elukey>	 so I started another call to article descr
[10:29:36] <elukey>	 we have 5 mins :)
[10:30:04] <elukey>	 I am on ml-staging2001, just ran `sudo perf record -F 99 -p 3203798`
[10:30:13] <elukey>	 will wait some seconds, then I'll run perf report -n
[10:30:33] <elukey>	 at the top I see
[10:30:34] <elukey>	 33.76%           368  python3          libgomp-a34b3233.so.1
[10:30:59] <klausman>	 that's a multiprocessing lib
[10:31:05] <elukey>	 libgomp is OpenMP, it seems to be the same issue that we had with xgboost 
[10:31:12] <elukey>	 when not recognizing the cgroups v2 limits
[10:31:47] <klausman>	 So you're thinking it spawns a lot of threads that are then ground to standstill by the cgroup limits?
[10:31:47] <elukey>	 klausman: do you want to try perf on ml-staging2001?
[10:32:25] <klausman>	 sure
[10:32:49] <elukey>	 go ahead I am not using it now
[10:32:55] <elukey>	 the request is still ongoing
[10:33:07] <elukey>	 kevinbazira: lemme know if you want more info about what's happening
[10:34:13] <kevinbazira>	 elukey: I am following :)
[10:34:30] <elukey>	 sure I meant if you have doubts questions etc..
[10:35:35] <klausman>	 Hmm perf record -n just hangs?
[10:36:12] <elukey>	 you need to use perf record -F 99 -p $pid, then perf report -n
[10:36:24] <elukey>	 at least this is what I usually do first, there are other combinations
[10:36:35] <klausman>	 yes, That's what I am trying
[10:36:48] <elukey>	 no you have `perf record -n`
[10:36:52] <elukey>	 not `report`
[10:36:56] <klausman>	 oh
[10:37:15] <klausman>	 dman you, short edit distance!
[10:37:43] <klausman>	 yeah, showing hanging around in libgomp and libtorch a lot
[10:37:44] <elukey>	 I used nsenter -m to get the content of /opt/lib/python/site-packages/ in the container, and I see 
[10:37:47] <elukey>	 root@ml-staging2001:/opt/lib/python/site-packages# find -name *gomp*
[10:37:50] <elukey>	 ./torch/lib/libgomp-a34b3233.so.1
[10:38:40] <klausman>	 But libtorch is supposed to be there, no?
[10:38:52] <elukey>	 yes torch is used IIRC
[10:38:54] <elukey>	 https://github.com/pytorch/pytorch/issues/57715
[10:39:27] <elukey>	 https://github.com/pytorch/pytorch/issues/18183#issuecomment-474629623
[10:39:56] <elukey>	 so OMP_NUM_THREADS may need to be added
[10:40:09] <klausman>	 Should be easy enough
[10:40:25] <elukey>	 we can try to set it briefly to see if it solves
[10:40:37] <elukey>	 but it is not great for maintainability
[10:40:57] <elukey>	 catboost and xgboost do recognize cgroupsv2 and their limits
[10:41:09] <klausman>	 Arguably, we want that env var permanently, even if it doesn't completely solve this, so we might as well make it a whole change
[10:41:12] <elukey>	 but maybe the article-description's code does something weird behind the scenes
[10:41:29] <klausman>	 (as opposed to kubectl edit)
[10:41:48] <elukey>	 klausman: I'd prefer not to have any variables, so we can change the limits as we want and torch automatically adjusts
[10:42:00] <elukey>	 otherwise we will surely forget to fix values
[10:42:12] <klausman>	 Agreed, but as it is, it doesn't work at all.
[10:42:39] <elukey>	 sure, but I'd like to check the article-descr code first
[10:42:45] <elukey>	 to make sure that they don't set num_threads or similar
[10:42:50] <klausman>	 I didn't mean permanent as in "forever", but trather making it oart of the chart
[10:43:02] <klausman>	 s/oart/part/
[10:43:08] <klausman>	 or values, rather.
[10:43:16] <elukey>	 we can add custom env variables to isvcs, no need for changes (just add them in helmfile's value.yaml)
[10:43:25] <klausman>	 yeah.
[10:44:00] <elukey>	 kevinbazira: what is the repo that contains the code for article-description?
[10:44:08] <klausman>	 RR-wikidata has the entry already, it's simple c&p
[10:44:13] <elukey>	 can you check if they use something weird like num_threads etc.. ?
[10:44:20] <klausman>	 elukey: https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/article-descriptions/model-server/model.py
[10:44:42] <kevinbazira>	 yep that's it
[10:44:44] <wikibugs>	 10Machine-Learning-Team: refactor revertrisk model server to run locally - https://phabricator.wikimedia.org/T352181 (10achou) a:03achou
[10:44:47] <isaranto>	 o/ I'm following the conversation as well. I'm curious why other model servers that use pytorch dont do this (like nllb)
[10:44:56] <elukey>	 I mean if there was anything external like we have for readability
[10:45:02] <elukey>	 I know where to find model.py :)
[10:45:20] <elukey>	 isaranto: same question, maybe it depends on the version?
[10:45:25] <elukey>	 or functions used
[10:45:47] <isaranto>	 Luca is referring to this repo https://github.com/wikimedia/descartes
[10:45:53] <isaranto>	 I don't see any issue there
[10:46:07] <elukey>	 yes thank you :)
[10:47:27] <isaranto>	 jumping in a call with Aiko and I'll be back. I want to update the README with some info missing that Aiko pointed out and then I'll push the change. If you want I can set the env var OMP_NUM_THREADS in the patch I'll create to update the image
[10:47:54] <elukey>	 nono we can set it in helmfile's value.yaml
[10:48:07] <elukey>	 klausman: can you kubectl edit and add it on the fly to see if we resolve?
[10:48:10] <klausman>	 already making a change
[10:48:16] <klausman>	 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/979042
[10:49:13] <elukey>	 klausman: nit on the commit msg - we use ml-services: description or similar
[10:50:22] <klausman>	 fixed
[10:50:54] <elukey>	 ack
[10:51:04] <klausman>	 doing the kubctl edit now
[10:52:25] <elukey>	 another thing to ask in our form for onboarding models - threads, parallelism, etc..
[10:52:46] <elukey>	 it should also be easy to check it when testing on ml-sandbox or locally
[10:52:52] <klausman>	 Ok, new pod is up, hitting it with a query
[10:53:03] <klausman>	 takes 15s, but it works!
[10:53:12] <elukey>	 super
[10:53:22] <klausman>	 kevinbazira: can you confirm?
[10:53:35] <kevinbazira>	 okok checking ...
[10:55:06] <kevinbazira>	 \o/
[10:55:07] <kevinbazira>	 it's working thank you klausman and elukey:
[10:55:07] <kevinbazira>	 ```
[10:55:07] <kevinbazira>	 time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H  "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1
[10:55:07] <kevinbazira>	 {"lang":"en","title":"Clandonald","blp":false,"num_beams":2,"groundtruth":"Hamlet in Alberta, Canada","latency":{"wikidata-info (s)":0.04006314277648926,"total network (s)":0.30153417587280273,"model (s)":13.69589614868164,"total (s)":13.997446298599243},"features":{"descriptions":{"fr":"hameau d'Alberta","en":"hamlet in central Alberta, Canada"},"first-paragraphs":{"en":"Clandonald is a hamlet in central Alberta, Canada within the 
[10:55:07] <kevinbazira>	 County of Vermilion River. It is located approximately 28 kilometres (17 mi) north of Highway 16 and 58 kilometres (36 mi) northwest of Lloydminster.","fr":"Clandonald est un hameau (hamlet) du Comté de Vermilion River, situé dans la province canadienne d'Alberta."}},"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"]}
[10:55:07] <kevinbazira>	 real	0m14.030s
[10:55:08] <kevinbazira>	 user	0m0.013s
[10:55:08] <kevinbazira>	 sys	0m0.000s
[10:55:09] <kevinbazira>	 ```
[10:55:54] <elukey>	 kevinbazira: one suggestion - for long paste let's use https://phabricator.wikimedia.org/paste/
[10:56:24] <klausman>	 For screenshots, https://phabricator.wikimedia.org/file/ is great
[10:58:42] <klausman>	 elukey: should I wait for your +1 on the OMP change, or just submit?
[10:58:54] <elukey>	 already +1ed
[10:59:14] <klausman>	 huh. completely missed that. Submitting.
[11:01:04] <elukey>	 as FYI I just upgraded the ml-staging-codfw istio control plane
[11:02:04] <klausman>	 the bullseye update?
[11:02:25] <elukey>	 yep
[11:02:43] <klausman>	 Roger that
[11:25:28] <wikibugs>	 10Machine-Learning-Team: Fix istio gateway's PodDisruptionBudgets for ml-serve - https://phabricator.wikimedia.org/T352400 (10elukey)
[11:30:02] * klausman lunch
[11:35:51] <isaranto>	 aiko: https://docs.python.org/3/reference/import.html#regular-packages on __init__.py
[11:38:40] <wikibugs>	 (03PS16) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[11:45:06] <isaranto>	 Nice job making this work folks!
[11:45:13] <aiko>	 isaranto: o/ nice, thanks! 
[11:54:50] * elukey lunch!
[12:04:11] <isaranto>	 aiko: let me know if you are you ok with the article-descriptions patch
[12:31:43] <wikibugs>	 (03PS17) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[12:32:05] <isaranto>	 I added the git clone step that was missing --^ and tested the whole process
[12:32:09] * isaranto afk lunch
[13:13:30] <wikibugs>	 (03CR) 10AikoChou: [C: 03+1] "LGTM! just have a question to better understand a change you made." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[13:17:05] <isaranto>	 aiko: the issue is tha setting headers= when creating the object would throw an error as there is no such class attribute as a parameter in the constructor and self.headers is initialized afterwards
[13:32:35] <aiko>	 isaranto: but we set it after creating the object? I meant the change here  https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/976670/4..5 you removed line 134 session.headers["Host"] = host_header and added line 136 session_params={"headers": {"Host": host_header}}, 
[13:32:52] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[13:33:31] <aiko>	 isaranto: session.headers["Host"] = host_header would throw an error?
[13:35:38] <isaranto>	 no no that would work. what failed was this example https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/c46ed2a8e02637cfcaf37b9e42f9fda345eb3b02%5E%21/article-descriptions/model-server/model.py
[13:35:51] <chrisalbon>	 Morning all
[13:37:20] <isaranto>	 the only reason I changed the working part  was to simplify the code. Since the constructor offers the variable setting it is a better practice to use that function instead of manually modifying a class attribute 
[13:37:37] <isaranto>	 which would be private in another language other than python 
[13:37:42] <isaranto>	 o/ Chris!
[13:42:06] <aiko>	 isaranto: ooh got it! yeah I think using the variable setting is better. I'll change it in revertrisk as well.
[13:42:48] <aiko>	 morning Chris o/
[13:45:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[13:49:28] <isaranto>	 kevinbazira: aiko: oupsy it seems that headers is not set that way. The way you folks did it previously is the only way to set the headers
[13:50:31] <isaranto>	 there is a session inside session so my approach sets the header in the session inside the object which is not used in the request
[13:50:33] <isaranto>	 https://github.com/mediawiki-utilities/python-mwapi/blame/master/mwapi/async_session.py#L84
[13:51:01] <isaranto>	 in my patch I set  the host in self.session.headers where it should be set in self.headers
[13:52:37] <isaranto>	 well , saved by CI as the llm image failed so the patch wasn't merged
[13:53:23] <aiko>	 isaranto: ahhh I see. yeah that wouldn't work
[13:54:45] <kevinbazira>	 yep, local runs couldn't reveal  this but k8s/LiftWing would have asked for the host headers :)
[13:54:52] <wikibugs>	 (03PS18) 10Ilias Sarantopoulos: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940)
[13:55:21] <isaranto>	 I'm going to open a patch for mwapi to support setting headers for async requests
[14:00:10] <kevinbazira>	 ack
[14:02:56] <isaranto>	 *pull request
[14:03:22] <isaranto>	 pull request/patch/merge request let's just give it one name :)
[14:04:59] <isaranto>	 (just kidding, don't want to open that can of worms - the one about proper names)
[14:11:19] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[14:18:56] <wikibugs>	 (03Merged) 10jenkins-bot: article-descriptions: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976670 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[14:42:50] <klausman>	 my internet router failed earlier today. DUnno yet if it's bricked or fixable. Will have spotty availability for no
[14:42:57] <klausman>	 now*
[14:44:58] <isaranto>	 ack!
[15:08:24] <wikibugs>	 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10Samwalton9-WMF)
[15:18:22] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] "ok, so this means we'll have to set the `LOW_CPU_MEM_USAGE="True"` in the helm configs." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[15:20:51] <wikibugs>	 (03CR) 10Elukey: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[15:27:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[15:28:00] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940)
[15:28:25] <wikibugs>	 (03CR) 10Elukey: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[15:29:36] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[15:32:30] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: fix boolean parsing of env var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[16:58:09] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[16:58:53] <isaranto>	 hm, I think that PageTriage extension is still using ores.wikimedia.org behind the hood https://github.com/wikimedia/mediawiki-extensions-PageTriage/blob/46ba114c53a30582299645565bfb1f5e4711f8df/includes/Api/ApiPageTriageList.php#L15
[16:58:55] <wikibugs>	 (03Merged) 10jenkins-bot: article-descriptions: fix boolean parsing of env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979111 (https://phabricator.wikimedia.org/T351940) (owner: 10Ilias Sarantopoulos)
[16:59:06] <isaranto>	 I'll have to verify by looking at the code + access logs
[17:09:28] <isaranto>	 Updated the article-desc with latest image. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/979114
[17:09:46] <isaranto>	 Going afk folks, cya tomorrow! 
[17:15:32] <isaranto>	 kevinbazira: thanks for the +1 I just merged and deployed it!
[17:18:21] * elukey afk!
[17:19:06] <isaranto>	 oh no, the article-desc model now fails :(
[17:26:40] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) After deploying the changes in this task I'm getting a 500 with the following error logs ` Traceback (most recent call last):   File "/opt/lib/python/site...
[17:27:54] <isaranto>	 I reverted the deployment and will fix it later or in the morning
[17:28:07] * isaranto afk! nighty night!
[17:29:06] <aiko>	 isaranto: revertrisk is running locally \o/
[17:29:24] <isaranto>	 🎉
[17:30:05] <aiko>	 I'll submit the patch and you can review it tomorrow!
[17:30:41] <aiko>	 have a nice evening :)
[17:30:53] <isaranto>	 aiko: I was thinking the following: in order to make it even easier for anyone to run the model server locally to add a script (a Makefile would be better) and have all the instructions there
[17:31:41] <isaranto>	 which will include a wget command to download the model from the public repo (analytics.wikimedia.org) etc. If folks like the idea we can create a task for it 
[17:31:49] <kevinbazira>	 isaranto: kubectl logs show:
[17:32:00] <kevinbazira>	 https://www.irccloud.com/pastebin/ZbOcaOf2/
[17:32:19] <kevinbazira>	 this URL:
[17:32:19] <kevinbazira>	 ```
[17:32:19] <kevinbazira>	 http://api-ro.discovery.wmnet/v1/page/summary/Clandonald
[17:32:19] <kevinbazira>	 ```
[17:32:19] <kevinbazira>	 should be:
[17:32:20] <kevinbazira>	 ```
[17:32:20] <kevinbazira>	 http://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/page/summary/Clandonald
[17:32:21] <kevinbazira>	 ```
[17:32:36] <aiko>	 isaranto: that sounds good! +1
[17:33:00] <kevinbazira>	 *rather:
[17:33:00] <kevinbazira>	 ```
[17:33:00] <kevinbazira>	 http://rest-gateway.discovery.wmnet:4111/en.wikipedia.org/v1/page/summary/Clandonald
[17:33:00] <kevinbazira>	 ```
[17:33:51] <isaranto>	 you're right I missed that part when I copied the logs
[17:35:27] <isaranto>	 sorry for the back and forth with the fixes Kevin! The old version is running and I'll fix this tomorrow
[17:36:02] <kevinbazira>	 sure sure, no problem. we shall pick this up tomorrow.
[17:36:07] <kevinbazira>	 enjoy your evening o/