[00:57:05] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10139467 (10Papaul) Some notes here: I checked console redirect, it was working for me and the issue i found was th... [07:10:21] (03PS2) 10Santhosh: Support Default collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1072175 [07:29:22] good morning o/ [09:00:22] (03PS2) 10Nik Gkountas: WIP: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) [09:00:22] (03CR) 10Nik Gkountas: "Looks good overall, just a minor comment." [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [09:05:45] guten tag o/ [09:06:12] o/ wie geht's [09:06:13] ? [09:21:17] alles gut :D [09:24:25] alles klar [09:24:57] Ich werde eine Aufgabe für das Artikelbeschreibungsmodell eröffnen [09:25:07] Artikelbeschreibungsmodell == article-descriptions [09:32:03] lol that's a long word [09:32:04] ack! [09:37:05] (03PS3) 10Santhosh: Support Default collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1072175 (https://phabricator.wikimedia.org/T374597) [10:25:52] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use vllm for ROCm in huggingface image - https://phabricator.wikimedia.org/T370149#10140125 (10isarantopoulos) At the moment I'm trying to build vllm on top of the pytorch base image according to [[ https://docs.vllm.ai/en/stable/getting_started... [10:37:27] * isaranto afk lunch and errand! [10:50:01] hello [10:50:49] copy paste of my questions to #wikimedia-tech [10:50:52] [15:13] is there someone hosting gemini in wikimedia provided infrastructure? if so, how to access it? assuming it possible at all. i don't want to use the google provided version [10:50:52] [15:13] if this is somehow against tos, let me know [10:50:52] [15:14] i would like to write some software where it helps to create content which is revised by human before submitting [10:51:02] thank you in advance for your help :-) [11:08:26] We are currently not running Gemini on Lift Wing (where we run other ML models, as mentioned here: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage) [11:08:42] hello klausman [11:08:50] why are you not running gemini? [11:08:58] So far there was no need. [11:09:02] what could i use for processing text? [11:10:08] Most of the models we currently run are not LLMs, but rather inference models that take an article or revision and e.g. return how likely it is that this revision is reverted. [11:10:24] i would like my program to read a text and to extract, for example, the corresponding location which it described [11:11:03] context: semi-automate http://en.wikinews.org/wiki/Wikinews:5W [11:11:32] gemini does a lovely job, but i would prefer to not rely on google-hosted infrastructure :) [11:11:34] That is beyond the scope of what we currently do. Once isaranto is back from lunch, he can probably explain more. Or chrisalbon, once he's up (he's on the US West coast, so it's really early for him) [11:11:57] My chili is about to burn in the pan, so I gotta run :) [11:12:04] i am in Sydney and it is 9:11pm; I do leave irc open, though [11:12:17] enjoy the cooking and the meal :) [11:14:20] ty! [11:59:09] Hi gry ! [11:59:09] We currently are not hosting any LLMs open to the public. We are doing work to host some open source models for limited use cases for the moment [11:59:59] I say limited because we do have a limited number of GPUs (we just got them) [13:06:08] (03PS1) 10AikoChou: locust: entry for reference_quality model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1072541 (https://phabricator.wikimedia.org/T371902) [13:06:25] 06Machine-Learning-Team, 10Temporary accounts: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context - https://phabricator.wikimedia.org/T356102#10140679 (10JayCano) [13:41:44] klausman: could you merge this when you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/1063213? it is the article-models httpbb tests (no +2 on operations/puppet) [14:02:36] (03CR) 10AikoChou: "The load testing result doesn't look promising. I'm asking the research team to provide an alternative set of inputs instead of using reve" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1072541 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [14:03:34] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Request to host article-country model on Lift Wing - https://phabricator.wikimedia.org/T371897#10140928 (10PWaigi-WMF) @Isaac As discussed today, the LPL team will be ready to incorporate the country model into this T113257 area of work; basically, the next i... [15:01:50] klausman aiko : can someone help me debug this -> https://phabricator.wikimedia.org/T374387 [15:02:07] there seem to be issues with rec-api [15:09:40] looking [15:11:17] I added a comment asking for more error detail [15:13:13] cmw [15:13:19] s/cmw// [15:14:28] thank you! looking through the logs doesnt help without that [15:14:44] also, there are no exact timestamps, which doesn't help either [15:14:49] I'll amend my comment [15:16:42] I was trying to share a logstash link with them so that they can have more context but this filter which would work for the isvcs doesn't work for rec-api-ng https://logstash.wikimedia.org/goto/ffbb35c08b058a76d4144a5536045137 [15:19:25] It's also not guarantee they have access to Logstash [15:20:08] Well, the rec-api is not a kserve service, right? [15:20:17] I am looking at the Istio logs [15:20:34] E.g. https://logstash.wikimedia.org/goto/c3e1493e7790097e29b4b7bba69a71f9 [15:22:35] yes it is not kserve [15:22:53] I do see some 500s in the last 3h, looking deepr [15:23:26] I was trying to find the pod logs in case there was sth related to data fetching etc [15:24:01] a clear error - I assume an exception is thrown somewhere [15:25:03] Nothing obvious in the pod logs so far [15:26:11] lol I know realized I was looking at the kserve logs [15:27:13] this request indeed returns a 500 - `https://api.wikimedia.org/service/lw/recommendation/api/v1/translation/sections?source=en&target=fr&topic=films%7Ctelevision&count=24` [15:29:10] found sth, let me pastebin it [15:30:08] https://phabricator.wikimedia.org/P69061 [15:31:15] Not sure it's useful [15:31:50] Nothing useful before/after that entry in the logs [15:34:52] It looks like the CXSERVER (contacted via http://localhost:6015) is serving 503s [15:42:44] yes, thanks for that. it seems useful. I'm wondering if it would explain the difference between local runs and LW [15:43:21] the aerror _may_ happen in the Envoy proxying from LW to the actual cxs, but I am not sure about that [15:43:24] just cause someone mentions in the task that when they ran the service locally requests were successful [15:45:28] I checked the tlspxoy container in the rec-api pods [15:45:30] [2024-09-12T15:35:01.378Z] "GET /v2/suggest/sections/The%20Godfather/en/fr HTTP/1.1" 503 UC 0 95 0 - "-" "WMF Recommendation API (https://recommend.wmflabs.org/; leila@wikimedia.org)" "99ecd643-c19b-46a9-ab91-57b287c2e5bb" "cxserver.wikimedia.org" "10.2.2.18:4002" [15:56:58] I found this change deployed a couple of weeks ago that may be related https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1062348 [15:58:58] I don't know enough about rec-api-ng internals to tell if that's it [15:59:56] ack, just pasting what I found. Although this seems more like a transfer of configs from an ini to yaml [16:01:01] The % of 500s seems to have increased somewhat more recently than that deployment/change [16:01:39] Starting at about 20240904 and more in earnest at ...08 [16:02:17] Since we have no insight into cxserver (it's run by the language team), I don't think there is much we can do right now. [16:04:26] ok, it makes sense. thanks for all the work/help! [16:04:32] np :) [17:10:33] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work, 13Patch-For-Review: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10141910 (10isarantopoulos) I found out some issues with the staging deployment A request mentioned above doesn... [17:22:33] I removed the comment above as I was trying to access the staging endpoint through a wrong url [17:28:47] the requests mentioned in the task seem to work (https://phabricator.wikimedia.org/T374387) [17:36:32] going afk for the evening folks o/ [18:26:31] hello [18:28:39] isaranto, may i have developer access to your LLM, please? i would like to develop a software. i am happy to keep it to myself until your LLM is public. you can see the desired functionality description in my chat above. in my testing it would be less than 50 requests a day. [18:28:51] isaranto: do you need help with the installation? [18:30:25] isaranto: (you appeared at my 10pm and disappeared at my 3:36am, both times i was asleep; i hope it is not every day like that. if it is, please let me know,i will adjust my schedule. :) [21:01:23] (03PS3) 10Eamedina: WIP: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [21:02:11] (03CR) 10CI reject: [V:04-1] WIP: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [21:04:21] (03CR) 10Eamedina: WIP: Fetch campaign metadata and return them with recommendations (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)