[06:53:54] <isaranto>	 good morning!
[06:54:42] <ozge_>	 Good morning
[07:00:59] <elukey>	 kalimera!
[07:01:19] <elukey>	 isaranto: do you have more info about why outlink got to zero pods yesterday?
[07:04:09] <elukey>	 btw I rechecked helmfile diff for all ml-services and nothing seccomp related is pending
[07:04:45] <isaranto>	 no I didn't have the chance to dig into it. I was just going afk so I was focused on just fixing it.  I saw that there was a revision but no pod.
[07:04:54] <isaranto>	 I don't recall anything about the deployment though
[07:15:15] <elukey>	 very weird, maybe it was something autoscaling-related
[07:15:31] <elukey>	 anyway, we should be good now
[07:19:18] <isaranto>	 sorry should have kept some logs. At first sight I didn't see anything though, it seemed weird so I guess it required more digging
[07:22:12] <elukey>	 nono it was just a curiosity, it is totally my bad to have missed the seccomp rollout in outlink
[07:22:53] <elukey>	 I'll add a note to double check all namespaces in the procedure that we'll follow for eqiad
[07:24:43] <isaranto>	 no worries at all, thanks for all the work!
[07:24:56] <elukey>	 np! <3
[07:26:15] <wikibugs>	 06Machine-Learning-Team, 10Editing-team (Tracking): Peacock detection model GPU deployment returns inconsistent results - https://phabricator.wikimedia.org/T393154#10798989 (10isarantopoulos) Let's create a plan and test some things in order to debug this. I'm starting with some suggestions. In every test we s...
[07:37:20] <georgekyz>	 good mornig
[07:37:54] <bartosz>	 good morning! 
[07:44:04] <isaranto>	 georgekyz: o/ check my comment in https://phabricator.wikimedia.org/T393154#10798989 and let me know if you agree or have a different plan for this
[07:44:22] <georgekyz>	 isaranto: I am on it
[07:45:01] <isaranto>	 my main suggestion is to try to capture all our efforts while debugging this so that we reach a proper conclusion and learnings which we'll use in the future
[07:45:14] <isaranto>	 and ofc make some documentation out of it
[07:45:18] <georgekyz>	 isaranto: looks good,  I think we have already prove that we have deterministic results both on gpu and cpu
[07:45:36] <isaranto>	 ok! thanks
[07:45:42] <georgekyz>	 I will document everything in each experiment
[08:20:56] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10799184 (10kevinbazira) In T385173#10737743, we ran inference latency benchmarks using the upstream ROCm-vLLM image to understand how vLLM performs when serving the `aya-expanse-32b` model on an...
[08:22:36] <kevinbazira>	 o/ morning morning
[08:22:36] <kevinbazira>	 here are the benchmarking results of vLLM serving the `aya-expanse-32b` model in the `wmf-debian-vllm` image: https://phabricator.wikimedia.org/T385173#10799184
[08:22:36] <kevinbazira>	 the ported image has similar performance as the upstream image.
[08:23:14] <georgekyz>	 I am trying to build the blubber in ml-lab but I am getting a permission denied: 
[08:23:30] <georgekyz>	 https://www.irccloud.com/pastebin/kaAbM0tH/
[08:24:11] <elukey>	 georgekyz: you need to sudo :)
[08:24:40] <elukey>	 elukey@ml-lab1001:~$ ls -l /var/run/docker.sock
[08:24:40] <elukey>	 srw-rw---- 1 root docker 0 Apr 29 08:15 /var/run/docker.sock
[08:24:51] <georgekyz>	 thnx elukey 
[08:24:55] <elukey>	 so either root or a member of the docker group can use it
[08:25:27] <georgekyz>	 I did not know that I can use sudo powers in ml-lab
[08:25:28] <georgekyz>	 :p
[08:25:39] <elukey>	 use it with extreme care :D
[08:26:51] <georgekyz>	 I am not sure which is my sudo pass tho :P
[08:27:10] <elukey>	 it is passwordless, your ssh key basically grants you the sudo
[08:27:11] <georgekyz>	 so I am using it with extreme care :P :P
[08:27:51] <georgekyz>	 it doesn't seem to be like this :(
[08:28:02] <elukey>	 ah wait no, I thought that ml-team-admins granted sudo
[08:28:06] <elukey>	 I am checking puppet and it doesn't
[08:28:13] <elukey>	 okok now I get it, it asks for the password
[08:28:19] <georgekyz>	 yeap....
[08:28:49] <georgekyz>	 but I never set up a pass for sudoers
[08:29:17] <elukey>	 okok so the easiest should probably be to change the /var/run/docker.sock ownership, or to automatically add folks to the docker group
[08:30:40] <georgekyz>	 is it something that I could do ?
[08:31:04] <elukey>	 lemme check puppet, in another host (build2001) where we have docker I am in the docker group, there may be a setting
[08:32:41] <elukey>	 okok there is a quick way in admin/data.yaml, but I think we need to first figure out what is the plan to use docker on ml-lab
[08:32:52] <isaranto>	 georgekyz: why not just pull the image from the registry to test first? unless you want to try sth else
[08:53:22] <georgekyz>	 https://www.irccloud.com/pastebin/6K5nLNqn/
[09:12:02] <isaranto>	 hmm wait lemme check
[09:13:29] <isaranto>	 ok found it georgekyz are you using ml-lab1002 or ml-lab1001? docker is only installed on ml-lab1002, there you shouldnt have an issue
[09:15:18] <elukey>	 it is installed on 1001 as well
[09:16:23] <elukey>	 on both the problem is the same, namely the docker socket needs to be accessed by root or members of the docker group
[09:16:42] <elukey>	 we already have automation in puppet to say "this group of uids will get to docker"
[09:16:50] <isaranto>	 ack, you're right the message george says indicates that as well
[09:17:03] <elukey>	 but IIRC the docker install step was a test, we never really decided if we wanted to officially install it
[09:17:14] <elukey>	 if so, there are some decisions to make, namely who can run docker etc..
[09:17:29] <elukey>	 because running docker means escalating to root
[09:17:55] <elukey>	 and we cannot allow, imho, everybody using ml-lab to root in that way
[09:18:06] <elukey>	 maybe ml-team-admins only 
[09:18:17] <elukey>	 but it needs to be confided to some people
[09:19:16] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10799385 (10kevinbazira) Hi @elukey, following your suggestion in T385173#10538744, we ported the upstream Ubuntu based [[ https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.1...
[09:23:27] <klausman>	 elukey: how about 9for now) adding the ML team people to the docker group?
[09:30:09] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10799408 (10isarantopoulos) I would suggest the following in order to make reviewing a bit easier: @kevinbazira can you open a new MR in the same repo and add a "polished" version of the dockerfil...
[09:37:51] <elukey>	 klausman: you can do it properly via admin's data.yaml, we did it for other use cases, my main point is that we need to figure out what we want to do 
[09:37:59] <elukey>	 make a case etc..
[09:38:08] <elukey>	 otherwise we add somethihg manually and we forget
[09:38:40] <klausman>	 yeah, that was my bad. I had added Ilias initially and then never made it a Puppet thing.
[09:39:00] <elukey>	 that's fine, a small initial test is ok
[09:39:01] <klausman>	 I think having the ml admin members be in the docker group is the easiest approach for now. 
[09:39:06] <elukey>	 but now it is becoming bigger :)
[09:40:34] <klausman>	 Yeah, if I hadn't _forgotten about it,_...
[09:43:54] <elukey>	 let's open a task for access request etc..
[09:49:25] <wikibugs>	 06Machine-Learning-Team: Add the ML team to the POSIX group `docker` on the ML lab machines. - https://phabricator.wikimedia.org/T393566 (10klausman) 03NEW
[09:49:31] <klausman>	 Opened ^^^ for now
[09:52:14] <elukey>	 ack
[10:16:22] <georgekyz>	 sorry folks I was in a meeting. I am using `ml-lab1001` which also has docker on it. In both machines I am getting the same error.
[10:18:02] <klausman>	 I can quickly add you to the docker greoup, but you will have to re-ssh-in
[10:18:08] <georgekyz>	 ok
[10:18:11] <georgekyz>	 thnx
[10:18:32] <klausman>	 done on both machines
[10:19:27] <georgekyz>	 danke
[10:20:32] <klausman>	 Παρακαλώ!
[10:21:18] <klausman>	 Ahwait, that's the same false friend as "bitte"<->"you're welcome"/"please"
[10:21:37] <klausman>	 Or is it? Now I have confused myself :D
[10:53:02] <georgekyz>	 hahaha, no you are correct, it is used mainly as bitte so you were correct on the answer
[10:53:22] <georgekyz>	 so it is an answer to thank you, you say parakalw.
[10:54:25] <georgekyz>	 It is mainly used as the dutch "Alstublieft" which you can use it as a response to dankje, but you can also use it as "please".
[11:00:02] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10799694 (10kevinbazira) >>! In T385173#10799408, @isarantopoulos wrote: > I would suggest the following in order to make reviewing a bit easier: @kevinbazira can you open a...
[12:13:02] <klausman>	 georgekyz: interesting, I did not know languages other than German (and Swamp German ;)) had the please/you're welcome overlap
[12:37:07] <kevinbazira>	 georgekyz: o/ for archival purposes, here is the command we used to run the edit-check model-server on ml-lab1002 with GPU and mounted model volume: https://phabricator.wikimedia.org/P75872
[12:37:40] <georgekyz>	 kevinbazira: thnx so much mate! I will include it to the phab ticket with the experiments
[12:38:06] <kevinbazira>	 sure sure. np!
[13:01:59] <wikibugs>	 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Bartosz Wójtowicz - https://phabricator.wikimedia.org/T393595#10800232 (10isarantopoulos)
[13:04:50] <wikibugs>	 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Bartosz Wójtowicz - https://phabricator.wikimedia.org/T393595#10800237 (10isarantopoulos)
[13:07:31] <wikibugs>	 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Bartosz Wójtowicz - https://phabricator.wikimedia.org/T393595#10800246 (10isarantopoulos) I approve adding Bartos...
[20:50:52] <wikibugs>	 06Machine-Learning-Team, 06Discovery-Search, 10MediaWiki-Search: Build and enable thesaurus / synonym list for search - https://phabricator.wikimedia.org/T85770#10802450 (10TJones) @Jack_who_built_the_house, as things currently stand, I don't think this is the right ticket for what you are proposing. This ti...
[22:27:22] <wikibugs>	 07artificial-intelligence, 10WikiCite: Reference recommender system - https://phabricator.wikimedia.org/T155846#10802749 (10SEgt-WMF) Hi @Harej ! I'm interested in both the status of this amazing proposal (and if there are any plans of continuing the work for Librarybase as well!)