[05:46:25] <wikibugs>	 10Machine-Learning-Team: Containerize Content Translation Recommendation API - https://phabricator.wikimedia.org/T338805 (10kevinbazira) It's good to know that we have an existing scaffolding for fastapi-apps. The recommendation-api project is a good example of projects we are likely to be handed to host on Lift...
[05:54:34] <wikibugs>	 10Machine-Learning-Team: Containerize Content Translation Recommendation API - https://phabricator.wikimedia.org/T338805 (10elukey) Sure, I am fine with the approach, the only thing that I asked earlier on was if you had thoughts/time to figure out how long would it take to migrate to fastapi (if even possible),...
[05:56:48] <wikibugs>	 (03PS3) 10Elukey: llm: add clean up steps when GPU errors are raised [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583)
[05:57:18] <wikibugs>	 (03CR) 10Elukey: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[06:03:15] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] llm: add clean up steps when GPU errors are raised [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[06:17:50] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[06:23:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] llm: add clean up steps when GPU errors are raised [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[06:27:08] <wikibugs>	 (03Merged) 10jenkins-bot: llm: add clean up steps when GPU errors are raised [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930622 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[07:13:25] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] ores-legacy: Change message in RevisionNotFound error [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930166 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[07:21:29] <wikibugs>	 (03CR) 10Elukey: "Really nice! I left some comments to better understand the patch :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[07:32:19] <wikibugs>	 (03PS1) 10Elukey: llm: fix call to empty_cache() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930747
[07:49:12] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] llm: fix call to empty_cache() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930747 (owner: 10Elukey)
[07:49:32] <wikibugs>	 10Machine-Learning-Team: Containerize Content Translation Recommendation API - https://phabricator.wikimedia.org/T338805 (10kevinbazira) Rebuilding a project of this scale using a different framework requires careful planning as we would have to rethink the implementation architecture to keep the current app fun...
[07:50:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] llm: fix call to empty_cache() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930747 (owner: 10Elukey)
[07:51:10] <wikibugs>	 (03Merged) 10jenkins-bot: llm: fix call to empty_cache() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930747 (owner: 10Elukey)
[07:51:16] <klausman>	 \o Morning!
[07:52:38] <wikibugs>	 (03PS8) 10Ilias Sarantopoulos: feat: add Response Models in ores-legacy API [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414)
[07:52:57] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: ores-legacy: Change message in RevisionNotFound error [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930166 (https://phabricator.wikimedia.org/T330414)
[07:53:11] <isaranto>	 Morning as well!
[07:56:14] <wikibugs>	 10Machine-Learning-Team: Containerize Content Translation Recommendation API - https://phabricator.wikimedia.org/T338805 (10elukey) Totally get your point but I don't agree 100%, in this case we don't really need a complete design doc nor roadmaps, it would just be moving the API from Flask to fast-api and uvico...
[08:01:01] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: feat: add Response Models in ores-legacy API (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[08:02:04] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: ores-legacy: Change message in RevisionNotFound error [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930166 (https://phabricator.wikimedia.org/T330414)
[08:02:23] <wikibugs>	 (03PS9) 10Ilias Sarantopoulos: feat: add Response Models in ores-legacy API [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414)
[08:05:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "great work!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[08:31:58] <klausman>	 elukey: any objections to me deploying the replicacount change immediately?
[08:32:16] <elukey>	 nope
[08:32:30] <klausman>	 Alright, will do so once Jenkins merges
[08:48:10] <wikibugs>	 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) Challenge with Falcon 7b, this is the first call to the model server (model tensors loaded to the GPU, plus tokens related to features):  ` 2023-06-16 08...
[08:48:40] <elukey>	 isaranto: added some thoughts to --^ I think that the model itself doesn't fit into the GPU's VRAM, so we cannot do much
[08:49:17] <elukey>	 the gpu is left into an inconsistent state, I think that this is why we get the second error msg (all the way to predict())
[08:51:31] <klausman>	 So I watched the logs of the predictor-kserve container as I updated the setup in eqiad (using  kubectl logs -f --tail=30 -n articletopic-outlink outlink-topic-model-predictor-default-XXXX for both old and new) and you could see the traffic migrating from one to the other over a few seconds, and then at 90s, the old pods started to terminate. Very nice.
[08:52:22] <klausman>	 the complete termination and cleanup of the old pods usually happens after 5m or so (I suspect we specifically configured that or it's a k8s default)
[08:52:55] <elukey>	 yep it is all knative doing the hard work
[08:53:41] <klausman>	 It's nice to see that deployment with transparent cutover has made it to the outside world. I was completely amazed when I first saw this kinda stuff in 2010 after joining the Goo
[08:55:22] <elukey>	 it can do also more, like canary deployments etc...
[08:55:49] <klausman>	 Yeah, I suspected as much, once you have versioned config with Helm, rollbacks and all that become a lot more feasible.
[08:58:17] <elukey>	 isaranto: when you talked about falcon-7b-8bit, was it something like https://huggingface.co/legendhasit/falcon-7b-instruct-8bit ?
[08:58:41] <elukey>	 that is not from https://huggingface.co/tiiuae but it may work for us
[09:04:02] <klausman>	 elukey: I'll poke Hugh/Kamila about how they feel about updating changeprop on a Friday
[09:05:39] <elukey>	 klausman: I'd suggest to not rush it, if we have problems with the firehose we'll need to do more deployments etc..
[09:05:45] <elukey>	 it is fine to wait for monday
[09:06:29] <klausman>	 Yeah, there also was some oddity with 1-2 of the changeprop Grafana graphs (processing time incresing quite a bit). But Hugh mentioned that might just be an artifact of how changeprop does sharding.
[09:07:24] <klausman>	 Also, for watching traffic in multiple pods etc, `kubetail` in my homedir on deploy1002 is great.
[09:07:46] <klausman>	 it's from https://github.com/johanhaleby/kubetail
[09:08:23] <isaranto>	 elukey: I was referring to just loading the same model in 8bit (perhaps the link you posted has done this and then saved the model). https://huggingface.co/docs/transformers/main_classes/quantization 
[09:08:23] <isaranto>	 This is what I was referring to 
[09:08:23] <isaranto>	 ```
[09:08:23] <isaranto>	 model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
[09:08:23] <isaranto>	 ```
[09:08:24] <isaranto>	 but didnt work out of the box. I'll add this info on phab
[09:08:42] <isaranto>	 btw huggingface has amazing resources 
[09:10:58] <elukey>	 ahhh nice!
[09:11:02] <elukey>	 I was reading https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/15
[09:11:15] <elukey>	 "in bfloat16 it takes ~65GB of VRAM on A1000 80GB, in 8bit ~46GB
[09:11:28] <elukey>	 so bigger llms are probably out of range for us
[09:11:57] <klausman>	 As an 80s kid, I am a big fan of 8bit :)
[09:12:52] <elukey>	 isaranto: what is the diff between setting the load_in_8bit param and running torch_dtype=torch.bfloat8 in transformers.pipeline?
[09:13:49] <elukey>	 ah right we have the issue while loading, so the latter is already at predict time
[09:13:53] <elukey>	 okok self answered :)
[09:14:06] <isaranto>	 :)
[09:14:36] <elukey>	 but probably if we want to load the model in 8bit integers then we'd also need inference to run in 8bit right?
[09:14:41] <elukey>	 to preserve memory
[09:14:49] <elukey>	 (sorry for all the qs trying to get the code :)
[09:15:07] <isaranto>	 once you load it in 8bit, it also runs in 8 bit
[09:15:18] <elukey>	 perfect, without any extra settings
[09:15:34] <klausman>	 Also, can this transformation be done offline, i.e. to the on-disk version and reduce its footprint?
[09:15:37] <isaranto>	 I mean you lose the extra information by downcasting the model weights on loading time
[09:17:47] <elukey>	 klausman: I think so https://huggingface.co/docs/transformers/main_classes/quantization#push-quantized-models-on-the-hub
[09:19:07] <klausman>	 That would also help with the /var/lib/{docker,kubelet} thing
[09:19:20] <isaranto>	 yes, we could do that! the only downside it the extra step required, but we already download the models and upload them manually to swift
[09:21:46] <wikibugs>	 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) We are discussing https://huggingface.co/docs/transformers/main_classes/quantization#load-a-large-model-in-8bit, as a way to reduce the model's footprint...
[09:22:36] <elukey>	 klausman: did you see https://phabricator.wikimedia.org/T339231? Ok for you?
[09:23:19] <klausman>	 LGTM
[09:23:24] <elukey>	 we could also think about using less disk space for the partition, to leave something out for emergencies, but not sure if worth it
[09:23:31] <klausman>	 Do we also need to change the partman recipe?
[09:23:43] <elukey>	 later yes
[09:24:19] <klausman>	 I wonder about the spinning rust as well. How fast it is etc.
[09:24:59] <elukey>	 I would be on the fence using them, the ssd vs hdd latency could be hard to diagnose
[09:26:13] <klausman>	 yeah, I would not wnat to have them in the same VG
[09:26:28] <klausman>	 but I wonder what uses they might be suitable for.
[09:27:16] <elukey>	 yep I meant latency in general, we'd need to be aware of what disk calls are slow and what not
[09:27:34] <elukey>	 for example, adding the kubelet partition or the docker one on hdds may be a recipe for a big trouble
[09:28:11] <elukey>	 and I am very scared about having things possibly hitting different latency-class partitions
[09:28:21] <klausman>	 Yep. I was wondering if there is anything we're downloading from somewhere that would be faster to load from a spinning disk, but I can't think of anything
[09:28:24] <elukey>	 (then we'd need to add that variable when debugging)
[09:30:23] <klausman>	 I think if we actually run out disk space for the VGs we already have, replacing the existing rust with more SSDs would be another option. I suspect DCops wouldn't mind having spare disks
[09:30:50] <klausman>	 And SSD prices have plummeted in the last 6-9 months
[09:31:08] <elukey>	 yeah but we need approved budget for those in the CapEX
[11:01:50] * elukey lunch!
[11:07:48] <klausman>	 same
[13:07:43] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] ores-legacy: Change message in RevisionNotFound error [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930166 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[13:13:59] <wikibugs>	 (03Merged) 10jenkins-bot: ores-legacy: Change message in RevisionNotFound error [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930166 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[13:19:40] <wikibugs>	 (03PS10) 10Ilias Sarantopoulos: feat: add Response Models in ores-legacy API [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/929743 (https://phabricator.wikimedia.org/T330414)
[13:33:23] <wikibugs>	 (03PS1) 10Elukey: llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583)
[13:34:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[13:34:32] <wikibugs>	 (03PS2) 10Elukey: llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583)
[13:35:36] <wikibugs>	 (03PS3) 10Elukey: llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583)
[13:41:51] <elukey>	 really nice https://vilsonrodrigues.medium.com/run-your-private-llm-falcon-7b-instruct-with-less-than-6gb-of-gpu-using-4-bit-quantization-ff1d4ffbabcc
[13:43:15] <elukey>	 maybe a bit aggressive
[13:46:32] <klausman>	 It's obviously a tradeoff between memory usage and quality
[13:47:00] <klausman>	 Then again, isn't "so-so predictions" better than "none at all, because we don't have a GPU big enough"?
[13:50:03] <elukey>	 no idea yet..
[14:00:35] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] "I'm curious if we need to (un)set anything in the torch model. I'm wondering if it expects the model to still be on GPU." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[14:09:44] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[14:10:49] <wikibugs>	 (03Merged) 10jenkins-bot: llm: wipe VRAM memory when an out of memory event occurs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930797 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[14:12:08] <isaranto>	 elukey: this is a good read as well https://huggingface.co/docs/accelerate/usage_guides/big_modeling
[14:12:08] <isaranto>	 Using device_map ="auto" prioritizes over GPU,CPU and disk according to their capacity
[14:18:41] <elukey>	 isaranto: ah wow nice, should we test it?
[14:21:54] <elukey>	 seems supported by AutoModelForCausalLM.from_pretrained as well
[14:22:03] <isaranto>	 sure! at the moment I'm working on the model registry
[14:22:12] <elukey>	 ok lemme file the code change
[14:22:30] <elukey>	 I am also trying to download falcon from hf's website and build the llm image locally
[14:22:33] <elukey>	 not easy
[14:22:41] <isaranto>	 I'm trying to think of a faster way to try out stuff instead of going through ci/cd..
[14:23:01] <isaranto>	 perhaps attaching to the pod and changing the code could even work
[14:24:26] <isaranto>	 you can do this if you want to run falcon locally https://phabricator.wikimedia.org/P49441
[14:25:22] <isaranto>	 replace model_path with 'tiiuae/falcon-7b' and set the local_files_only to false
[14:25:35] <elukey>	 ah nice
[14:26:01] <elukey>	 do you think that we should still use .to(device) in the tokenizer when using device auto?
[14:26:04] <elukey>	 or just remove it?
[14:26:38] <elukey>	 maybe AutoTokenizer has the same option
[14:26:53] <isaranto>	 I think it doesnt cause tokenizer is only on cpu
[14:27:22] <isaranto>	 the tokenizer remains on cpu but we load the inputs on the device where the model exists
[14:28:23] <isaranto>	 btw when u download huggingface models they go to a cache directory with separated blobs and symlinks , if you want to use specific dir you can do a snapshot download using this script https://phabricator.wikimedia.org/P49442
[14:29:11] <elukey>	 mmm so the tokenizer runs on cpu, then the inputs go on the GPU otherwise they cannot be computed?
[14:29:22] <elukey>	 or can we do model on gpu and inputs in regular memory?
[14:33:01] <isaranto>	 the latter will result in an error. both need to be on same device for the computation (inference) to take place
[14:35:07] <wikibugs>	 (03PS1) 10Elukey: llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806
[14:36:16] <wikibugs>	 (03PS2) 10Elukey: llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806
[14:36:58] <wikibugs>	 (03PS3) 10Elukey: llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806
[14:38:32] <wikibugs>	 (03PS4) 10Elukey: llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806
[14:38:56] <elukey>	 there you go --^
[14:39:00] <elukey>	 this is the idea right?
[14:42:50] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Hopefully!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806 (owner: 10Elukey)
[14:42:56] <isaranto>	 yep!
[14:44:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806 (owner: 10Elukey)
[14:53:17] <wikibugs>	 (03CR) 10Elukey: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806 (owner: 10Elukey)
[15:02:24] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806 (owner: 10Elukey)
[15:03:24] <wikibugs>	 (03Merged) 10jenkins-bot: llm: test device_auto functionality in AutoModelForCausalLM [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930806 (owner: 10Elukey)
[15:29:24] <wikibugs>	 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) Tried to use the device_auto setting in , but this is the result:  ` Explicitly passing a `revision` is encouraged when loading a model with custom code...
[15:29:30] <elukey>	 I get this very weird error --^
[15:29:36] <elukey>	 wondering if the GPU is into a weird state
[15:30:42] <isaranto>	 yeah this is weird
[15:30:59] <elukey>	 seems so from https://grafana.wikimedia.org/d/ZAX3zaIWz/amd-rocm-gpu?orgId=1&var-source=eqiad%20prometheus%2Fops&var-instance=ml-serve1001:9100
[15:31:05] <elukey>	 one gpu is completely used
[15:31:26] <isaranto>	 lol
[15:31:36] <isaranto>	 actually I mean "lol"
[15:31:49] <elukey>	 but in theory the new pod should be scheduled on the other one free 
[15:31:50] <elukey>	 mmmmm
[15:32:26] <elukey>	 it makes zero sense
[15:33:21] <elukey>	 I do see https://grafana.wikimedia.org/d/ZAX3zaIWz/amd-rocm-gpu?orgId=1&var-source=eqiad%20prometheus%2Fops&var-instance=ml-serve1001:9100&from=1686928654949&to=1686929576451&viewPanel=7
[15:33:30] <elukey>	 so maybe the device_auto is scheduled on the new gpu but it fails
[15:34:04] <elukey>	 I've reset both gpus from ml-serve1001, maybe it helps
[15:34:35] <elukey>	 nope
[15:35:26] <klausman>	 How do you reset GPUs?
[15:36:11] <elukey>	 sudo /opt/rocm/bin/rocm-smi --gpureset -d 0
[15:36:23] <elukey>	 but it doesn't always restore them in a good state
[15:36:41] <elukey>	 anyway, the device_auto thing seems to lead to a worse situation
[15:36:49] <klausman>	 ack
[15:41:09] <wikibugs>	 (03PS1) 10Elukey: Revert "llm: test device_auto functionality in AutoModelForCausalLM" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930779
[15:52:37] <elukey>	 isaranto: I tested the previous version of the LLM image as well, and the cleanup of the GPU works
[15:52:51] <elukey>	 I keep seeing the same message now, " HIP out of memory. Tried to allocate 80.00 MiB (GPU 0; 15.98 GiB total capacity" etc..
[15:53:00] <isaranto>	 ack
[15:57:21] <wikibugs>	 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) Current status for falcon:  * I deployed the last version of the docker image that boots correctly, but that leads to a consistent VRAM out of memory eve...
[15:58:50] <elukey>	 all right heading out for the weekend folks
[15:58:53] <elukey>	 have a nice one
[15:58:56] <elukey>	 see you on monday :)
[15:59:30] <isaranto>	 ciao Luca!
[16:02:10] <klausman>	 \o
[16:02:20] <klausman>	 I'm heading out as well. Have a great weekend, everyone
[17:02:47] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: add the ability for facilitate various Open Source LLMs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930847 (https://phabricator.wikimedia.org/T333861)
[17:03:33] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] Revert "llm: test device_auto functionality in AutoModelForCausalLM" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930779 (owner: 10Elukey)
[17:08:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "llm: test device_auto functionality in AutoModelForCausalLM" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930779 (owner: 10Elukey)
[17:09:03] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930779 (owner: 10Elukey)
[17:10:12] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "llm: test device_auto functionality in AutoModelForCausalLM" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930779 (owner: 10Elukey)
[17:18:25] <isaranto>	 going away too, have a nice weekeeend o/
[22:02:11] <wikibugs>	 10Machine-Learning-Team, 10API-Portal, 10Platform Team Initiatives (API Gateway): Add documentation about LiftWing to the API Portal - https://phabricator.wikimedia.org/T325759 (10apaskulin) Hi @elukey and @achou! Just wanted to let you know that I moved some things around in the API Portal. You can now acce...