GGUF vllm
It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
9 Replies
Unknown User•14mo ago
Message Not Public
Sign In & Join Server To View
hi, is there any solution to this?
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
the problem is that you have to specify gguf file name, and belive there is no such env var for vllm worker
we could download the model and pack it in the container, but I was just looking for out-of-the-box solution
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
will do that as a workaround, but would be nice to support that natively
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
I will add a support for this natively
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View