vLLM Endpoint - Gemma3 27b quantized
Hello,
I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.
I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.
Has anyone encountered and resolved this issue?
4 Replies
Try to check the vllm-worker repository in github
Maybe it's not supported yet but feel free to comment on the current issues on github
I've got the response from the support that vLLM worker doen't support gguf models(
I'll try to fix this for you by updating our vLLM worker - it's a little difficult to work on Docker Images given #🚨|incidents
@Dj thank you for you attitude, I decided to move on with ollama worker and have finished a basic version today that fits my needs