vLLM Endpoint - Gemma3 27b quantized

Hello, I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM. I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint. Has anyone encountered and resolved this issue?
4 Replies
Jason
Jason2w ago
Try to check the vllm-worker repository in github Maybe it's not supported yet but feel free to comment on the current issues on github
Eugene_Swanley
Eugene_SwanleyOP2w ago
I've got the response from the support that vLLM worker doen't support gguf models(
Dj
Dj2w ago
I'll try to fix this for you by updating our vLLM worker - it's a little difficult to work on Docker Images given #🚨|incidents
Eugene_Swanley
Eugene_SwanleyOP2w ago
@Dj thank you for you attitude, I decided to move on with ollama worker and have finished a basic version today that fits my needs

Did you find this page helpful?