Runpod•2w ago

vLLM Endpoint - Gemma3 27b quantized

Hello, I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM. I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint. Has anyone encountered and resolved this issue?

google/gemma-3-27b-it-qat-q4_0-gguf · Hugging Face

4 Replies

Jason•2w ago

Try to check the vllm-worker repository in github Maybe it's not supported yet but feel free to comment on the current issues on github

Eugene_SwanleyOP•2w ago

I've got the response from the support that vLLM worker doen't support gguf models(

Dj•2w ago

I'll try to fix this for you by updating our vLLM worker - it's a little difficult to work on Docker Images given #🚨｜incidents

Eugene_SwanleyOP•2w ago

@Dj thank you for you attitude, I decided to move on with ollama worker and have finished a basic version today that fits my needs

Gaming

Programming

vLLM Endpoint - Gemma3 27b quantized

Did you find this page helpful?