vLLM Endpoint - Gemma3 27b quantized
Hello,
I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.
I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.
Has anyone encountered and resolved this issue?
4 Replies
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
I've got the response from the support that vLLM worker doen't support gguf models(
I'll try to fix this for you by updating our vLLM worker - it's a little difficult to work on Docker Images given #🚨|incidents
@Dj thank you for you attitude, I decided to move on with ollama worker and have finished a basic version today that fits my needs