vLLM Endpoint - Gemma3 27b quantized

Hello,

I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.

I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.

Has anyone encountered and resolved this issue?
Was this page helpful?