vLLM Endpoint - Gemma3 27b quantized
Hello,
I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.
I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.
Has anyone encountered and resolved this issue?
I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.
I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.
Has anyone encountered and resolved this issue?