I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM.
I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint.