© 2026 Hedgehog Software, LLC

Twitter GitHub Discord

More

Communities Docs About Terms Privacy

Does VLLM support quantized models? - Runpod

Runpod•16mo ago•

1 reply

Does VLLM support quantized models?

Trying to figure out how to deploy this, but I didn't see an option for selecting which quantization I wanted to run. https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Thanks!

bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF · Hugg...

bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF · Hugg...

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

21,202Members

Resources

Similar Threads

Was this page helpful?

Similar Threads

vLLM Endpoint - Gemma3 27b quantized

RRunpod / ⚡｜serverless

Settings to reduce delay time using sglang for 4bit quantized models?

RRunpod / ⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

RRunpod / ⚡｜serverless

How does the vLLM serverless worker to support OpenAI API contract?

RRunpod / ⚡｜serverless