Search
Setup for Free
R
Runpod
•
15mo ago
StandingFuture
Does VLLM support quantized models?
Trying to figure out how to deploy this
, but I didn
't see an option for selecting which quantization I wanted to run
.
https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Thanks
!
bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF · Hugg...
Runpod
Join
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
20,853
Members
View on Discord
Similar Threads
Was this page helpful?
Yes
No
© 2026 Hedgehog Software, LLC
Twitter
GitHub
Discord
System
Light
Dark
More
Communities
Docs
About
Terms
Privacy
S
StandingFuture
OP
•
10/25/24, 1:39 AM
I tried doing download directory to the quant model
, but I see that the model says
"Using llama
.cpp release b3496 for quantization
.
" and I don
't see that as an option on runpod for the quantization method
Similar Threads
vLLM Endpoint - Gemma3 27b quantized
R
Runpod / ⚡|serverless
7mo ago
Settings to reduce delay time using sglang for 4bit quantized models?
R
Runpod / ⚡|serverless
12mo ago
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
R
Runpod / ⚡|serverless
14mo ago
How does the vLLM serverless worker to support OpenAI API contract?
R
Runpod / ⚡|serverless
2y ago