Search
Feedback
Setup for Free
R
Runpod
•
2y ago
•
9 replies
Volko
is AWQ faster than GGUF ?
In which order is the fastest inference speed between AWQ
, GGUF
, GPTQ
, QAT
, EXL2
?
Runpod
Join
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
20,883
Members
View on Discord
Resources
ModelContextProtocol
ModelContextProtocol
MCP Server
Recent Announcements
Similar Threads
Was this page helpful?
Yes
No
© 2026 Hedgehog Software, LLC
Twitter
GitHub
Discord
System
Light
Dark
More
Communities
Docs
About
Terms
Privacy
D
digigoblin
•
4/17/24, 5:05 PM
EXL2
D
digigoblin
EXL2
V
Volko
OP
•
4/17/24, 5:13 PM
Okay thanks
D
digigoblin
•
4/17/24, 5:15 PM
Thats the fastest
, don
't know about the others
, never actually used QAT or GGUF
.
@aikitoria will probably know
.
A
aikitoria
•
4/18/24, 7:27 PM
I
've not used AWQ or GPTQ directly
, those are older formats
A
aikitoria
•
4/18/24, 7:27 PM
you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only
. it
's for hybrid CPU
/GPU inference
A
aikitoria
•
4/18/24, 7:27 PM
you use EXL2 for maximum speed on a single GPU
A
aikitoria
•
4/18/24, 7:28 PM
you use aphrodite
-engine or TensorRT
-LLM
(good luck
!
) for maximum speed on multiple GPUs
G
Geri
•
7/12/24, 8:53 PM
hi do you use tensorrt llm
?
Similar Threads
is disk volume faster than network volume?
R
Runpod / ⛅|pods
2y ago
GGUF vllm
R
Runpod / ⚡|serverless
16mo ago
GGUF in serverless vLLM
R
Runpod / ⚡|serverless
2y ago
Empty Tokens Using Mixtral AWQ
R
Runpod / ⚡|serverless
2y ago