Search
Star
1.4k
Feedback
Setup for Free
R
Runpod
•
16mo ago
•
6 replies
Yasmin
Llama
Hello
! For those who tried
, how much GPU is needed for inference only
, and for fine
-tuning of Llama 70B
? How about the inference of the 400B version
(for knowledge distillation
)
? Is the quality difference worth it
? Thanks
!
Solution
Only for inference
Jump to solution
Runpod
Join
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
21,080
Members
View on Discord
Resources
ModelContextProtocol
ModelContextProtocol
MCP Server
Recent Announcements
Similar Threads
Was this page helpful?
Yes
No
© 2026 Hedgehog Software, LLC
Twitter
GitHub
Discord
System
Light
Dark
More
Communities
Docs
About
Terms
Privacy
Similar Threads
meta-llama/Meta-Llama-3-8B-Instruct serverless
R
Runpod / ⚡|serverless
2y ago
Llama 3.1 + Serveless
R
Runpod / ⚡|serverless
17mo ago
llama.cpp serverless endpoint
R
Runpod / ⚡|serverless
2y ago
Length of output of serverless meta-llama/Llama-3.1-8B-Instruct
R
Runpod / ⚡|serverless
10mo ago