R
Runpod16mo ago
__den3b__

Workers configuration for Serverless vLLM endpoints: 1 hour lecture with 50 students

Hey there, I need to showcase 50 students how to do RAG with open-source LLMs (i.e., LLama3). Which type of configuration do you suggest? I wanna make sure they have a smooth experience. Thanks!
No description
Solution:
16GB isn't enough, you need 24GB
Jump to solution
11 Replies
digigoblin
digigoblin16mo ago
Depends on which LLama3 model
Madiator2011
Madiator201116mo ago
for 70b non quant you would need at least 2x80GB
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
digigoblin
digigoblin16mo ago
Pods are expensive
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
__den3b__
__den3b__OP16mo ago
8b params can also suffice
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Solution
digigoblin
digigoblin16mo ago
16GB isn't enough, you need 24GB
digigoblin
digigoblin16mo ago
Unless you use a quantized version
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?