I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM
I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!