I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM
I do this with maximum possible memory.
After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log.
Is it even possible to run this model?
What is the problem? can this be resolved?
(for the record, I did manage to run a much smaller model using the same procedure as above)
17 Replies
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
I tried all of them I think. The storngest possible for sure.
I did
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
is this chioce ok?

Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
it's the default (I deleted the instace by now)
yes
How can I choose a GPU? (there is no choice available in the setup process)
should I be using a different template?
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
no
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
How do I choose a GPU? Where do I even see which GPU I got?
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
readonly token is ok?
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
I get an error that I need to ask for access to the model in huggingface
I applied and waiting for approval...
Thanks for your time.
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
I works now. Thanks.
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View