Runpod•7mo ago

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)

17 Replies

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

I tried all of them I think. The storngest possible for sure. I did

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

is this chioce ok?

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

it's the default (I deleted the instace by now) yes How can I choose a GPU? (there is no choice available in the setup process) should I be using a different template?

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

How do I choose a GPU? Where do I even see which GPU I got?

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

readonly token is ok?

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

I get an error that I need to ask for access to the model in huggingface I applied and waiting for approval... Thanks for your time.

Unknown User•7mo ago

Message Not Public

ErezLOP•7mo ago

I works now. Thanks.

Unknown User•7mo ago

Message Not Public

Gaming

Programming

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

Did you find this page helpful?