Pod SSH keeps disconnecting

I terminated my pod to avoid being charged. I had tried terminated and recreating the pod multiple times thinking it might be issue with a specific machine. But the same keeps happening. SSH keeps disconnecting randomly that I am not able to do work. Here's the SSH log: -- RUNPOD.IO -- Enjoy your Pod #puuuogf0fbma0f ^_^ Error response from daemon: Container 068a5870008ded73495e58f7295b7e240f686b96ec67f1020bc829c33b378fad is not running Connection to 100.65.19.183 closed. Connection to ssh.runpod.io closed. If necessary, I can print the verbose information. But it is obvious that the error is that the container stopped running after a while.
9 Replies
Calvinn
CalvinnOP5d ago
This is terrible. My container keeps on stopping. The same happens for GPUs with both low and medium availability.
Calvinn
CalvinnOP5d ago
Tried waiting for 10 minutes after the pod starts and it still fails. -- RUNPOD.IO -- Enjoy your Pod #bt7lakn5m1c8xz ^_^ Error response from daemon: container 00777c6698202fc8dd16c538fcf57f06694996b86e1fd66f5a6f5b208fc5114a is not running Connection to 100.65.27.24 closed. Connection to ssh.runpod.io closed.
No description
Unknown User
Unknown User5d ago
Message Not Public
Sign In & Join Server To View
riverfog7
riverfog75d ago
If the logs show nothing try checking system log tab
Calvinn
CalvinnOP5d ago
I see that it is trying to load meta-llama/Meta-Llama-3.1-8B-Instruct model as soon as it is launched. I am using the vLLM latest Docker image. I did not run anything here. The errors appear as soon as the pod is created.
riverfog7
riverfog75d ago
Did you provide the huggingface auth token It is failing on auth
Calvinn
CalvinnOP5d ago
But why is it trying to launch a server? I am just launching a Pod. It is not supposed to launch a vLLM server right away right? Got it. I tried with the Pytorch and it works. Just noticed the vllm:latest image is community. Not sure what is happening but pretty sure the behavior isn't what's normally expected. Marking this issue as resolved. Thank you for all your help!!
riverfog7
riverfog75d ago
Vllm should try to launch a openai api compatible server right away Its intended behavior You can check the cmd of the image
Calvinn
CalvinnOP5d ago
Oh I see. Thank you for pointing it out!

Did you find this page helpful?