Container is not running error

I am having an issue I can't figure out how to work around it. I am new to RunPod so please excuse the limited knowledge at this point. I have my pod running (trying to finetune Mistral 7b) and have my SSH pub key configured under settings (can also see it when launching the pod). But when the Pod is ready and I attempt to connect/ssh into it using my private key, I get the same error everytime "Error response from daemon: Container 7b7a3790f1500c544348d2c6e09c286ee3fe3849adcb241ac54bceb3c518619f is not running"....even though I can see the the container running on the Container Logs.
Clicking on the "Start Web Terminal" on the UI doesnt do anything either.... I have restarted/terminated the Pod multiple times...but no luck
Solution:
Thank you guys. The article and the questions sent me in the right direction. I am using a custom template and didnt realize the image "winglian/axolotl:main-py3.10-cu118-2.0.1" didnt have SSH installed. I updated the template with "winglian/axolotl-runpod:main-latest" and all is working now. I am finetuning the model as I write this. Thanks for all the help
Jump to solution
3 Replies
justin
justin4mo ago
Start Web Terminal has a bug right now. Can you share your exact commands? https://github.com/justinwlin/Runpod-Tips-and-Tricks/tree/main/SSH%20On%20Runpod#step-1-install-the-runpod-python-package Also try this isntead, the setup is much simpler. Btw this will only work for future pods. If you want, to modify an existing pod, assuming it has openssh, u can follow the password setup in the guide for ssh
GitHub
Runpod-Tips-and-Tricks/SSH On Runpod at main · justinwlin/Runpod-Ti...
Runpod Tips and tricks repository. Contribute to justinwlin/Runpod-Tips-and-Tricks development by creating an account on GitHub.
justin
justin4mo ago
What template are u using? It is weird ur getting a daemon error considering ur trying to ssh My recommendation is launch with a Runpod Pytorch template, which should have no issues with any openssh / anything like that.
Solution
swordfish01
swordfish014mo ago
Thank you guys. The article and the questions sent me in the right direction. I am using a custom template and didnt realize the image "winglian/axolotl:main-py3.10-cu118-2.0.1" didnt have SSH installed. I updated the template with "winglian/axolotl-runpod:main-latest" and all is working now. I am finetuning the model as I write this. Thanks for all the help