Help with constantly crashing GPU pods

Hello, I’ve been struggling for the past few days with trying to get a docker image up and running on a GPU pod. I had success with a template I made (docker image mcgillrobotics/mujoco:cuda118) and managed to connect and get things running, but since then I have not been able to successfully connect to a pod. The docker image pulls, but when I click “Connect to web terminal” nothing happens. When I try to SSH it says the container is not running and kicks me out instantly. I’ve tried different rocket images, different CUDA versions, GPUs, template overrides but have had no luck. I reached out to support on the website and was told I would receive an email, but it’s been a few days and I’ve heard nothing. Would be super grateful if anyone has any input!
Solution:
Yes you need to add sleep infinity
Jump to solution
4 Replies
ashleyk
ashleyk4mo ago
Sounds like your Docker image isn't keeping the container alive
Antoine Dangeard
What do you mean “keeping it alive”? Is there something special I need to do so the container doesn’t die? Or a crash or something?
Solution
ashleyk
ashleyk4mo ago
Yes you need to add sleep infinity
Antoine Dangeard
gotcha, thanks!