CUDA 12.6 Image Having Issues with A40 - Is it a CUDA version issue?
I've been able to use my container that runs off of nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 on various secure gpus with no issue, but all of a sudden I started running into an issue with an A40 recently where it would keep cycling the pod (and not give me any meaningful messages).
The only message I got was the nvidia one about the license agreement. This makes me think it's possibly a cuda mismatch issue? Any thoughts?
If it's a version issue, how do I filter the gpu's to request via graphql for an on-demand pod? This is all automated and scaled so I can't manually pick a gpu every time 😦
1 Reply
Solution
looks like it might be user error.. will mark as resolved once I confirm