Inconsistent Pod behavior

I'm running the same exact custom docker image on the same exact instance types. One pod always fails with a CUDA memory error, and the other pod doesn't. I am using the same exact setups for the pods, the same exact inputs, the same everything.

I have tried switching to use multiple different regions of pod, different volumes, etc. Behavior is still the same.
Was this page helpful?