Urgent: All new gpu pods are broken
Hi, our existing pods and new pods we are creating are having all same issue where they cannot find cuda devices, all giving error
Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled
Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled