Description:
I'm unable to run ComfyUI or any PyTorch workloads on Blackwell-based GPUs (RTX PRO 6000).
Error:
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
Environment:
GPU: NVIDIA RTX PRO 6000 (Blackwell)
Driver: 580.95.05
CUDA Version (from nvidia-smi): 13.0
Tried templates: "ComfyUI - Blackwell Edition" and custom template with runpod/pytorch:2.4.0-py3.11-cuda12.4.1
Observations:
nvidia-smi shows the GPU is working correctly
PyTorch fails during CUDA initialization
The issue appears to be that the host drivers report CUDA 13.0, but no stable PyTorch release supports CUDA 13.0 yet
Same error occurs across multiple pod restarts and different templates
Request:
Could you either:
Provide a PyTorch image that is compatible with CUDA 13.0 drivers, or
Downgrade the drivers on Blackwell machines to CUDA 12.x for compatibility
Thank you.