Intermittent "CUDA error: device-side assert triggered" on Runpod Serverless GPU Worker
I’m deploying a GPU-based service on Runpod Serverless. Most requests run fine, but after some time I start getting errors like: "error": "CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.",
"executionTime": 120575
1 Reply
Unknown User•4d ago
Message Not Public
Sign In & Join Server To View