RunpodR
Runpod10mo ago
11 replies
Shubham Patel DJ

Serverless instances are not assigned GPUs, resulting in job error in Production. Require Assist

Error Message 1 with Stack Trace:
Task Failed [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=0220236a79a1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=177 ; expr=cudnnCreate(&cudnnhandle); \n

Error Message 2:
Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer

Will refreshing the worker help in this situation ?
Was this page helpful?