RunpodR
Runpod2y ago
23 replies
galakurpismo3

Problem with RunPod cuda base image. Jobs stuck in queue forever

Hello, I'm trying to do a request to a serverless endpoint that uses this base image on its Dockerfile
FROM runpod/base:0.4.0-cuda11.8.0

I want the serverside to run the input_fn function when I do the request. This is part of the server side code:
model = model_fn('/app/src/tapnet/checkpoints/')
runpod.serverless.start({"handler": input_fn}) 


If I use the cuda base image it does not run input_fn, I only see the debug prints from model_fn and then the job stays in queue forever (photo).

The thing is that if I use this base image:
FROM python:3.11.1-buster
It does run both input_fn and model_fn

So my questions are:
- Why is the problem happening in the cuda base image?
- What are the implications of using the 2nd base image? Are there cuda or pytorch dependencies missing here?
- What base image should I use? What do I do?
job_inqueue_runpod.jpg
Solution
Hmm yeah I guess python 3.11 is missing from that runpod base image..
Was this page helpful?