Runpod•2y ago

Problem with RunPod cuda base image. Jobs stuck in queue forever

Hello, I'm trying to do a request to a serverless endpoint that uses this base image on its Dockerfile FROM runpod/base:0.4.0-cuda11.8.0 I want the serverside to run the input_fn function when I do the request. This is part of the server side code:

model = model_fn('/app/src/tapnet/checkpoints/')
runpod.serverless.start({"handler": input_fn})

model = model_fn('/app/src/tapnet/checkpoints/')
runpod.serverless.start({"handler": input_fn})

If I use the cuda base image it does not run input_fn, I only see the debug prints from model_fn and then the job stays in queue forever (photo). The thing is that if I use this base image: FROM python:3.11.1-buster It does run both input_fn and model_fn So my questions are: - Why is the problem happening in the cuda base image? - What are the implications of using the 2nd base image? Are there cuda or pytorch dependencies missing here? - What base image should I use? What do I do?

Solution:

Message Not Public

Jump to solution

12 Replies

Unknown User•2y ago

Message Not Public

galakurpismo3OP•2y ago

FROM runpod/base:0.4.0-cuda11.8.0 FROM python:3.11.1-buster Python dependencies COPY builder/requirements.txt /requirements.txt RUN python3.11 -m pip install --upgrade pip && \ python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \ rm /requirements.txt Add src files (Worker Template) COPY src /app/src Ensure the checkpoints directory exists and copy the checkpoint file RUN mkdir -p /app/src/tapnet/checkpoints COPY src/tapnet/checkpoints/bootstapir_checkpoint.pt /app/src/tapnet/checkpoints/bootstapir_checkpoint.pt Set working directory WORKDIR /app Set AWS credentials. DEBUG, luego poner en env o ENV AWS_ACCESS_KEY_ID=... ENV AWS_SECRET_ACCESS_KEY=... ENV AW... ENV PYTHONPATH=/app CMD ["python3.11", "-u", "src/inference.py"] if i use the cuda image it is not running, if i use the other image, it runs, it gets the video and everything sorry for the bad format on the dockerfile, but its just the typical thing i guess

Unknown User•2y ago

Message Not Public