Pipeline is not using gpu on serverless
Hi!
I 'm running bart-large-mnli on serverless but as I can see from the worker stats it's not using the gpu, do you know what I'm doing wrong?
The image is my current handler.py
And as docker base I'm using "FROM runpod/base:0.6.2-cuda12.2.0", also tried with "runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04" but still 0% usage of gpu.
Let me know if you need more details!
Thank you π

57 Replies
How are you running the model?
this is the docker, I'm building + push on my docker and running it from a 24gb gpu on serverless

and this is the model downloader

I have a feeling this line:
Is doing something funky.
You should try doing a print right after that:
And see if your code thinks it is running on a CPU.
thank you! I'll try it immediately and let you know
@PatrickR this is the output

I can give you the full repo if you need π
Yep, will be useful for us to help you test it
That would be useful yes! Would love to test out and see what is going on.
here it is! thank you so much for your help
Risky click π
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
if you'd prefer I can give you single files
this is the folder structure

Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
its already doing that
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
with 5 concurrent requests ~5s per request

Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
let me try again cause I don't remember π
I'll launch the 32vcpu and let you know!
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
sure no problem, I see 100% CPU usage and 0% for the GPU
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
thanks for the tip, but I'm performing stress tests sending constantly requests for 1 minutes on it to understand how many requests it can handle so it's always running
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
another strange thing is that on a cheap cpu on hugging face inference endpoint it performs faster than on a 24gb gpu on runpod (that's also why I think that is not using it) π
always ~5 seconds with 5 concurrent requests on a 32 vcpu
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
@nerdylive tried now, still 100% CPU usage and 0% for the GPU π¦
I might look at it
thank you π
Hey, so I went through this and I've this input:
and this output:
Here is my python code:
`
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
So I am getting the GPU to run through CUDA.
Yes, output of the device is GPU.
BTW I used the CLI tool
runpodctl project create for faster itteration cycles/not having to rebuild docker constantly.Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
I rebuilt the new Docker image based off another image:
I think he trying to use the cache_model.py to cache the model locally when building the docker image. He set local_files_only=True, just to make sure it never download from internet.
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
i don't feel anything wrong with thatπ , I am still wondering what Patrick changed make it works to start using the GPU.
Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
Sorry, my code was a little bit of a redherring. Here is a screenshot of it running on GPU though.

Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View
hi! thank you so much for your help, I will try with the suggested docker image π
I think this might be the root cause, in your requirements.txt, you have to set:
torch==2.2.1

Make sure to install cuda version not cpu
I'll try setting manually the torch version, because it's strange that I still see 0% of the GPU usage

so I have to remove torch and use pytorch and pytorch-cuda=12.1 right?
Assming your base image is CUDA 12.1
that's crazy, always 0% π©

Its using GPU if the GPU memory is showing as used
That telemetry is not real time and not reliable
but it's strange that even if I run stress test on it for over 1 minute it's never used π
check nvidia-smi
I added some logs in the code and it is using the GPU.


Yep, the GPU utilization telemetry always confuses people because its not real-time
this one is interesting, lol
π

Unknown Userβ’17mo ago
Message Not Public
Sign In & Join Server To View