vllm seems not use GPU
i'm using vllm
and on the graph, when i launch some request, only cpu usage increase.
if i open a terminal and launch nvidia-smi, i didn't see any process too.
settings line
--model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97

36 Replies
Unknown User•13mo ago
Message Not Public
Sign In & Join Server To View
i tried on 4 different pod.
for cuda version, i don't know where i can set it
Unknown User•13mo ago
Message Not Public
Sign In & Join Server To View
i m trying pod not serverless.
i don't see where in pod i can filter cuda
Unknown User•13mo ago
Message Not Public
Sign In & Join Server To View
thanks!
Unknown User•13mo ago
Message Not Public
Sign In & Join Server To View
i used A40 so 12.4
i ll try with RTX6000 12.5 to check if i see a difference
i don't understand why i don't see any processes here

Unknown User•13mo ago
Message Not Public
Sign In & Join Server To View
Hi, was this issue solved? I have the same problem with the latest Pytorch and Cuda, as well. I also reset my pod, etc. but CPU is at 100%, GPU utilisation is low, and I have no processes showing up in nvidia-smi
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
What do you mean by the official vllm? I'm installing it via pip
I have a pod
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
I already reset the pod, it doesnt seem to be that
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
pyenv virtualenv, then pip install vllm
And yes, I deleted the pod and started a new one
Or do you mean a pod with a different pytorch version?
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
Yes, after deleting I still have this problem
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
Yes, the same issue with a new pod
100% CPU and ~50% GPU or 100% GPU?
I don't see a process there either, neither for lmdeploy
but it does show up in nvtop
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
No, tokens per second is very low for me (10-12 tps)
Thanks for the video!
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
A100 gpu and Qwen2.5 32B Instruct
I'm starting to think it has something to do with the JSON output I'm generating, maybe
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
I have deleted and recreated it, and it's the sane
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
I tried it on an Ada 6000 or so, and it had regular performance which is why I'm surprised
On a different hoster
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
I didn't use a A6000 on runpod, but on Hetzner
In the meantime I have setup LMDeploy and it seems to fully utilise the A100 on runpod
So, it might just be vLLM that's broken
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
Huh, okay
There are issues on Github as well, where people are stuck on 100% CPU but low GPU utilisation
No solutions there either
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
In the vllm github
At least LMDeploy seems to work for now
Thank you!
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View