vllm seems not use GPU
i'm using vllm
and on the graph, when i launch some request, only cpu usage increase.
if i open a terminal and launch nvidia-smi, i didn't see any process too.
settings line
--model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97
and on the graph, when i launch some request, only cpu usage increase.
if i open a terminal and launch nvidia-smi, i didn't see any process too.
settings line
--model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97


