Serverless Endpoint with vLLM (Qwen2.5-VL-3B-Instruct)
I’m trying to set up a Serverless Endpoint on RunPod with vLLM (with Qwen2.5-VL-3B-Instruct).
My goal is to get a lot of images descriptions.
Here is how i set it up:
Docker Image:
GPU:
ENV vars:
This call with one image works :
Now I have several questions.
Is it worth passing multiple images to the model in a single call? Will it be more efficient?
If so, how should I pass the parameters?
Did I miss anything in the ENV vars that would be important to go faster?
Thank you very much for any help or tips you can give me.
1 Reply
Unknown User•3mo ago
Message Not Public
Sign In & Join Server To View