Serverless Endpoint with vLLM (Qwen2.5-VL-3B-Instruct)
I’m trying to set up a Serverless Endpoint on RunPod with vLLM (with Qwen2.5-VL-3B-Instruct).
My goal is to get a lot of images descriptions.
Here is how i set it up:
Docker Image:
GPU:
ENV vars:
This call with one image works :
Now I have several questions.
Is it worth passing multiple images to the model in a single call? Will it be more efficient?
If so, how should I pass the parameters?
Did I miss anything in the ENV vars that would be important to go faster?
Thank you very much for any help or tips you can give me.
My goal is to get a lot of images descriptions.
Here is how i set it up:
Docker Image:
GPU:
ENV vars:
This call with one image works :
Now I have several questions.
Is it worth passing multiple images to the model in a single call? Will it be more efficient?
If so, how should I pass the parameters?
Did I miss anything in the ENV vars that would be important to go faster?
Thank you very much for any help or tips you can give me.