Serverless Endpoint with vLLM (Qwen2.5-VL-3B-Instruct)
I’m trying to set up a Serverless Endpoint on RunPod with vLLM (with Qwen2.5-VL-3B-Instruct). My goal is to get a lot of images descriptions. Here is how i set it up:
Now I have several questions. Is it worth passing multiple images to the model in a single call? Will it be more efficient? If so, how should I pass the parameters? Did I miss anything in the ENV vars that would be important to go faster? Thank you very much for any help or tips you can give me.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!