vLLM Dynamic Batching

Hi, I currently use a locally hosted exl2 setup but want to migrate my inference to RunPod serverless. My use case requires processing hundreds, sometimes thousands of prompts at the same time. I'm currently taking advantage of exl2's dynamic batching to figure out the optimal collating for batch processing. Does vLLM backend support taking in thousands of prompts (some of which could be close to 4096 tokens long) through the openAI API and process them as a job and return the results as a batch?
Was this page helpful?