I've been using vLLM on my homelab servers for a while and I'm looking to add the ability to scale my application using RunPod. On my locally hosted vLLM instances, I use output guidance via the "outlines" guided decoder to constrain LLM output to specified Json Schemas or Regex.
One question I haven't been able to find an answer to: Does RunPod support this functionality with serverless vLLM hosting in the OpenAI API? (I assume it supports it with pods if you set up your own instance of vLLM)
It's looking like the answer is no, but I'm hopeful the answer is "yes" as I'd really like to take advantage of the benefits of serverless hosting AND guided output.
Appreciate any help or insight you can provide. Thanks in advance, cheers.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!