Hi I'm new in Runpod. I deployed a finetined LLM using the vLLM template in Runpod. I'm having problems with the API, when I fire requests using the OpenAI chat completions API it gets stuck processing the request for a couple of minutes and returns 500. When I hit the API using the Runpod endpoint console and afterwards hit it again with the request that 500'd previously it works as expected in about 8 seconds. I am doing this without using a Handler, am I doing something wrong?
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!