Created by whirr on 4/24/2024 in #⚡|serverless
API not properly propping up?
Hi I'm new in Runpod. I deployed a finetined LLM using the vLLM template in Runpod. I'm having problems with the API, when I fire requests using the OpenAI chat completions API it gets stuck processing the request for a couple of minutes and returns 500. When I hit the API using the Runpod endpoint console and afterwards hit it again with the request that 500'd previously it works as expected in about 8 seconds. I am doing this without using a Handler, am I doing something wrong?
3 replies