vllm worker OpenAI stream timeout
OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible.
I'm hosting 70B model, which usualy has ~2 mins delay for request.
Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
11 Replies
Did you set model name ?
Or it was as it is MODEL_NAME?
MODEL_NAME is huggingface link as usual
basically what I experience there is that server closes the connection after ~ 1 min in case stream == True, non-streaming works fine
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
yes this is what I meant, sorry
I'm not sure how does MODEL_NAME affect this problem at all
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
Yes, this waits for the whole request to finish.
Adding
stream=True, sends the request which I can see in the dashboard, but it terminates the connection after ~1 min.Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
Nope
Unknown User•11mo ago
Message Not Public
Sign In & Join Server To View
@Misterion
Escalated To Zendesk
The thread has been escalated to Zendesk!
Same issue here but even without streaming