R
Runpod11mo ago
Misterion

vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
11 Replies
wiki
wiki11mo ago
Did you set model name ? Or it was as it is MODEL_NAME?
Misterion
MisterionOP11mo ago
MODEL_NAME is huggingface link as usual basically what I experience there is that server closes the connection after ~ 1 min in case stream == True, non-streaming works fine
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Misterion
MisterionOP11mo ago
yes this is what I meant, sorry I'm not sure how does MODEL_NAME affect this problem at all
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Misterion
MisterionOP11mo ago
Yes, this waits for the whole request to finish.
client = OpenAI(
base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
messages=[
{
"role": "user",
"content": "Say hello!",
},
],
)
client = OpenAI(
base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
messages=[
{
"role": "user",
"content": "Say hello!",
},
],
)
Adding stream=True, sends the request which I can see in the dashboard, but it terminates the connection after ~1 min.
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Misterion
MisterionOP11mo ago
Nope
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Poddy
Poddy11mo ago
@Misterion
Escalated To Zendesk
The thread has been escalated to Zendesk!
Justin
Justin11mo ago
Same issue here but even without streaming

Did you find this page helpful?