Runpod•11mo ago

vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?

11 Replies

wiki•11mo ago

Did you set model name ? Or it was as it is MODEL_NAME?

MisterionOP•11mo ago

MODEL_NAME is huggingface link as usual basically what I experience there is that server closes the connection after ~ 1 min in case stream == True, non-streaming works fine

Unknown User•11mo ago

Message Not Public

MisterionOP•11mo ago

yes this is what I meant, sorry I'm not sure how does MODEL_NAME affect this problem at all

Unknown User•11mo ago

Message Not Public

MisterionOP•11mo ago

Yes, this waits for the whole request to finish.

client = OpenAI(
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
    model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Say hello!",
        },
    ],
)

client = OpenAI(
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
    model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Say hello!",
        },
    ],
)

Adding stream=True, sends the request which I can see in the dashboard, but it terminates the connection after ~1 min.

Unknown User•11mo ago

Message Not Public

MisterionOP•11mo ago

Nope

Unknown User•11mo ago

Message Not Public

Poddy•11mo ago

@Misterion

Escalated To Zendesk

The thread has been escalated to Zendesk!

Justin•11mo ago

Same issue here but even without streaming

Gaming

Programming

vllm worker OpenAI stream timeout

Did you find this page helpful?