vllm worker OpenAI stream timeout
OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible.
I'm hosting 70B model, which usualy has ~2 mins delay for request.
Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
I'm hosting 70B model, which usualy has ~2 mins delay for request.
Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?