Runpod•4mo ago

vllm containers serving llm streaming requests abruptly stop streaming tokens

This has been an issue for a while, and I thought it was a vllm thing, but I've deployed the same image to AWS and there have never been these issues there. This issue is for A40s, not region-specific, but on AWS the A10 equivalent doesn't have these issues. I've looked at the logs and there isn't anything unusual happening in the runpod container logs.

6 Replies

Unknown User•4mo ago

Message Not Public

ZOP•3mo ago

what kind of workarounds are possible here?

Unknown User•3mo ago

Message Not Public

ZOP•3mo ago

to be honest it's been quite variable, i think never before 30 seconds from my testing

Unknown User•3mo ago

Message Not Public

ZOP•3mo ago

ok i can try h100

Gaming

Programming

vllm containers serving llm streaming requests abruptly stop streaming tokens

Did you find this page helpful?