R
Runpod4mo ago
Z

vllm containers serving llm streaming requests abruptly stop streaming tokens

This has been an issue for a while, and I thought it was a vllm thing, but I've deployed the same image to AWS and there have never been these issues there. This issue is for A40s, not region-specific, but on AWS the A10 equivalent doesn't have these issues. I've looked at the logs and there isn't anything unusual happening in the runpod container logs.
6 Replies
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Z
ZOP3mo ago
what kind of workarounds are possible here?
Unknown User
Unknown User3mo ago
Message Not Public
Sign In & Join Server To View
Z
ZOP3mo ago
to be honest it's been quite variable, i think never before 30 seconds from my testing
Unknown User
Unknown User3mo ago
Message Not Public
Sign In & Join Server To View
Z
ZOP3mo ago
ok i can try h100

Did you find this page helpful?