vllm containers serving llm streaming requests abruptly stop streaming tokens
This has been an issue for a while, and I thought it was a vllm thing, but I've deployed the same image to AWS and there have never been these issues there. This issue is for A40s, not region-specific, but on AWS the A10 equivalent doesn't have these issues. I've looked at the logs and there isn't anything unusual happening in the runpod container logs.
6 Replies
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
what kind of workarounds are possible here?
Unknown User•3mo ago
Message Not Public
Sign In & Join Server To View
to be honest it's been quite variable, i think never before 30 seconds from my testing
Unknown User•3mo ago
Message Not Public
Sign In & Join Server To View
ok i can try h100