Runpod•6mo ago

vllm containers serving llm streaming requests abruptly stop streaming tokens

This has been an issue for a while, and I thought it was a vllm thing, but I've deployed the same image to AWS and there have never been these issues there. This issue is for A40s, not region-specific, but on AWS the A10 equivalent doesn't have these issues. I've looked at the logs and there isn't anything unusual happening in the runpod container logs.