RunpodR
Runpod2y ago
44 replies
shensmobile

vLLM streaming ends prematurely

I'm having issues with my vLLM worker ending a generation early. When I send the same prompt to my API without "stream": true, the prompt returns fully. When "stream": true is added to the API, it stops early, sometimes right after {"user":"assistant"} gets sent. It was working earlier this AM, I see this in the system logs around the time that it stopped working:

2024-06-13T15:37:10Z create pod network
2024-06-13T15:37:10Z create container runpod/worker-vllm:stable-cuda12.1.0
2024-06-13T15:37:11Z start container

Was a newer version pushed? I see that there were two new updates pushed in the last 24 hours at the vllm_worker github repo.
Was this page helpful?