Created by shensmobile on 6/13/2024 in #⚡|serverless
vLLM streaming ends prematurely
I'm having issues with my vLLM worker ending a generation early. When I send the same prompt to my API without "stream": true, the prompt returns fully. When "stream": true is added to the API, it stops early, sometimes right after {"user":"assistant"} gets sent. It was working earlier this AM, I see this in the system logs around the time that it stopped working: 2024-06-13T15:37:10Z create pod network 2024-06-13T15:37:10Z create container runpod/worker-vllm:stable-cuda12.1.0 2024-06-13T15:37:11Z start container Was a newer version pushed? I see that there were two new updates pushed in the last 24 hours at the vllm_worker github repo.
45 replies