worker-vllm with the awq model in production, and it recently turned out that there are problems with scaling it (all the requests are being sent to the one worker).worker-vllm. It works when using a pre-built Docker Image but I need to build a custom Docker Image with a slightly modified vllm (there's one minor update that negatively affects the quality of outputs).worker-vllm? Thanks in advance!Join the Discord to ask follow-up questions and connect with the community
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
21,906 Members
Join