I deployed a serverless vLLM using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B But when i made a request, output is only 16 tokens (tested many times), I don't change anything from default setting but max_model_length to 32768. How can i fix that? or did I miss any config?
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!