The total token limit at 131

I use vLLM and set max model length to 8000 a2048 but out is just 131 (total out + in ), although i have set max tokens to 2048. I try with 2 models and result is the same.
No description
No description
8 Replies
Jason
Jason2w ago
Hi, you need to use the right input for max output token
{
"input": {
"messages": [
{
"role": "system",
"content": "Your are an ai assistant."
},
{
"role": "user",
"content": "Explain llm models"
}
],
"sampling_params": {
"max_tokens": 3000,
"temperature": 0.7,
"top_p": 0.95,
"n": 1,
"stream": false,
"stop": [],
"presence_penalty": 0,
"frequency_penalty": 0,
"logit_bias": {},
"best_of": 1
}
}
}
{
"input": {
"messages": [
{
"role": "system",
"content": "Your are an ai assistant."
},
{
"role": "user",
"content": "Explain llm models"
}
],
"sampling_params": {
"max_tokens": 3000,
"temperature": 0.7,
"top_p": 0.95,
"n": 1,
"stream": false,
"stop": [],
"presence_penalty": 0,
"frequency_penalty": 0,
"logit_bias": {},
"best_of": 1
}
}
}
Something like this, feel free to modify it It should be the max tokens inside the sampling params
riverfog7
riverfog72w ago
does runpod do json schema validation why did that invalid JSON not cause an error
Jason
Jason2w ago
Which? Oh for vllm's unknown input right? Hmm yea interesting does runpod checks them
riverfog7
riverfog72w ago
it should return 4xx error (if they do validation stuff)
Jason
Jason2w ago
This 4xx is http code? I see yeah should be that way
riverfog7
riverfog72w ago
essentially its your fault 5xx: uh oh i messed up 3xx: go somewhere else
Jason
Jason2w ago
Actually, i checked i don't see any validate() calls in vllm worker
riverfog7
riverfog72w ago
that's unfortunate pydantic schema and json validation would be nice

Did you find this page helpful?