The total token limit at 131
I use vLLM and set max model length to 8000 a2048 but out is just 131 (total out + in ), although i have set max tokens to 2048. I try with 2 models and result is the same.


8 Replies
Hi, you need to use the right input for max output token
Something like this, feel free to modify it
It should be the max tokens inside the sampling params
does runpod do json schema validation
why did that invalid JSON not cause an error
Which?
Oh for vllm's unknown input right?
Hmm yea interesting does runpod checks them
it should return 4xx error
(if they do validation stuff)
This 4xx is http code?
I see yeah should be that way
essentially
its your fault
5xx: uh oh i messed up
3xx: go somewhere else
Actually, i checked i don't see any validate() calls in vllm worker
that's unfortunate
pydantic schema and json validation would be nice