R
Runpod•3mo ago
Ellroy

Not possible to set temperature / top_p using Serverless vLLM via quick deploy?

By default, vLLM loads sampling parameters (e.g. temperature / top_p) from a model's generation_config.json if present. (see here: https://github.com/vllm-project/vllm/issues/15241). To override this you, have to pass --generation-config when starting the vLLM server. Because RunPod's worker-vllm (https://github.com/runpod-workers/worker-vllm) doesn't expose an environment variable to pipe through a --generation-config value, does this mean it's not possible to change the temperature or top_p for any model deployed by Serverless vLLM quick deploy that has a generation_config.json file e.g. all the Meta Llama models? And the solutions is a customer Docker image / Worker?
5 Replies
Unknown User
Unknown User•3mo ago
Message Not Public
Sign In & Join Server To View
3WaD
3WaD•3mo ago
generation_config="vllm" is required when using openAI too. I think the official template should get inspired also by allowing the user to pass any vllm engine argument/env variable. Ellroy if you want custom check this one. We allow exactly that, even dynamically per coldstart request
Ellroy
EllroyOP•3mo ago
vLLM ignores the input from the OpenAI client unless the vLLM server is started with --generation-config vllm Thanks @3WaD - will check it out 🙂
Ellroy
EllroyOP•3mo ago
For anyone seeing this - I misunderstood how this works. Defaults are loaded from generation_config.json if present but values passed at runtime still take precedence, as you would expect – https://chatgpt.com/share/68516a00-0b28-8000-b2b6-a51b890e4be0
3WaD
3WaD•3mo ago
For anyone overusing ChatGPT - the AI bot on the vLLM docs page has access to the entire source code, documentation, and GitHub issues and discussions.

Did you find this page helpful?