RUNPOD_API_KEY and MAX_CONTEXT_LEN_TO_CAPTURE
We are also starting a vLLM project and I have two questions:
1) In the environment variables, do I have to define the RUNPOD_API_KEY with my own secret key to access the final vLLM OpenAI endpoint?
2) Isn't MAX_CONTEXT_LEN_TO_CAPTURE now deprecated? Do we still need to provide it, if MAX_MODEL_LEN is already set?
Thank you
14 Replies
After some try and error, I figured out the solution to 1) that the RUNPOD_API_KEY has no effect. We need to use the actual API KEY that can be generated under accounts -> Settings to access the OpenAI Url.
I'm still not quite certain how to set the model length. I'm getting this error right now: Llama-3 supports 8192 tokens, however I was expecting that it would use RoPE to automatically increase it. Is this not how it's done? RoPE scaling is supported in vLLM: https://github.com/vllm-project/vllm/pull/555
I'm still not quite certain how to set the model length. I'm getting this error right now: Llama-3 supports 8192 tokens, however I was expecting that it would use RoPE to automatically increase it. Is this not how it's done? RoPE scaling is supported in vLLM: https://github.com/vllm-project/vllm/pull/555
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Yeah that is easily done with Aphrodite-engine to increase the model length (by using more memory). vLLM is quite limited.
But based on that PR it must be possible, just not so easy I guess.
You are right @nerdylive , but its called
MAX_MODEL_LEN
I don't see how its possible to set the max_model_len to a value thats higher than whats supported by the model, that doesn't make sense to me @houmie
@Alpay Ariyak is the best person to advise on this.Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
In Aphrodite-engine I can set CONTEXT_LENGTH to 16384 and it automatically uses RoPE scaling, in return it requires more memory.
See bullet point 3 (https://github.com/PygmalionAI/aphrodite-engine?tab=readme-ov-file#notes)
I'm using that right now on production. It is really possible 🙂
Guys I really hope you can help me with bullet point 1 about API-KEYS.
Is there a way I could define the API-KEY for vLLM myself instead of having RunPod creating it for me?
This last one is quite urgent due a migration request.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Of course, happy to help.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Thank you. And sorry do you know by any chance about the API-KEY issue? I hope there is a way.
What is the API key issue? You have to generate an API key in the RunPod web console and use it to make requestes, you can't use a custom API key, you have to use a RunPod one for RunPod serverless to function correctly.
This is also pretty clear in the docs: https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
I see. Ok, so there is no way to set a custom key. Thanks
Nope, not possible, create your own backend as a proxy to serverless if you want to use custom API keys