Adding Hugging face access token to vllm serverless endpoint
Hi. How to add Hugging face acces token to runpod vllm serverless endpoint? I tried to add it through setting env variables where it's written max 50 and I typed it there HUGGING_FACE_HUB_TOKEN= the token. But whenever run the request it shows state: in queue, the worker side: unhealthy, and the logs:File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 355, in get_config
configdict, = PretrainedConfig.get_config_dict(
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 649, in get_config_dict
Access to model meta-llama/Llama-3.1-8B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in. I've been granted access to that model and the access token permissions set to "write". In the docs and screenshots I see a dedicated Hugging Face token input box, but in my endpoint UI I don’t get that option. Is that expected?