Adding Hugging face access token to vllm serverless endpoint
Hi. How to add Hugging face acces token to runpod vllm serverless endpoint? I tried to add it through setting env variables where it's written max 50 and I typed it there HUGGING_FACE_HUB_TOKEN= the token. But whenever run the request it shows state: in queue, the worker side: unhealthy, and the logs:File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 355, in get_config
configdict, = PretrainedConfig.get_config_dict(
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 649, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 708, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 321, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 543, in cached_files
raise OSError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct.
401 Client Error. (Request ID: Root=1-68d26fda-316827ec000944685ee70a65;30dfdf37-c1e0-4658-bf30-9124f96b8ff3)
Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Llama-3.1-8B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in.
I've been granted access to that model and the access token permissions set to "write".
In the docs and screenshots I see a dedicated Hugging Face token input box, but in my endpoint UI I don’t get that option. Is that expected?
2 Replies