worker-vllm cannot download private model
I built my model successfully and it was able to download it during the build. However, when I deploy it on RunPod Serverless, it fails to startup upon request because it cannot download the model.
export DOCKER_BUILDKIT=1
export HF_TOKEN="your_token"
docker build -t user/app:0.0.1 \
--secret id=HF_TOKEN \
--build-arg MODEL_NAME="my_model_path" \
./worker-vllm
engine.py is not able to find the tokenizerengine.py{
"model":"<my_model_path>",
"download_dir":"/runpod-volume",
"quantization":"None",
"load_format":"auto",
"dtype":"auto",
"disable_log_stats":true,
"disable_log_requests":true,
"trust_remote_code":false,
"gpu_memory_utilization":0.95,
"max_parallel_loading_workers":48,
"max_model_len":"None", snapshot_download(
model,
cache_dir=download_dir,
allow_patterns=[
"*token*",
"config.json",
]
)