RunpodR
Runpod2y ago
9 replies
marshall

vllm + Ray issue: Stuck on "Started a local Ray instance."

Trying to run TheBloke/goliath-120b-AWQ on vllm + runpod with 2x48GB GPUs:
2024-02-03T12:36:44.148649796Z The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
2024-02-03T12:36:44.149745508Z 
0it [00:00, ?it/s]
0it [00:00, ?it/s]
2024-02-03T12:36:44.406220237Z WARNING 02-03 12:36:44 config.py:175] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
2024-02-03T12:36:46.465465797Z 2024-02-03 12:36:46,465    INFO worker.py:1724 -- Started a local Ray instance.


It's stuck on Started a local Ray instance. and I've tried both with and without RunPod's FlashBoot

has anyone encountered this issue before?

requirements.txt:
vllm==0.2.7
runpod==1.4.0
ray==2.9.1


build script:
from huggingface_hub import snapshot_download

snapshot_download(
    "TheBloke/goliath-120b-AWQ",
    local_dir="model",
    local_dir_use_symlinks=False
)


initialization code:
from vllm import AsyncLLMEngine, AsyncEngineArgs

llm = AsyncLLMEngine.from_engine_args(
    AsyncEngineArgs(model="./model", quantization="awq", tensor_parallel_size=int(os.getenv("tensor_parallel_size", 1)))
)
Was this page helpful?