trl vllm-serve not binding to port
I have a pod with two A6000 and I am trying to run
vLLM
on one of them via:
AFAICT, the model launches fine however it seems like there is a problem with binding to the port. I see nothing when doing lsof -i :8000
Is there any obvious additional configuration I need todo?1 Reply
Looks to be a problem with TRL, specifically related to this issue: https://github.com/huggingface/trl/issues/2923
My
trl env
in case its helpful to anyone:
GitHub
NCCL timeout when GRPO training with vllm · Issue #2923 · hugging...
I'm working on reproducing the experiments from this awesome blog. However, I’m encountering an issue when enabling vLLM. When I run the training without vLLM, everything works fine, and I get ...