Issue with vLLM on L40s GPU on RunPod
Hey everyone,
I’m running into an issue trying to use vLLM on a RunPod instance. Here’s the setup:
Instance: L40s (4x GPUs), Ubuntu base image
Python: 3.11.10
vLLM: 0.7.3 (also tried 0.8.0 and 0.9.1, but they get stuck earlier)
PyTorch: 2.4.0+cu124
CUDA: 12.4
NCCL Runtime: 2.20.5
Originally vLLM was just hanging, but after changing the PyTorch and CUDA versions, it no longer gets stuck. However, I’m now seeing the following error when I run vLLM:
NotImplementedError: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build).
...
'_C::rms_norm' is only available for these backends: [HIP, Meta, BackendSelect, Python, ...]
🔍 Notes:
This setup works perfectly on AWS L40s instances.
I suspect the issue might be due to missing CUDA kernels in the PyTorch or model build inside the RunPod container.
Has anyone faced this on RunPod L40s? Any workaround or Docker base image suggestion to get RMSNorm CUDA kernels working?
Appreciate any help!
1 Reply
It looks like this error is caused by an incompatible torch and vllm version combination, https://github.com/vllm-project/vllm/issues/12441
GitHub
[Bug]: Could not run '_C::rms_norm' with arguments from the 'CUDA' ...
Your current environment env Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: CentOS Lin...