Runpod•3mo ago

Issue with vLLM on L40s GPU on RunPod

Hey everyone, I’m running into an issue trying to use vLLM on a RunPod instance. Here’s the setup: Instance: L40s (4x GPUs), Ubuntu base image Python: 3.11.10 vLLM: 0.7.3 (also tried 0.8.0 and 0.9.1, but they get stuck earlier) PyTorch: 2.4.0+cu124 CUDA: 12.4 NCCL Runtime: 2.20.5 Originally vLLM was just hanging, but after changing the PyTorch and CUDA versions, it no longer gets stuck. However, I’m now seeing the following error when I run vLLM: NotImplementedError: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). ... '_C::rms_norm' is only available for these backends: [HIP, Meta, BackendSelect, Python, ...] 🔍 Notes: This setup works perfectly on AWS L40s instances. I suspect the issue might be due to missing CUDA kernels in the PyTorch or model build inside the RunPod container. Has anyone faced this on RunPod L40s? Any workaround or Docker base image suggestion to get RMSNorm CUDA kernels working? Appreciate any help!

1 Reply

Dj•3mo ago

It looks like this error is caused by an incompatible torch and vllm version combination, https://github.com/vllm-project/vllm/issues/12441

GitHub

[Bug]: Could not run '_C::rms_norm' with arguments from the 'CUDA' ...

Your current environment env Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: CentOS Lin...

Gaming

Programming

Issue with vLLM on L40s GPU on RunPod

Did you find this page helpful?