R
RunPod•13mo ago
Dhruv Mullick

TensorRT-LLM setup

Has anyone been able to successfully install tensorrt_llm? I'm trying with pip, but I'm running into mpi related errors: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found I've tried a few templates (runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04; nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3) on A100 and on a 4090. Cuda 12.2
30 Replies
Madiator2011
Madiator2011•13mo ago
tried:
apt update
apt install libopenmpi-dev
apt update
apt install libopenmpi-dev
Dhruv Mullick
Dhruv MullickOP•13mo ago
Doesn't work unfortunately Tried uninstalling and reinstalling as well. But doesn't help
Madiator2011
Madiator2011•13mo ago
apt-get install libopenmpi-dev openmpi-bin
Dhruv Mullick
Dhruv MullickOP•13mo ago
Yeah, tried them too. I've narrowed down the problem to building mpi4py which gets built from tensorrt_llm
Dhruv Mullick
Dhruv MullickOP•13mo ago
Madiator2011
Madiator2011•13mo ago
are you running it in venv or normal?
Dhruv Mullick
Dhruv MullickOP•13mo ago
Normally Let me try in venv
Madiator2011
Madiator2011•13mo ago
mpicc --version do you get output?
Dhruv Mullick
Dhruv MullickOP•13mo ago
Same error: root@afabf97a0d57:/workspace# mpicc --version Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found
Madiator2011
Madiator2011•13mo ago
try with venv
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
Dhruv Mullick
Dhruv MullickOP•13mo ago
Same error šŸ˜…
Dhruv Mullick
Dhruv MullickOP•13mo ago
(Bottom part -> )
Madiator2011
Madiator2011•13mo ago
you will probably need to ask on their repo
Dhruv Mullick
Dhruv MullickOP•13mo ago
Okay, thank you
Dhruv Mullick
Dhruv MullickOP•13mo ago
@Papa Madiator , are we doing anything MPI related while spawning the container on RunPod? https://github.com/mpi4py/mpi4py/issues/483 Per this, on a clean container from the image I shared, the mpi issue isn't there
GitHub
pip installation fails with "Cannot open configuration file" Ā· Iss...
Hello, I'm trying to install mpi4py (dependency of tensorrt_llm) using pip, but I get the error: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nc...
Madiator2011
Madiator2011•13mo ago
@Dhruv Mullick I mean runpod does not change files in docker container
Dhruv Mullick
Dhruv MullickOP•13mo ago
https://discord.com/channels/912829806415085598/948767517332107274/1225899896532504596 With reference to the new error here (reached this point thanks to @aikitoria) Can we increase the limit? I don't have permissions to do so...
aikitoria
aikitoria•13mo ago
you should be able to stop openmpi from trying to increase it idk why the variable I posted doesn't work for you
Madiator2011
Madiator2011•13mo ago
It's not possible as containers are not provilaged
Dhruv Mullick
Dhruv MullickOP•13mo ago
@aikitoria , did you do a apt install libopenmpi-dev as well if you remember? I'm not sure if we should be doing that based on the github link I shared above But if I don't, then I get a different set of errors like: /usr/bin/ld: cannot find -lvt.mpi: No such file or directory /usr/bin/ld: cannot find -lvt-hyb: No such file or directory /usr/bin/ld: cannot find -lvt.ompi: No such file or directory _configtest.c:2:10: fatal error: mpi.h: No such file or directory
aikitoria
aikitoria•13mo ago
https://www.reddit.com/r/LocalLLaMA/comments/1b4iy16/comment/kt2nuee/ I ended up not having any time to mess more with tensorrt-llm my original goal was to run tritonserver
Dhruv Mullick
Dhruv MullickOP•13mo ago
Worked!
aikitoria
aikitoria•13mo ago
so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920
Dhruv Mullick
Dhruv MullickOP•13mo ago
Thanks a lot!! I think the apt-get command along with the exports you shared together worked out for me I'm on the runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 template. Will have to see if it works with others too
aikitoria
aikitoria•13mo ago
if you don't want to run triton that should work just fine
Dhruv Mullick
Dhruv MullickOP•13mo ago
Well. Triton is the goal Will go through your post šŸ˜„
aikitoria
aikitoria•13mo ago
then you should run it in the nvidia container image like I did there yeah but you have to install trtllm the same way to get the tools to build the engine locally I didn't get to the step of actually running triton realized it would be more work than I have time for rn I definitely want min-p sampling for example but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154 it's probably not that hard to add it except if I build trtllm myself the built executable doesn't work worlds least stable software
Dhruv Mullick
Dhruv MullickOP•13mo ago
Does seem that way! Thanks for helping out here šŸ˜„
Geri
Geri•12mo ago
hi guys - is someone using torch tensorrt?
Madiator2011
Madiator2011•12mo ago
What are requirements might take time and if I get some of it can try build one.

Did you find this page helpful?