TensorRT-LLM setup

Has anyone been able to successfully install tensorrt_llm? I'm trying with pip, but I'm running into mpi related errors: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found I've tried a few templates (runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04; nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3) on A100 and on a 4090. Cuda 12.2
30 Replies
Madiator2011
Madiator2011ā€¢2mo ago
tried:
apt update
apt install libopenmpi-dev
apt update
apt install libopenmpi-dev
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Doesn't work unfortunately Tried uninstalling and reinstalling as well. But doesn't help
Madiator2011
Madiator2011ā€¢2mo ago
apt-get install libopenmpi-dev openmpi-bin
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Yeah, tried them too. I've narrowed down the problem to building mpi4py which gets built from tensorrt_llm
Madiator2011
Madiator2011ā€¢2mo ago
are you running it in venv or normal?
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Normally Let me try in venv
Madiator2011
Madiator2011ā€¢2mo ago
mpicc --version do you get output?
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Same error: root@afabf97a0d57:/workspace# mpicc --version Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found
Madiator2011
Madiator2011ā€¢2mo ago
try with venv
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Same error šŸ˜…
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
(Bottom part -> )
Madiator2011
Madiator2011ā€¢2mo ago
you will probably need to ask on their repo
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Okay, thank you
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
@Papa Madiator , are we doing anything MPI related while spawning the container on RunPod? https://github.com/mpi4py/mpi4py/issues/483 Per this, on a clean container from the image I shared, the mpi issue isn't there
GitHub
pip installation fails with "Cannot open configuration file" Ā· Iss...
Hello, I'm trying to install mpi4py (dependency of tensorrt_llm) using pip, but I get the error: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nc...
Madiator2011
Madiator2011ā€¢2mo ago
@Dhruv Mullick I mean runpod does not change files in docker container
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
https://discord.com/channels/912829806415085598/948767517332107274/1225899896532504596 With reference to the new error here (reached this point thanks to @aikitoria) Can we increase the limit? I don't have permissions to do so...
aikitoria
aikitoriaā€¢2mo ago
you should be able to stop openmpi from trying to increase it idk why the variable I posted doesn't work for you
Madiator2011
Madiator2011ā€¢2mo ago
It's not possible as containers are not provilaged
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
@aikitoria , did you do a apt install libopenmpi-dev as well if you remember? I'm not sure if we should be doing that based on the github link I shared above But if I don't, then I get a different set of errors like: /usr/bin/ld: cannot find -lvt.mpi: No such file or directory /usr/bin/ld: cannot find -lvt-hyb: No such file or directory /usr/bin/ld: cannot find -lvt.ompi: No such file or directory _configtest.c:2:10: fatal error: mpi.h: No such file or directory
aikitoria
aikitoriaā€¢2mo ago
https://www.reddit.com/r/LocalLLaMA/comments/1b4iy16/comment/kt2nuee/ I ended up not having any time to mess more with tensorrt-llm my original goal was to run tritonserver
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Worked!
aikitoria
aikitoriaā€¢2mo ago
so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Thanks a lot!! I think the apt-get command along with the exports you shared together worked out for me I'm on the runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 template. Will have to see if it works with others too
aikitoria
aikitoriaā€¢2mo ago
if you don't want to run triton that should work just fine
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Well. Triton is the goal Will go through your post šŸ˜„
aikitoria
aikitoriaā€¢2mo ago
then you should run it in the nvidia container image like I did there yeah but you have to install trtllm the same way to get the tools to build the engine locally I didn't get to the step of actually running triton realized it would be more work than I have time for rn I definitely want min-p sampling for example but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154 it's probably not that hard to add it except if I build trtllm myself the built executable doesn't work worlds least stable software
Dhruv Mullick
Dhruv Mullickā€¢2mo ago
Does seem that way! Thanks for helping out here šŸ˜„
Geri
Geriā€¢4w ago
hi guys - is someone using torch tensorrt?
Madiator2011
Madiator2011ā€¢3w ago
What are requirements might take time and if I get some of it can try build one.