R
Runpod•2y ago
Santosh

NGC containers

Has anyone gotten NGC containers running on runpod? I see it as an option but I think it doesn't work because you need to install the ssh libraries on top. I need this to use FP8 on H100s since the PyTorch NGC container includes Transformer Engine for FP8. Building Transformer Engine manually takes a long time (requires downloading a cudnn tarball from NVIDIA website).
40 Replies
Madiator2011
Madiator2011•2y ago
Yes
Santosh
SantoshOP•2y ago
Is there any docs or quick example on how to use it? Any update on this? CC @mmoy
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Santosh
SantoshOP•2y ago
yeah I got that, I guess I was just too lazy to add the required ssh libs and create that template I also didn't understand why RunPod PyTorch NGC containers are available in the dropdown selection if the limitations are known. Maybe I'm just not using it correctly?
Madiator2011
Madiator2011•2y ago
I can always take comissions
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Santosh
SantoshOP•2y ago
how do you use a container and "Connect" if there's no SSH access? There's also no option to SSH into the host and use the container interactively So I'm not sure what you can do after deploying a "RunPod PyTorch NGC" template
Madiator2011
Madiator2011•2y ago
if you run bare image you might need to set container command to
bash -c 'sleep infinity'
bash -c 'sleep infinity'
Santosh
SantoshOP•2y ago
but how would I SSH into the container or is the SSH command for host machine with Docker access? anyways I think I can create a template to fix it with my remaining few dollars of credits 😅
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Santosh
SantoshOP•2y ago
this is for pods, a pod still runs a container a pod doesn't give you access to the host machine
Madiator2011
Madiator2011•2y ago
Give me like 1h will build container for you
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Madiator2011
Madiator2011•2y ago
any specific docker image as base? @sbhavani
Santosh
SantoshOP•2y ago
latest container from a few days ago: nvcr.io/nvidia/pytorch:24.04-py3
Madiator2011
Madiator2011•2y ago
I think it should work note volume storage wont be /workspace @sbhavani btw you wanted template for pods? Note image requires host with CUDA 12.4 @sbhavani so you have any code for test?
Santosh
SantoshOP•2y ago
yes template for pods, I guess it depends on the driver version for the host too
Madiator2011
Madiator2011•2y ago
I got template done just need to run some test and if you have any small code to test 8bit quant let me know
Santosh
SantoshOP•2y ago
GitHub
GitHub - NVIDIA/TransformerEngine: A library for accelerating Trans...
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio...
Madiator2011
Madiator2011•2y ago
what kinda output shall I get from it?
Santosh
SantoshOP•2y ago
hmm actually that code is more functional testing, I don't have anything readily available to test perf/speed up I can clean up this repo and add a HF LLama-2/3 example comparing BF16 and FP8 throughput: https://github.com/sbhavani/h100-performance-tests
Madiator2011
Madiator2011•2y ago
I kinda run it and not getting anything not output or error
Santosh
SantoshOP•2y ago
then sounds like it works! if you publish to the community I'll test it out as well
Madiator2011
Madiator2011•2y ago
It should be cached on H100 PCIe CA region on secure cloud at leas @sbhavani https://runpod.io/console/deploy?template=lc5dch2fuv&ref=vfker49t template name pytorch-ngc-runpod password for jupiter is RunPod volume storage is being mounted at /vol btw @sbhavani let me know if it worked for you not much rare im happy to help build templates but not if you ask me to add 50 models from civati ai
Santosh
SantoshOP•2y ago
thanks! I'll test it out on friday!
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Madiator2011
Madiator2011•2y ago
I can build you container that would block access to civati ai
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Madiator2011
Madiator2011•2y ago
FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04
RUN echo "127.0.0.1 civitai.com" >> /etc/hosts
FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04
RUN echo "127.0.0.1 civitai.com" >> /etc/hosts
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Geri
Geri•2y ago
im looking for a pytorch docker container without runpod can i just do a docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.10-py3?
Geri
Geri•2y ago
i want to use pytorch with sentence transformers from huggingface (https://github.com/huggingface/setfit) and do a torch.compile and run predictions
GitHub
GitHub - huggingface/setfit: Efficient few-shot learning with Sente...
Efficient few-shot learning with Sentence Transformers - huggingface/setfit
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Geri
Geri•2y ago
has someone tried torch.compile?
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Geri
Geri•2y ago
where can i find which torch-tensorrt version is compatibel with cuda, torch etc? is it expected that pip install torch-tensorrt==2.2.0 installs both: nvidia-cuda-runtime-cu11 and nvidia-cuda-runtime-cu12 .. same for nvidia-cudnn-cu11 and nvidia-cudnn-cu12 ... and some other nvidia packages? and does torch-tensorrt work with an older gpu like a g4dn.xlarge?
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Santosh
SantoshOP•2y ago
@Geri Take a look at the versions used in https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html. That should give you an idea of compatibility across torch and nvidia packages
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Geri
Geri•2y ago
hi does someone know to configure a config.pbtxt for onnx or pytorch?

Did you find this page helpful?