RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Ignore root start.sh and use custom persistent script.

Im trying to avoid using start.sh since I need to experiment with some different processes. I've tried to copy the contents of start.sh and to point via the container start command to a install_req.sh script that is in the workspace folder. Im also unchecking "start jupyter notebook" and "ssh terminal access" since I don't want the container to run the original start.sh file. Maybe here Im confusing how these work or Im missing something. The logs show that all is good but I can't use the runpod ui to start jupyter anymore and the actual ssh connection does not work anymore. Why is that happening although install_req.sh is the same as start.sh in the root dir? ...

streamlit app not loading up on CPU node

This is my dockerfile ``` FROM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 WORKDIR /workspace...
No description

Issues with changing file permission to 400

I have a ssh key that I'm trying to set the permission as 400 by running the following command chmod 400 id_rsa_git upon running ls -l I'm seeing the permission as 444...

Why FileBrowser cant be opened?

Why does a "HTTP Error 404" appear when I click on HTTP Service [Port 4040]? Even though the log output says "No module named 'ip_adapter'",.I want to check if the 'ip_adapter' file has been copied from the Docker Hub image to the 'runpod-volume/my project' directory. I do have this file in my local project.
No description

Are there very few GPUs that support CUDA 11.8?

When I create a GPU Pod on Secure Cloud, if I select the CUDA 11.8 version, there are very few GPUs available. However, when I choose 'any', there are many more GPUs available for deployment. My project currently requires the use of CUDA 11.8.

GPU speed getting slower and slower

Yesterday I was using a 3090 and it was writing at 3.5 it/s which was great, but today im using a 3090 TI and it started at 2.5 it/s and is now slowing to 1.4 it/s and still going down... Wasting my money.

"How can I run multiple templates in one pod?"

"How can I run multiple templates in one pod?"

How do I run Docker in a RunPod environment?

I want to run the docker in the gpu pod,but the pod may be a docker container. How can I run the doker in it.

[ONNXRuntimeError] when running ComfyUI

I'm a total noob when it comes these. I was able to run a vid2vid workflow for about a week now, no issues, but from yesterday, I'm running into this issue and I have no clue what to do. Anyone be able to help?
No description

Running sshuttle in my pod

I am trying to connect my pod to my k8s cluster and I need to work with sshuttle -- I need iptables DNAT and REDIRECT modules installed. Is there a way to enable this on my instance? Alternatively I could also use nftables or TPROXY...

How to stop a Pod ?

The model has not been fully uploaded yet, and I would like to continue the upload tomorrow. If I don't stop the pod, it will continue to incur costs.

Network issues with 3090 pods

Pods tfc6texf3xrkip and 33laj8z8yzm0du both have borked networking. The download speeds are very slow, and I get issues like these: ``` Collecting fairseq@ git+https://github.com/pzelasko/fairseq@ba2f4bae68107c9d8a838f19611f951e718577b4 (from -r requirements.txt (line 60)) Cloning https://github.com/pzelasko/fairseq (to revision ba2f4bae68107c9d8a838f19611f951e718577b4) to /tmp/pip-install-wt38ex46/fairseq_32d4b5f22eec428196d0a086873b7d52...

are we able to run DinD image for GPU pods?

Hi, anyone tried running DinD in GPU pods?

Runpod error starting container

2024-03-07T14:40:19Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 534: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed! nvidia-container-cli: detection error: driver rpc error: failed to process request: unknown I restart pod but still error...

Runpod SD ComfyUI Template missing??

Where did the "Runpod SD ComfyUI" template go? Can anyone help? I've been using it extensively for a month now, and suddenly it's gone?

Pod Outage

Currently taking 100x longer to pull the docker image and when it eventually builds I have an API server running inside the container and inferencing is taking an absurdly long time which is breaking production (api timeout) - Is there a current problem with the servers I should know about?

Cuda - Out of Memory error when the 2nd GPU not utilized

I have a pod with 2 x 80 GB PCIe and I am trying to load and run Smaug-72B-v0.1 LLM. The problem is, I can download it and when I try to load it it gives me CUDA Out of memory exception while the 2nd GPU memory is empty. I was expecting that when I choose 2 x GPU to run I can use the sum capacity. If you check screenshot, the 2nd GPU memory not used at all when exception is fired. Also, there is no GPU instances with that big RAM so I have to choose 2x or 3x. How i can fix it? Thanks The exception is: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.11 GiB of which 168.50 MiB is free. Process 3311833 has 78.93 GiB memory in use. Of the allocated memory 78.31 GiB is allocated by PyTorch, and 189.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF...
No description

Backdrop Build V3 Credits missing

Hi team, I hope this message finds you well. I am writing to follow up on the recent offer I received to sign up for RunPod and connect it with my Build account. As instructed, I have successfully signed up for RunPod, ensured that my RunPod account is connected to the same email as my Build account, and marked myself as “interested in” or “building with” on the partner page. However, it has been over 48 hours since I completed these steps, and I have yet to see the promised credits applied to my RunPod account. I am reaching out to inquire about the status of this offer and to kindly request assistance in ensuring the credits are credited to my account as promised....

When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU!

When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU! Yesterday I set it up and it's okay. Today I set it up and not detecting GPU. NVIDIA-SMI says had it. Why is that??? It doesn't comfortable if I need to always install torch cuda everytime, waste of time and money....

cant get my pod to work right

hi im new to runpod im trying to add models and loras to my runpod as well as trying to install runpodctl but i cant figure it out when i try to follow the tutorial for the runpodctl i keep getting errors. Help would be greatly appreciated thank you in advance...