Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

COLMAP in custom docker template doesn't use CUDA/GPU

docker.io/chenhsuanlin/colmap:3.8 I'm trying to get this to work with the docker iamge above, but it seems like it's using CPU instead of CUDA/Nvidia GPUs. I've checked into https://github.com/runpod/containers runpod container templates, but I'm not 100% sure which bash file or dockerfile I should copy the format off...

My public pod template is not visible to others in the explore section

Hello, I created a public pod template for the docker image that I host publicly on DockerHub. In my primary account I can use it and list it in the explore section. However in my other account it is not visible even though it is public. I can only deploy it through the share link https://console.runpod.io/deploy?template=g77d7didja&ref=9oqlbhoc But why is not listing in the search to others? Info about the template is below...
Solution:
I got a reply a reply by mail from the support team. "In order for a public template to appear in the Explore section, it typically needs to have at least 24 hours of runtime from other users. " Thanks everyone...

CUDA 11.8 Seems unavailable in templates

Any 11.8 template I deploy ends up being 12.4 Is there a CUDA 11.8 template that still runs 11.8 ?...

No Docs for RunPod volumes in sky pilot.

Hi, is there currently any method for adding a network volume or changing the volume size on SkyPilot? I can't find any docs regarding this. I would like to use this for training, without using buckets.

ComfyUI installs outside /workspace

Hey everyone! I’m a complete beginner working with ComfyUI on RunPod (using the official runpod/stable-diffusion:comfy-ui-6.0.0 template), and I’m running into an issue where some files don’t persist after I stop or restart a pod. Using the official RunPod ComfyUI template, I have a Network Volume mounted and I understand that only files inside /workspace are saved So custom things like where my images go, custom nodes and a lot of files that might be important in the future. In the root directory of the pod, I see many folders like /ComfyUI, /models, /custom_nodes, etc. that are not inside /workspace and these are the folders that get deleted when I terminate and restart the pod. It seems like ComfyUI is installed directly at root level rather than under /workspace....

Help Needed: RunPod Pod Creation & Proxy Connection Issues for Stable Diffusion

Hi RunPod community, I'm facing persistent issues with pod creation and proxy connections for Stable Diffusion batch image generation on my Ubuntu server. Here's a summary of the problems: Pod Creation Failures: Frequent "Something went wrong" errors during pod creation via GraphQL and SDK, even with fallback logic for GPU types, regions, and network volume settings. Proxy Connectivity Timeouts: Connection timeouts to the Stable Diffusion Web UI proxy URLs (e.g., <pod-id>.proxy.runpod.net:7860), tried both HTTP and HTTPS across multiple ports (3000, 7860). Infrastructure Delays: Pods often stuck in "RUNNING (waiting for runtime)" state without container startup....

So you guys REALLY like making things as difficult as possible don't you?

So tried to build a training POD ... Your prebaked Kohya image is about 50 versions behind and don't support Flux. I need to Install a later version of CUDA Library but I can't because while old drivers are installed/ they are not installed so the installer can't replace them. Not to mention you have 5090s installed and yet don't have the drivers installed to support them! And of course I can't install these either. I wasted 6 hours trying to get a damn pod to run damned Kohya ... A task I did on my own desktop in 20 mins! you can send my refund to kris.paypal@luxelite.com.au WTF do you insist on making EVERYTHING shit?...

How to add more RAM to the Pod

Hi! Is there a way to set RAM manually? I mean, the current options is limited for example max 283gb RAM. But I need 1TB. How do I add more RAM? Thank you....

Support Ticket#20560 / created - no follow-up / POD deletes all data / Require response via email.

"River" responded to the ticket originally, with no follow-up. We are now in on a critical time-loss with no support. Please have someone reach out directly here, on the discord, or through email as the situation needs immediate resolution....

GPU occupied by unkow process at boot

Hi everyone, I have just initialized a pod with four 3090 GPUs, but one of them keeps being occupied by an unknown process, as shown in the attached screenshot. I have tried to terminate and re-initialize another pod with the same config, but it doesn't help. Could anyone help resolve this issue? Thanks....
No description

No GPU in Pod, Unless I Create New Pod

I understand how spinning up a pod leases a GPU connected to its data center, and reserves it for as long as my pod is running, and I also understand how a region might not have the GPU I want available and how to resolve it using networked storage. My question is, why does my Original pod (named: Orignal) state that there are no GPUs available, but I can spin up a new pod, in the same region, using the same parameters and it works?...
No description

Help with connecting to pods with SSH connection

Hi everyone, I've generated my ssh keys following this guideilne: https://docs.runpod.io/pods/configuration/use-ssh and I've entered the output into my sshkeys fo the runpod . But when I try to connect, it says [numbers@ssh.runpod.io: Permission denied (publickey).]....

Hi! I need to run a custom Docker image on

Hi! I need to run a custom Docker image on RunPod with full Docker-in-Docker (DIND) and Privileged mode enabled. However, I do not see any "Privileged" option in the Pod creation UI. - Does my account/plan/template support privileged containers? - How can I enable privileged mode, or what are the requirements? My workflow requires running dockerd, docker run, and full system-level operations inside the container....

Which Server is Best for Running StabilityAI Stable Diffusion 3.5 Large?

I’m planning to deploy and run stabilityai/stable-diffusion-3.5-large and would like to hear your suggestions. Which server or service offers the best performance, stability, and cost-efficiency for running this model? I’m mainly focused on smooth performance, low latency, and reasonable pricing for image generation tasks. Would love to hear about your experiences and any recommendations!

custom image from quay not being pulled

I built and pushed and updated ComfyUI image (based off the official one) to quay.io I can pull the image myself to my own machine. When deploying a pod using quay.io/thoraxe/stable-diffusion-comfyui:v0.3.44-3 nothing happens - it doesn't appear the image is ever even attempted to be pulled because no logs appeared. This was after ~5+ minutes. No logs or anything. Any thoughts?...
Solution:
The problem appeared to be temporary. After waiting a while, it seems to be pulling now.

Terminate pod after it finished it command

Is there a way to remove a pod after it finished the dockerStartCmd command automatically? Instead of restart it? I had to install runpodctl in my container to add ; runpodctl stop pod "$RUNPOD_POD_ID" which i dont think is ideal....

Good documentation would help keep everyone happier!

It would be such a help to be able to get good working sample code and good documentation. Im finding it extremely frustrating tring to get SD working in a pod that I can call up, run and destroy. The presales amble sounds so easy!

Invalid container image name

From today, I get an Invalid Container Image Name when using my AWS hosted container. It has always worked before. xxxxxxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/company/myimage:latest...
No description

Slow Disk Performance For H200 nodes

"Container volumes provide fast read and write speeds since they are locally attached to workers." Based on the RunPod documentation, I would expect there to be local SSDs with the pods. However, I've noticed getting slow disk speeds when I'm using certain H200 nodes. I'd expect 5GB/s+ with local NVMe SSDs, but I'm getting < 1 GB/s with some pods. Here's the command I use to benchmark: fio --name=write-test --directory=. --numjobs=8 --size=10G --rw=write --bs=128K --iodepth=1 --time_based --runtime=10s --group_reporting --direct=1 My company is saving very large model checkpoints, so we need performant disk speeds....

Haven't been able to use Stable Diffusion for 2 days.

Hello. I'm trying to use my pod so I can use Stable Diffusion, but I keep getting a message that I haven no GPUs available. I was able to get on briefly today and then I got kicked off because of the GPUs disappearing. I haven't been able to use Stable Diffusion for 2 days. Please fix this as soon as possible.