RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Output in ComfyUI

Dear all, I have a very basic question: I’m using Runpod for ComfyUI and I don’t understand where to download all my images … Is there a way to download them in one shot as well? Thank you very much in advance and sorry for the newbie question...

storage

having issues with pod disconnecting
No description

Issue with Checkpoint Switching and Runtime Error

Hello, I’m experiencing an issue when switching checkpoints on RunPod. When I change the checkpoint, it looks like it switches, but in reality, it doesn’t. Then, when I try to generate an image, I get the following error: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)" It seems like the model is not correctly loading onto the GPU. I’ve tried restarting the instance, but the problem persists. Does anyone know how to fix this?...

model cache status?

there was a feature for caching hf models in runpod right? i just wanna get the updates, where is it right now

Http reverse proxy disconnects

Hi, I have noticed that when we send an API request to our Pod, we are getting a 524 response back if we send through the reverse proxy and the inference job takes over ~20 seconds. This does not happen when we use direct tcp. But with that method, we have to handle dynamic ports. This is happening on all of our pods running this job. Mostly 4090s

ComfyUI keeps reconnecting on Pod (EU-SE-1) with network storage

I use SSH connection and spin up ComfyUI. Then, access it from http://localhost:8188. It works great previously. But since the last 2 days, even the initial load took so long, after loaded, it shows "Reconnecting". I can't really work with the UI. I've tried:...

How to determine server location in respect to AWS

We are running our applicaiton stack on AWS Singapore. We want to know the runpod server locations and find the closet one. However the names don't really help. Is there a mapping which can help us ?
No description

having trouble connecting to my RunPod via SSH

Hello everyone! I'm having trouble connecting to my RunPod via SSH. I've uploaded my public key in the RunPod settings, but I'm still getting "Permission denied" when I try to connect using: ssh -i ~/.ssh/runpod_key -p 13241 [email protected] I've tried:...

Pod downloading template everytime i start a new one

I'm using a custom template which is quite heavy, every time I start a new pod it downloads it from my private registry. Is there a way to store it in runpod / cache it so download times are faster?

No http start

Has anyone run into an issue where they can't start the http service? First image is one pod with the button missing, second image is with it looking normal. This has happened for a few pods, not sure how to fix it / get it running....
No description

Network and Local Storage Performance

Hi, we are noticing very slow performance loading in our model on our Pods in the IS region. We are also noticing a very slow sequential read time when we copy the same model into local storage. The model loading takes about 10x as much time as it did for us on a different network. When we compare the sequential read time, we see about a 3x increase in time on Runpod. Local storage is about 5s faster than network storage. our old network read ```13550863+1 records in 13550863+1 records out 6938042106 bytes (6.9 GB, 6.5 GiB) copied, 7.09554 s, 978 MB/s...

Run own CPU

Hey! I have the following machine. Would I run this machine on RunPod to gain rewards? Thanks!
No description

2-3 hours for Pod to load OneTrainer and Flux model

See images, marked with red is how long it took to load. I really want to use cloud for Flux LoRa Training with OneTrainer and using the OneTrainer cli Template with flux model it took 2-3 hours to prepare before training can begin. Is this correct that I should wait 3 hours before training begins for a Flux LoRa? If I have done something wrong what is it and how do I correct it?...
No description

Pod SSH Connection Slow and Failing in EU-SE-1

I'm using "SSH over exposed TCP", either SSH connection from VS Code or Terminal is very slow, from terminal I got connected but occasionally got these logs: ``` channel 22: open failed: connect failed: open failed channel 24: open failed: connect failed: open failed channel 26: open failed: connect failed: open failed...

4090 Power capped

Hi, I was testing an inference job on a 4090 pod. I noticed it was running very slowly. When I checked the nvidia logs, I noticed a "sw power cap" message when it got to about 1/3 of the Power (450W). How do we get full performance of our 4090 GPU?

Network volume for GPU and CPU

I'd like to try two GPU with my big dataset so I was looking for a DC with the both GPU and found one (US-KS-2). But I also found that network volumes aren't available for CPU at that DC. Am I right? Is there a chance to get this option in the near future?...
Solution:
CPU pods are only at EU-RO-1 and EUR-IS-1
No description

Unable to run any docker image in runpod instance

I have setup the instance successfully, installed the docker successfully but unable to run any image in the instance. It always fails with below error. ```root@6f45062b53d4:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES root@6f45062b53d4:~# docker run hello-world Unable to find image 'hello-world:latest' locally...

HOw can I open port for my pod and run it as a api endpoint

I can not ping to my endpoint I can not run nginx in my pod How can I open the port? I created it with RunPodPyTorch 2.4 and global network enabled...

[runpodctl] Error creating a pod with a network volume attached

Hi. I am getting an error trying to create a pod with an existing network volume attached. The erroring command: ``` runpodctl create pod --secureCloud --networkVolumeId=="214w8k0zq1" volumePath="/workspace" --gpuType="NVIDIA GeForce RTX 4090" --imageName="ubuntu:latest" Error: Something went wrong. Please try again later or contact support....

Mimicking UI with API issues

Hellow! Im trying to use the template byecho/simpletuner-image:latest via the API. When i deploy via UI it all works fine and i get a public IP and port that i can ssh/scp into. ...
No description