Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Quality Control Issues?

I've noticed this a lot and it's becoming very problematic in my workflows. When I have production GPUs, some of them are just objectively worse (and by a lot, like 3x worse in performance). Same GPU type, same number of CPUs, same template, same storage, same codebase, but it performs x3 slower per iteration in the same region and I can't even swap to a new one because some of the new ones have the same problem. I don't want to play casino when I boot up a GPU and hope that it matches the quali...

How do I know when my pod has the GPU available again?

Do I just keep refreshing the page? I moved away from a network volume because it slowed the speed of my ComfyUI generations, but now I can't pause pods because I'll lose my GPU...Is there a better way to have persistent storage while being able to leave and come back to GPU at max speed? I feel like i'm missing something here....
No description

How to connect CivitAI models to Runpod?

As the name states, how? And if theres multiple options, please let me know....

Is there a way to release without downtime?

If there is no downtime when publishing the image, is there an optimal time? Currently, the reset function seems to be able to achieve no downtime?

A40 pod in EU-SE-1 in weird state, showed GPU utilization at 100% despite no running processes.

Basically title, noticed the GPU was in a weird state. I've attached a screen shot of what nvidia-smi returned. Container template was runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04, pod ID was 7d5lkyzaceqkq1

getting insufficient storage !

I have a 200GB network volume (ID: eghhav8xzy). And only 33GB of it is occupied (attach screenshot). But I cannot download a 16GB model from huggingface onto it for some reason!! I get the following error (see attached screenshot for complete msg) : "RuntimeError: Data processing error: CAS service error : IO Error: No space left on device (os error 28)". Please try to resolve this ASAP .....
No description

Secure Cloud: Jupyter not load ca. 50% of starts

If I create a new pod and start it. I have often the problem the jupyter lab doesnt open via the direct link.
Load errors in browser logs are this: lab:1 GET https://p2uet3rx491wcu-8888.proxy.runpod.net/static/lab/main.89b98d6484fcb68c324e.js?v=89b98d6484fcb68c324e net::ERR_ABORTED 404 (Not Found)...

Best ComfyUI Template for RunPod & Network Storage?

Hey everyone, I run a YouTube channel about AI in German. I’m planning a video on installing and using ComfyUI on RunPod. What’s the best template to use with network storage for this setup? Appreciate your suggestions!...

Refunds for nonfunctional community GPUs?

I went through multiple community GPUs to find one that was working.

"We have detected critical error on this machine..."

Hello everyone, Currently suffering from the problem
We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime....

Hey, I restarted my pod and now I can't start anything.

Where the ports are, usually I see ComfyUI and Jupiter File Manager - now they are gone and I only see the Ports, under the 'Connnect' tab. What can I do to fix this? And second question, how do I stopmy pod from taking money when I am not actually using it, I just terminate it?...
No description

I don't know how to access my Automatic1111

Hi everyone, I'm new to Run Diffusion and I'm having a problem. I don't know how to access my Automatic1111 interface because I'm stuck at the beginning where I see some blue folders on the right. Thank you very much, even though I'm embarrassed, but I need to learn how it works. Thanks and best regards!...

kohya_ss GUI logs

How do I see the logs for Kohya_ss GUI? A previous post said to check the readme file for instructions but I don't see anything in there for viewing logs. Also nothing seems to happen when I try to start training or do wd14 captioning. I've loaded my image folder and clicked 'Caption Images' but it's never produced any captions....

Error in starting New Pod

Whenever I am trying to create a new pod in the last 30 min, the system logs are showing this error. Tried multiple templates, still gives a similar daemon error each time. Any idea how to resolve this??? 'error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown'...

Used bank to get $5 credit but credit not showing up

Hi it said if I use my bank to set up payment and loaded $10 I get $5 in credits but that's not happening and i just wanted some help with that

Pod not downloading container correctly from docker.io

The pod shows that it has downloaded the container from docker.io. The digest is the same as when I pull it on to my own server. The pod shows 15MB/30GB even though the container is 16GB. It then just says starting but never starts. The ssh connections shows:-- RUNPOD.IO -- Enjoy your Pod #1ly..7o ^_^ Error response from daemon: container 3ae07de.... is not running Connection to 1.. closed....

any good template for beginner?

I just tried below, but I got so much problem to install extensions. Is thete any good template for beginner? Ondemand,A4000,NetworkStorage70GB. Template:RunPod Stable Diffusion runpod/stable-diffusion:web-ui-10.2.1...

“We have detected critical error..” issue

Hi team, I am currently suffering from this message “we have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime” appeared in one of the pods I deployed. I cannot access to the server and I got some important data in this workspace Could you help me with this?...

H100 VRAM usage limited by power

So I am paying for a H100 but only getting the "power" of an A40? This doesn't seem right? It never exceeds 30% of total VRAM available Also 310 watt power limit seems very low?...
No description

Can't use anything on ComfyUI

Hey I just finished downloading and set up ComfyUI onto RunPod, but after i installed it, I noticed I couldn't move or use mostly any buttons or anything, most notably the "Run" button.
Next