Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

opdroid1234

5/14/2025

Unable to create MI300x pod

I was able to do so about a week ago where I was using the pods to write some rocm code. Now when I try to do it, I get some transient ui error message.

CJ Wolff

5/13/2025

Determing a fit for my needs.

Hello, I currently have a Ubuntu VM with GPU support, a static IP and storage with Vultr. The project is being developed so it’s not using really any resources (no clients). The client iPhone and android app will offload GPU processing to the backend GPU enabled server....

peanut_

5/13/2025

Running out of memory

Hi, the OG kohya template from runpod was taken down and not replaced, so now I'm using the InvokeAI template. I can't complete any training because it keeps crashing because it keeps running out of memory. I've never had this happen before

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacty of 23.54 GiB of which 303.12 MiB is free. Process 2505369 has 384.00 MiB memory in use. Process 2505421 has 7.50 GiB memory in use. Process 2521763 has 15.35 GiB memory in use. Of the allocated memory 14.30 GiB is allocated by PyTorch, and 581.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacty of 23.54 GiB of which 303.12 MiB is free. Process 2505369 has 384.00 MiB memory in use. Process 2505421 has 7.50 GiB memory in use. Process 2521763 has 15.35 GiB memory in use. Of the allocated memory 14.30 GiB is allocated by PyTorch, and 581.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

...

LiquidMayo

5/13/2025

ollama template not working

Trying to start a pod running ollama either the base or also better-ollama-webui create container madiator2011/better-ollama-webui:cuda12.4 cuda12.4 Pulling from madiator2011/better-ollama-webui Digest: sha256:2926109085a545a2d5e83545fa0b7e2d10c3fa773c7cb2cf95331da37ff72413...

Evgeniy_Wis

5/13/2025

How do I get rid of "Running your iPod without GPUs"?

I've already set up focuuus on this pod and I've spent a lot of time on it, but now I can't use it because of this error. And the only option I see is to create a new pod and transfer all the settings there, initialize focuuus again.... in fact, doing double work, because waiting is clearly not an option, this inscription never disappears. Maybe someone has other options?

lululatortue#9992

5/13/2025

Issue with Changing Pod Template on RunPod (Pytorch 2.4.0)

Hello everyone, I’m experiencing an issue on RunPod when trying to change the template of my pod. Currently, the template is set to Pytorch 2.1, but I want to switch it to Pytorch 2.4.0 to use ComfyUI. The problem is that even after selecting the Pytorch 2.4.0 template from the list, the interface automatically switches back to Pytorch 2.1 without applying the change. I’ve tried several solutions: Reloading the page...

GoldBowl

5/13/2025

Hello Runpod team,

I have registered on Runpod platform and was very excited to speed up my project with better hardware. But the experience has been very disappointing. I have deployed a ComfyUI pod, the machine itself had problem( RTX 4090), and after waiting for 2 days I had to terminate the pod. this is after I finished the configuration. I have reconfigured another Pod( RTX 4090), now i dont know when GPUs will be available. This is shown as High Availability instance . I feel I am stuck here. can someone guide me how can I get GPUs allocated, instead of not knowing when it will be available. Also do let me know if there any best practices for a short term project....

turtlebasket

5/13/2025

Proxied SSH not working

SSH'ing via ssh.runpod.io hasn't worked for the last 2 weeks. Is there any plan to fix this?

yumb

5/13/2025

Persistent Issues with SDXL + IP-Adapter Stability Across RunPod AI Templates (ComfyUI & A1111)

Hey @RunPod Support / Community, I'm running into persistent critical issues with the hearmeman/comfyui-wanvideo:v9 template when trying to use SDXL + IP-Adapter (specifically with h94 IP-Adapter weights and the LAION CLIP-ViT-H-14 encoder). Despite correct model placement and extensive troubleshooting, enabling IP-Adapter consistently leads to highly unstable/disfigured outputs or backend crashes, even with simple prompts and base SDXL 1.0 (which works fine without ControlNet/IP-Adapter). The "Resampler size mismatch" error has been resolved by using the correct ~2.5GB ViT-H encoder. Is anyone else facing issues with stable SDXL + IP-Adapter character consistency? I'm trying to get basic Text-to-Image, Image-to-Image, and eventually Image-to-Video workflows functional. I'm currently unable to get a usable character output. Any verified workflow examples (JSONs), specific ComfyUI node configurations for this template, or insights into known incompatibilities would be hugely appreciated. I've been trying to resolve this since last Friday....

billpress

5/13/2025

SSH key generation issue with `runpodctl config`

I'm receiving a new error as of today when I run runpodctl config --apiKey $RUNPOD_API_KEY

Error: failed to update SSH key in the cloud: failed to get SSH keys from the cloud: API error: Unauthorized

Error: failed to update SSH key in the cloud: failed to get SSH keys from the cloud: API error: Unauthorized

I wasn't getting this error before today. I tried generating a new API key (with full perms) just in case there was an issue with it (even though I could use it successfully with a GraphQL request). Still fails. I added a print statement before this to confirm that $RUNPOD_API_KEY has the correct value....

SzymonOżóg

5/12/2025

Unable to get MI300X pod, website error

Tried on 2 browsers and 2 computers, always the same result

sugarUnderflow

5/12/2025

Something wrong with A4000

My stable diffusion generations gets mangled very blurry after 15 mins of use with same checkpoint when using A4000. I use forge UI. I have to change models from time to time to get decent generations.

Onebyte

5/11/2025

404 Proxy / Terminal error

I've been consistently not able to load a template (which should've mostly worked before). It always shows 404 error or the web terminal cannot be spawned. This is the first time it has happened. Anyone has similar issue? The template is atinoda/text-generation-webui:latest, or does anyone has a different template they recommend?...

Solution:

Yep i think that's it, the image is broken. I'll be using the attached template i guess

yfeng997

5/10/2025

US-TX-3 Pod Availability

There doesn't seem to be any pod availability in US-TX-3 region. Do we have any estimation on when availability will be back? Thanks!

Trojaner

5/9/2025

SFTP Support

Is SFTP supported on CPU instances?

Vedran

5/9/2025

global networking through REST API

Is it possible to create a pod using the REST API with “Global networking” enabled? On the "Global networking" docs page (https://docs.runpod.io/pods/networking) or on the REST API docs (https://rest.runpod.io/v1/docs#tag/docs/GET/openapi.json) I can't find info about that. Thanks...

neural-soupe

5/9/2025

No GPUs available on Instant Clusters?

bghira

5/9/2025

torch cuda shows no devices available (B200)

on 8x B200 system lz1ew4cgoiot8f : ``` Python 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information....

bghira

5/8/2025

very slow 5090 pod

hello, this pod a02462e46395 seems to be terribly slow. i'm trying to install flash_attn and it's building for more than 30 minutes. can someone please check?

heyado

5/8/2025

error starting container: Error response from daemon: failed to create task for container

Received this email at 2AM: There seems to have been a possible issue with the server that one or more of your pods is hosted on. The following pods were impacted. 80pvzctxtwaruc ...

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!