Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Paying for H100 SXM but got an A100

Reboot Web UI shows that I launched an H100 (256 GB of VRAM) but nvidia-smi shows that I got an A100. Unless I'm missing something, it looks like I'm not getting what I'm paying for....
No description

Degraded performance on EU datacenters when using ComfyUI

Since around 2 days ago the performance on EU hosted pods has declined. I use mainly ComfyUI and generated images load extremely slowly or in some cases do not load at all in the Comfy nodes that have image previews. I have to manually open them, especially if the images are larger in size. In addition to that once ComfyUI is attached to a port, the interface takes a long time to load in the browser window (is stuck in a white screen for a while). I am deploying with a network storage on EU-SE-1, usually RTX A5000 GPUs....
Solution:
Looks like its back to normal today, thanks for the reply tho
No description

Issue with Texas pods - very low bandwidth

Hi guys, experiencing very low download speeds with Texas pods. No unusual errors in inspect screen. if you need my account, message me privately. Pod id: o63bji5h1cgg1j...
No description

Raised a request for increase in A100 SXM Pods. How long would it take for approval?

My team requested for increase in A100 SXM pods recently. We'd like to know how long would it typically take to provide approval. We kindly request to expedite the process asap. Thanks!...

My on-demand CPU Pod has been running 24/7 ...

My on-demand CPU Pod has been running 24/7, is it possible to have it run only when a request is made to the server instead?

Container not running error

I was working on ComfyUI normally when it suddenly stopped working, a restart of the pod did not fix the problem. Now when creating new pods the container would not start running, any suggestions/fixes? Im on EU-RO Region running 4090rtx, cuda 12.7 on Runpod Pytorch 2.4 template...

ComfyUI webpage is taking lot of time to load. The services come up fine, in expected time.

The webpage takes almost 6-8 minutes to load. Using pytorch:2.8.0-py3.11-cuda12.8.1 and RTX A4500 in EU-RO-1...

ComfyUI is not loading

It is just saying ‘Backends are still loading on server’ from last hour. It was working fine in early morning but not anymore....

No root on pod. Cannot write to workspace.

I've tried launching a new pod, but same error. It was working perfectly fine yesterday, so I'm not sure what's happening or if I should make a support ticket for this. I've tried ``` (f2l) mambauser@3ccffae5dc96:/$ chmod -R 775 /workspace/...

slow network speed on EUR-IS-1

I have been maxing out at 100KB/s download speed for the last couple of hours - i tried multiple pod restarts. Local machine connection is fine. Any ideas?

Unstable pod socket connection

We experienced slow connections with multiple pods in the past 12 hours. We run a socket-based python app using a custom docker. We haven't been updated the docker image and haven't got any issues since we started using the pod configs for a long time. However, while we are launching a pod today, we found that the connection is very weird and slow. We inspected our client and observe the packet flow, it was like all the packets sent and received are ‘throttled’ - client could only receive one 64k packet every second. It was not always problematic, like every 1 out of 2 times when we connect we got that issue. The experience is like we alternately got a bad connection, and when the socket is established in a ‘bad route’, it stays bad until we reconnect....

"error creating container: nvidia-smi: parsing output of line 0: failed to"

Recently started getting this error "error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max) into int: strconv.Atoi: parsing "": invalid syntax" when creating new pods.

Missing 12.9 cuda version in Create pod API

Hi, I'm trying to create a pod with APIs with vllm 0.9.2 on A40/A6000 in EU-SE1 how can I force it to be created on an host with cuda 12.9? vllm 0.9.0+ requires cuda 12.8+ but the hosts with a40 there have a mix of 12.4, 12.7 and 12.9. If I don't set a cuda version it work when I'm lucky to get an host with 12.9 otherwise it won't work. Is it possible to get the cuda version 12.9 added to the API options to create a new pod ?...

Lora Loader Node

Hello, I have a power lora loader to choose different loras from my files, but when i click it doesnt do anything and this is what I see in the console
No description

Disk quota exceeded error hours after resizing

Hello, yesterday I created a network volume to which I uploaded a compresssed archive, forgetting I need double the space to extract it. Today, a few hours ago I resized the archive to get extra space. Even after retrying on multiple pods I still get the error "Disk quota exceeded" after filling up the original size. I am wondering how long it should take for the size to be updated? Update: Even after creating a new network storage with enough space (so 60 gb) it says "Disk quota exceeded" at 30gb usage. I am at a bit of a loss on what to do....
Solution:
Answered by support: "Thank you for your patience. Our reliability team ran diagnostics on your network volume and, while you only see ~11 GB via du, the backend reports a “size” of 56 GB and a “real size” of 112 GB due to how files and metadata are stored. This exceeds your 60 GB quota and is triggering the error. To resolve this, you can either:...

Help me how to install models into ComfyUi on runpod pls and ty

Hello, I need help because I have no clue how to install models and such using runpod, right now im using ComfyUI Slim better template. I need help with how you download specific models and put them in specific file locations. I dont know how to do that with online GPUs like this. For example i need to use this model for a specific workflow https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main which is supposed to be in FLUX1-DEV.SAFETENSORS \diffusion_models directory any help would be appreciated, please and thank you (:...

Is it possible to setup ZeroTier on a pod or serverless

I’m wondering if using cloudflare and the runpod proxy may prevent this, has this been attempted before? Thanks for any assistance

COLMAP in custom docker template doesn't use CUDA/GPU

docker.io/chenhsuanlin/colmap:3.8 I'm trying to get this to work with the docker iamge above, but it seems like it's using CPU instead of CUDA/Nvidia GPUs. I've checked into https://github.com/runpod/containers runpod container templates, but I'm not 100% sure which bash file or dockerfile I should copy the format off...

My public pod template is not visible to others in the explore section

Hello, I created a public pod template for the docker image that I host publicly on DockerHub. In my primary account I can use it and list it in the explore section. However in my other account it is not visible even though it is public. I can only deploy it through the share link https://console.runpod.io/deploy?template=g77d7didja&ref=9oqlbhoc But why is not listing in the search to others? Info about the template is below...
Solution:
I got a reply a reply by mail from the support team. "In order for a public template to appear in the Explore section, it typically needs to have at least 24 hours of runtime from other users. " Thanks everyone...

CUDA 11.8 Seems unavailable in templates

Any 11.8 template I deploy ends up being 12.4 Is there a CUDA 11.8 template that still runs 11.8 ?...