Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Whats going on jupyter lab on pods

I created a new GPU pod in RunPod, but JupyterLab does not start up. The pod itself seems to be running, but JupyterLab never comes online.

Wan2.2 generation is too slow

Hey even i have used good GPU still wan2.2 takes a lot of time to generate a video not sure wat to do

Can't SSH into fresh pod

```➜ ~ ssh 9ef4j1jcr7ks20-64411a82@ssh.runpod.io -i ~/.ssh/ided25519 -- RUNPOD.IO -- Enjoy your Pod #9ef4j1jcr7ks20 ^^ ...

CUDA NO WORKY?

I'm unable to get SSH working to pods from a clean Cuda docker image. Despite saying they're ready and giving me an SSH line (and charging me $$$), they all spit out the same error:
Error response from daemon: container a94707bd5f391d6a3f25d13f3ba02a425757bdbecfcb7de3b1169ddda866d434 is not running
Error response from daemon: container a94707bd5f391d6a3f25d13f3ba02a425757bdbecfcb7de3b1169ddda866d434 is not running
...

intermittent network/proxy issue on the path to Runpod’s S3 API

when using the S3 API I get lots of botocore.exceptions.ClientError: "An error occurred (502) when calling the HeadObject operation (reached max retries: 4): Bad Gateway" "An error occurred (502) when calling the GetObject operation (reached max retries: 4): Bad Gateway" anyone else experiencing this?...

-- RUNPOD.IO --

-- RUNPOD.IO -- Enjoy your Pod #hw9olfpxhfnmlz ^_^ root@208f833dda91:/workspace# exit status 137 Connection to 100.65.25.141 closed....

Infinity Fabric networking for multi AMD GPUs pod

Hi guys, I can use PCIe to let AMD GPUs to talk to each other. Is there any way to use Infinity Fabric - it's much faster!

Pod auto-moving if no GPU available

I was wondering when we will be able to start a Pod in a region regardless of the GPU available in my Pod Rack to avoid "zero GPUs assigned to my Pod" issues? I think a feature in coming to automatically move the Pod files in a server with available GPUs to avoid this pain, is it still planed ?

Runpod terminal crash

"Directory not found" start poping up I cant Dissmis it or close, its just poping up, once i refresh page it give me error on both ports and I cant even use terminal anymore. Reset/restart POD doesnt help, just getting that error on logs ModuleNotFoundError: No module named 'safetensors'...

Volumn

I am going to use volumn on other device(no runpod). But I got S3 keys from "Create S3 API Key" but permission denied. why?...

Started Pod takes a long time and uses no RAM on VLLM GPT-OSS-120B

I am so confused, yesterday I did the same thing, the Pod was running quite quickly. Now, I have tried restarting it, and the GPUs were taken- ok. So I spin up a new pod, and it just does not work. I get a 502 error, and it seems like it just doesn't do anything anymore?
Solution:
For anyone else wondering - it took 30 minutes to start and load the model, then it worked. That's what caused the 502 error. The low RAM utilization persisted throughout
No description

Pod start time varies although I'm using the same docker image.

I'm using the same docker image all the time. Normally my pods only have a startup time of around 12 mins. Generating an image will take around 1 - 2 mins. There are times where a pods would take 30 mins to start up. When this happens, generation time for an image takes 10 - 15 mins. What could be the cause of this varying startup times? my network volume? GPU region?...

RTX 5090 permanently unavailable on US-CA-2?

is someone hoarding 5090s on US-CA-2 or are they doing maintenace or something? for the last 24-36 hours no 5090 has been available and I refresh check every 15-30 minutes.

Network Volume + RTX PRO 6000 not starting on EU-RO-1

As the title states. Logs are empty. Looks like instance is not initiating while credits are consumed.

HELP!ComfyUI defaults to CPU on pod startup but switches to GPU after Comfyui Manager restart

I'm experiencing a consistent issue with ComfyUI on RunPod where device selection behaves differently depending on how ComfyUI is started. When I first start the pod and launch ComfyUI, any workflow I run processes on CPU (confirmed by low GPU utilization in monitoring) If I restart ComfyUI through ComfyUI Manager and run the exact same workflow, it switches to GPU processing....

Is it possible to lauch a Runpod official image while specifying available shm (shared memory)?

I'm running into a shm related bottleneck and I noticed that shm is always lower than available memory. Start command option overrides entrypoint, so that doesn't help me. Do I have to create a custom docker image for this or am I missing something?...

Help meee!! Kohya SS GUI template help needed!

Hey! I'm trying to train LORA using kohya ss gui template, and when i go to open Port 3000 I keep receiving this message. Can someone help? Jupyter is working, I checked the logs and says container is ready.. IDK. full disclaimer: I'm new to this and not a programmer lol thanks!
No description

Help - Workflow for realstic product - WAN 2.2 14B

guys someone help im trying to generate realstic product video i can rent any gpu with high vram just to get this done but i need a workflow that makes videos realstic Im using WAN 2.2 14B I2V i tried on rtx 5090 32gb and 1040 x 541 res i got a TV Video i used dpm_3m_gpu and steps 25 batch size 1 and it didn't work well for me horrible result i used default...

Docker Hub rate limits

We are planning to run workloads where our GPU instances will be sparsely used during the night. Since the GPUs will be turned off when idle, whenever a new request comes in, the instance will need to pull our image from DockerHub. Given DockerHub’s rate limits, we’re concerned that repeated image pulls might result in throttling. Could you share how this situation is generally handled on RunPod? For example, are there caching mechanisms, best practices, or alternative registries we should consider to avoid hitting DockerHub’s limits?...

Network Storage constantly "Not Ready" - EU-RO-1

EU-RO-1, volume ID pa49xr3u5g (Nayuki), template “ComfyUI – Python 3.12,” the pod is “Not Ready” only when the volume is attached. Tried to detatch and attach it again, tried different templates. Nothing works. Your ask-ai bot mentions a recent degradation of EU-RO-1. Also, it shows as almost full when I haven't really used it, apart from downloadong some basic models/checkpoints. df -h:...