RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

You must remove this network volume from all pods before deleting it.

Why can't I delete my storage, it says "You must remove this network volume from all pods before deleting it." but i dont have any pod running or any serverless is running...

CA-MTL-1 | Network Volume Input/output error

I am receieving Input/output errors. Since my network volume is 2TB and I am currently only using 13% of that, there shouldn't be a problem of not having enough space. And I get random "Directory not found" errors while inside JupyterLab. I wondered if there might be an issue of the network storages in that region? Anyone else facing the same issue?

CA | Latency when loading files from Storage

Hello there, I'm currently using a network volume in CA region and when try to load the files from there through A40 machine it's very slow (see attached). Any support is really appreciated. ...
No description

Random 404 errors thrown by runpod secure cloud instance

We are running a llm in production 24/7 and we have a heartbeat detector that checks the vllm health endpoint /health and we’ve noticed seemingly random 404 errors returned when we hit the runpod instance. Can someone from runpod comment on what might be causing these errors? We only hit every 5 minutes and we have been averaging 3-4 404’s the last few days. Our pod is in data center CA-1

A100 in CA is just a bad node

I usually opt in to a100 but honestly at this point i am wasting money because it just can't download an image size of 4gb. It takes 10 mins to pull and then starts extracting the image but 1MB per second speed to do full 3GB file. Could we please fix this node? I am being charged for unused time and it's an unpleasant experience overall....

automatically start jupyter notebook with API call?

Hi, when i make a new pod via the web ui i can check the box 'start jupyter notebook' and it will start that up. Is there a way to pass this to the api when i hit rest.runpod.io/v1/pods to create a new pod?

nvidia-deepstream container template hangs when starting

I am trying to use the nvidia-deepstream docker as a template of a Pod. I am using "nvcr.io/nvidia/deepstream:7.1-gc-triton-devel" as the container image. I start the pod, it downloads the required files and then it hangs. I checked for previous posts, there was an archive message that didn't have any responses.Thank you for your help! Here is the github link for the Dockerfile https://github.com/NVIDIA-AI-IOT/deepstream_dockers/blob/main/docker/Dockerfile

CUDA 12.6 Image Having Issues with A40 - Is it a CUDA version issue?

I've been able to use my container that runs off of nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 on various secure gpus with no issue, but all of a sudden I started running into an issue with an A40 recently where it would keep cycling the pod (and not give me any meaningful messages). The only message I got was the nvidia one about the license agreement. This makes me think it's possibly a cuda mismatch issue? Any thoughts? If it's a version issue, how do I filter the gpu's to request via graphql for an on-demand pod? This is all automated and scaled so I can't manually pick a gpu every time 😦...
Solution:
looks like it might be user error.. will mark as resolved once I confirm

I lose my data every time I stop my pod

Could somebody please provide a simple explanation for how I'm supposed to use network storage for persistent data? Every time I stop a pod I've been using it becomes unavailable next time, so I have to choose a different GPU on a different datacenter. When the pod is launched it does not connect to my network storage on the original datacenter, so I have to start everything all over again. The whole datacenter/pod thing is incredibly confusing

Cant set volumeInGb to 0 from api while creating pod

I can set to 1 but i cant set it to 0. Is there a way to do that?

How do I transfer my pod’s data to my network volume storage ?

New to Runpod and I accidentally built a lot inside a pod instead of via network volume. How do I move it to my network volume so I can terminate the Pod ? Can anyone help with that ? 🙂...

Stable Diffusion ComfyUI for Krita

When I try to connect the Krita AI plugin, I get the error I shared 99% of the time. How can I fix it? Could not establish websocket connection at wss://cl62xa2yqbxt38-3001.proxy.runpod.net: timed out during handshake Logfile content: --------------->...
No description

textual inversion embeddings not showing in automatic

Can someone please help me ! everything is in the correct folders I have given every command inside terminal to verify and yet the models will not show!

How do i change my instruct template using text generation webui v2

Hello so i tried using a the text-generation-webui-v2 template on pod but i dont know how to change the chat template

Unexpected Storage Full Issue on Network Volume

I am currently using a network volume. I have rented a total of 800GB, and during the process of uploading data and extracting compressed files, I encountered an issue where the storage space became insufficient. I have tried running commands such as du -sh, df -h, and df -i, but I couldn't identify any apparent issues. The output of du -sh workspace/ shows approximately 352GB, and there are no additional data or files that I have added in other directories. If anyone has experienced a similar issue or knows a possible solution, I would greatly appreciate your advice....

What is the best way to transfer my local folder to pod?

I tried using SFFP, but it is very slow. I have models ranging from 10 to 100 GB, and it's taking a long time to zip, unzip, and transfer them. I also explored storage options, but the network volumes are only available for certain pods. We are in the testing phase and trying each pod one by one to find the best one based on our requirements....

Automatic templates not installing models

Hi is anyone having the issue of not one automatic template will install any models

Where are community templates?

I just signed up and added credits to my account and I need to use a community template but i only see official templates. How do i filter and see community ones???

Bad Gateway

hosting a service on a runpod via fastapi and it consistently is timing out ....

How do I ssh tunnel into my runpod instance?

I have tried just testing connectivity. I start up a normal connection using ssh [email protected] -i ~/.ssh/id_ed25519 I then start a python server to emulate ollama...