Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

EU-RO-1 consistently slow network performance compared to other data centers

We consistently get significantly slower netwrok when using GPUs on this data center, GitHub pulls are dreadfully slow,. It is a shame since it seems to have a high availability for RTX 4090

deepseek-r is loading for >1h into vram.

Seems it is related to nmap on network drive. How do you solve it?

Docker image from Docker hub

Hi, I am very new to runpod so might be a silly question. I have a custom docker image pushed to docker hub. Can I build a pod based on that docker image? I am trying to run the nvidia ingest microservice( https://github.com/NVIDIA/nv-ingest) on run pod. I have tried to create a custom template which points to the docker repo where the image is stored ( and made the image public), though it is unable to find the image. I am sure the path is right. Any advice would help, thanks!

Only 1 CPU Core getting utilized

I wanted to deploy an ONNX model that utilizes CPU on RunPod. I am using uvicorn server for the same. I have selected Compute Optimized 5cpu with 16 vCPU and 32GB RAM. After testing using locust, the debugging showed that only 1 CPU core is being utilized. My code uses multiple threads created within the program. To further confirm:
>>>os.sched_getaffinity(0)
{0}
>>>os.sched_getaffinity(0)
{0}
Which shows that only core 0 can be utilized by the process. This issue is specific to RunPod as it does not happen in my local laptop (Windows)...

Billing issue I found

I've set up some pods and I calculated total hourly billing. As a result, it is 10.61 but the UI shows $11.14. It has been lasted for several days....

network volume

How can I copy my network volume in one region to another region? Since network is slow downloading and installing everything over is a nightmare.

Kobold.cpp - Remote tunnel loads before the model, causing confusion (possible off-product issue)

Here's the log piece: ``` load_tensors: offloading 88 repeating layers to GPU load_tensors: offloading output layer to GPU...
Solution:
Should be fixed

VRAM stuck at 77% usage

VRAM usage stuck at 77% on 1 of my 4 GPUs. already restarted, hard stop, and start. and reset. i don't want to have to switch pods bc I have hundreds of GB of data on the volume that will take a long time to set up again. anything else i can do? tried reset. still stuck. ID: ox02c3pvm058j3...

Restore_snapshot error.

Hello, have anyone seen an error like this? dockerf build error log Restore the snapshot to install custom nodes...

Choose CPU model on Pods

Hi everyone, I have to test AMD EPYC 9354 perfomances on our product and found that it is supported in serverless mode. Is it possible to have it on a pod? I only saw two options CPU3 and CPU5, but the pods I started to check contained an older model....

Maintanence

does this mean the server will be taken down during this time or at the end of may cause 6th of Feb has already passed?
No description

Network Storage question

Hi, I am looking to create several GPU pods that all share the same shared network storage. When I go to create a network storage, it looks like I have to deploy a new GPU pod that is always running. How do I create a storage that doesn't rely on a GPU pod being always on? I want to be able to turn off these pods when they are not being used and use the shared storage when they turn back on.

no more full ssh? cannot connect vs code / cursor

hi, I used to be able to connect vs code to my pods over ssh by using the 'full ssh' option (supports scp & sftp). that option doesn't seem to be around any more? I have connected to multiple A40 pods and now an H100 over the last couple days and there's no full ssh option. is this going to come back? is there some new configuration needed?

runpodctl communityCloud + spot

How to create spot on community? start a pod from runpod.io Usage:...

Multiple Pods with same network storage, ports?

I'm trying to run multiple pods simultaneously, all connected to the same network storage. According to the info in the website this is possible, but i cannot connect to the second's pods jupyter - do i need to assign a different port to the second pod in order for this to work and if so how and which port? I've tried adding 8889 to https and tcp but i probably had to specify it somewhere before trying to start Jupyter or something?

HTTP 502 on VLLM pod

I'm getting a 502 when trying to connect to the deployed service. Using the vllm-latest image and these arguments: --host 0.0.0.0 --port 8000 --model mistralai/Mistral-Small-24B-Instruct-2501 --dtype auto --enforce-eager --gpu-memory-utilization 0.95 --tensor-parallel-size 2 Using the ollama service doesn't have any issues. Any ideas?...

Potential L40S P2P Communication Issue via NCCL on Some Hosts in US-TX-4

I’m seeing a possible NCCL P2P issue on some L40S hosts in US-TX-4. Some pods hang indefinitely while others in the same region work fine. Here’s a reproducible example:
runpod-vllm-nccl-diagnostic Observations - Environment: 2 x L40S GPU pods in US-TX-4 ...

Pod data loss after disk resize

I deployed an RTX 3090 pod on RunPod and was working on a project. I then edited the pod to increase the disk size. Afterward, I could no longer connect to the server, and all my files and folders seem to have disappeared. What can I do?

Can't connect to terminal or jupyterlab on runpod pytorch 2.1 or 2.4 template

This is for EU-RO region, I tried a different region it worked fine, but I need to use EU-RO because of network volume

NVIDIA Driver Selection

Is there a way to select which NVIDIA GPU driver my pod is using?