Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

Grape

6/24/2025

Missing files

My pod restarted and wiped my entire /workspace directory. I had crucial scripts and files there that weren't backed up elsewhere. Is there a snapshot or backup you can restore?

igt0

6/23/2025

Not able to run a pod

gstnt

6/23/2025

When will RTX 3070 available?

Very hurry... I am writing my thesis and need to collect the data in RTX3070.

Solution:

I do see one, make sure your filters are not to strict if you dont

John lanser

6/21/2025

Not able to update template

Whenever I try to change anything in my template I get this error: "tcpPort field must have less than or equal to 10 items" Even though my TCP ports can be upto 50...

Hinky

6/21/2025

Multi-node without clustering

hello hope you are well. sorry might be obvious from other questions on board regarding the multi node. But i just want to confirm. I can only multi node through the cluster runpod service? Due to internal networking limitation on normal pods? (no internal ips, from what i can see). I mainly want to run around 4 nodes, with 10 a40s each....

marcusbiz

6/21/2025

Installation Error Due To Security Level Configuration

[Installation Errors] 'ComfyUI-WanVideoWrapper': With the current security level configuration, only custom nodes from the "default channel" can be installed. Received above error when trying to install a missing node for ComfyUI. ...

Simon

6/21/2025

stopAfter and terminateAfter

I have created a pod with the API PodFindAndDeployOnDemandInput and set the stopAfter and terminateAfter, assuming (the docs don't say what should happen) the pods will be stopped at that time, but it doesn't seem to work since the pod is still running. Do those fields stop and terminate a pod at the time specified? If yes, what am I doing wrong? Here is an example:...

artem

6/20/2025

Waiting for creating a placement group of specs for 310 seconds

INFO 06-20 11:24:54 [ray_utils.py:232] Waiting for creating a placement group of specs for 310 seconds. specs=[{'node:172.19.0.2': 0.001, 'GPU': 1.0}, {'GPU': 1.0}]. Check `ray status` and `ray list nodes` to see if you have enough resources, and make sure the IP addresses used by ray cluster are the same as VLLM_HOST_IP environment variable specified in each node if you are running on a multi-node.

INFO 06-20 11:24:54 [ray_utils.py:232] Waiting for creating a placement group of specs for 310 seconds. specs=[{'node:172.19.0.2': 0.001, 'GPU': 1.0}, {'GPU': 1.0}]. Check `ray status` and `ray list nodes` to see if you have enough resources, and make sure the IP addresses used by ray cluster are the same as VLLM_HOST_IP environment variable specified in each node if you are running on a multi-node.

Does it mean it cant find GPU?...

TomS

6/20/2025

Docker pull randomly gets stuck

When launching a Pod on EU-RO-1 (Nvidia 4090), the download speed drops to almost nothing at some point. This doesn't happen every time, sometimes the download completes without issues. Pod ID: dp302es4x8ad1b...

Solution:

Transitioned to serverless and it just works

6/20/2025

P2P transport between gpus issue in EU-SE-1

I've been training LLMs using deepspeed. However, I've noticed that when the pod is created in the EU-SE-1 data center that sometimes when the model has been loaded and the training is about to start the process hangs right after moving some of the parameters to the gpus (indefinietly as far as I can tell). The only way to prevent this I've found so far is to set the env var NCCL_P2P_DISABLE=1 disabling P2P transport between gpus; however, this in turn causes issue when tensor parallelism is enabled as it creates data inconsistencies between gpus. ...

6/20/2025

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to host it on runpod for inference. Can anyone point me in the right direction on how to do it or where to read up on it?

elo.siema

6/19/2025

Mounting network volume via s3 on the filesystem with geesefs

I was able to mount the runpod volume using https://github.com/yandex-cloud/geesefs. All other solutions I tried didn't work - s3fs, goofys, rclone... Command:

geesefs --endpoint https://s3api-eur-is-1.runpod.io --region EUR-IS-1 --profile runpod --no-checksum xxxxxxxxx ~/.ollama/models

...

artem

6/19/2025

cuda>=12.8 A6000

Some A6000 have cuda>=12.8 and other don't. Should there be some filter if I select image that requires pytorch 2.8 or cuda>=12.8 then only pods that can run it should be offered. Can I see version of cuda on hardware I select?

Solution:

Found

omidcode

6/19/2025

GPU is not available on 1 x RTX A6000

root@bd7ed317681c:~# python3 Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.version) # Should show 2.2.0...

ART_K

6/19/2025

How to access Ubuntu as in GUI

Hi, I tried to run pod and all the available templates are pytourch or comfy uis. Is the ubuntu ui features not available anymore?

delucca

6/18/2025

Can't access SSH over exposed TCP

Hey team, I've properly added my SSH key to my account, and I can access SSH without exposed TCP. But when I try accessing it over exposed TCP it keeps asking for the root user password What should I do?...

Solution:

I deleted the pod, created another one and now it worked

Juckye | Slingy

6/18/2025

Enable SSH access with SCP using Template and Shared Storage

I've been trying to set up a pod using a custom template and shared storage and have SSH enabled. The only one is active is the basic one without SCP, but I need to enable the SSH with SCP. I have the checkmark checked for "SSH Terminal Access". What else can I try?

Solution:

Well, if anyone else is curious, I managed to get it done bu adding the 22 TCP port, adding the ssh server and key with environment parameter, and then SSH works

Igor Zinovyev[MTSK]

6/18/2025

Template editor resets on switching tabs

I've run into a problem with the template editor. What I'm trying to do: - Open an existing template editor by clicking the pencil icon on the templates page. - Edit any details on the General tab. - Switch to the README tab to edit anything there....

Ludovic

6/17/2025

network outage

I have this error message: This server has recently experienced a network failure and may have inconsistent network connectivity. We aim to restore connectivity soon, but you might have connection issues until they are resolved. You will not be charged during a network outage. [15:13] What is the deadline for resolution [15:13]...

임주형

6/17/2025

how I get files from runpod SSH server to local directory

I used scp command, sever need password. what is this password?

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!