Missing files
My pod restarted and wiped my entire /workspace directory. I had crucial scripts and files there that weren't backed up elsewhere. Is there a snapshot or backup you can restore?
When will RTX 3070 available?
Very hurry... I am writing my thesis and need to collect the data in RTX3070.
Solution:
I do see one, make sure your filters are not to strict if you dont
Not able to update template
Whenever I try to change anything in my template I get this error:
"tcpPort field must have less than or equal to 10 items"
Even though my TCP ports can be upto 50...
Multi-node without clustering
hello hope you are well. sorry might be obvious from other questions on board regarding the multi node. But i just want to confirm.
I can only multi node through the cluster runpod service? Due to internal networking limitation on normal pods? (no internal ips, from what i can see).
I mainly want to run around 4 nodes, with 10 a40s each....
Installation Error Due To Security Level Configuration
[Installation Errors]
'ComfyUI-WanVideoWrapper': With the current security level configuration, only custom nodes from the "default channel" can be installed.
Received above error when trying to install a missing node for ComfyUI.
...
stopAfter and terminateAfter
I have created a pod with the API PodFindAndDeployOnDemandInput and set the stopAfter and terminateAfter, assuming (the docs don't say what should happen) the pods will be stopped at that time, but it doesn't seem to work since the pod is still running.
Do those fields stop and terminate a pod at the time specified? If yes, what am I doing wrong?
Here is an example:...
Waiting for creating a placement group of specs for 310 seconds
INFO 06-20 11:24:54 [ray_utils.py:232] Waiting for creating a placement group of specs for 310 seconds. specs=[{'node:172.19.0.2': 0.001, 'GPU': 1.0}, {'GPU': 1.0}]. Check `ray status` and `ray list nodes` to see if you have enough resources, and make sure the IP addresses used by ray cluster are the same as VLLM_HOST_IP environment variable specified in each node if you are running on a multi-node.
INFO 06-20 11:24:54 [ray_utils.py:232] Waiting for creating a placement group of specs for 310 seconds. specs=[{'node:172.19.0.2': 0.001, 'GPU': 1.0}, {'GPU': 1.0}]. Check `ray status` and `ray list nodes` to see if you have enough resources, and make sure the IP addresses used by ray cluster are the same as VLLM_HOST_IP environment variable specified in each node if you are running on a multi-node.
Docker pull randomly gets stuck
When launching a Pod on
EU-RO-1 (Nvidia 4090), the download speed drops to almost nothing at some point. This doesn't happen every time, sometimes the download completes without issues.
Pod ID: dp302es4x8ad1b...Solution:
Transitioned to serverless and it just works

P2P transport between gpus issue in EU-SE-1
I've been training LLMs using deepspeed.
However, I've noticed that when the pod is created in the
EU-SE-1 data center that sometimes when the model has been loaded and the training is about to start the process hangs right after moving some of the parameters to the gpus (indefinietly as far as I can tell).
The only way to prevent this I've found so far is to set the env var NCCL_P2P_DISABLE=1 disabling P2P transport between gpus; however, this in turn causes issue when tensor parallelism is enabled as it creates data inconsistencies between gpus.
...[Beginner] How to run unsloth llama3.1 8b finetune in a pod?
I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to host it on runpod for inference. Can anyone point me in the right direction on how to do it or where to read up on it?
Mounting network volume via s3 on the filesystem with geesefs
I was able to mount the runpod volume using https://github.com/yandex-cloud/geesefs. All other solutions I tried didn't work - s3fs, goofys, rclone...
Command:
geesefs --endpoint https://s3api-eur-is-1.runpod.io --region EUR-IS-1 --profile runpod --no-checksum xxxxxxxxx ~/.ollama/models
...cuda>=12.8 A6000
Some A6000 have cuda>=12.8 and other don't. Should there be some filter if I select image that requires pytorch 2.8 or cuda>=12.8 then only pods that can run it should be offered. Can I see version of cuda on hardware I select?
Solution:
Found
GPU is not available on 1 x RTX A6000
root@bd7ed317681c:~# python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch print(torch.version) # Should show 2.2.0...
How to access Ubuntu as in GUI
Hi, I tried to run pod and all the available templates are pytourch or comfy uis. Is the ubuntu ui features not available anymore?
Can't access SSH over exposed TCP
Hey team, I've properly added my SSH key to my account, and I can access SSH without exposed TCP. But when I try accessing it over exposed TCP it keeps asking for the root user password
What should I do?...
Solution:
I deleted the pod, created another one and now it worked
Enable SSH access with SCP using Template and Shared Storage
I've been trying to set up a pod using a custom template and shared storage and have SSH enabled. The only one is active is the basic one without SCP, but I need to enable the SSH with SCP. I have the checkmark checked for "SSH Terminal Access". What else can I try?
Solution:
Well, if anyone else is curious, I managed to get it done bu adding the 22 TCP port, adding the ssh server and key with environment parameter, and then SSH works
Template editor resets on switching tabs
I've run into a problem with the template editor.
What I'm trying to do:
- Open an existing template editor by clicking the pencil icon on the templates page.
- Edit any details on the General tab.
- Switch to the README tab to edit anything there....
network outage
I have this error message:
This server has recently experienced a network failure and may have inconsistent network connectivity. We aim to restore connectivity soon, but you might have connection issues until they are resolved. You will not be charged during a network outage.
[15:13]
What is the deadline for resolution
[15:13]...
how I get files from runpod SSH server to local directory
I used scp command, sever need password. what is this password?
