RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Slow download

I'm currently getting 2mb/s download on 2xa100 pod; normally get way higher than this -- anyone else running into this rn?
Solution:
i shut down the pod and started another one and it was a lot faster

`ERROR | InternalServerError: None is not in list`

I was using several machines but faced the same error(above). Someone said it was due to the ddos attack. Is it right or not? DDos can attack the security pods easily? thanks...

Increase Spot Warning Time

I see in the docs that there is a 5s window before a spot instance is interrupted. 5s isn't really enough time to save or do anything - e.g. AWS has a 2 minute warning. Even if 2 minutes is too much, it would be huge if we could get 1m or even 30s of a warning, so that we don't need to check so often.

how to route docker secrets to pod automatically

I have some credentials saved as runpod secrets. After creating a new pod using runpodctl, I manually have to add the secrets to the pod. Is there a way to have the secrets available in the pod, without manually adding them?

Network issue ETA?

Several of my podst got hit with This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime. including e.g. 82mr3meakiiytt Do you have ETA for the fix? They are still not back up....

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+...
No description

Kohya port not working

i'm trying to lunch kohya tried everything nothing work even used the command tail -f /workspace/logs/kohya_ss.log...
Solution:
You only need to chat in 1 place, not multiple places, you have already been answered in #🎤|general

runpodctl -> get public IP + exposed ports

Lets say I create a new pod using runpodctl create pod --name 'Whatever' \ --imageName 'runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04' \ --gpuType 'NVIDIA GeForce RTX 3070' ...
Solution:

This pod suddenly came into my account ( i didnt create it )

vi9vaz7fu77b52 Thats the pod id, already deleted it. I think because of vllm workers / template?...

pod has no public ip

A pod has no public ip despite me clicking on the "public ip" checkmark

can I deploy flask, celery, redis, postgreSQL on runpod?

Hi, as you know the pod only persist data under /workspace folder. for all python related packages I can use venv to put all the data and configuration under /workspace. while if I need to install all the tools like flask, celery, redis, postgreSQL they are not python installation, the configuration files will be scattered here and there. all these file and configuration will disappear after pod restart. ...
Solution:
You can install whatever you want but I don't recommend installing databases etc on RunPod. Its better to deploy those things to a CPU cloud provider and use RunPod serverless for offloading tasks that need to run on a GPU.

CudaToolkit >= 12.2

When selecting the POD to deploy, I can filter the GPU supported cuda version up to v12.4. I suppose this refers to the CUDA display driver, right? The runpod base images however, only provide up to "cuda 12.1.1" which is not the driver- but the cuda toolkit version, correct?...
Solution:
You have two types of CUDA One that shows from nvidia-smi with is max cuda version supported by host. Version from nvcc --version is one bundled with template ...

why don't I have a stop option, only terminate option available

Solution:
I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.
No description

are network volumes slower than "normal" volumes?

Hey everyone! I've been experimenting with network volumes because of their perk of not needing to reinstall everything whenever my pod 'looses' its GPU. However, I've noticed that the upload/download speeds are pretty slow every time I use them. Has anyone else experienced this? Do these volumes need a few hours or days to reach optimal performance, similar to AWS? I'd really appreciate any insights or experiences you might have!
Solution:
Its accessed over network and not directly attached to the machine.

cannot find my network volume in the running ubuntu pod.

Hi I have ceated pod with the network volume of 300GB. it is shown in the pod details. but when I logon to the pod, run command "df -h" I cannot find the network volume attached with the running pod. please help.
Solution:
The workspace one
No description

Apply a fix public ip and attach to the running pod, Attach a network volume to the same pod.

Hi, I am new user of runpod. I have one pod running. but I cannot find and place to apply for a fix public ip and attach to the running pod. also I need to put the data on a persistent storage, that is why I have created a network volume. but I cannot find any where to attach it to my pod. I think this is very basic requirements that majority of the user will need. it must be somewhere in the document. but unfortunately I did not find the answer in the document either. please help, thx...
Solution:
1. You cannot attach a public IP to an existing pod. In Secure Cloud all pods should have a public IP by default. In Community Cloud, you need to check the filter at the top of the page before deploying your pod. 2. You cannot attach a network volume to an existing pod. You either need to click the Deploy button from the network storage to deploy a new pod with it attached, or alternatively select it from the filter at the top of the page in Secure Cloud before deploying your pod. Basically seems like you are not using any of the available filters....

graphql Unauthorized

When I perform the "myPods" query [https://graphql-spec.runpod.io/#query-myself looks similar] with the "machines" field, I receive a strange output: ``` { "errors": [ {...
Solution:
The solution was given on Slack I had to use this query not like
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
...

Help needed with Docker Installation

Hey guys, how can I install docker within an ubuntu containers. I tried but I am unable to run.
Solution:
Not possible on RunPod, you will have to do it somewhere else, like AWS etc.

Update image runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 ?

I use a conda environment with pytorch for weeks and this worked perfectly on the runpod container - until today. Now pytorch doesn't work anymore. Was something changed on the image?

Questions About Resuming GPU Sessions

Hi, so I saw this message when I paused my session. It said, "You can start the pod later, but it's not guaranteed to be available." So, if I wanna start it up again and the GPU I was using isn't available, will it set me up with another 4090? Or do I have to wait until the one I was using is available again?
Solution:
Wait until available or just start with cpu only
No description