RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

graphql Unauthorized

When I perform the "myPods" query [https://graphql-spec.runpod.io/#query-myself looks similar] with the "machines" field, I receive a strange output: ``` { "errors": [ {...
Solution:
The solution was given on Slack I had to use this query not like
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
...

Help needed with Docker Installation

Hey guys, how can I install docker within an ubuntu containers. I tried but I am unable to run.
Solution:
Not possible on RunPod, you will have to do it somewhere else, like AWS etc.

Update image runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 ?

I use a conda environment with pytorch for weeks and this worked perfectly on the runpod container - until today. Now pytorch doesn't work anymore. Was something changed on the image?

Questions About Resuming GPU Sessions

Hi, so I saw this message when I paused my session. It said, "You can start the pod later, but it's not guaranteed to be available." So, if I wanna start it up again and the GPU I was using isn't available, will it set me up with another 4090? Or do I have to wait until the one I was using is available again?
Solution:
Wait until available or just start with cpu only
No description

"The port is not up yet"

Having problems again, I created a new pod about 1 hour ago. it took me 1 hour to cloud sync, and now the pod will not run anything. I have tried to restart a couple of times, but always get this error message...

Change disk volume

Apologise for a newbie question, is it possible to change the size of the existing persistent disk volume or is it necessary to create a new one and transfer data from the old one? Thank you.
Solution:
It is possible, no need to try it first.

There is no pod available

Hi!, all GPU Pods, whether secure or community are unavailable, no matter what filter you use. What's going on? Edit: Now it seems to be working, but the page is taking a long time to load, is there any maintenance work going on?...
Solution:
There is no maintenance otherwise everyone would be affected and not just you. Sounds like an issue with your internet connection.

wget not working inside the terminal for stable diffusion webUI

When I try to run the wget command to get models from civitai, it throws an error about username and password. I've watched many videos about it, and I seem to be doing everything right but I still can't get it to work

RTX 6000 Ada performance much worse than expected

From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?

Slow model download speeds/bandwidth

Can anyone explain to me why the download speed is so bad from huggingface on Runpod? I consistently get 10-30 MB/s download speeds compared to 100+ MB/s on Vast.ai. I have often had to have instances running for 1-5 hours just to download LLama 3 70b or LLaVa 34b. Quite frankly, this issue is so bad that it has pushed me to vast.ai for most model training. The only issue I have seen with vast, is I can't select 5 GPUs instead of 4 or 8 which is the required amount for our use-case. Running batc...

Container Log From Saved Storage stuck on loading loop

I have saved Stable Diffusion storage Using the EU-SE-1 Server and cannot get it to complete loading after multiple attempts of waiting 15 minuets for it to load with the A40/RTX6000/RTX5000. I have tried deplying via 'Deploy GPU Pod' and from 'Storage.' How can I check to see where the issue is in Juypter labs or via terminal? Do I need to remove arguments or add any?

Server Volume Access

I'm using Runpod primarily to run the Stable Diffusion WebUI. I also set up a Server volume so that I could upload models and have them persist across any pod I create, but I can't seem to find out how to access the Volume storage rather than the temporary pod storage, how can I upload items to the volume so that they won't be deleted every time I terminate a pod?

Cloud Sync - "Something went wrong"

I have tried to set up Cloud sync with both Google cloud and Backblaze, and both have issues when I tried to sync. I get the same "Something went wrong" message. Sometimes if I just keep inputting my bucket data, again and again and again, eventually it starts syncing, even though I get the error message every time, other times like now it will just not sync at all...

Feature Request: `runpodctl send` TO specific machine & folder (ala SCP)

This can be achieved today by running: ``` runpodctl send foo ssh machine 'cd /workspace && runpoctl receive ...'...

SSH connection issue

anyone else have problems connecting to pods with SSH currently via TCP? i'm getting connection refused every time. connecting with ssh.runpod.io works though. i never had this issue before and tried via different networks already...

Better solution for 0 GPU stranded volumes

Since on-demand GPUs can get taken, would be great to have some better escape valves for getting our data off the volume. Right now, the 0.5 vcpu 512 MB RAM pod you give keeps killing my upload task. I would happily pay for more resources to speed up getting my data out. Would be nice to be able to attach a network volume to a pod after creation as well, or if you had cross-region network volumes. Network volume that only works in same region is of limited value, because a big reason for moving...

Kasmweb Runpod Desktop failing to connect

Hi there, I have tried to setting the runpod/kasm-docker:cuda11 multiple times now, however I have not been able to connect to the pod in any attempts. Upon clicking connect to HTTP service/terminal, a new web browser tab is opened, however the page fails to connect every time. Is this a known issue with the Kasmweb Runpod Desktops?...

pod terminate after command finishes

Hi folks -- it seems like if runpod notices that the entrypoint command for my pod finishes, it restarts the container and runs it again. is that expected, and is there any way to turn that off and have the pod terminate instead of re-running?

waiting for logs....

Hi, I wanted to start a RTX A 4000 pod with stable diffusion, but I got only "waiting for logs" for > 5 min... I've stopped after some time. is there an overload, or have I search the problem on my side. I'm new on runpod.io

Kohya_SS - Clicked "Start Training" button....how can i tell that it's working?

I'm running Kohya_ss through Runpod (via Stable Diffusion Kohya_ss ComfyUI Ultimate template). When I click "Start Training" the GUI gives me no indication that anything is happening. Because of how long this process takes it's hard to know whether an error happened or not. Everything I read seems to suggest that I should be able to see the training happening via the Terminal – if nothing else to confirm that activity is taking place and things are working. ...