RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Multinode training Runpod ports

I'm trying training a distributed models using multinode, 2xPods x8GPU 4090 for each. We cant train using torchrun, because i need the same TCP port, for each machine, so, runpod assigned me a random external port , command example: NODE A:...

Feature Request / Is it possible RunpodCTL

Just sharing a wish / pending thought as a backlog wish ~ Is it possible to add a CLI command to runpodctl, where it generates SSH keys / stuff, and I can send "the public key" to another pod and stuff, and it automatically adds it to the authorized public keys etc. And then it does a connection and a direct SCP file transfer? ...

How to mount persistent storage volume in pod?

I've created persistent storage and launched a pod from the storage UI. When I log in via ssh I can't see the storage volume. How do I find/mount it for use?

RunPod SD InvokeAI v3.3.0 Errors

When I try to run a runpod with invokeai, I just get a Server Error and Runtime Error when I try to generate an image.

ENDPOINT IS

#⛅|gpu-cloud Hi to you all, can somebody please tell where to find the "endpoint" code ? I would like to connect to my GPU Cloud based by using Python ! It would be grand if somebody can post here an example of a working Python code to connect and use the GPU ! Thanks a lot to all those that would like help 😆...

connect ssh vscode to runpod gpu server

following the blog - https://blog.runpod.io/how-to-connect-vscode-to-runpod/ 1. I have created ssh key, and added public key to account 2. I have created pod with TCL port 22 3. accessing the pod via terminal, using and running following command to turn on the ssh connection bash -c "apt update;apt install -y wget;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo YOUR_PUBLIC_KEY > authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity"...
Solution:
Hey got it resolved, thanks for help I have wsl on my windows, I was creating a ssh key on wsl and ...

environment variable not accessible from true ssh ?

I see it when using fake ssh but not using the true ssh. I am not sure how to setup this.

Pod disappeared after yesterdays maintenance

Hello, yesterday I wanted to start my Pod but I got the message that the system was down for maintenance till my (local) midnight. That's fine, but this morning I wanted to try again and my entire pod is gone. I hope its possible to recover this because quite some time making it went into it. Checked if it was stale, it was getting close but still had 2 days on that timer. Also plenty of credits. Bit strange how it can just disappear... Guess I can try to recreate it but a lot of work went into that one. Hope it can be restored somehow....

How to enable Jupyter Notebook and SSH support in a custom Docker container?

I built my own docker image to deploy on a pod. After creating the Custom Template with my docker image, there is no option to enable Jupyter Notebook or SSH for it. I tried my best to imitate the official Runpod containers, by installing jupyterlabs and openssh-server, but when setting up the pod, there is still no option to enable Jupyter Notebooks or SSH on the pod. I am also not able to find any guides on how to incorporate Jupyter notebook support on a custom docker image....

open ports

I would like to open the posts in my instance, how do I do it?
Solution:
https://docs.runpod.io/docs/expose-ports maybe this doc can help?...

[Urgent] One GPU suddenly went away

Hi, we have prod issue right now one of the gpu from our pod suddently disappared

Does GPU Cloud service support Illyasviel/Fooocus AI?

My pc has low vram and always get disconnections from the fooocus ai, im interested to upgrade with a runpod gpu service, does it support the https://github.com/lllyasviel/Fooocus service?

Pod suddenly says "0x A100 80GB" and cuda not available

Hi, I created a pod a few days ago and worked with it, no problem. I stopped the pod after the session. Today I try again and suddenly it says 0x A100 80GB and the cuda is not available. If I look at starting a new pod it seems the A100 80GB is available in the same location, so why can't I start my pod with this GPU? What should I do? Is there a way to transfer the data to a new pod?...

Moving storage location

My storage drive is in region EU-CZ-1. But there are no pods available to launch. Is there anyway I can move my storage drive to another region?

is your network volume charged by actual usage or the fixed number keyed in during setup?

is your network volume charged by actual usage or the fixed number keyed in during setup?
Solution:
charged by the quota you ask for

Error 804: forward compatibility was attempted on non supported HW

Writing to the online chat bounces the messages, despite me being obviously connected.
No description

"We have detected a critical error on this machine...failing pods

I get a lot of this errors lately "We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime." I lost pods (H100 in the secure cloud) and don't know why, I had the 6th pod failing today in 2 weeks. Runpod support is not helping either. Someone can help me? I'm not going to use runpod's service anymore till this issue is adressed, thanks. Current pod failing: ID: jfktfsgsvw19i1...

Webhook URL

how i can pass webhook url on JSON body??
No description

stop pod

hello, i am kind of confused. i havent used runpod in a while. I want to stop my gpu instances, butif i select the trash button on my pods, it seems to want to delete the volume. I am using a volume and running secure cloud gpu's. Isnt there a way to terminate pod but keep all the data in the volume?
No description

How to transfer between pods?

I'm running stable diffusion and would like to transfer my outputs to a different pod to continue working. When using runpodctl to transfer data between from 1 pod to another, what is the command? I have tried using runppdctl send “file path name” but this isn’t working for me. What file path should I be using? Can someone share an example of the file paths structure, please? It was suggested I post the question here, I'm not getting an error, it's just that nothing is happening.
No description