Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Environment variables in direct SSH

Is there a way to access environment variables defined in the web app in an SSH connection over exposed TCP port?

How does runpod handle pod terminating

It is very likely that runpod simply sends a sigkill to the main container process. This is really annoying when you are trying to handle termination. Could you please provide information on how your orche system handles pod termination and how I can get the OS signal

KoboldCpp - Official Template broken

I've tried to launch the KoboldCpp template a few times, but am hitting errors. The model I want to use downloads in two parts (split with commas in launch arguments). The downloads finish and append, but the logs show 'rm: cannot remove './mmproj.gguf': No such file or directory' right before it finishes. The container then restarts and the downloads begin again from square one. These same models worked the last week. I have saved the entire logs if needed.

Secret now showing up in the pod `env` output

hi, i added some secrets and added those secrets as environment variables for my pod, but i couldn't see it when i run env in my pod, i'm using {{ RUNPOD_SECRET_secret_name }} as the environment variable value...
No description

transfer data of a stopped pod to a new one

hey i finished my training on a big pod and i want to share all the data to another pod using the storage (network volume) how can i do that?

pod error

2024-08-19T23:00:50Z create pod network 2024-08-19T23:00:51Z create container runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 2024-08-19T23:00:52Z 2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 Pulling from runpod/pytorch 2024-08-19T23:00:52Z Digest: sha256:75bf115d87ee3813f8026fed3e11bae3bf68bfd789a9566878735245b723ef8b 2024-08-19T23:00:52Z Status: Image is up to date for runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04...

Pod Down for hrs

Any idea how long this will take to resolve, I cannot access my pod.
No description

Can pods shutdown from inside the pod itself?

Wondering if the pod can accept a shutdown command to stop billing

Does runpod provides environments isolation?

Hi, if we want to have two isolated environments, dev and prod, what can I do in Runpod? Thanks,...

error pulling image (US community server)

When creating a new community pod based in the US I get this message: error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers What is the problem here?...

Resuming an on demand pod via sdk

Hello, how can I resume a spot pod through the python sdk? I am using the resume_pod function but I am not able to

Possibility of Pausing a Pod Created with Network Storage

Hello, I am a new user of RunPod. Currently, I am using a pod created through network storage. I noticed that regular pods have a pause function, but I couldn't find this feature in the pod created with network storage. I would like to know if this feature is available for such pods and, if so, how I can use it.

Docker run in interactive mode

Hi, I want to be able to ssh into my pod and run bash commands. If i provide no entry command in my Dockerfile I am unable to connect to my pod via ssh. I also don't see anywhere with the option to edit the docker run command to include the interactive flag. Any help is appreciated...

Made a optimized SimplerTuner runpod : Failed to save template: Public templates cannot have Registr

Hi! I've spent a couple of days creating and testing a Docker flow for RunPod, and I've run the pod privately multiple times with no problems. There is no registry information in the Dockerfile, but I keep encountering this error, with absolutely no indication of its origin or how to fix it. Any help would be greatly appreciated, as we have a community eager to train Flux1 on RunPod....

URGENT! Network Connection issues

Hi, looks like there is a general issue in all pods and all of them are suffering network connection issues. Can someone look into this?

Looking for suggestion to achieve Faster SDXL Outputs

Hi, I am currently trying to generate large amount of images (200+) every session via Automatic1111/Forge UI SDXL Model and was wondering how can I generate them fast? I tried using RTX 3090 for generation and it's about 1.5-2 it/s which is pretty slow in the long run. Perhaps there is a faster alternative to this or workflow? Please let me know. Provide me a workflow and GPU suggestion that can generate large amount of images swiftly....

Official Template not running correct version of CUDA

Hello ! I'm trying to run a pod using the official templates : runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 Unless I completely misunderstood the notation, the image should run with cuda11.8.0 right? I've tried with Secure Cloud RTX 4090 and Secure Cloud RTX Ada 6000...
Solution:
@InnerSun nvidia-smi shows max CUDA version supported by host

I can't run the pod with container start command

bash -c "cd /workspace/ && sh run.sh" I tried with this start command, but it does not work, it seems that it run repeatedly but after I connect to pod and run "cd /workspace && sh run.sh", it work well ...

Volume / Storage issues

I am attempting to install comfy on a few different machines. Before Comfy and the flux dev models are done installing I am getting a out of volume error and cannot run the pod. Could this be just a bad string of luck in a few broken pods? Or am I not srtting something up correctly?...
No description

I'm trying to start a cpu pod using the graphql endpoint and specifying an image

Hey, I've succesfully ran a cpu pod creation using the graphql endpoint, however, it does not seem to follow the same structure as the gpu creation. What I'm trying which is working is: ``` mutation { deployCpuPod( input: { ...