Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

How is runpod secret / environment vars for credentials more secure?

I'm looking at the runpod Secret feature for handling AWS credentials. It looks like 'best practice' for handling credentials in a docker image is to set them as environment variables; and Runpod's "Secrets" feature feeds into that. Could anyone explain how using runpod's "Secrets" is more secure than just passing environment variables? If the security concern is to avoid writing your credentials directly into the image and instead pass them on launch with env vars, how do "Secrets" do anything more? Is it a feature for handling credentials within a runpod account managed by a team?...
Solution:
Yes, they are meant to keep keys secure in a team environment. With ENV variables all team members could view your keys in clear text in the template definition.

Get SSH Login Via API

When getting a pod via the API, it does not return any information on connecting via the Basic Terminal Access. Obviously the first bit of the username is pod ID but I haven't been able to identify the numbers proceeding after the dash. How might you get this username via the API or programmatically? ssh cbdf4581hxb1vy-64411092@ssh.runpod.io -i ~/.ssh/id_ed25519 cbdf4581hxb1vy == pod ID...

Llama3 setup

Hi, everyone. We are planning to deploy Llama3 for our app with millions of users. How can we achieve this? And which GPU series or cloud platforms are best for achieving high speed and scalability?...

BROKEN: TheLastBen Fast Stable Diffusion

2024-07-26T18:04:17.934600984Z --2024-07-26 18:04:17-- https://huggingface.co/datasets/TheLastBen/RNPD/raw/main/Notebooks.txt 2024-07-26T18:04:17.960726061Z Resolving huggingface.co (huggingface.co)... 65.9.95.31, 65.9.95.61, 65.9.95.114, ... 2024-07-26T18:04:17.964895834Z Connecting to huggingface.co (huggingface.co)|65.9.95.31|:443... connected. 2024-07-26T18:04:18.292202440Z HTTP request sent, awaiting response... 401 Unauthorized 2024-07-26T18:04:18.292233330Z ...
Solution:
Template has been pulled for a while already because RunPod cancelled the contract with TheLastBen so he removed the files from his repo and broke it.

Network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

ollama won't pull manifest - weird error.

In a runpod I've tried the various ollama templates, and also installed ollama on a basic template. I can run ollama serve; but in every case when I run ollama run <model> I always get the error: Error: pull model manifest: Get "https://registry.ollama.ai/v2/library/mistral-large/manifests/latest": dial tcp: lookup registry.ollama.ai on 127.0.0.11:53: read udp 127.0.0.1:59647->127.0.0.11:53: i/o timeout ...

is disk volume faster than network volume?

I found that network volume is very slow when loading models to gpu. I wonder if disk volume is faster? Is disk volume physically attached to pod? Also can I mount both disk volume and network volume to the same pod machine?...

Cannot open Checkpoints folder - Fooocus

hi, I can't open the Chekcpoints folder in Jupyter Fooocus, when I click on it, it does nothing, nothing happens, but I can open the other folders, I don't understand, I tried to delete them the pod and redo it but it still doesn't work, please let me know.

CA-MTL-1 region has GPU loaded at 87%

I created a pod in the CA-MTL-1 region with A40 GPU. The pod started with GPU 100% utilization and 87% GPU memory in use. I tried it multiple times, but same result.

3 pods inaccessible after network outtage

There was a network outage in EU NO and the pods are up, but cannot start:
error creating container: container: create: container create: Error response from daemon: layer does not exist
error creating container: container: create: container create: Error response from daemon: layer does not exist
This is a second time an incident like this has occurred. I have >2 TB of storage I cannot access. Am I being billed for these pods? No response from support....

Build docker image

I have a pod, I would like to use it to build a docker image, specifically threestudio https://github.com/threestudio-project/threestudio/blob/main/docs/installation.md. But I've heard that running docker in a pod is not supported. How should I build the docker image?

How to set environment variable when launching pod with network volume

I am launching a pod with ashleykza's automatic1111 template using a network volume, however it starts to redownload everything even though it's already on my network volume. She provided an environment variable to skip 'sync'ing, which I thought I did when editing the template overrides as shown in the second pic. Despite this, its still redownloading everything. What am I supposed to set 'key' to here to prevent it from redownloading everything?
No description

'Background' options for Pod Initiated file transfer

I'm trying to scope out if there's a solution to have a runpod send me back a small .db/txt file on completion of task, or of progress before closing due to being outbid and closed (Community pods) I've been looking at rsync, runpodctl, SSH, and they all seem to require transfer to be 'initiated' from the recipient machine I'm looking at the google drive API, which I think is going to be my best bet for an 'always ready to receive' solution. ...
Solution:
You might need something like this, detect the signal and do something: import signal import boto3 import os...

No such image

I just created an image, pushed it to docker.io and created a Pod template referencing this image. However, startup fails due to Error response from daemon: No such image: $IMAGENAME I can pull the image locally from my machine without being logged in to docker.io. Why is my Pod not able to pull the image?...
Solution:
Yep, solved. Building the image with docker buildx build --platform linux/amd64 helped. Not a Runpod issue at all.

network volume usage on pod deploy

I created a community pod with 40 GB volume storage. By default it started with 59% usage. I tried deploying another pod and the same thing happened. This is in the US region.
Solution:
if its really empty, but it says used you can report it from the website's contact button @mathew
Message Not Public
Sign In & Join Server To View
No description

Is it possible to use Runpod to finetune a text to speech model

I am not super tech savvy so I am unsure if this is possible, The TTS is (https://github.com/erew123/alltalk_tts) I know how to connect to runpod via SSH but I dont know how to connecting the two would work if its possible at all.

Predict SSH over TCP command predicting <username> - trying to automate pulling a repo at pod deploy

I want to pull a git repo into the workspace of a pod as it is deployed, i am trying to ssh into a pod without accessing the gui, i know the command has a typical form ssh <username>@<runpodproxy> -i (path to ssh). I do not know how <username> is generated. I can tell that the <username> is <[podID]-[string]>. Anyone know what the [string] is? is it predictable or otherwise associated with the pod? I am also looking into the runpodctl exec python [file] [pod id] command, any suggestions would be appreciated....

text gen webui template not downloading models

wehn I try downlaoding a model on text gen web ui nothing happens

Error response from daemon: driver failed external connectivity on endpoint.

Suddenly I am getting below error when I try to docker compose up The Docker was working fine on the pod. I just made some code changes and rebuilt it and now I getting below errors: Gracefully stopping... (press Ctrl+C again to force) Error response from daemon: driver failed programming external connectivity on endpoint mia-runpod-backend-engine-1 (f4a69cb1cbf0100d22af23c3d5dc5a09aeeac3425476d4bc8bfbf886e42a77f1): Unable to enable MASQUERADE rule: (iptables failed: iptables --wait -t nat -A POSTROUTING -p tcp -s 172.19.0.4 -d 172.19.0.4 --dport 8000 -j MASQUERADE: /usr/sbin/iptables: error while loading shared libraries: libip4tc.so.2: cannot close file descriptor: Error 24 (exit status 127))...