Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

Flynn

8/27/2024

Looking for best A1111 Stable Diffusion template

Anyone know any custom templates for Stable Diffusion A1111 that have ADETAILER and CONTROLNET extensions pre-installed?

Flynn

8/27/2024

No A40s available

Been checking all throughout the day, but no A40s are available. Anyone know why?

streamize

8/27/2024

community cloud spot POD

A spot instance suddenly automatically switched to an on-demand instance. Is this normal? Also, when downloading Docker images, it often fails or becomes slow. (The speed variance when downloading Docker images each time a pod is created is too large (depends on luck)). Is this normal? Is there a way to minimize this? I host an average of 20 RTX 4090 instances for about 12 hours a day, automatically removing or adding pods to match demand. I'm curious about situations where Docker image downloads suddenly fail and about the behavior of spot instances...

lil_sean

8/27/2024

Does the pod hardware differ a lot in US?

Hi, We deployed several times in US region (secure cloud) with runpod cli, but the inference performance/speed differs a lot, even model loading time differs a lot, what's the reason? and how do I know what data center I'm using. it only shows 'US'. thanks...

delirious

8/26/2024

GPU requires reset

Restarted and re-created the pod a couple times, getting the same error on container start. I assume it keeps grabbing the same bad node. I was able to start the container by switching to a different instance type. 2024-08-26T21:15:45Z error creating container: nvidia-smi: parsing output of line 5: failed to parse ([GPU requires reset]) into int: strconv.Atoi: parsing "": invalid syntax Pod ID: 2hvpqmtrowunjp...

tallariel

8/26/2024

Problems updating admin passwords on kasm image

I'm trying to change the default admin and kasm passwords on a kasm instance using the image runpod/kasm-docker:cuda11 once the pod is running, I login via ssh and successfully use passwd to change the admin password. Then I successfully change the kasm password using vncpasswd -u kasm_user then when i login using kasm, i can login successfully but the screen is completely gray and the cursor doesn't appear. something's broken and i have no clue what it is. ...

Flynn

8/26/2024

No A40s available?

I have my pod on an A40 and -a lot- of material i've downloaded onto it, but the A40 gpu's have been taken up all night. Is there anyway to quickly transfer all my downloaded material to another pod, or will the lack of availability be solved quickly?...

bluekk__

8/26/2024

kernel dying issue.

Starting today, the kernel has suddenly stopped working properly, and it keeps dying or failing to run. I need to quickly check the results, but all my work has come to a halt. I need a quick response regarding this kernel dying issue.

Bob

8/26/2024

Running out of disk space

I am trying to load a large dataset to train my model. How do I increase the available disk space of my pod?

Bob

8/25/2024

Interested in multinode training on Runpod

Hi guys, my team is interested in using RunPod for multinode training. We are looking for 24-96 a100s for larger scale model training. Do you guys currently support this?

Guiman

8/24/2024

Continuous Deployment for Pods

Hello, I recently transitioned from using Serverless Endpoints to Pods, but I'm encountering issues with my existing build and deployment workflow. Previously, with Serverless Endpoints, I had a setup in GitHub where I used GitHub Actions workflows to build container images, push them to my registry, and update the template image reference via the GraphQL API. When I updated the template, the endpoint would automatically restart and pull the new image. However, with Pods, this behavior doesn't seem to work the same way. Even after updating the template, the Pod continues running the "old" image and doesn't refresh automatically. Could you suggest a method to trigger a dynamic update or replacement of the Pod? Additionally, are there any other deployment strategies you recommend for my situation? I appreciate your assistance! ...

Mandragora.ai

8/24/2024

Production pod suddenly unreachable, how long can I expect this to last for? (Please provide ETA)

Hi, I have an On-Demand Secure Cloud pod that runs the backend for my app. My app is now not working, and the pod has the message in the screenshot. How long can I expect this to last for? Minutes? Hours?

polar

8/23/2024

Test Support Thread

Test Support Description

PierrunoYT

8/23/2024

How to stop a Pod?

sluzorz

8/23/2024

Maximum number of A40s that can run at one time

I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?

June Thai

8/22/2024

Cannot SSH over exposed TCP (multiple pods, tested from different local machine)

Hi @here I cannot SSH over TCP but is able to do basic. I suspected my Docker at first, but I have the same issue with multiple Docker image. I tested it from multiple local machine. This is the verbosed error message: debug1: Reading configuration data ~/.ssh/config...

sluzorz

8/22/2024

Does RunPod support other repos other than Docker Hub?

Wodering if we can use AWS or GitHub as an alternative

1AndOnlyPika

8/22/2024

Persistent container disk

Is there a way to make the container disk mounted at / persistent for a pod instead of the additional drive at /workspace or whatever?

remi

8/21/2024

How to avoid Cloudflare timeouts on pods?

I saw a previous post mentionning using the public IP but it doesn't seem to work for me? I'm using runpod to host a vLLM server (the serverless endpoint doesn't work for me). I'm running batch workloads and those timeout (cloudflare)...

pac-man

8/21/2024

Environment variables in direct SSH

Is there a way to access environment variables defined in the web app in an SSH connection over exposed TCP port?

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!