RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Automatic stop Pod after some time while using ollama

Hi everyone, As I wrote in the title, I would like my pod to "wake up" at 8AM from monday to friday and stop them when it's ollama endpoint is not triggered after 30 minutes. Is something like this possible?...

Can't view my ComfyUI workflow even though i exposed ports

I exposed some ports but i get 'Not ready' and when i try to access the ports , i get a Bad gateway error.The only port that opens is the 8888 (jupyter) port. I'm using the runpod pytouch template on the pod...
No description

Trouble comparing pods

Is there any way to compare different pod performance in terms of GB RAM , GB VRAM, and VCPU?

Starting a pod with runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 has cuda version 12.6

I am confused what determines the cuda version of a pod I start. I would expect that when I start a docker image with a cuda version in the name that it has this cuda version bundled into the image and when I start the pod that this is the cuda version I see, but this is not the case. How can I start a pod with a predictable cuda version?

Broken pod

RunPod Pytorch 2.4.0 ID: qq5a8cbw7q0jms 2024-11-18T16:42:49Z create container runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 2024-11-18T16:42:49Z image pull: runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04: pending 2024-11-18T16:42:57Z error creating container: container: create: container create: Error response from daemon: No such image: runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04...

No GPU available

Hi, I have been since yesterday without GPU's available for my pod with 3x4090, sometimes it has happened to me but after a few minutes I have been able to boot with a GPU. I use a virtual disc, I use SD and the other machine that uses flux if I can boot it, any solution or do you know why this happens?...

I am not using GPU, but someone else is occupying my GPU. What is the solution?

ID: xx5vmcdbbkab3m A100 *6 / 1 week service. When I first initialized it, it showed that someone else was occupying my GPU. How should I handle this?...
No description

"There are no longer any instances available with the requested specifications."

I've been getting this error a lot lately when trying to deploy a pod even when the GPU is listed as being available. This persists after multiple page refreshes. A feature request is to help with finding an available instance either by giving feedback with the error on the resource that's not matching the availability list (e.g. drive storage) or to allow filtering by more parameters so I can see what's actually available and source a machine that meets my needs.
No description

Stuck on Pod Initialization

Hi everyone, I’m new to RunPod and facing an issue while setting up a GPU pod. Every time I try to launch one, it gets stuck during initialization and shows “Waiting for logs,” but no logs are generated. I’ve tried using different servers, CPUs, and GPUs, but the problem persists across all scenarios. I would greatly appreciate any guidance or suggestions to resolve this issue. Thank you!...
No description

MI300X in RO cannot be created

Creating the pod is failing because ``` 2024-11-17T12:20:23Z Status: Image is up to date for runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04 2024-11-17T12:20:23Z error creating container: container: create: container create: Error response from daemon: layer does not exist...

Pods getting erased/terminated

Finally decided to give runpod a try, deposit some credit and deploy on spot node with network volume. Several seconds after it runs, it getting erased automatically, thought it was because on spot. Tried to deploy on-demand, at it gones too. now when I tried to access my account i, the runpod website just keeps loading. tbh not a really good first experience any help?...
No description

Hosting RunPod as an API endpoint

I have hosted the workflow from the runpod pods service. Is there any way to host it as an api endpoint or work as a script based on the user input?...

accessing nginx server on my local machine

hello, can anyone help me access a server hosted on localhost:8000 on my pod from my pc's web browser. or can you provide a basic setup of nginx and how to access it on my local machine. i have gone through the documentation of exposing ports and it didn't work well...
Solution:
Try to break it down into smaller problems: 1. HTTP server. 2. Connectivity. 3. Nginx config. ...

Does the Kohya_ss template support FLUX?

Want to use Kohya to train Flux dreambooth models, just curious if it works with your settings or I would have to upload my own install of Kohya to do so

Network volume permissions

Is there a way to change permissions for files/directories on a network volume? I’d like to save Postgres data to a network drive but the directory needs permissions 700 or 750 rather than 777. I haven’t been able to find a way to change any permissions for any file/directory on a network volume. Permissions for container volumes can be modified no problem with chmod....

How to migrate serverless endpoint to a pod?

I have a strange use case in which I have a functional serverless endpoint that must run on AMD hardware (for none technical reasons) Everything is setup and working currently running on NVIDIA hardware. AMD hardware is not yet available for serverless, can I recreate the serverless behaviour using a pod?...

Ollama on Runpod

After following all instructions in the following article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20 I am able to setup a Ollama on a pod, however after a few inferences, I get a 504 (sometimes 524) error in response. I have been making inferences to Ollama on a Runpod pod for the past few months now, and never faced this issue, so it's definitely more recent. Any thought on what might be going on?...

My pod is down, and won't restart

After this log, the pod is down and won't restart. I tried restart pod, stop pod, reset pod, but nothing doesn't work
No description

A100 PCIe is not working with EU-RO-1 storage.

I have created storage(network volume based on EU-RO-1) A100 PCIe is available. But I am getting an error while deploying runpod instance. There are no longer any instances available with the requested specifications. Please refresh and try again. whats wrong with me?...
Solution:
hmm maybe tthe gpu is taken, low on stock, and there are no currently