RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Open-WebUI 404 Error

When using the Better Ollama CUDA 12 template, and following the instructions found here: blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide, getting an error when posting a query using open-webui: Ollama: 404, message='Not Found', url='https://<snip>-11434.proxy.runpod.net/api/chat' Interestingly enough, replacing the open-webui localhost URL with the above URL works well with cURL using network diagnostics. Wanted to replicate the issue on a less expensive server, but can no longer find the template....
No description

Why is upload speed so slow?

A week back when I downloaded a 6BG checkpoint, it took 1-2 hours. Now it's telling me it'll take 12 hours. Is there a reason for this?

GPU errored, machine dead

Search 0 matches 2024-09-04T11:12:09Z stop container 2024-09-04T11:12:44Z remove container...

Slow Container Image download

Two EU datacenters are experiencing extreme slowdown during docker container image download, EU-SE-1 and EU-RO-1, to the point where our scaler can't keep up with load spikes because it takes > 30 minutes to start up a pod. This needs to be resolved as it's directly costing us money, we can't properly scale, causing our queue to keep spiking and building. Alongside being forced to use on-demand vs spot because of the slow download speed....

Can I specify CUDA version for a pod?

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.4, please update your driver to a newer version, or use an earlier cuda container: unknown vLLM based container image fail to start...
Solution:
In deploy click Filters and you can specify Cuda version there.

Pods wont start

Looks like auth to hugging face failed, cannot launch any pods - tried with multiple configs, same result. Clicking on start web terminal does nothing, sometimes connect to jupyter button appears but does not do anything. Pod ID: 5d15c6q1grfm6p ``` .254316737Z ...done....

create POD with full Intel Sapphire Rapids CPU chip for Parallel Algorithm scalability test.

Hi, I usually create PODs for GPU tasks, accessing through ssh, so I am very familiar in that sense. But now we need to rent a POD with just a modern Intel CPU fully available for us. In particular, we need one with Intel Sapphire Rapids architecture, so that it supports AMX matrix instructions. This is for a parallel CPU algorithm for which we need to obtain performance and energy consumption results (plots). I went to the menus of runpod but i could not find options on the CPU side, neither exact info of the CPU model of the pod. Am i missing something too obvious? Thanks in advance...

My pod had been stuck during initialization

ogw47gdxzk3a26 - stuck during image pulling. Could you checkout what happened and handle that issue, because our infra is not ready to handle this kind of your errors.

Creating instances with a bunch of open ports

I'm using several gpu pods. I faced the the lack of open ports. afaik, while creating instances, the number of ports is restricted. Only support at most 10 ports. How can I get 20 ro 30 ports while creating an instance?...

creating instance from an image file

i want to make an image from an image file (faster than using registry), any idea how to do it? i prefer to use the runpod storage, because it is faster that way.

Creating pods with different GPU types.

Hello, Can I create pods with different GPU types? Say I want to create a pod with 2 A40s and 1 RTX A5000. I asked because I there is a gpuTypeIdList property on the runpod graphql specs. Also, it would be amazing to have that feature. Thanks!

Slowish Downloads

I'm trying to setup a pod running ComfyUI for Flux at the moment, and it's going to take 30-40 mins just to download the models with the speed it's running at. ```Downloading 1 model(s) to /workspace//storage/stable_diffusion/models/unet... Downloading: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors 0K .......... .......... .......... .......... .......... 0% 10.9M 34m23s...

can't cloud sync with Backblaze B2

I need help, I can't do cloud sync with Backblaze B2 I put the key ID and the application key and the bucket root path but it says Something went wrong!...

How do i deploy a Worker with a Pod?

I have deployed a worker with a Serverless deployment, now i expected to be able to deploy the exact same image to a Pod and be able to have an endpoint URL to make a similar Worker request, but i'm not having success? I am currently using the following as the initial entrypoint for handler.py...
runpod.serverless.start({"handler": handler})
runpod.serverless.start({"handler": handler})
Is there any doc that discusses how to get a Serverless Worker deployed to a Pod? thx....

Funds not appearing in account balance

Hi - I deposited 300 dollars in my account. I got emailed the receipt. But the funds haven't been deposited as credit - could you look into this please?

Very inconsistent performance

I recently started using Runpod - and am a fan of the setup simplicity and pricing. I have recently noticed a huge amount of inconsistency in performance with identical training runs taking up to 3x longer to finish. I am on the secure cloud. Do you know why this may be?

Can someone help me fix my tensorflow installation to see the gpu?

I've been trying to fix this for over a week. Running the official template with pytorch 2.1.0, cuda 118...
No description

save state of pod to persistent storage?

HI, once I'm done training with a pod, is there a way to save my storage/current state off to a 'longer term' storage so I don't have to go through setting everything up again via ssh when i do my next training session?...

There's inconsistency in performance ( POD )

Hello. I rent and operate 20 RTX4090 GPUs all day long. However, there are significant differences in inference speeds. Each line in the table in the attached image represents 2 RTX 4090 GPUs. One processes 150 images in 3 minutes. However, the rest only process 50-80 images. On my own RTX4090 2-way server that I purchased directly, the throughput is 180 images processed in 3 minutes. I haven't been able to figure out why these speed differences are occurring. The inference task is generating one image....
No description

Pod's connection is less stable that the tower of babel

I'm trying to use ollama in a container on runpod as a pod and I keep running into connection errors over and over again. I've tried different pods, Secure Cloud vs Community and different GPUs but I keep getting timeouts like this:
ResponseError: <!DOCTYPE html> <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]--> <!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]--> <!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]--> <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]--> <head> <title>3k27lkqzwstw36-11434.proxy.runpod.net | 524: A timeout occurred</title>
ResponseError: <!DOCTYPE html> <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]--> <!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]--> <!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]--> <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]--> <head> <title>3k27lkqzwstw36-11434.proxy.runpod.net | 524: A timeout occurred</title>
...