Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

Robbie

6/23/2024

Why the available GPUs are only 1?

I want to run my pod with at least 2 gpus. My pod is A5000. Now available gpus are ony 2. what happened?...

Solution:

@Robbie if you created pod you cant edit number of gpus's you would need to make new one with correct amount

svennova

6/21/2024

Faster-Whisper worker template is not fully up-to-date

Hi, We're using the Faster-Whisper worker (https://github.com/runpod-workers/worker-faster_whisper) on Serverless. I saw that Faster-Whisper itself is currently on version 1.0.2, whereas the Runpod template is still on 0.10.0. There are a few changes that have been introduced in Faster-Whisper (now using CUDA 12) since, that we would like to benefit from, especially the language_detection_threshold setting, since it seems like most of our transcriptions done by people with British accent are being transcribed into Welsh (with a language detection confidence of around 0.51 to 0.55) - which could be circumvented by increasing the threshold....

zkreutzjanz

6/21/2024

Slow IO speeds on serverless

An A6000 always active worker takes twice as run to run my code than a normal A6000, I think it is IO speed. How can I see IO speeds?

Solution:

It looks like the method I was using for seeking had a really high IO. Changing to another method sped up serverless a lot, but not necessarily a ton on pod. This leaved me to believe that serverless IO is just slow

Hermann

6/20/2024

How to download models for Stable Diffusion XL on serverless?

1) I created a new network storage of 26 GB for various models I'm interested in trying.
2) I created a Stable Diffusion XL endpoint on serverless, but couldn't attach the network storage.
3) After the deployment succeeded, I clicked on edit endpoint and attached that network storage to it. So far so good I believe. But how do I exactly download various SDXL models into my network storage, so that I could use them via Postman?...

Yash

6/20/2024

0% GPU utilization and 100% CPU utilization on Faster Whisper quick deploy endpoint

I used the "Quick Deploy" option to deploy a Faster Whisper custom endpoint (https://github.com/runpod-workers/worker-faster_whisper). Then, I called the endpoint to transcribe a 1 hour long podcast by using the following parameters: ``` { 'input': { 'audio': 'https://www.podtrac.com/pts/redirect.mp3/pdst.fm/e/traffic.megaphone.fm/ISOSO6446456065.mp3?updated=1715037715',...

thisisfine

6/19/2024

Loading models from network volume cache is taking too long.

Hello all, I'm loading my model like following so that I can use the cache from my network volume. model = AutoModel.from_pretrained(...

digigoblin

6/19/2024

Are webhooks fired from Digital Ocean?

I setup a WAF in AWS to block bots and I am getting a bunch of requests to my RunPod Serverless Webhook blocked by AWS#AWSManagedRulesBotControlRuleSet#SignalKnownBotDataCenter . The IP address in these requests seems to be a Digital Ocean Data Center. I have disabled the WAF for my ALB for my RunPod webhooks temporarily, but hoping that someone can confirm whether these are legitimate requests or not, because I was under the impression that RunPod uses AWS and not Digital Ocean.

Bitman

6/18/2024

best architecture opinion

Hello, I would like to build an app that out of 1 prompt specified by a user, create 10 prompts. Then call a model once for each of these 10 prompts, giving me 10 responses. Then, do a final call to aggregate the 10 responses into one final response that will be returned to the user. My question is the following, do you have any advice on how to build this ? option a) send the user prompt to the serverless endpoint, and within the endpoint, create the 10 prompts, and call the model sequentially, and then one last time to aggregate the result. All of that in 1 call from the user to the serverless endpoint...

Ardgon

6/18/2024

Cancelling job resets flashboot

For some reason whenever we cancel a job the next time the serverless worker cold boots it doesn't use flash boot and instead it reloads the llm model weights into the gpu from scratch. Any idea why cancelling jobs might be causing this problem? Is there maybe a more graceful solution for stopping jobs early than the /cancel/{job_id} endpoint?

Hermann

6/18/2024

RUNPOD_API_KEY and MAX_CONTEXT_LEN_TO_CAPTURE

We are also starting a vLLM project and I have two questions: 1) In the environment variables, do I have to define the RUNPOD_API_KEY with my own secret key to access the final vLLM OpenAI endpoint? 2) Isn't MAX_CONTEXT_LEN_TO_CAPTURE now deprecated? Do we still need to provide it, if MAX_MODEL_LEN is already set? ...

Hermann

6/18/2024

Do I need to allocate extra container space for Flashboot?

I'm planning to use Llama3 model that takes about 40 GB space. I believe Flashboot takes a snapshot of the worker and keeps it on the disk to load it within seconds when the worker becomes active. Do I need to allocate enough space on the container for this? In this case, since I'm planning to select a 48 GB vRAM GPU, do I need to allocate 40 GB Model + 48 GB for snapshot + 5 GB extra space = 93 GB container space?
Thanks...

jax

6/17/2024

When servless is used, does the machine reboot if it is executed consecutively? Currently seeing iss

When servless is used, does the machine reboot if it is executed consecutively? Currently seeing issues with last execution affecting the next

lexicon

6/17/2024

How can I view a generated image or video ?

Other than storing it in s3 ?

사탄

6/17/2024

unusual usage

Hello ! we got billed weirdly this past weekend...

Itay Elgazar

6/16/2024

Slow I/O

Hey, I am trying to download a 7GB file and run a ffmpeg process to extract an audio from that file (its a video). Locally it takes on average around 5 minutes, but when I try it on the cloud (I chose the CPU, general purpose since a GPU doesn't seem to give any advantage here) and it looks like the I/O is SUPER SLOW. Is there anything I can do to speed up the Disk I/O?...

galakurpismo3

6/14/2024

Problem with RunPod cuda base image. Jobs stuck in queue forever

Hello, I'm trying to do a request to a serverless endpoint that uses this base image on its Dockerfile FROM runpod/base:0.4.0-cuda11.8.0 I want the serverside to run the input_fn function when I do the request. This is part of the server side code: ```model = model_fn('/app/src/tapnet/checkpoints/')...

Solution:

Hmm yeah I guess python 3.11 is missing from that runpod base image..

Message Not Public

BBAzn

6/14/2024

runpod-worker-a1111 and loras

I dont think my loras are working with this worker? But it seems to be able to get loras with the /sdapi/va/loras https://github.com/ashleykleynhans/runpod-worker-a1111/blob/main/docs/api/a1111/get-loras.md so am i able to use loras with this worker or no?...

digigoblin

6/14/2024

Intermittent connection timeouts to api.runpod.ai

```json { "endpointId":"oic105cyzlovnk" "workerId":"3cwou4m0x6hxl0" "level":"error"...

shensmobile

6/13/2024

vLLM streaming ends prematurely

I'm having issues with my vLLM worker ending a generation early. When I send the same prompt to my API without "stream": true, the prompt returns fully. When "stream": true is added to the API, it stops early, sometimes right after {"user":"assistant"} gets sent. It was working earlier this AM, I see this in the system logs around the time that it stopped working: 2024-06-13T15:37:10Z create pod network 2024-06-13T15:37:10Z create container runpod/worker-vllm:stable-cuda12.1.0 2024-06-13T15:37:11Z start container...

fireice

6/13/2024

Why no gpu in canada data center today?

My network volume is in ca-mtl-1, there is no any gpu now.

Solution:

Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨｜incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!