RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

No Logs when build Failed

There are not logs shown how am i supposed to find out why it failed
No description

Help with deploying WhisperX ($35 bounty)

I've been trying to get WhisperX to run on runpod serverless. Here is what I have so far: https://github.com/YashGupta5961/whisperx-worker . The worker deploys but its running into some problems processing the request. I cant seem to debug whats going wrong. I am willing to offer $35 USD to anyone who can get it working with diarization. I know its not much but I hope it can be motivating to bang their head against the wall for me 😄...

How can I connect my code to runpod gpu with api

How can I connect my code to runpod gpu with api

do we get billed partially or rounded up to the second?

If my execution time is 0.35 seconds, will I get billed 1 second for that request or partially?

Max workers increase

Hi we are planning a production launch, currently using serverless setup. We see max workers is 5 right now, and if we have a balance of 100 we can increase to 10. I want to understand what is the process of increasing lets say to 20 or 100 in the future?

Runpod workers getting staggered when I call more then 1 at a time.

So i'm currently running an connected to the endpoint, and I've noticed that the workers tend to be deployed in a staggered way. That is I have a function that is splitting a workload into 50 runpod jobs. However I've noticed that for some reason, my endpoint does not actually use all 50 workers that I have that are ready. Instead it seems like the workers are getting staggered deployed that is I'll see that 36 of the jobs went through and are running and i still have 14 jobs in queue while I h...

Feb 20 - Serverless Issues Mega-Thread

Many people seem to be running into the following issue: Workers are "running" but they're not working on any requests, and requests just sit there for 10m+ queued up without anything happening. I think there is an issue with how the requests are getting assigned to the workers: there a number of idling workers, and there are a number of queued requests, and they both stay in that state for many minutes without any requests getting picked up by workers! ...

Default Execution time out

In the docs it say that all serverless endpoints have a 10 min default execution time out. We have had few instances that the job is stuck in processing for hours. Are the docs incorrect and we need to set the execution timeout manually?

Gpu hosting with API

Hi there. I need gpu hosting 24/7 that I can scale by bringing on more instances as needed than take offline when not using an api. I’m looking at novita but I don’t like their serverless pricing for 4090. Is this something you offer? Thx

Job Stuck in Queue Eventhough worker is ready

I am using serverless endpoint with H100 but I am experiencing high queue time .If you send a single request to runpod enpoint you may get 2 seconds delay time and on same 2nd request you will get queue time of 7 seconds which should not happend.I think they should optimize their queue and worker communication codes ist run: 3 seconds 2nd run: 15.84 seconds...

us-tx3 region cannot spin up new worker

My endpoint is deployed at us-tx3 region. When I submit new request to the endpoint, it spin up new worker in the console. However, there is no log on the console and no response from ssh. Any thought?

Builds are slower than ever & not showing up Logs at all

After the announced improvement of 2-3x on build times, we get 2-3x slow builds and received 0 logs since then. Please raise attention here.
No description

Workers stuck at initializing

For the past couple of days I've been unable to get any workers. I followed the AI advice (https://discord.com/channels/912829806415085598/1341152205549338804/1341152314190204940), and even created a new endpoint with the hope that selecting -almost - all options for everything would help. No workers at all have shown in the new endpoint, and the old is still stuck on initializing as shown in the image for over 6 hours now. What's the problem here?
No description

Avoiding hallucinations/repetitions when using the faster whisper worker ?

worker: https://github.com/runpod-workers/worker-faster_whisper Hi everyone, as the title suggests, I'm encountering an issue where the transcription occasionally might repeat the same word/sentence. When this occurs it ruins the entire transcription from the point where it happens....
No description

Serverless Docker tutorial or sample

Hi, where can I get a dockerfile deployment tutorial? I'm interested of deploying a custom docker image on a serverless...

Baking model into Dockerimage

Hello, im trying to bake or rather, downlaod the model via vllm directly while building so the image contains the model. I havent found any kind of simple "vllm download" command sadly. The onlny was is either by running vllm and afterwards adding the file to the image which would be too big to host on my registry or let runpod serverless build the image for me with its doing it while building

Facing Read timeout error in faster whisper

When i pass the link, sometimes its getting failed with the error read timed out. 8od67s8p9ijjao[error]Captured Handler Exception 8od67s8p9ijjao[info]Failed to download https://gen7.icreatelabs.com/generate/download?mp3=azhoM2gzaTljN2gxZzFnMWYyaDN5N24yeDdvNGIxejB5N3owZTF4N3A2ejB0MXg3ajl5N2cxdDFsMHYydjJ6MGIxZzF4OWwweTdqOWEzZzFxMGsxdTN5NnczaDNzM2w4YTN5N2Ex: HTTPSConnectionPool(host='ytdl.vreden.web.id', port=443): Read timed out. (read timeout=5)\n 8od67s8p9ijjao[info]_common.py :120 2025-02-15 12:10:08,546 Giving up download_file(...) after 3 tries (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='ytdl.vreden.web.id', port=443): Read timed out. (read timeout=5))\n 8od67s8p9ijjao[info]_common.py :105 2025-02-15 12:10:01,305 Backing off download_file(...) for 0.1s (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='ytdl.vreden.web.id', port=443): Read timed out. (read timeout=5))\n...

Seems like my serverless instance is running with no requests being processed

Seems like my serverless instance is running with no requests being processed. There are no active workers or anything keeping the instance active based on the logs or requests tabs but it's been running
No description

Flashboot not working after a while

Hello, i wanted to ask why sometimes the Flashboot works when i have a worker in idle and sometimes it doesnt. It seems when a certian amount of time has passed that it simply is doing a cold start again. Is this normal? Is there anything to prevent this?...

Why isn't RunPod reliable?

I have 3 workers setup. When I submit a request sometimes it sits in the queue for 5+ minutes before processing begins. I can see a single worker running while the rest idle, but the work isn't getting done. This isn't suitable for production if it takes 5+ minutes to kick off a job. Am I doing something wrong or does this service just not work well?
No description