Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

am i billed for any of this

4 on idle, havent touched them for 10 mins, initialising one is extra tho, which i didnt ask for. do i only get billed when a request is sent from start to finish?
No description

Unauthorized while pulling image for Faster Whisper Template from Hub

Getting the following error suddenly while using the Faster Whisper Template from the Hub, worked fine before: loading container image from cache Loaded image: registry.runpod.net/runpod-workers-worker-faster-whisper-main-dockerfile:bd500dc88 error pulling image: Error response from daemon: unauthorized...

ComfyUI Serverless Worker CUDA Errors

Some serverless workers run into runtime cuda errors and fail silently. Is there anyway to tackle this? Can I somehow get runpod to fire me a webhook so I can atleast retry? Any solutions to make serverless more predictable? How are people deploying production level comfyui inference on serverless? Am I doing something wrong?...

CUDA error comfyui

I'm getting this error, everything used to work so far, Idk what's wrong 🤔 id: dbfc886a-8465-4112-ba2c-e1c8e297bbb4-e2 Workflow execution error: Node Type: CLIPTextEncode, Node ID: 10, Message: CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n\n"...
Solution:
I'll try RTX 4090 now

Is there any way to force stop a worker?

So i have a project that requires a worker to run to access the app on this worker via this app api. problem is, this takes about a minute and then i dont need it. but worker keeps running for about 8 mins when i dint need it. so i was wandering, is there any way to force stop? i didnt find any api calls in docs, execution timeout option seems to do nothing, and even canceling the job and purging queue via api doesnt stop it. Help would be appreciated

Long delay time

HI , my serverless inference requests always have a long delay time 40 - 50 seconds. What is exactly this delay time ? My docker image is quite big, would making it smaller reduce the delay time ? Thanks you

How do I attach an image to my prompt?

The docs only shows: ``` { "input": {...

stuck at pending after pushing

not like this noooooooooo this is crucial hotfix
No description

Can't Create 5090 Endpoint via REST APIs

Creating RTX5090 serverless endpoint via REST APIs returns error as follows:
{"error":"create endpoint: create endpoint: graphql: gpuId(s) is required for a gpu endpoint","status":500}
{"error":"create endpoint: create endpoint: graphql: gpuId(s) is required for a gpu endpoint","status":500}
...

Workers downloading and extracting docker images again and again, not loading from cache

This is happening even after workers have already previously downloaded the docker image, also this is only happening in one of my endpoints, rest are working normally.

Getting workers id via api?

Can i get worker ids inside of the serverless endpoint via api? by id i mean the name of the worker. Was looking for it all over your documentation, but didnt find any info except getting endpoint id, but not the workers one

Runpod serverless wan 2.2

Does anyone has the solution/tutorial for deploy serverless endpoint with Wan 2.2 t2v 5B model?

why is serverless 2x more expensive than normal pod?

5090 on normal pod is 0.89$ / hour 5090 on serverless is almost 1.54$ / hour the marketing line says its "cost effective", but normal pod is twice is more effective?...

Bad performance on runpod

Hi : The inference my docker image locally on a RTX 4070 is way faster than in your RTX 3090 serverless. I was expecting some speed increase or at least same speed. Im using Nemo Nvidia Diarize model and 1 hour long audio takes me 85 seconds to process on my 4070 using same image as the one used by your worker while it takes 160 seconds on the 3090 on runpod. Also I use torch.multirpocess to spawn 2 process 1 for the transcirption using whisperx and one for the diarization in parallel. I don't...

Serverless max 2 workers, queue delay with 5s idle timeout : Create 4-6 idle workers not terminated

Serverless max 2 workers, queue delay with 5s idle timeout : Create 4-6 idle workers not terminated

Worker went idle before finishing downloaded my doker image

It seems that the worker kept trying to push my 30gig docker image and went idle before finishing to download it. Request still in queue and no more test credit. I would just like to test one serverless inference on your platform before making a choice to use it or not on production for my app.

Add more worker restriction options?

I'm running a video encode application inside serverless and I've ran into an issue where some workers just don't support the technologies I am using, specifically NVENC and Vulkan in my case. Currently the only way to fix it is by removing the entire region from the allowed data centers, which removes a bunch of workers out of the selection that would have worked perfectly fine. I know this might be a niche use case because most of your customers are doing AI but would it be possible to add mor...

Delay Time spike via public API; same worker next job is ~2s

Hey folks! I’m seeing intermittent high Delay Time on a serverless endpoint and would love a sanity check. Setup: A40 (others enabled), concurrency=1, Auto Scaling = Request Count (4 req/worker), up to 9 workers, Min Workers sometimes >0. Symptom: Via public API, Delay Time jumps to 1–2 min. Same worker then handles the next request with ~2s delay. Execution Time goes from ~1m8s (first) to ~27s (next). Logs during slow runs look like cold start: Questions:...

Delay time of 120,000 ms?

Runpod is advertising <250ms cold start times. I am running a custom ASR model that isnt more than a couple of gigabytes. Total docker image is 11gb. For some reason, the delay time is inifinite and the request never goes through. Any ideas?...

Terraform provider or alternative deployment strategy?

Currently I release new versions of my infrastructure as part of a terraform apply in my CI/CD. Does Runpod offer a Terraform provider or an API which I can POST a new version of my Docker image to in order to trigger a new release?