Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

What are some good analogues of Runpod Serverless?

Because the original serverless has not been working for 2 days already

Finish task with error: CUDA error: no kernel image is available for execution on the device

I get this error always in all my workers now:
3czrvanpdpzxz3[error] [31m[2025-10-14 19:48:25] ERROR [0m [34m[Task Queue] Finish task with error: CUDA error: no kernel image is available for execution on the device\n
3czrvanpdpzxz3[error] [31m[2025-10-14 19:48:25] ERROR [0m [34m[Task Queue] Finish task with error: CUDA error: no kernel image is available for execution on the device\n
Also, it is after gpu increasing (was 24gb, now 32). ...

Throttling on multiple endpoints and failed workers

All of our endpoints with RTX 4090 workers are fully throttled, some with over 100+ workers. There is no incident report or any update here or the status page. Workers consiostently come up and get stuck in loading the image and to top it all they are in the executing state and charge the account.

Ongoing Throttling Issues with Multiple Serverless Endpoints

Hey guys, hi! I'm having ongoing throttling issues with several serverless endpoints in Runpod (thfv8pa98n0zmx, 3uo2k0k7717auu, 9o42o47k1v1wn)—they've been stuck for the second day now and it's disrupting work. Which section/channel should I post a detailed support request to get a quick response?

Serverless throttled

Hi! Since yesterday I can't run my serverless endpoint - I'm constantly being throttled or given unhealthy workers. Can we do something to make it work?
Solution:
I believe I've spoken to all of you in a mixture of other threads and in the general channel - but sharing this for visibility: Throughout this week we've been running emergency maintenance and the users most affected are those running serverless workloads with popular GPUs. Where we may have a surplus of a specific GPU, we have to delist the machines that host the GPUs (where it's up to 8 GPUs per machine) to perform work on them. We are obligated to perform this maintenance across the fleet and only ask for your patience until it's done and we can disclose the reason....

Huggingface cached models seems to not working

i`m added repo but no models inside container and no logs
No description

vLLM jobs not processing: "deferring container creation"

We just noticed there are 2000+ jobs waiting in our queue and no jobs in progress. I'm getting super-frustrated with Serverless. In the logs I see this message: "deferring container creation: waiting for models to complete: [meta-llama/llama-3.3-70b-instruct]" I just terminated a few workers hoping that they would start back up and work again, but can someone help me figure out how to resolve this? Why are my workers not processing jobs (which has been working mostly ok for a couple of weeks now with no changes)...

serverless pod download of docker images on system on first time takes a long time.

15 mins for a 8Gb image top download. is that normal? Any way to optimize? I am using dockerhub

Serverless Worker Crashed but Request Still Running

A serverless worker suddenly died. However, an inexplicable phenomenon occurred — the request is still being processed. Endpoint ID: h10qsr3s6f5puk Request ID: 1c30dc85-d5e4-472d-b5e8-034d40249e7c-e2 Worker ID: 01yuqqrddjl88x...
No description

CI/CD for runpod. How to automatically trigger all workers to update their docker images?

When my ci/cd pipeline updates, I want to automatically trigger runpod workers to close and use new set of docker images. how to do that?

/dev/nvidia-caps never mounts

Hey all—seeing NVENC fail on the RTX 4090 serverless pod even though NVIDIA_DRIVER_CAPABILITIES=compute,display,graphics,utility,video is set. Boot log shows /dev/nvidia3, /dev/nvidiactl, /dev/nvidia-uvm, etc., but /dev/nvidia-caps never mounts, so FFmpeg’s OpenEncodeSessionEx returns “unsupported device (2)”. Looks like the caps device hook isn’t firing for this SKU—could someone from the RunPod team confirm whether the serverless runtime is supposed to surface /dev/nvidia-caps/nvidia-cap*? Happy to share the bootcheck output if useful....

Nvidia-smi parsing error

Hi, I’m encountering an issue when launching a video generation on my RunPod instance. Sometimes, this error message appears repeatedly: ...
No description

Workflow TO Api wizard not working properly

I've run 3 different docker images, and it keeps saying build failed but i don't know why it doesn't give me any error messages Can anybody help?...
No description

Serverless GitHub builds failing

I'm getting this error "Pod could not be created for tests. This is most likely because there were no instances available to" when deploying https://github.com/runpod-workers/worker-comfyui via Runpods GitHub serverless deployment, how can I resolve?

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

I am using CUDA 12.9, and I have also set "Allowed CUDA Versions" to 12.9, but sometimes an error occurs. Could you please tell me how to solve this problem? logs show RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
No description

The Delay Time is extremely long

The Delay Time is extremely long. Out of 10 access attempts, 5 of them result in a Delay Time of at least 5 minutes or more. This issue did not occur in previous versions. Could you please advise on what might be causing this problem?

Serverless loadbalancer scaling config (Request Count)

From the documentation: "Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently. Total Workers Formula: Math.ceil((requestsInQueue + requestsInProgress) / 4) Use this when you have many requests and workers won't have a chance to idle (e.g., with vLLM). This allows your app to scale down when traffic drops. With queue delay, once a worker scales up, if it's always busy, which makes scaling down harder." It's not clear at what interval Math.ceil, runs at. If my server can handle 10 requests per second, what's a good value for the Request Count? Also, since my FastAPI handels an internal Queue, requestsInQueue will always be 0. ...
No description

Serverless FAILING to add Workers

I have a queue-based endpoint created & i have 4 requests in the pipeline. It's been over 30-40 mins and Serverless has failed to recruit any new H100 worker for me. I don't have any data-centers (regions) specified....
No description

Serverless crashing

why does my Browser Crash after editing/browsing for things in a running serverless instance? Just whenever i do anything in serverless my Whole browser just lags and then crashes Anything else like pods etc. works fine...