Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

why is serverless 2x more expensive than normal pod?

5090 on normal pod is 0.89$ / hour 5090 on serverless is almost 1.54$ / hour the marketing line says its "cost effective", but normal pod is twice is more effective?...

Bad performance on runpod

Hi : The inference my docker image locally on a RTX 4070 is way faster than in your RTX 3090 serverless. I was expecting some speed increase or at least same speed. Im using Nemo Nvidia Diarize model and 1 hour long audio takes me 85 seconds to process on my 4070 using same image as the one used by your worker while it takes 160 seconds on the 3090 on runpod. Also I use torch.multirpocess to spawn 2 process 1 for the transcirption using whisperx and one for the diarization in parallel. I don't...

Serverless max 2 workers, queue delay with 5s idle timeout : Create 4-6 idle workers not terminated

Serverless max 2 workers, queue delay with 5s idle timeout : Create 4-6 idle workers not terminated

Worker went idle before finishing downloaded my doker image

It seems that the worker kept trying to push my 30gig docker image and went idle before finishing to download it. Request still in queue and no more test credit. I would just like to test one serverless inference on your platform before making a choice to use it or not on production for my app.

Add more worker restriction options?

I'm running a video encode application inside serverless and I've ran into an issue where some workers just don't support the technologies I am using, specifically NVENC and Vulkan in my case. Currently the only way to fix it is by removing the entire region from the allowed data centers, which removes a bunch of workers out of the selection that would have worked perfectly fine. I know this might be a niche use case because most of your customers are doing AI but would it be possible to add mor...

Delay Time spike via public API; same worker next job is ~2s

Hey folks! I’m seeing intermittent high Delay Time on a serverless endpoint and would love a sanity check. Setup: A40 (others enabled), concurrency=1, Auto Scaling = Request Count (4 req/worker), up to 9 workers, Min Workers sometimes >0. Symptom: Via public API, Delay Time jumps to 1–2 min. Same worker then handles the next request with ~2s delay. Execution Time goes from ~1m8s (first) to ~27s (next). Logs during slow runs look like cold start: Questions:...

Delay time of 120,000 ms?

Runpod is advertising <250ms cold start times. I am running a custom ASR model that isnt more than a couple of gigabytes. Total docker image is 11gb. For some reason, the delay time is inifinite and the request never goes through. Any ideas?...

Terraform provider or alternative deployment strategy?

Currently I release new versions of my infrastructure as part of a terraform apply in my CI/CD. Does Runpod offer a Terraform provider or an API which I can POST a new version of my Docker image to in order to trigger a new release?

Stuck on "Not Ready" - Skill issue?

Hi all, I'm new to the forum but love RunPod - great product. Currently trying to deploy a production service and require a bit of guidance. I have a custom worker running FastAPI on Port 80 (health port 80) - I did this by letting it default. I can see from the worker logs that the FastAPI does boot. I can even see /ping being received with a 200 response code. ...

1 hour uploading in serverless

First time that happened to me and I have already hundreds of pushes in the serverless, pls check
No description

Regarding the new serverless loadbalancing endpoint

I try it with my existing fastapi application and found the worker is unable to go back from running -> idle state. My application has asyncio processes running in the background. Is that the reason? Also is the new loadbalancing endpoint support socket-based API?...
Solution:
sockets should not work, i say should not because we use a proxy to forward the request, SSE will work, but doubt sockets will

Serverless worker docker entrypoint override?

Hello, Is there a way to change the entrypoint of docker in serverless worker ?...

Not using all workers

I have ~10 requests pending and its only using one worker
No description

Is Vulkan supported?

Is vulkan supposed to be supported on the serverless endpoints? I don't know if I just set it up incorrectly or what, but running vulkaninfo in the container gives me the following error: ``` ERROR at /build/source/vulkaninfo/./vulkaninfo.h:573:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER ERROR: [Loader Message] Code 0 : vkCreateInstance: Found no drivers!...
Solution:
Ok so this was actually a skill issue on my part. I needed to add this environment variable to let vulkan-loader know where the nvidia driver ICD is and now it works: VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json...

What compiler flags to use?

hi, I am looking to run a native CPU and GPU bound application inside serverless and I am wondering what kind of CPU I should be targeting when compiling the executable. Right now I am building for x86-64 baseline which supports all Intel/AMD CPUs from like 2004. I'm sure runpod doesn't use CPUs this old so I could target a newer instruction set for better performance, the question is which one? Are there are any guarantees for what kind of CPUs runpod will give me? What instruction set would be...
Solution:
Cool question, I don't know a lot about compilers but I do have this list of every CPU that runs our serverless fleet. https://docs.runpod.io/references/cpu-types#serverless-cpu-types...

How to get rid of this HTTPSConnectionPool. I get this after every few requests

``` Aug 22 15:12:04 voiceclone gunicorn[356]: Bad Request: /api/audio_to_audio/ Aug 22 15:12:04 voiceclone gunicorn[356]: Unexpected error: HTTPSConnectionPool(host='api.runpod.ai', port=443): Max retries exceeded with url: /v2/vn2o4vgw0aes0k/runsync (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x73558ad9f7f0>, 'Connection to api.runpod.ai timed out. (connect timeout=3600)')) Aug 22 15:12:04 voiceclone gunicorn[356]: - - [22/Aug/2025:10:12:04 +0000] "POST /api/audio_to_audio/ HTTP/1.0" 400 0 "-" "okhttp/5.1.0"...

Recipe for Llama 4 Scout on vLLM

I am trying to follow this Llama 4 recipe from vLLM and deploy it on Runpod Serveless. Even using 2 x H100 or a B200, I could not deploy the LLM. Has someone managed to deploy it?...

Multi-nodes Serverless Endpoint

Can I create a serverless deployment for two or more nodes? For example DeepSeek R1 671b needs at least two nodes. Thanks....

Serverless

Hi team 👋, I ran into an issue with unexpected billing (around $400) on my serverless vLLM endpoint while it was idle. Support explained it was caused by a CUDA 12.9 misconfiguration in my endpoint settings. They kindly applied a $100 credit 🙏, but I’d like to make sure I configure things correctly moving forward. ...
No description

Network connectivity issues on EUR–IS nodes (works fine on EUR–RO)

Hi Runpod team, I’m running the same workload across different regions, and I’ve observed consistent connectivity problems when using the EUR–IS nodes, while everything works perfectly fine on EUR–RO nodes. Symptoms: My service makes HTTPS calls to https://freesound.org/apiv2/.......