Runpod•6mo ago

Serverless GPU is unstable

Hi team,
We are currently using serverless to host our inference model, but we've observed that GPU performance is highly unstable — the same task can take anywhere from 3ms to 100ms. In contrast, performance is very stable on a reserved pod, consistently ranging from 3ms to 5ms.
We’re wondering if RunPod’s serverless is sharing a single GPU across multiple users' jobs. If that’s the case, please let us know so we can make an informed decision about whether to continue using serverless or switch to a reserved pod.

Thank you!

Jason•6/28/25, 7:46 AM

when running your workload the gpu that runs your image should be reserved for yourself

Jason•6/28/25, 7:46 AM

is 3 -100 ms is just the inferernce part (without network, queue on runpod) ?

zongheng1619OP•6/28/25, 7:54 AM

yes, it's a very limit scope inference cuda function.

Jason•6/28/25, 8:02 AM

i dont know the explanation why that is, what if you try using active workers, is it still the same range? shouldnt affect anything but just wondering.

do you want to create a ticket?

zongheng1619OP•6/28/25, 8:03 AM

I haven't tried the active worker, is active worker just like reserved pods?

zongheng1619OP•6/28/25, 8:03 AM

yes, I can create one.

zongheng1619OP•6/28/25, 8:03 AM

but this is very random, sometimes the speed is fast enough as a normal 4090 standard

Zzongheng1619 Hi team, We are currently using serverless to host our inference model, but we'v...

PoddyAPP•6/28/25, 8:05 AM

@zongheng1619

Escalated To Zendesk

The thread has been escalated to Zendesk!

Ticket ID: #19577

Jason•6/28/25, 8:05 AM

if you havent created one

Zzongheng1619 I haven't tried the active worker, is active worker just like reserved pods?

Jason•6/28/25, 8:05 AM

serverless is just managed pods by runpod, so active workers is also a pod

Zzongheng1619 but this is very random, sometimes the speed is fast enough as a normal 4090 sta...

Jason•6/28/25, 8:06 AM

i see yeah, i hope you can get an explanation on this

zongheng1619OP•6/28/25, 8:18 AM

hi, I just tried the active worker, it's the same stable as the reserved pods

zongheng1619OP•6/28/25, 8:18 AM

So I am now sure the non-active worker is less efficient as the reserved/active pods or worker

zongheng1619OP•6/28/25, 8:10 PM

false alarm, it can just because midnight, gpu is more stable. now I am using active wroker, the perf is bad

zongheng1619OP•6/28/25, 8:47 PM

ok I have test the pods, it's not stable at the day time either. so the problem is more about why the perf is very stable at night, but not at the day time

Serverless GPU is unstable

Similar Threads

Similar Threads

Similar Threads