Serverless GPU is unstable

Hi team,
We are currently using serverless to host our inference model, but we've observed that GPU performance is highly unstable — the same task can take anywhere from 3ms to 100ms. In contrast, performance is very stable on a reserved pod, consistently ranging from 3ms to 5ms.
We’re wondering if RunPod’s serverless is sharing a single GPU across multiple users' jobs. If that’s the case, please let us know so we can make an informed decision about whether to continue using serverless or switch to a reserved pod.

Thank you!
Was this page helpful?