Severe performance disparity on RunPod serverless (5090 GPUs)
I’ve deployed workflows on RunPod serverless with 5090 GPUs, and the performance differences I’m seeing are concerning.
Same endpoint, same model, same operation — yet the results vary a lot:
Sometimes the workflow finishes in around 44 seconds
Other times it takes over 3 minutes
That’s more than 3x slower for the exact same task.
The main bottleneck seems to be model loading. On some cards it loads in just a few seconds, while on others it takes much longer.
This kind of inconsistency makes it difficult to rely on serverless for predictable performance. Running on the same hardware should not feel like a lottery...
Same endpoint, same model, same operation — yet the results vary a lot:
Sometimes the workflow finishes in around 44 seconds
Other times it takes over 3 minutes
That’s more than 3x slower for the exact same task.
The main bottleneck seems to be model loading. On some cards it loads in just a few seconds, while on others it takes much longer.
This kind of inconsistency makes it difficult to rely on serverless for predictable performance. Running on the same hardware should not feel like a lottery...
The task failed because I did set a timeout that should NEVER be hit.
