We are using 2 serverless endpoints on runpod and the "Delay Time" (which I assume measures end to end time) varies drastically between the endpoints. They both use the same hardware (the A5000 option) and one of them has sub-second delay times and the other ~50 seconds up to 180s.
On the slow endpoint, the worst cold start time is reported as 13s, and the execution time is ~2s, which don't add up to the delay time. There are ~50 seconds unnacounted for.
The other endpoint using the same hardware does not observe such drastic delay time.