Why is there a 3–4s delay in API responses due to RunPod’s reverse proxy?
Hi,
We deployed a GPU cloud service on RunPod that exposes an API endpoint. The API generates image results, and while the model inference on the server side is quite fast, we’ve noticed that the actual response time observed from the frontend is significantly longer.
Specifically, the time difference between when the server finishes computing the image and when the frontend actually receives the result is around 3–4 seconds slower. After investigating, we confirmed that the extra latency is not caused by our code or model, but rather seems to come from RunPod’s reverse proxy service.
Could you explain why the reverse proxy introduces such a large delay, and whether there are any options to reduce or bypass this latency?
6 Replies
Any one here?
im assuming this is what serverless?
is your input or output have a big payload?
not serverless, not big payload. We test that a customized API just return an empty response will also cost 2-3s to get result by frontend.
if you can open a ticket with a pod id so support team can investigate
Can be the HTTP proxy
you can try with TCP ports
Pod id is 6gli2n3vkuyo0m.
Maybe, but we can't change that