Why is there a 3–4s delay in API responses due to RunPod’s reverse proxy?

Hi, We deployed a GPU cloud service on RunPod that exposes an API endpoint. The API generates image results, and while the model inference on the server side is quite fast, we’ve noticed that the actual response time observed from the frontend is significantly longer. Specifically, the time difference between when the server finishes computing the image and when the frontend actually receives the result is around 3–4 seconds slower. After investigating, we confirmed that the extra latency is not caused by our code or model, but rather seems to come from RunPod’s reverse proxy service. Could you explain why the reverse proxy introduces such a large delay, and whether there are any options to reduce or bypass this latency?
6 Replies
トトロ1号
トトロ1号OP4w ago
Any one here?
flash-singh
flash-singh4w ago
im assuming this is what serverless? is your input or output have a big payload?
トトロ1号
トトロ1号OP4w ago
not serverless, not big payload. We test that a customized API just return an empty response will also cost 2-3s to get result by frontend.
flash-singh
flash-singh4w ago
if you can open a ticket with a pod id so support team can investigate
riverfog7
riverfog74w ago
Can be the HTTP proxy you can try with TCP ports
トトロ1号
トトロ1号OP4w ago
Pod id is 6gli2n3vkuyo0m. Maybe, but we can't change that

Did you find this page helpful?