Intermittent 502 Bad Gateway Errors on Serverless Load Balancer Endpoints
Hi RunPod team and community,
I’m experiencing intermittent 502 Bad Gateway errors on my serverless load balancer endpoints. The requests are usually processed normally, and my logs don’t show any clear pattern or error when the 502 occurs. The timeout settings are reasonable and not being hit.
Has anyone else encountered this issue?
* The error appears randomly, even when the endpoint is otherwise healthy.
* There’s no indication in the logs as to why the 502 is triggered.
* Restarting the endpoint sometimes helps, but the issue returns sporadically.
Possible causes I’ve considered:
* Backend worker or pod becoming temporarily unresponsive.
* Temporary network or infrastructure issues.
* Application-level handler failures that don’t show up in logs.
I’ve seen similar advice for pods (e.g., restarting after a "Bad Gateway" error in Stable Diffusion setups), but nothing specific for serverless endpoints [Stable Diffusion OpenPose Blog].
Would appreciate any insights or suggestions from others who have faced this, or from the RunPod team.
Is there a recommended way to debug or mitigate these intermittent 502 errors on serverless endpoints?
Note: I’ve created 3–4 custom load balancer endpoints so far, and based on my observations, this issue occurs for endpoints where worker takes more than about 1.5 minutes to process a request. It happens most of the time, though occasionally it works without any issues with exact same GPU type and RAM configurations(so here we rule out the issues related to OOM).
I can provide more details on request.
Thanks!
3 Replies
I have confirmed it's a problem with Runpod serverless load balancer system by deploying the exact same docker image as a Pod. I used the proxy to call the FastAPI endpoints and it worked as expected every single time.
can you DM me details, ill work with you to see if its a bug or something else, can setup a meeting as well through DMs
normally these things go through support, you can still open a ticket but i've been closely involved with load balancer and we are actively trying to get it to GA stage from current beta
Shared the details over DM for now. I will open a new ticket as soon as my current one is closed.