Unable to connect to a serverless load balancing workers

I'm running a serverless load balancing endpoint for my Fast API server, although when I send a request to the endpoint I get 400 response after over two minutes. Moreover, HTTPS serivces marked unready and web terminal is not starting. I have set PORT in env variables with the same value my server running on. I cannot see errors anywhere. How can I fix that?
46 Replies
emilwallner
emilwallner3d ago
I'm also figuring it out 🙂 1) Make sure the Docker credetinals are set, 2) Set minium workers to 1, and 3) Then you'll start seeing the API logs when you click on the server instance that's active Also, set the health endpoint to /ping, and set the PORT and HEALTH_PORT, to the port you are running the server on
بطرفلاي
بطرفلايOP3d ago
I'm using a private registery, and I set the credentials. Running workers logs shows that the server is running, nothing looks odd. I can't find minimum workers option, but I set maximum workers to 3 and active workers to 1 (Sometimes I see all workers idle though). I'm tried to set PORT and HEALTH_PORT to 8000, but nothing happened. Now, PORT = 8000, HEALTH_PORT = 80
No description
emilwallner
emilwallner3d ago
And EXPOSE 8000 is set in the dockerfile?
بطرفلاي
بطرفلايOP3d ago
yup
No description
emilwallner
emilwallner3d ago
HEALTH_PORT should probably be 8000, but I'm also trying to figure it out If you figure it out, please let me know!
بطرفلاي
بطرفلايOP3d ago
Sure
emilwallner
emilwallner3d ago
For me it's also not clear if a standard call is routed to :80, of if you have to add :8000 at the end of the url
بطرفلاي
بطرفلايOP3d ago
tried both ways. Nothing changed.
بطرفلاي
بطرفلايOP3d ago
I'm wondering why the service is marked unready?! And, why web terminal is not starting?
No description
emilwallner
emilwallner3d ago
Interesting, did you set HEALTH_PORT to 8000? Ah, saw you tried it
بطرفلاي
بطرفلايOP2d ago
Docs
No description
emilwallner
emilwallner2d ago
Ah, I missed this, weird to have different ports
Neutrino Resonance
Are we expected to actually do something with the PORT_HEALTH environment variable in our code? The example code at https://github.com/runpod-workers/worker-load-balancing/ doesn't do anything with it
GitHub
GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...
A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing
Neutrino Resonance
I'm running into the same issue, and I'm not even seeing requests getting queued I made an API key specifically for this endpoint (read/write permissions) as well
emilwallner
emilwallner2d ago
@flash-singh is the port issues a typo?
Neutrino Resonance
And yes, I have deployed that code and am currently working with it
emilwallner
emilwallner2d ago
Have you set workers to 1?
Neutrino Resonance
...and that's all I have in the worker console. I can't even get a buton to launch a web shell
No description
No description
Neutrino Resonance
I've set max workers to 1
emilwallner
emilwallner2d ago
Set active to 1 That will start the docker
Neutrino Resonance
Ok, just did it - now I do get
No description
Neutrino Resonance
Can't hit anything on it though
runpod-load-balance % curl -i -k https://REMOVED.api.runpod.ai/stats -H 'Authorization: Bearer: rpa_REMOVED'
HTTP/2 401
date: Wed, 01 Oct 2025 10:28:04 GMT
content-length: 0
cf-ray: 987b45667991b18d-MIA
cf-cache-status: DYNAMIC
set-cookie: __cflb=REMOVED; SameSite=None; Secure; path=/; expires=Wed, 01-Oct-25 10:58:04 GMT; HttpOnly
server: cloudflare
runpod-load-balance % curl -i -k https://REMOVED.api.runpod.ai/stats -H 'Authorization: Bearer: rpa_REMOVED'
HTTP/2 401
date: Wed, 01 Oct 2025 10:28:04 GMT
content-length: 0
cf-ray: 987b45667991b18d-MIA
cf-cache-status: DYNAMIC
set-cookie: __cflb=REMOVED; SameSite=None; Secure; path=/; expires=Wed, 01-Oct-25 10:58:04 GMT; HttpOnly
server: cloudflare
Neutrino Resonance
Here's another weird signal: both services are marked as "not ready" even when Uvicorn said it was brought online
No description
Neutrino Resonance
I enabled the web shell and get 403'd on that
emilwallner
emilwallner2d ago
The example is a bit wierd, it doesn't expose any ports in the docker contrainer, and it's not clear how the port 5001 is routed
Neutrino Resonance
"Request History" has zero requests in it, no global logs, no request volume
No description
No description
No description
Neutrino Resonance
Yes, it is weird. The service I'm developing that isn't a test service actually spawns another HTTP server using Python's multiprocessing on that port in particular .... still nothing. The port exposures from the Dockerfile shouldn't matter if you manually set them in the endpoint config, correct? Surely they don't need us to mark them twice?
emilwallner
emilwallner2d ago
Yeah, in the example they only expose it on the Runpod side. Also noticed that the test examples include RUNPOD_API_KEY curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}'
Neutrino Resonance
^^ did that actually work for you? Yes, I did create one as described above
emilwallner
emilwallner2d ago
I noticed you have rpa_REMOVED
Neutrino Resonance
I just used your exact command line and I get 401'd Yes, it isn't a good practice to post credentials into a public form
emilwallner
emilwallner2d ago
lol, true
Neutrino Resonance
So I edited the output / command after running it I'm starting to think I need to find another cloud provider This is getting ridiculous - I have spent days on this and my time at this point is precious
emilwallner
emilwallner2d ago
yeah, it seems like the loadbalancing endpoints are a bit wanky I've got to the stage of having the server up, but unable to make a test call to the HTTPS endpoint
Neutrino Resonance
Yep, then we're in the same boat. Waste of like 16 hours for me.
emilwallner
emilwallner2d ago
yeah, although it's been smoother than aws so far 😅
Neutrino Resonance
Ehhhh AWS is archaic but I haven't seen it behave like this.... it at least is doing what it says it is. What screws up everyone is their VPC system Does support actually read any of these?
emilwallner
emilwallner2d ago
I've tried the test deployment here: https://github.com/runpod-workers/worker-load-balancing I can't get the https endpoint to work, but if I ssh into the instance, the app is running and works when I call it from inside
GitHub
GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...
A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing
emilwallner
emilwallner2d ago
Also getting the "Not Ready" status Okay, I got it running by setting port = int(os.getenv("PORT", "80")), and leaving everything else default, i.e. not setting PORT, HEALTH_PORT, or exposing any ports Interesting, when I did not configure the port, it automatically set PORT=80, PORT_HEALTH=80, and Expose HTTP Ports=80. This is working. I tried with PORT 5000, it's working too when setting PORT=5000, PORT_HEALTH=5000, and Expose HTTP Ports=5000
emilwallner
emilwallner2d ago
Worth noting, the requests are working, but the status of the server is still "Not Ready"
No description
emilwallner
emilwallner2d ago
No description
بطرفلاي
بطرفلايOP2d ago
what port does your dockerfile expose? @flash-singh can you help us with this problem?
emilwallner
emilwallner2d ago
None DOCKER FROM nvidia/cuda:12.1.0-base-ubuntu22.04 RUN apt-get update -y \ && apt-get install -y python3-pip RUN ldconfig /usr/local/cuda-12.1/compat/ # Install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY app.py . # Start the handler CMD ["python3", "app.py"] FastAPI import os from fastapi import FastAPI, HTTPException from pydantic import BaseModel # Create FastAPI app app = FastAPI() # Define request models class GenerationRequest(BaseModel): prompt: str max_tokens: int = 100 temperature: float = 0.7 class GenerationResponse(BaseModel): generated_text: str # Global variable to track requests request_count = 0 # Health check endpoint; required for Runpod to monitor worker health @app.get("/ping") async def health_check(): return {"status": "healthy"} # Our custom generation endpoint @app.post("/generate", response_model=GenerationResponse) async def generate(request: GenerationRequest): global request_count request_count += 1 # A simple mock implementation; we'll replace this with an actual model later generated_text = f"Response to: {request.prompt} (request #{request_count})" return {"generated_text": generated_text} # A simple endpoint to show request stats @app.get("/stats") async def stats(): return {"total_requests": request_count} # Run the app when the script is executed if __name__ == "__main__": import uvicorn # When you deploy the endpoint, make sure to expose port 5000 # And add it as an environment variable in the Runpod console port = int(os.getenv("PORT", "5000")) # Start the server uvicorn.run(app, host="0.0.0.0", port=port) From the example, it works You also need to set up an API key, and add it in the call: curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}' You expose the port in the Container Start Command > Expose HTTP Port
Neutrino Resonance
I wasn't able to get a web shell but I can SSH in, and the worker can connect back out Inside of a python3 shell on that ssh session:
>>> response = urlopen('http://localhost:80/stats')
>>> response.readlines()
[b'{"total_requests":0}']
>>> response = urlopen('http://localhost:80/stats')
>>> response.readlines()
[b'{"total_requests":0}']
.... what on earth happened? Now it all works! Same API key, same permissions / scope / etc but now it works! I do not understand what happened! What I did was coarsen the policy scope of the API key temporarily - I made it full read/write for API in general, and then after that, it all worked, so then I changed it back to the original endpoint-restricted-read-write scope and it still works! I think something was wrong with KMS??
emilwallner
emilwallner2d ago
lol, glad it's running!

Did you find this page helpful?