Unable to connect to a serverless load balancing workers
I'm running a serverless load balancing endpoint for my Fast API server, although when I send a request to the endpoint I get 400 response after over two minutes.
Moreover, HTTPS serivces marked unready and web terminal is not starting. I have set PORT in env variables with the same value my server running on. I cannot see errors anywhere. How can I fix that?
46 Replies
I'm also figuring it out 🙂
1) Make sure the Docker credetinals are set, 2) Set minium workers to 1, and 3) Then you'll start seeing the API logs when you click on the server instance that's active
Also, set the health endpoint to /ping, and set the PORT and HEALTH_PORT, to the port you are running the server on
I'm using a private registery, and I set the credentials. Running workers logs shows that the server is running, nothing looks odd.
I can't find minimum workers option, but I set maximum workers to 3 and active workers to 1 (Sometimes I see all workers idle though).
I'm tried to set PORT and HEALTH_PORT to 8000, but nothing happened. Now, PORT = 8000, HEALTH_PORT = 80

And EXPOSE 8000 is set in the dockerfile?
yup

HEALTH_PORT should probably be 8000, but I'm also trying to figure it out
If you figure it out, please let me know!
Sure
For me it's also not clear if a standard call is routed to :80, of if you have to add :8000 at the end of the url
tried both ways. Nothing changed.
I'm wondering why the service is marked unready?! And, why web terminal is not starting?

Interesting, did you set HEALTH_PORT to 8000?
Ah, saw you tried it
Docs

Ah, I missed this, weird to have different ports
Are we expected to actually do something with the PORT_HEALTH environment variable in our code? The example code at https://github.com/runpod-workers/worker-load-balancing/ doesn't do anything with it
GitHub
GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...
A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing
I'm running into the same issue, and I'm not even seeing requests getting queued
I made an API key specifically for this endpoint (read/write permissions) as well
@flash-singh is the port issues a typo?
And yes, I have deployed that code and am currently working with it

Have you set workers to 1?
...and that's all I have in the worker console. I can't even get a buton to launch a web shell


I've set max workers to 1
Set active to 1
That will start the docker
Ok, just did it - now I do get

Can't hit anything on it though
Here's another weird signal: both services are marked as "not ready" even when Uvicorn said it was brought online

I enabled the web shell and get 403'd on that
The example is a bit wierd, it doesn't expose any ports in the docker contrainer, and it's not clear how the port 5001 is routed
"Request History" has zero requests in it, no global logs, no request volume



Yes, it is weird. The service I'm developing that isn't a test service actually spawns another HTTP server using Python's multiprocessing on that port in particular
.... still nothing.
The port exposures from the Dockerfile shouldn't matter if you manually set them in the endpoint config, correct?
Surely they don't need us to mark them twice?
Yeah, in the example they only expose it on the Runpod side.
Also noticed that the test examples include RUNPOD_API_KEY
curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \
-H 'Authorization: Bearer RUNPOD_API_KEY' \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, world!"}'
^^ did that actually work for you?
Yes, I did create one as described above
I noticed you have rpa_REMOVED
I just used your exact command line and I get 401'd
Yes, it isn't a good practice to post credentials into a public form
lol, true
So I edited the output / command after running it
I'm starting to think I need to find another cloud provider
This is getting ridiculous - I have spent days on this and my time at this point is precious
yeah, it seems like the loadbalancing endpoints are a bit wanky
I've got to the stage of having the server up, but unable to make a test call to the HTTPS endpoint
Yep, then we're in the same boat.
Waste of like 16 hours for me.
yeah, although it's been smoother than aws so far 😅
Ehhhh AWS is archaic but I haven't seen it behave like this.... it at least is doing what it says it is. What screws up everyone is their VPC system
Does support actually read any of these?
I've tried the test deployment here: https://github.com/runpod-workers/worker-load-balancing
I can't get the https endpoint to work, but if I ssh into the instance, the app is running and works when I call it from inside
GitHub
GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...
A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing
Also getting the "Not Ready" status
Okay, I got it running by setting port = int(os.getenv("PORT", "80")), and leaving everything else default, i.e. not setting PORT, HEALTH_PORT, or exposing any ports
Interesting, when I did not configure the port, it automatically set PORT=80, PORT_HEALTH=80, and Expose HTTP Ports=80. This is working.
I tried with PORT 5000, it's working too when setting PORT=5000, PORT_HEALTH=5000, and Expose HTTP Ports=5000
Worth noting, the requests are working, but the status of the server is still "Not Ready"


what port does your dockerfile expose?
@flash-singh can you help us with this problem?
None
DOCKER
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
RUN apt-get update -y \
&& apt-get install -y python3-pip
RUN ldconfig /usr/local/cuda-12.1/compat/
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app.py .
# Start the handler
CMD ["python3", "app.py"]
FastAPI
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
# Create FastAPI app
app = FastAPI()
# Define request models
class GenerationRequest(BaseModel):
prompt: str
max_tokens: int = 100
temperature: float = 0.7
class GenerationResponse(BaseModel):
generated_text: str
# Global variable to track requests
request_count = 0
# Health check endpoint; required for Runpod to monitor worker health
@app.get("/ping")
async def health_check():
return {"status": "healthy"}
# Our custom generation endpoint
@app.post("/generate", response_model=GenerationResponse)
async def generate(request: GenerationRequest):
global request_count
request_count += 1
# A simple mock implementation; we'll replace this with an actual model later
generated_text = f"Response to: {request.prompt} (request #{request_count})"
return {"generated_text": generated_text}
# A simple endpoint to show request stats
@app.get("/stats")
async def stats():
return {"total_requests": request_count}
# Run the app when the script is executed
if __name__ == "__main__":
import uvicorn
# When you deploy the endpoint, make sure to expose port 5000
# And add it as an environment variable in the Runpod console
port = int(os.getenv("PORT", "5000"))
# Start the server
uvicorn.run(app, host="0.0.0.0", port=port)
From the example, it works
You also need to set up an API key, and add it in the call:
curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \
-H 'Authorization: Bearer RUNPOD_API_KEY' \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, world!"}'
You expose the port in the Container Start Command > Expose HTTP PortI wasn't able to get a web shell but I can SSH in, and the worker can connect back out
Inside of a python3 shell on that ssh session:
.... what on earth happened? Now it all works!
Same API key, same permissions / scope / etc but now it works! I do not understand what happened!
What I did was coarsen the policy scope of the API key temporarily - I made it full read/write for API in general, and then after that, it all worked, so then I changed it back to the original endpoint-restricted-read-write scope and it still works!
I think something was wrong with KMS??
lol, glad it's running!