Topics

Runpod•3d ago

Unable to connect to a serverless load balancing workers

I'm running a serverless load balancing endpoint for my Fast API server, although when I send a request to the endpoint I get 400 response after over two minutes. Moreover, HTTPS serivces marked unready and web terminal is not starting. I have set PORT in env variables with the same value my server running on. I cannot see errors anywhere. How can I fix that?

46 Replies

emilwallner•3d ago

I'm also figuring it out 🙂 1) Make sure the Docker credetinals are set, 2) Set minium workers to 1, and 3) Then you'll start seeing the API logs when you click on the server instance that's active Also, set the health endpoint to /ping, and set the PORT and HEALTH_PORT, to the port you are running the server on

بطرفلايOP•3d ago

I'm using a private registery, and I set the credentials. Running workers logs shows that the server is running, nothing looks odd. I can't find minimum workers option, but I set maximum workers to 3 and active workers to 1 (Sometimes I see all workers idle though). I'm tried to set PORT and HEALTH_PORT to 8000, but nothing happened. Now, PORT = 8000, HEALTH_PORT = 80

No description

emilwallner•3d ago

And EXPOSE 8000 is set in the dockerfile?

بطرفلايOP•3d ago

yup

No description

emilwallner•3d ago

HEALTH_PORT should probably be 8000, but I'm also trying to figure it out If you figure it out, please let me know!

بطرفلايOP•3d ago

Sure

emilwallner•3d ago

For me it's also not clear if a standard call is routed to :80, of if you have to add :8000 at the end of the url

بطرفلايOP•3d ago

tried both ways. Nothing changed.

بطرفلايOP•3d ago

I'm wondering why the service is marked unready?! And, why web terminal is not starting?

No description

emilwallner•3d ago

Interesting, did you set HEALTH_PORT to 8000? Ah, saw you tried it

بطرفلايOP•2d ago

Docs

No description

emilwallner•2d ago

Ah, I missed this, weird to have different ports

Neutrino Resonance•2d ago

Are we expected to actually do something with the PORT_HEALTH environment variable in our code? The example code at https://github.com/runpod-workers/worker-load-balancing/ doesn't do anything with it

GitHub

GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...

A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing

Neutrino Resonance•2d ago

I'm running into the same issue, and I'm not even seeing requests getting queued I made an API key specifically for this endpoint (read/write permissions) as well

emilwallner•2d ago

@flash-singh is the port issues a typo?

Neutrino Resonance•2d ago

And yes, I have deployed that code and am currently working with it

Neutrino Resonance•2d ago

No description

emilwallner•2d ago

Have you set workers to 1?

Neutrino Resonance•2d ago

...and that's all I have in the worker console. I can't even get a buton to launch a web shell

No description

No description

Neutrino Resonance•2d ago

I've set max workers to 1

emilwallner•2d ago

Set active to 1 That will start the docker

Neutrino Resonance•2d ago

Ok, just did it - now I do get

No description

Neutrino Resonance•2d ago

Can't hit anything on it though

runpod-load-balance % curl -i -k https://REMOVED.api.runpod.ai/stats -H 'Authorization: Bearer: rpa_REMOVED'
HTTP/2 401
date: Wed, 01 Oct 2025 10:28:04 GMT
content-length: 0
cf-ray: 987b45667991b18d-MIA
cf-cache-status: DYNAMIC
set-cookie: __cflb=REMOVED; SameSite=None; Secure; path=/; expires=Wed, 01-Oct-25 10:58:04 GMT; HttpOnly
server: cloudflare

runpod-load-balance % curl -i -k https://REMOVED.api.runpod.ai/stats -H 'Authorization: Bearer: rpa_REMOVED'
HTTP/2 401
date: Wed, 01 Oct 2025 10:28:04 GMT
content-length: 0
cf-ray: 987b45667991b18d-MIA
cf-cache-status: DYNAMIC
set-cookie: __cflb=REMOVED; SameSite=None; Secure; path=/; expires=Wed, 01-Oct-25 10:58:04 GMT; HttpOnly
server: cloudflare

Neutrino Resonance•2d ago

Here's another weird signal: both services are marked as "not ready" even when Uvicorn said it was brought online

No description

Neutrino Resonance•2d ago

I enabled the web shell and get 403'd on that

emilwallner•2d ago

The example is a bit wierd, it doesn't expose any ports in the docker contrainer, and it's not clear how the port 5001 is routed

Neutrino Resonance•2d ago

"Request History" has zero requests in it, no global logs, no request volume

No description

No description

No description

Neutrino Resonance•2d ago

Yes, it is weird. The service I'm developing that isn't a test service actually spawns another HTTP server using Python's multiprocessing on that port in particular .... still nothing. The port exposures from the Dockerfile shouldn't matter if you manually set them in the endpoint config, correct? Surely they don't need us to mark them twice?

emilwallner•2d ago

Yeah, in the example they only expose it on the Runpod side. Also noticed that the test examples include RUNPOD_API_KEY curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}'

Neutrino Resonance•2d ago

^^ did that actually work for you? Yes, I did create one as described above

emilwallner•2d ago

I noticed you have rpa_REMOVED

Neutrino Resonance•2d ago

I just used your exact command line and I get 401'd Yes, it isn't a good practice to post credentials into a public form

emilwallner•2d ago

lol, true

Neutrino Resonance•2d ago

So I edited the output / command after running it I'm starting to think I need to find another cloud provider This is getting ridiculous - I have spent days on this and my time at this point is precious

emilwallner•2d ago

yeah, it seems like the loadbalancing endpoints are a bit wanky I've got to the stage of having the server up, but unable to make a test call to the HTTPS endpoint

Neutrino Resonance•2d ago

Yep, then we're in the same boat. Waste of like 16 hours for me.

emilwallner•2d ago

yeah, although it's been smoother than aws so far 😅

Neutrino Resonance•2d ago

Ehhhh AWS is archaic but I haven't seen it behave like this.... it at least is doing what it says it is. What screws up everyone is their VPC system Does support actually read any of these?

emilwallner•2d ago

I've tried the test deployment here: https://github.com/runpod-workers/worker-load-balancing I can't get the https endpoint to work, but if I ssh into the instance, the app is running and works when I call it from inside

GitHub

GitHub - runpod-workers/worker-load-balancing: A Runpod worker temp...

A Runpod worker template for load balancing Serverless endpoints. - runpod-workers/worker-load-balancing

emilwallner•2d ago

Also getting the "Not Ready" status Okay, I got it running by setting port = int(os.getenv("PORT", "80")), and leaving everything else default, i.e. not setting PORT, HEALTH_PORT, or exposing any ports Interesting, when I did not configure the port, it automatically set PORT=80, PORT_HEALTH=80, and Expose HTTP Ports=80. This is working. I tried with PORT 5000, it's working too when setting PORT=5000, PORT_HEALTH=5000, and Expose HTTP Ports=5000

emilwallner•2d ago

Worth noting, the requests are working, but the status of the server is still "Not Ready"

No description

emilwallner•2d ago

No description

بطرفلايOP•2d ago

what port does your dockerfile expose? @flash-singh can you help us with this problem?

emilwallner•2d ago

None DOCKER

FROM nvidia/cuda:12.1.0-base-ubuntu22.04 

RUN apt-get update -y \
    && apt-get install -y python3-pip

RUN ldconfig /usr/local/cuda-12.1/compat/

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app.py .

# Start the handler
CMD ["python3", "app.py"]

FastAPI

import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

# Create FastAPI app
app = FastAPI()

# Define request models
class GenerationRequest(BaseModel):
    prompt: str
    max_tokens: int = 100
    temperature: float = 0.7

class GenerationResponse(BaseModel):
    generated_text: str

# Global variable to track requests
request_count = 0

# Health check endpoint; required for Runpod to monitor worker health
@app.get("/ping")
async def health_check():
    return {"status": "healthy"}

# Our custom generation endpoint
@app.post("/generate", response_model=GenerationResponse)
async def generate(request: GenerationRequest):
    global request_count
    request_count += 1

    # A simple mock implementation; we'll replace this with an actual model later
    generated_text = f"Response to: {request.prompt} (request #{request_count})"

    return {"generated_text": generated_text}

# A simple endpoint to show request stats
@app.get("/stats")
async def stats():
    return {"total_requests": request_count}

# Run the app when the script is executed
if __name__ == "__main__":
    import uvicorn

    # When you deploy the endpoint, make sure to expose port 5000
    # And add it as an environment variable in the Runpod console
    port = int(os.getenv("PORT", "5000"))

    # Start the server
    uvicorn.run(app, host="0.0.0.0", port=port)

From the example, it works You also need to set up an API key, and add it in the call: curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}' You expose the port in the Container Start Command > Expose HTTP Port

Neutrino Resonance•2d ago

I wasn't able to get a web shell but I can SSH in, and the worker can connect back out Inside of a python3 shell on that ssh session:

>>> response = urlopen('http://localhost:80/stats')
>>> response.readlines()
[b'{"total_requests":0}']

>>> response = urlopen('http://localhost:80/stats')
>>> response.readlines()
[b'{"total_requests":0}']

.... what on earth happened? Now it all works! Same API key, same permissions / scope / etc but now it works! I do not understand what happened! What I did was coarsen the policy scope of the API key temporarily - I made it full read/write for API in general, and then after that, it all worked, so then I changed it back to the original endpoint-restricted-read-write scope and it still works! I think something was wrong with KMS??

emilwallner•2d ago

lol, glad it's running!

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

19KMembers

View on Discord

Did you find this page helpful?