drazenz Posts - Answer Overflow

drazenz

•Created by drazenz on 4/7/2025 in #⚡｜serverless

Serverless SGLang spent credits on phantom requests

I deployed a serverless endpoint (id ua6ui6kfksdocn). I tried sending a sample request from the web dash, that one still seems to be in the queue, 20 hours later. However, looking a the logs, there are lots of requests like this:

2025-04-07  06:36:03.788 | info | b6nvs9f9o5kthk | [2025-04-07 04:36:03] INFO:     127.0.0.1:34746 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:58.778 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:58] INFO:     127.0.0.1:40110 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:53.769 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:53] INFO:     127.0.0.1:40094 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:48.758 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:48] INFO:     127.0.0.1:37338 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:43.748 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:43] INFO:     127.0.0.1:37322 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:38.741 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:38] INFO:     127.0.0.1:36914

2025-04-07  06:36:03.788 | info | b6nvs9f9o5kthk | [2025-04-07 04:36:03] INFO:     127.0.0.1:34746 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:58.778 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:58] INFO:     127.0.0.1:40110 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:53.769 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:53] INFO:     127.0.0.1:40094 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:48.758 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:48] INFO:     127.0.0.1:37338 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:43.748 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:43] INFO:     127.0.0.1:37322 - "GET /v1/models HTTP/1.1" 401 Unauthorized\n
2025-04-07  06:35:38.741 | info | b6nvs9f9o5kthk | [2025-04-07 04:35:38] INFO:     127.0.0.1:36914

I'm assuming that's what kept the workers alive, spending the credits in vain. I'm assuming the addresses in the log are source addresses of the request - would that be some runpod process trying to get the list of models? Any clue on how to resolve this and prevent it from happening in the future?

4 replies

RRunPod

•Created by drazenz on 2/19/2025 in #⛅｜pods-clusters

Downloading models causes the pod to freeze

Hey, not sure if I'm missing something obvious here. I'm noticing two problems (might have the same cause): 1. I'm trying to download phi-4 14b from HuggingFace. I'm not doing anything out of the ordinary, I just run

hf_model = transformers.AutoModelForCausalLM.from_pretrained(model_id, token=hf_token, torch_dtype=torch.bfloat16).to(device)

hf_model = transformers.AutoModelForCausalLM.from_pretrained(model_id, token=hf_token, torch_dtype=torch.bfloat16).to(device)

The download is going to the volume disk. Somewhere mid downloading 2nd out of 6 .safetensors files, the pod freezes. I lose ssh connection, the RunPod dash shows 100% Memory usage. I can only restart the pod at this point. 2. When I try rsyncing 6gb of files from my local machine to a pod (eg. Llama3.2-3B-Instruct), it uses up the ram and freezes, more often than not. Sometimes it helps to restart the pod, but sometimes the only way is to download the weights instead of rsyncing up. I'm using: - 1xA40, 50GB RAM - runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 image - 20GB container, 40GB volume (with enough free space before starting the download) Thanks!

8 replies

Gaming

Programming