R
Runpod6d ago
esp.py

400 error on Load balancing endpoint

Hello, first time here I am using llama.cpp server image to host a model via the load balancer serveless endpoint. The worker is running and I can check the log an see that my server is running. But when I am trying to hit the endpoint it returning 400 error. Here is how I am making the request.
headers = {
"Content-Type": "application/json",
}
data = {
"prompt": [
{"role": "system", "content": ""},
# just try to limit the characters
{"role": "user", "content": "who are you? I am trying to connect to you"},
],
"n_predict": 512,
"temperature": 0.3,
"top_k": 40,
"top_p": 0.90,
"stopped_eos": True,
"repeat_penalty": 1.05,
"stop": [
"assistant",
"<|im_end|>",
],
"seed": 42,
}
headers = {
"Content-Type": "application/json",
'Authorization': 'Bearer ' + RUNPOD_API_KEY,
}

BASE_URL = 'https://id.api.runpod.ai/completion'

response = requests.post(
f"{BASE_URL}",
headers=headers,
json=data,
timeout=3000,
)
headers = {
"Content-Type": "application/json",
}
data = {
"prompt": [
{"role": "system", "content": ""},
# just try to limit the characters
{"role": "user", "content": "who are you? I am trying to connect to you"},
],
"n_predict": 512,
"temperature": 0.3,
"top_k": 40,
"top_p": 0.90,
"stopped_eos": True,
"repeat_penalty": 1.05,
"stop": [
"assistant",
"<|im_end|>",
],
"seed": 42,
}
headers = {
"Content-Type": "application/json",
'Authorization': 'Bearer ' + RUNPOD_API_KEY,
}

BASE_URL = 'https://id.api.runpod.ai/completion'

response = requests.post(
f"{BASE_URL}",
headers=headers,
json=data,
timeout=3000,
)
This request is taking 2 minutes and then return a 400 error. For more context, I am running the following image: ghcr.io/ggerganov/llama.cpp:server
2 Replies
Unknown User
Unknown User5d ago
Message Not Public
Sign In & Join Server To View
Poddy
Poddy5d ago
@esp.py
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #26369

Did you find this page helpful?