R
Runpod11mo ago
vlad000ss

Custom vLLM OpenAI compatible API

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?
11 Replies
vlad000ss
vlad000ssOP11mo ago
OpenAI compatibility | RunPod Documentation
Discover the vLLM Worker, a cloud-based AI model that integrates with OpenAI's API for seamless interaction. With its streaming and non-streaming capabilities, it's ideal for chatbots, conversational AI, and natural language processing applications.
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
vlad000ss
vlad000ssOP11mo ago
It didn't work... The problem is that when I send the request to /openai/v1 the enpoint is invoked but the request is not processed, I guess because my vllm process is listening to just /v1 endpoint, didn't you have such problem? I'm using my custom vllm image, not the runpod one
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Sagor Sarker
Sagor Sarker11mo ago
I am having the same issue. I have a prepare a docker container with custom vLLM serving there. Created a template with that docker using docker hub. In serverless machine got created and I can use the endpoint using localhost:port but from outside I can't access the server. It got stuck. Maybe the it can't make connection using the above openai script. Anyone have any clue?
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Sagor Sarker
Sagor Sarker11mo ago
Hi @nerdylive I have exposed 8000 port as TCP port as the server is running in this port. 1. I am trying to access it both in "request" method exist in the servereless. It's infinitely in queue. 2. I tried programmtically like below:
curl --request POST \
--url https://api.runpod.ai/v2/my_endpoint_id/runsync \
--header "accept: application/json" \
--header "authorization: my_runpod_api_key" \
--header "content-type: application/json" \
--data '
{
"input": {
"prompt": "What is the weather in Dhaka?"
}
}
'
curl --request POST \
--url https://api.runpod.ai/v2/my_endpoint_id/runsync \
--header "accept: application/json" \
--header "authorization: my_runpod_api_key" \
--header "content-type: application/json" \
--data '
{
"input": {
"prompt": "What is the weather in Dhaka?"
}
}
'
3. Tried openai compatibility as I served using vLLM serve command
import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1",
api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user",
"content": "আজকে ঢাকার আবহাওয়া কেমন?"} ],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)

print(response)
import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1",
api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user",
"content": "আজকে ঢাকার আবহাওয়া কেমন?"} ],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)

print(response)
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Sagor Sarker
Sagor Sarker11mo ago
Hi, there no error showing. It stays in queue and I can't see any process log in the worker machine. I didn't specify the port. How can I do that? I tried with runpod proxy method, then it was working for a single worker machine.
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
Sagor Sarker
Sagor Sarker11mo ago
One of the main problem in runpod vLLM based docker is it's not working for tool calling. That is the reason I move for the custom docker builder using vLLM serve method. You are right, maybe I directly serve as "vllm serve ................ " in the docker starting point might be not compatible with runpod. I will try to follow your suggestions. Thank you. I don't know why runpod vLLM is not solving the tool-calling issue yet. But it's an essential need.

Did you find this page helpful?