Custom vLLM OpenAI compatible API

vlad000ss · 2024-12-01T19:21:04.095Z

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?

V

vlad000ssOP•12/1/24, 7:25 PM

https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#initialize-your-project

For anyone who will face the same issue as I did

OpenAI compatibility | RunPod Documentation

Discover the vLLM Worker, a cloud-based AI model that integrates with OpenAI's API for seamless interaction. With its streaming and non-streaming capabilities, it's ideal for chatbots, conversational AI, and natural language processing applications.

J

Jason•12/2/24, 3:58 AM

oh

J

Jason•12/2/24, 3:58 AM

you use the openai package

J

Jason•12/2/24, 3:58 AM

ssearch "openai package pip" in google

J

Jason•12/2/24, 3:58 AM

its for python

J

Jason•12/2/24, 3:59 AM

but then you set it like this:

J

Jason•12/2/24, 3:59 AM

fill in the RUNPOD_ENDPOINT_ID variable in python with your endpoint id. it is the random characters on top left of your endpoint

V

vlad000ssOP•12/2/24, 10:05 PM

It didn't work... The problem is that when I send the request to /openai/v1 the enpoint is invoked but the request is not processed, I guess because my vllm process is listening to just /v1 endpoint, didn't you have such problem? I'm using my custom vllm image, not the runpod one

J

Jason•12/3/24, 12:26 AM

Oh never tried custom image.. But staff said that it would be in the same url formar

J

Jason•12/3/24, 12:26 AM

Format

JJason fill in the RUNPOD_ENDPOINT_ID variable in python with your endpoint id. it is ...

J

Jason•12/3/24, 12:27 AM

They proxy the url, so you should use their format like this one, just replace the runpod endpoint ID with yours

S

Sagor Sarker•12/20/24, 4:02 AM

I am having the same issue. I have a prepare a docker container with custom vLLM serving there. Created a template with that docker using docker hub. In serverless machine got created and I can use the endpoint using localhost:port but from outside I can't access the server. It got stuck. Maybe the it can't make connection using the above openai script. Anyone have any clue?

SSagor Sarker I am having the same issue. I have a prepare a docker container with custom vLLM...

J

Jason•12/20/24, 11:30 AM

How did you access it, did you expose any ports?

S

Sagor Sarker•12/20/24, 11:38 AM

Hi @nerdylive
I have exposed 8000 port as TCP port as the server is running in this port.

I am trying to access it both in "request" method exist in the servereless. It's infinitely in queue.
I tried programmtically like below:```bashcurl --request POST \ --url https://api.runpod.ai/v2/my_endpoint_id/runsync \ --header "accept: application/json" \ --header "authorization: my_runpod_api_key" \ --header "content-type: application/json" \ --data '{"input": { "prompt": "What is the weather in Dhaka?"}}'```
Tried openai compatibility as I served using vLLM serve command

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

J

Jason•12/20/24, 11:48 AM

So what's the error, can is ee some logs

J

Jason•12/20/24, 11:50 AM

Or anything about the error

J

Jason•12/20/24, 11:50 AM

Ah and how did you access the ports for external?

S

Sagor Sarker•12/20/24, 11:54 AM

Hi,
there no error showing. It stays in queue and I can't see any process log in the worker machine. I didn't specify the port. How can I do that?
I tried with runpod proxy method, then it was working for a single worker machine.

J

Jason•12/20/24, 12:15 PM

Did you use any starting ponnt

J

Jason•12/20/24, 12:15 PM

Point?

J

Jason•12/20/24, 12:15 PM

I'd suggest looking at vllm-worker in github and start there if you want to customize

J

Jason•12/20/24, 12:17 PM

Then you can use the openai api, unless you broke it somehow, then try to undo things until it's working back

S

Sagor Sarker•12/20/24, 12:20 PM

One of the main problem in runpod vLLM based docker is it's not working for tool calling.
That is the reason I move for the custom docker builder using vLLM serve method.
You are right, maybe I directly serve as "vllm serve ................ " in the docker starting point might be not compatible with runpod.
I will try to follow your suggestions.
Thank you.

S

Sagor Sarker•12/20/24, 12:22 PM

I don't know why runpod vLLM is not solving the tool-calling issue yet. But it's an essential need.

Custom vLLM OpenAI compatible API

Similar Threads

Similar Threads

Similar Threads