Urgent: Issue with Runpod vllm Serverless Endpoint

We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing:

response = client.completions.create(
    model="llama3-dumm/llm",
    prompt=["hello? How are you "],
    temperature=0.8,
    max_tokens=600,
)

response = client.completions.create(
    model="llama3-dumm/llm",
    prompt=["hello? How are you "],
    temperature=0.8,
    max_tokens=600,
)

But the below is working :

response = client.chat.completions.create(
    model="llama3-dumm/llm",
    messages=[{'role': 'user', 'content': "hell0"}],
    max_tokens=100,
    temperature=0.9,
)

response = client.chat.completions.create(
    model="llama3-dumm/llm",
    messages=[{'role': 'user', 'content': "hell0"}],
    max_tokens=100,
    temperature=0.9,
)

And This is the client object :
client = OpenAI(

api_key=api_key,
base_url=f"https://api.runpod.ai/v2/endpoint_id/openai/v1",
)

This behavior is unusual and suggests there might be a bug. Given our tight deadline, could you please investigate this issue as soon as possible? Your prompt assistance would be greatly appreciated.

Thank you very much for your help.

Runpod•2y ago•

21 replies

naaviii

Urgent: Issue with Runpod vllm Serverless Endpoint

We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing:

response = client.completions.create(
    model="llama3-dumm/llm",
    prompt=["hello? How are you "],
    temperature=0.8,
    max_tokens=600,
)

response = client.completions.create(
    model="llama3-dumm/llm",
    prompt=["hello? How are you "],
    temperature=0.8,
    max_tokens=600,
)

But the below is working :

response = client.chat.completions.create(
    model="llama3-dumm/llm",
    messages=[{'role': 'user', 'content': "hell0"}],
    max_tokens=100,
    temperature=0.9,
)

response = client.chat.completions.create(
    model="llama3-dumm/llm",
    messages=[{'role': 'user', 'content': "hell0"}],
    max_tokens=100,
    temperature=0.9,
)

And This is the client object :
client = OpenAI(

api_key=api_key,
base_url=f"https://api.runpod.ai/v2/endpoint_id/openai/v1",
)

Urgent: Issue with Runpod vllm Serverless Endpoint

Similar Threads

Urgent: Issue with Runpod vllm Serverless Endpoint

Similar Threads

Similar Threads

Similar Threads