"Error decoding stream response" on Completed OpenAI compatible stream requests

Context
I have a custom worker on serverless, I am streaming a response from async OpenAI python client.

Error
When making requests on the OpenAI compatible API endpoint, non-streaming is fine, but stream requests always return with:
- Response code: 200
- Body just text:

"Error decoding stream response"

"Error decoding stream response"

I attached the run status results, which show the expected output and Completed status

Example Command

curl "https://api.runpod.ai/v2/ID/openai/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer rpa_xxx" \
    -d '{
        "model": "meow",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Explain why cats are cute"
            }
        ],
        "stream": true
    }'

curl "https://api.runpod.ai/v2/ID/openai/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer rpa_xxx" \
    -d '{
        "model": "meow",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Explain why cats are cute"
            }
        ],
        "stream": true
    }'

Response data

HTTP/1.1 200 OK
CF-RAY: 90cfa0bc08aafa05-SJC
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/event-stream
Date: Wed, 05 Feb 2025 02:56:32 GMT
Server: cloudflare
Set-Cookie: xxxx
Transfer-Encoding: chunked
cf-cache-status: DYNAMIC

Error decoding stream response

HTTP/1.1 200 OK
CF-RAY: 90cfa0bc08aafa05-SJC
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/event-stream
Date: Wed, 05 Feb 2025 02:56:32 GMT
Server: cloudflare
Set-Cookie: xxxx
Transfer-Encoding: chunked
cf-cache-status: DYNAMIC

Error decoding stream response

Relevant Code
Here's a snippet of the handler

async def async_handler(event):
  job_input = JobInput(**event["input"])
  openai_input = job_input.openai_input
  
  response = await self.openai_client.chat.completions.create(**openai_input)
  
  if "stream" in openai_input and openai_input["stream"] == True:
      async for chunk in response:
          # Only contain JSON serializable types
          yield chunk.to_dict(mode="json")
  else:
      print(f"Response: {response}")
      yield response.to_dict(mode="json")

runpod.serverless.start(
    {
        "handler": async_handler,
        "return_aggregate_stream": True,
    }
)

async def async_handler(event):
  job_input = JobInput(**event["input"])
  openai_input = job_input.openai_input
  
  response = await self.openai_client.chat.completions.create(**openai_input)
  
  if "stream" in openai_input and openai_input["stream"] == True:
      async for chunk in response:
          # Only contain JSON serializable types
          yield chunk.to_dict(mode="json")
  else:
      print(f"Response: {response}")
      yield response.to_dict(mode="json")

runpod.serverless.start(
    {
        "handler": async_handler,
        "return_aggregate_stream": True,
    }
)

I cannot reproduce the error when running the pod locally and making requests to

/runsync

/runsync

with the matching input data, any insight would be helpful, not sure if there's an additional layer of decoding or deserializing in the API that isn't happy with the streaming responses

curl -X POST http://localhost:8000/runsync \
  -H "Content-Type: application/json" \
  -d '{
   "input":{
      "openai_input":{
         "messages":[
            {
               "content":"You are a helpful assistant.",
               "role":"system"
            },
            {
               "content":"hello",
               "role":"user"
            }
         ],
         "model":"meow",
         "stream": true
      },
      "openai_route":"/v1/chat/completions"
   }
}'

curl -X POST http://localhost:8000/runsync \
  -H "Content-Type: application/json" \
  -d '{
   "input":{
      "openai_input":{
         "messages":[
            {
               "content":"You are a helpful assistant.",
               "role":"system"
            },
            {
               "content":"hello",
               "role":"user"
            }
         ],
         "model":"meow",
         "stream": true
      },
      "openai_route":"/v1/chat/completions"
   }
}'

example_request.json5.3KB

Runpod•14mo ago•

2 replies

tzushi

"Error decoding stream response" on Completed OpenAI compatible stream requests

"Error decoding stream response"

"Error decoding stream response"

I attached the run status results, which show the expected output and Completed status

Example Command

curl "https://api.runpod.ai/v2/ID/openai/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer rpa_xxx" \
    -d '{
        "model": "meow",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Explain why cats are cute"
            }
        ],
        "stream": true
    }'

curl "https://api.runpod.ai/v2/ID/openai/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer rpa_xxx" \
    -d '{
        "model": "meow",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Explain why cats are cute"
            }
        ],
        "stream": true
    }'

Response data

HTTP/1.1 200 OK
CF-RAY: 90cfa0bc08aafa05-SJC
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/event-stream
Date: Wed, 05 Feb 2025 02:56:32 GMT
Server: cloudflare
Set-Cookie: xxxx
Transfer-Encoding: chunked
cf-cache-status: DYNAMIC

Error decoding stream response

HTTP/1.1 200 OK
CF-RAY: 90cfa0bc08aafa05-SJC
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/event-stream
Date: Wed, 05 Feb 2025 02:56:32 GMT
Server: cloudflare
Set-Cookie: xxxx
Transfer-Encoding: chunked
cf-cache-status: DYNAMIC

Error decoding stream response

Relevant Code
Here's a snippet of the handler

async def async_handler(event):
  job_input = JobInput(**event["input"])
  openai_input = job_input.openai_input
  
  response = await self.openai_client.chat.completions.create(**openai_input)
  
  if "stream" in openai_input and openai_input["stream"] == True:
      async for chunk in response:
          # Only contain JSON serializable types
          yield chunk.to_dict(mode="json")
  else:
      print(f"Response: {response}")
      yield response.to_dict(mode="json")

runpod.serverless.start(
    {
        "handler": async_handler,
        "return_aggregate_stream": True,
    }
)

async def async_handler(event):
  job_input = JobInput(**event["input"])
  openai_input = job_input.openai_input
  
  response = await self.openai_client.chat.completions.create(**openai_input)
  
  if "stream" in openai_input and openai_input["stream"] == True:
      async for chunk in response:
          # Only contain JSON serializable types
          yield chunk.to_dict(mode="json")
  else:
      print(f"Response: {response}")
      yield response.to_dict(mode="json")

runpod.serverless.start(
    {
        "handler": async_handler,
        "return_aggregate_stream": True,
    }
)

I cannot reproduce the error when running the pod locally and making requests to

/runsync

/runsync

with the matching input data, any insight would be helpful, not sure if there's an additional layer of decoding or deserializing in the API that isn't happy with the streaming responses

curl -X POST http://localhost:8000/runsync \
  -H "Content-Type: application/json" \
  -d '{
   "input":{
      "openai_input":{
         "messages":[
            {
               "content":"You are a helpful assistant.",
               "role":"system"
            },
            {
               "content":"hello",
               "role":"user"
            }
         ],
         "model":"meow",
         "stream": true
      },
      "openai_route":"/v1/chat/completions"
   }
}'

curl -X POST http://localhost:8000/runsync \
  -H "Content-Type: application/json" \
  -d '{
   "input":{
      "openai_input":{
         "messages":[
            {
               "content":"You are a helpful assistant.",
               "role":"system"
            },
            {
               "content":"hello",
               "role":"user"
            }
         ],
         "model":"meow",
         "stream": true
      },
      "openai_route":"/v1/chat/completions"
   }
}'

example_request.json5.3KB

"Error decoding stream response" on Completed OpenAI compatible stream requests

"Error decoding stream response" on Completed OpenAI compatible stream requests

Similar Threads

Similar Threads

Similar Threads