ive been watching the ai gateway, and
ive been watching the ai gateway, and most errors are either capacity or something like
AiError: AiError: unexpected tokens remaining in message header: [\"……………...]???…….……...…..\", \"………...…ż…………..………………….………………………..\", \"...??…\",
.... and it goes on for a while until it looks like some of the strings are words from thoughts?4 Replies
Huh, I haven't seen that before. Thanks for raising!
My findings so far:
- I'm not seeing other accounts impacted by this, are you able to share your prompts with me so I can try and reproduce it?
- There's this related vLLM issue that might be the underlying cause: https://github.com/vllm-project/vllm/issues/23567
GitHub
[Bug]: openai_harmony.HarmonyError: unexpected tokens remaining in ...
Your current environment I keep hitting this error “ openai_harmony.HarmonyError: unexpected tokens remaining in message header” In multi-turn conversations when using gpt-oss-120b and with both vl...
I can't even reproduce it reliably. I do retries on errors, and often this error will appear for the first few requests, then it will randomly succeed with nothing changing
I should note that this is mainly just used for fun in a discord server, so mainly just an annoyance for me and the people in the server that use it.
the context is usually pretty long, since it sends at least the past 100 messages from the channel (more if the context window allows it and the bot has been running long enough) so im thinking that might have something to do with it? I use gpt-oss 20b on cf for another usecase that doesnt have as long context and that hasnt failed once as far as ive seen
yeah i just checked that other usecase. in 203 requests, only 2 errored and they were just the capacity error
with that long context, i should note it is well under the 128k token context window shown on cf's docs for the model. (the biggest appear to be around 12k tokens)
I should note that it looks like the
unexpected tokens remaining in message header
also happens with the 120b model. Possibly less often though? hard to tell if its less often because that model is called less often though
if its helpful, i might be able to send the full request (in dms) that the error happened onif its helpful, i might be able to send the full request (in dms) that the error happened onYes please!