ive been watching the ai gateway, and

ive been watching the ai gateway, and most errors are either capacity or something like AiError: AiError: unexpected tokens remaining in message header: [\"……………...]???…….……...…..\", \"………...…ż…………..………………….………………………..\", \"...??…\",.... and it goes on for a while until it looks like some of the strings are words from thoughts?
4 Replies
samjs
samjs16h ago
Huh, I haven't seen that before. Thanks for raising!
samjs
samjs16h ago
My findings so far: - I'm not seeing other accounts impacted by this, are you able to share your prompts with me so I can try and reproduce it? - There's this related vLLM issue that might be the underlying cause: https://github.com/vllm-project/vllm/issues/23567
GitHub
[Bug]: openai_harmony.HarmonyError: unexpected tokens remaining in ...
Your current environment I keep hitting this error “ openai_harmony.HarmonyError: unexpected tokens remaining in message header” In multi-turn conversations when using gpt-oss-120b and with both vl...
ajgeiss0702
ajgeiss0702OP12h ago
I can't even reproduce it reliably. I do retries on errors, and often this error will appear for the first few requests, then it will randomly succeed with nothing changing I should note that this is mainly just used for fun in a discord server, so mainly just an annoyance for me and the people in the server that use it. the context is usually pretty long, since it sends at least the past 100 messages from the channel (more if the context window allows it and the bot has been running long enough) so im thinking that might have something to do with it? I use gpt-oss 20b on cf for another usecase that doesnt have as long context and that hasnt failed once as far as ive seen yeah i just checked that other usecase. in 203 requests, only 2 errored and they were just the capacity error with that long context, i should note it is well under the 128k token context window shown on cf's docs for the model. (the biggest appear to be around 12k tokens) I should note that it looks like the unexpected tokens remaining in message header also happens with the 120b model. Possibly less often though? hard to tell if its less often because that model is called less often though if its helpful, i might be able to send the full request (in dms) that the error happened on
samjs
samjs10h ago
if its helpful, i might be able to send the full request (in dms) that the error happened on
Yes please!

Did you find this page helpful?