@Kathy | Browser Rendering PM Ah fair, it's related to "context caching" at Google / Vertex AI. For

E

Erisa•4/24/25, 9:45 PM

Is it expected or a bug that requesting workers ai openai-compatible api thru ai gateway (specifically with gemma3 in this case) would not log the request and only the response as shown in the screenshot?

IItsWendell @Kathy | Browser Rendering PM Ah fair, it's related to "context caching" at Goog...

K

Kathy•4/24/25, 11:49 PM

thanks for sharing so the ask here is for ai gateway to support tracking the costs of context caching through providers, including google

what does the response look like from google when using conext caching? Curious to see how it splits out input, output, and context caching because that is how we would track tokens to then calculate

EErisa Is it expected or a bug that requesting workers ai openai-compatible api thru ai...

K

Kathy•4/24/25, 11:49 PM

thisssss seems like a bug. could you share your account id + gateway name through DM please

KKathy thisssss seems like a bug. could you share your account id + gateway name throug...

M

mr.niko.la•4/26/25, 10:04 AM

Hi will you share whisper turbo sample code to test it. Could use a repo with examples.

X

xpawlik92x•4/26/25, 5:12 PM

Hey!

I'm currently checking AI Gateway for the company I work at — mainly to track token usage per our customer.

While testing GPT-4.1 with streaming enabled, I noticed that if a client cancels a request mid-stream, there’s no trace of that request in the AI Gateway logs.
Is this the intended behavior, or could this be a bug or a limitation of the current implementation?

A

Azuredush•5/1/25, 10:23 AM

When using Aistudio's OpenAI-compatible endpoint through AI Gateway, a lot of "��" characters appear when the model outputs long content. However, when I use the official endpoint directly, everything works fine. Why is this happening?

AAzuredush When using Aistudio's OpenAI-compatible endpoint through AI Gateway, a lot of "�...

I

Isaac McFadyen•5/1/25, 4:16 PM

That's probabably not long content but because it's a UTF-8 character that can't be rendered on the Cloudflare dashboard because it's missing in the font they use.

R

rob•5/2/25, 2:16 PM

Hi

D

dave•5/6/25, 10:16 PM

Does AI Gateway retry requests automatically?

D

dave•5/6/25, 10:17 PM

I'm seeing multiple requests logged on OpenAI that I don't see logged in AI Gateway, nor did I send intentionally.

�

😈 Donkey 💫•5/7/25, 11:46 AM

Hello !
What if the endpoint is router? like this:

https://router.huggingface.co/novita/v3/openai/chat/completions

https://router.huggingface.co/novita/v3/openai/chat/completions

M

mongj•5/9/25, 7:52 PM

hey we're seeing all of our openai STT requests proxied through CF AI gateway fail with 400

{
  "error": {
    "message": "The audio file could not be decoded or its format is not supported.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

{
  "error": {
    "message": "The audio file could not be decoded or its format is not supported.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

{
  "error": {
    "message": "The audio file could not be decoded or its format is not supported.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

{
  "error": {
    "message": "The audio file could not be decoded or its format is not supported.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Same request work when the original url

https://api.openai.com/v1/audio/transcriptions

https://api.openai.com/v1/audio/transcriptions

is used instead

is anyone else having this problem?

I

ItsWendellOP•5/10/25, 1:42 PM

None of the newer gemini models (2.5 Flash / Pro) and even 2.0 Flash (and 2.0 Flash Lite) properly track input / output tokens. We started sending metadata and pricing information into the AI Gateway, but just realized that tokens are not tracked and we don't have control over that.

Google does provide in both streaming and non-streaming use-cases the used tokens in detail, also visible in the output json of AI Gateway.

Couple of Log IDs: 01JTX5HT6G2VCPXCZQAWJ5X39W, 01JTX5JZS78V9Y2XDE8A4WAZ8F, 01JTVQBX08S1WA6DZ0GVS8CV2K

A

abufoysal2004•5/12/25, 10:10 AM

Yes

Mmongj hey we're seeing all of our openai STT requests proxied through CF AI gateway fa...

K

kyoya•5/16/25, 2:02 AM

Hello! I'm also having this problem. Hopefully CF can give us an answer

M

mr.niko.la•5/16/25, 10:02 AM

Which model is the beta for structured outpu? Lllama 4 is

Mmr.niko.la Which model is the beta for structured outpu? Lllama 4 is 😵‍💫🥴

H

Hygi•5/19/25, 5:31 AM

Hello , i want to ask . So i use the ai gateway its use model llama-4-scout-17b-16e-instruct . So , if i test , i get error like this on my dashboard
{
"errors": [
{
"message": "AiError: AiError: unknown internal error (e06783e9-9892-4065-853d-849bf7b684a9)",
"code": "7000"
}
],
"success": false,
"result": {},
"messages": []
}

why this error ?

A

ac•5/19/25, 5:45 PM

Hi -- are there plans to add more evaluations to the AI gateway any time soon? e.g. the ability to run model graded evals, similar to the evals that OpenAI provides? This would be huge for us.

R

Razmjoo•5/22/25, 9:43 AM

Hi there; are there any plans for Google OpenAI-compatible endpoints? Google AI Studio https://ai.google.dev/gemini-api/docs/openai Vertex AI: https://cloud.google.com/vertex-ai/docs/reference/rest

Google AI for Developers

OpenAI compatibility | Gemini API | Google AI for Developers

Google Cloud

Vertex AI API | Google Cloud

R

Razmjoo•5/22/25, 9:48 AM

I am unsure if this is the best place for a bug report... AI Gateway openai comptable for workers-ai (https://developers.cloudflare.com/ai-gateway/providers/workersai/#openai-compatible-endpoints) isn't working properly when sending an image to llama4 (https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/) while if using AI Gateway + Direct model call (@cf/meta/llama-4-scout-17b-16e-instruct) it will work.

Cloudflare Docs

llama-4-scout-17b-16e-instruct

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Cloudflare Docs

Workers AI

Use AI Gateway for analytics, caching, and security on requests to Workers AI. Workers AI integrates seamlessly with AI Gateway, allowing you to execute AI inference via API requests or through an environment binding for Workers scripts. The binding simplifies the process by routing requests through your AI Gateway with minimal setup.

Y

yinxingmaiming6409•6/1/25, 2:03 PM

Is there a way to obtain the image embedding on CF?

N

Nico•6/4/25, 12:19 AM

I'm trying out the new OpenAI-compatible endpoint and running into a weird difference between using the OpenAI SDK and curl. I've tested the exact code from the docs for both methods and curl works but the typescript example does not.

Here's the typescript:

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.GEMINI_API_KEY,
    baseURL: "<my gateway url>"
});

const response = await client.chat.completions.create({
    model: "google-ai-studio/gemini-2.0-flash",
    messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0]?.message.content);

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.GEMINI_API_KEY,
    baseURL: "<my gateway url>"
});

const response = await client.chat.completions.create({
    model: "google-ai-studio/gemini-2.0-flash",
    messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0]?.message.content);

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.GEMINI_API_KEY,
    baseURL: "<my gateway url>"
});

const response = await client.chat.completions.create({
    model: "google-ai-studio/gemini-2.0-flash",
    messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0]?.message.content);

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.GEMINI_API_KEY,
    baseURL: "<my gateway url>"
});

const response = await client.chat.completions.create({
    model: "google-ai-studio/gemini-2.0-flash",
    messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0]?.message.content);

This code responds with a 500 erorr:

error: 500 [{"code":2002,"message":"Internal server error"}]
    status: 500,
   headers: Headers {
  "date": "Wed, 04 Jun 2025 00:17:33 GMT",
  "content-type": "application/json",
  "content-length": "101",
  "connection": "keep-alive",
  "vary": "Accept-Encoding",
  "set-cookie": [ "__cf_bm=lxBSjxIyTfBLOugJfEcUnFQKJkr6p7.457_228mHwaY-1748996253-1.0.1.1-X7FS2Pm_wVKIaxeYigqVUhnTxNN9iz.UjHnzyc1mlSb.XVfZET4Qu.h1ipuCSntWPgT3rlOGusU1gjNJAKHq_SKTPaueVJM6jp6n4pkt.kU; path=/; expires=Wed, 04-Jun-25 00:47:33 GMT; domain=.gateway.ai.cloudflare.com; HttpOnly; Secure; SameSite=None" ],
  "server": "cloudflare",
  "cf-ray": "94a33f75f8e2000e-ORD",
},
 requestID: null,
     error: [
  [Object ...]
],

error: 500 [{"code":2002,"message":"Internal server error"}]
    status: 500,
   headers: Headers {
  "date": "Wed, 04 Jun 2025 00:17:33 GMT",
  "content-type": "application/json",
  "content-length": "101",
  "connection": "keep-alive",
  "vary": "Accept-Encoding",
  "set-cookie": [ "__cf_bm=lxBSjxIyTfBLOugJfEcUnFQKJkr6p7.457_228mHwaY-1748996253-1.0.1.1-X7FS2Pm_wVKIaxeYigqVUhnTxNN9iz.UjHnzyc1mlSb.XVfZET4Qu.h1ipuCSntWPgT3rlOGusU1gjNJAKHq_SKTPaueVJM6jp6n4pkt.kU; path=/; expires=Wed, 04-Jun-25 00:47:33 GMT; domain=.gateway.ai.cloudflare.com; HttpOnly; Secure; SameSite=None" ],
  "server": "cloudflare",
  "cf-ray": "94a33f75f8e2000e-ORD",
},
 requestID: null,
     error: [
  [Object ...]
],

error: 500 [{"code":2002,"message":"Internal server error"}]
    status: 500,
   headers: Headers {
  "date": "Wed, 04 Jun 2025 00:17:33 GMT",
  "content-type": "application/json",
  "content-length": "101",
  "connection": "keep-alive",
  "vary": "Accept-Encoding",
  "set-cookie": [ "__cf_bm=lxBSjxIyTfBLOugJfEcUnFQKJkr6p7.457_228mHwaY-1748996253-1.0.1.1-X7FS2Pm_wVKIaxeYigqVUhnTxNN9iz.UjHnzyc1mlSb.XVfZET4Qu.h1ipuCSntWPgT3rlOGusU1gjNJAKHq_SKTPaueVJM6jp6n4pkt.kU; path=/; expires=Wed, 04-Jun-25 00:47:33 GMT; domain=.gateway.ai.cloudflare.com; HttpOnly; Secure; SameSite=None" ],
  "server": "cloudflare",
  "cf-ray": "94a33f75f8e2000e-ORD",
},
 requestID: null,
     error: [
  [Object ...]
],

error: 500 [{"code":2002,"message":"Internal server error"}]
    status: 500,
   headers: Headers {
  "date": "Wed, 04 Jun 2025 00:17:33 GMT",
  "content-type": "application/json",
  "content-length": "101",
  "connection": "keep-alive",
  "vary": "Accept-Encoding",
  "set-cookie": [ "__cf_bm=lxBSjxIyTfBLOugJfEcUnFQKJkr6p7.457_228mHwaY-1748996253-1.0.1.1-X7FS2Pm_wVKIaxeYigqVUhnTxNN9iz.UjHnzyc1mlSb.XVfZET4Qu.h1ipuCSntWPgT3rlOGusU1gjNJAKHq_SKTPaueVJM6jp6n4pkt.kU; path=/; expires=Wed, 04-Jun-25 00:47:33 GMT; domain=.gateway.ai.cloudflare.com; HttpOnly; Secure; SameSite=None" ],
  "server": "cloudflare",
  "cf-ray": "94a33f75f8e2000e-ORD",
},
 requestID: null,
     error: [
  [Object ...]
],

The equivalent curl request works exactly as expected

curl -X POST "<my gateway url>" \
  --header "Authorization: Bearer $GEMINI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "google-ai-studio/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ]
  }'

curl -X POST "<my gateway url>" \
  --header "Authorization: Bearer $GEMINI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "google-ai-studio/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ]
  }'

curl -X POST "<my gateway url>" \
  --header "Authorization: Bearer $GEMINI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "google-ai-studio/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ]
  }'

curl -X POST "<my gateway url>" \
  --header "Authorization: Bearer $GEMINI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "google-ai-studio/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ]
  }'

Cloudflare Docs

OpenAI Compatibility

Cloudflare's AI Gateway offers an OpenAI-compatible /chat/completions endpoint, enabling integration with multiple AI providers using a single URL. This feature simplifies the integration process, allowing for seamless switching between different models without significant code modifications.

面

面条•6/4/25, 2:25 AM

Is the OpenAI compatible endpoint not supporting streaming output?

N

Nico•6/4/25, 3:06 AM

Maybe?

N

Nico•6/4/25, 3:06 AM

But also the curl didn't stream it's responsd

NNico But also the curl didn't stream it's responsd

I

Isaac McFadyen•6/4/25, 3:18 PM

cURL does if you specify the

-N

-N

(nobuffer) flag

IIsaac McFadyen cURL does if you specify the `-N` (nobuffer) flag

N

Nico•6/4/25, 3:23 PM

Thanks! Didn't know this. Behavior didn't change

I

Isaac McFadyen•6/4/25, 3:23 PM

Did you also specify

"stream": true

"stream": true

?

N

Nico•6/4/25, 3:24 PM

yes

I

Isaac McFadyen•6/4/25, 3:24 PM

Interesting, maybe it doesn't support it then.

N

Nico•6/4/25, 3:24 PM

Yeah must be. Curious what the error with the sdk version is

NNico Yeah must be. Curious what the error with the sdk version is

I

Isaac McFadyen•6/4/25, 3:30 PM

When you specified the URL with the SDK what URL did you use? IIRC the OpenAI SDK automatically adds

/chat/completion

/chat/completion

to the end, so your URL shouldn't have that in it with the SDK

NNico Yeah must be. Curious what the error with the sdk version is

N

Nico•6/4/25, 3:32 PM

I used

https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat/chat/completions

https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat/chat/completions

which is what is in the docs. But I'll try it without

/chat/completions

/chat/completions

!

N

Nico•6/4/25, 3:33 PM

Yes that was the issue. The docs are wrong.

I

Isaac McFadyen•6/4/25, 3:47 PM

I'll submit a PR - thanks.

C

camudo•6/4/25, 4:22 PM

Are you guys able to see the AI Gateway logs normally?
general-helpAI Gateway bug?

C

camudo•6/4/25, 4:23 PM

ok seems out

Captura_de_Tela_2025-06-04_as_13.23.11.png

A

aflores•6/4/25, 6:17 PM

does llm streaming not work with the byob api keys way of using api gateway? kind of unusable for a lot of front facing client work if i can't stream an api response with my own keys and instead have to use some workers ai thing with the websocket?
don't see clarification in the docs and quick search here didn't turn up much. i can set streaming true in the grok version of the call and see it coming in as chucnks but it buffers them all before sending them down as a single response

�

😈 Donkey 💫•6/4/25, 6:32 PM

AI gateway failed to render data

@Kathy | Browser Rendering PM Ah fair, it's related to "context caching" at Google / Vertex AI. For

Similar Threads

Similar Threads

Similar Threads