Yes. It's a super simple flow of: Embed query and retrieve from Vector Store Build a big prompt with

Yes. It's a super simple flow of:
Embed query and retrieve from Vector Store
Build a big prompt with query + RAG results
Send that to LLM

There is no need to double guardrail it. The prompt is already checked in the embedding step.
And the big query is build from safe-bits (guard before entrying into DB + guarded user query).

So the second guard is just extra latency.

BBryan Chen I am trying AI Gateway with WebSocket and what is the `p` field in the stream re...

Isaac McFadyen•3/4/25, 3:56 PM

https://blog.cloudflare.com/sv-se/ai-side-channel-attack-mitigated/

The Cloudflare Blog

Mitigating a token-length side-channel attack in our AI products

The Workers AI and AI Gateway team recently collaborated closely with security researchers at Ben Gurion University regarding a report submitted through our Public Bug Bounty program. Through this process, we discovered and fully patched a vulnerability affecting all LLM providers. Here’s the story

Peps•3/5/25, 3:22 AM

Does anyone know if disabling and re-enabling cache on a gateway clears it's previous cache?

Nisar446•3/5/25, 7:26 PM

ari•3/5/25, 8:05 PM

We just implemented AI Gateway using the Universal endpoint, with both OpenAI and Anthropic models in the list. Is there a way to test that the fallback is working (short of waiting for the next OpenAI service outage)? We've tried providing an invalid API key for the primary service from a local environment, but it seems like this would return a different error. What are the recommended ways to test different failure modes?

ari•3/5/25, 8:07 PM

Also wondering what the "predetermined request timeout" is. The documentation on this is just a broken link (https://developers.cloudflare.com/ai-gateway/configuration/fallbacks/#request-timeouts:~:text=predetermined%20request%20timeouts)

ari•3/6/25, 5:19 PM

One more issue: Turning on Guardrails seems to break streaming (at least when OpenAI is the default model.)

JJohnny Bernhard Yes. It's a super simple flow of: Embed query and retrieve from Vector Store Bui...

Kathy•3/7/25, 7:28 PM

got it, we currently don't have per request control to disable guardrails. so thanks for sharing your use - something for us to think about

as for how to help right now, since guardrails is at the gateway level, you could use two gateways - one for embedding with guardrails on. one for queries wiht guardrails off. sorry not the cleanest

Aari One more issue: Turning on Guardrails seems to break streaming (at least when Op...

Kathy•3/7/25, 7:29 PM

you're right, good point - will add to docs. currently doesn't work for streaming

Aari We just implemented AI Gateway using the Universal endpoint, with both OpenAI an...

Kathy•3/7/25, 7:29 PM

the way you tried of providing an invalid key to force the fallback does work

 [
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-2-7b-chat-int8",
        "headers": {
            "Authorization": "Bearer CF_TOKEN-INVALID",
            "Content-Type": "application/json"
        },
        "config":{
            "skipCache": true
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "10+1=?"
                }
            ]
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-2-7b-chat-int8",
        "headers": {
            "Authorization": "Bearer CF_TOKEN",
            "Content-Type": "application/json"
        },
        "config":{
            "skipCache": true
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "10+10=?"
                }
            ]
        }
    }
]

 [
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-2-7b-chat-int8",
        "headers": {
            "Authorization": "Bearer CF_TOKEN-INVALID",
            "Content-Type": "application/json"
        },
        "config":{
            "skipCache": true
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "10+1=?"
                }
            ]
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-2-7b-chat-int8",
        "headers": {
            "Authorization": "Bearer CF_TOKEN",
            "Content-Type": "application/json"
        },
        "config":{
            "skipCache": true
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "10+10=?"
                }
            ]
        }
    }
]

Aari Also wondering what the "predetermined request timeout" is. The documentation on...

Kathy•3/7/25, 7:30 PM

thanks for pointing that out. fixed

KKathy got it, we currently don't have per request control to disable guardrails. so th...

Johnny BernhardOP•3/8/25, 10:56 AM

Yeah, two Gateways would work, but a bit annoying to set up 2 more gateways (prod and dev) and share the config around.
But definitely understand this not being too important to change right now.

KKathy not at the moment. our rate limits right now are by number of requests through a...

Super Morris•3/8/25, 4:51 PM

please inplement ratelimit per user/metadata.... that would be awsome!

Super Morris•3/8/25, 4:52 PM

that would help everybody building apps with ai that want to rate limit their users usage out of the box.

nclevenger•3/9/25, 9:59 AM

When using openrouter, the duration values are way lower than reality ... many of these requests are taking multiple seconds (or even 10s of seconds) ... and even the fastest ones are coming back in mid-to-high hundreds of milliseconds while AI Gateway isn't reporting anything over 225ms ...

The other [minor but annoying] issue with AI Gateway + OpenRouter is that the model names are cut off very short, and given the {model}:{keyword}{model}:{keyword} syntax used by open router, the end of the model name is really critical to see when streaming or scanning the last

Shawn•3/11/25, 2:25 AM

Can anyone help with this issue or know another site to go to?

SShawn Can anyone help with this issue or know another site to go to?

Isaac McFadyen•3/11/25, 2:02 PM

?crossposting

SuperHelpflare•3/11/25, 2:02 PM

Please do not post your question in multiple channels/post it multiple times per the rules at #

welcome-and-rules. It creates confusion for people trying to help you and doesn't get your issue or question solved any faster.

PPeps Does anyone know if disabling and re-enabling cache on a gateway clears it's pre...

Kathy•3/11/25, 9:31 PM

disabling and re-enabling will not clear it's previous cache, sorry

Right now the only option is to wait for the ttl to expire --> we need to add the ability to purge cache still

SSuper Morris please inplement ratelimit per user/metadata.... that would be awsome!

Kathy•3/11/25, 9:34 PM

thanks for the feedback. helps with prioritization

Nnclevenger When using openrouter, the duration values are way lower than reality ... many o...

Kathy•3/11/25, 9:35 PM

hm! might be a bug - can you dm me your account id please?

KKathy hm! might be a bug - can you dm me your account id please?

nclevenger•3/12/25, 5:53 AM

just dm'd, thanks

IIsaac McFadyen https://blog.cloudflare.com/sv-se/ai-side-channel-attack-mitigated/

RobbBoss•3/14/25, 8:50 PM

This is super interesting. Thank you for sharing that. From this, and based on the JSON output that was shared... well, I just devised another possible attack which might reveal the encryption keys themselves. It's possible this mitigation technique might create a compromise in its current implementation. Given that discussing it here would be irresponsible, who should I contact, please?

RRobbBoss This is super interesting. Thank you for sharing that. From this, and based on t...

Isaac McFadyen•3/14/25, 8:54 PM

If you believe you've found a vulnerability you should report it through HackerOne.

Isaac McFadyen•3/14/25, 8:54 PM

?hackerone

SuperHelpflare•3/14/25, 8:55 PM

This discord is not for for reporting security reports. Cloudflare has a Responsible Disclosure program run through HackerOne where you can privately file reports, which will be reviewed by the appropriate teams and for which you may be eligible for bug bounties: https://hackerone.com/cloudflare

HackerOne

Cloudflare Public Bug Bounty - Bug Bounty Program | HackerOne

The Cloudflare Public Bug Bounty Bug Bounty Program enlists the help of the hacker community at HackerOne to make Cloudflare Public Bug Bounty more secure. HackerOne is the #1 hacker-powered security platform, helping organizations find and fix critical vulnerabilities before they can be criminally exploited.

RobbBoss•3/14/25, 8:56 PM

Thanks, I juuust saw that at the end of the article. haha Thank you for responding, though, truly.

I am likely to write about it, but not likely to go through the trouble of proving it out, which seems necessary to report it there. Well, thank you for that insight, regardless!

RRobbBoss Thanks, I juuust saw that at the end of the article. haha Thank you for respondi...

Isaac McFadyen•3/14/25, 10:01 PM

If you are going to publicly write about it without reporting it then that would be a security issue. If you don't end up reporting it then you should not publicly write about it.

RobbBoss•3/14/25, 10:03 PM

No, if I write about it I'll probably share it with researcher friends. I'm not going to randomly publicly disclose security issues.

RobbBoss•3/14/25, 10:03 PM

Thank you for the due diligence in stating that.

I also have no interest in joining HackerOne, tbh. But I respect that there's a community of people that find bugs and there are bounties and all this. I have no interest in bounties at all. I would take this to #off-topic but I gather that this is probably the end of this thread.

RRobbBoss Thank you for the due diligence in stating that. 🙂 I also have no interest in ...

Isaac McFadyen•3/14/25, 10:26 PM

It's less about bounties (although that's part of it) and more about responsible disclosure. By telling others that aren't Cloudflare, you risk them telling others themselves and eventually telling someone who is in a position to and wants to actively exploit it.

I would heavily encourage you to share through HackerOne, but if you don't want to I'd avoid sharing with anyone else regardless of whether they are friends or not.

dom•3/15/25, 10:43 AM

I have a product idea. An AI Router. Similar to AI Gateway but it would allow me to define custom routes where I can define the target llm model and settings like temp, max_tokens etc. I would then route each app llm call to the custome AI Router endpoint. I would send only the prompts (system(optional) and user prompts). All other settings, like which llm to use, requst settings would be defined at AI router level. With one setting I can turn on AI Gateway on the router.

This would allow me to quickly adapt to the fast paced landscape of llm models. I can quickly reconfigure app's functionality to use different llm's and even run tests to quickly to validate outputs.

I'm wasting a lot of time updating and refactoring app code to stay up to date with llm models. Each time I need to:

find the llm api functionality. Try to remember what it does and understand it
refactor the code to use different model because i found a better one, cheaper one or the one im using will no longer available.
test in staging, test in production

I think there is a huge opportunity here for a cloud provider to own this pipeline. A great lock in for the customer as well. Please pass the idea to your product team to further explore. Thanks!

Ddom I have a product idea. An AI Router. Similar to AI Gateway but it would allow me...

rob•3/17/25, 4:33 PM

https://x.com/G4brym/status/1901413193970299149 did u see this ?

Gabriel Massadas (@G4brym) on X

Sunday night project drop! Built a Cloudflare AI Gateway Provider for the Vercel AI SDK

Unlike other gateway products (e.g., Portkey), this one lets you keep using the official providers, so you still get all the type hints!

Try it out: npm install ai-gateway-provider

•

3/16/25, 11:20 PM

龙

龙卷风•3/17/25, 5:07 PM

I configured an AI gateway for the openrouter, but why does it always additionally call the @cf large language model when using the conversation feature?

龙

龙卷风•3/17/25, 5:08 PM

Gemini 2.0, too.

龙龙卷风 I configured an AI gateway for the openrouter, but why does it always additional...

Isaac McFadyen•3/17/25, 5:13 PM

That looks like you have Guardrails turned on (to guard input so it blocks inappropriate content)

Isaac McFadyen•3/17/25, 5:13 PM

https://developers.cloudflare.com/ai-gateway/guardrails/

IIsaac McFadyen https://developers.cloudflare.com/ai-gateway/guardrails/

龙

龙卷风•3/17/25, 5:26 PM

I see, thank you.

Rrob https://x.com/G4brym/status/1901413193970299149 did u see this ?

dom•3/20/25, 8:16 AM

I just took a peak. Not exactly what I need. With this solution I'm still required to define target models at app level.

ring0•3/20/25, 3:51 PM

Has anyone been successful using the openai python library to call AzureOpenAI through Cloudflare Gateway?

Rring0 Has anyone been successful using the openai python library to call AzureOpenAI ...

ring0•3/20/25, 7:00 PM

figured it out. in case anyone needs this, I was missing the default_query:

client = OpenAI(
default_query={"api-version": $AZURE_OPENAI_API_VERSION)},
base_url=$api_base,
api_key=$AZURE_OPENAI_API_KEY),
)

Victor•3/21/25, 9:22 PM

I'm using custom costs but it mentions

Custom costs will appear in the logs with an underline, making it easy to identify when custom pricing has been applied.

But that is no longer the case (I haven't used custom costs before so I don't know when it disappeared).

Cloudflare Docs

Custom costs · Cloudflare AI Gateway docs

Override default or public model costs on a per-request basis.

Ddom I have a product idea. An AI Router. Similar to AI Gateway but it would allow me...

Kathy•3/26/25, 4:27 AM

thanks for the ideas - we're working on an openai compatible endpoint that will help with #2 - saving you from refactoriing the code for different models/providers. will do some thinking on the routing

VVictor I'm using [custom costs](https://developers.cloudflare.com/ai-gateway/configurat...

Kathy•3/26/25, 4:36 AM

just tested and seems to be working. what did you try?

one thing i did notice is for decimals you need to put "0.2". It can't be ".2"

donp•3/31/25, 3:20 AM

Is adding support for Elevenlabs new Speech to Text model called Scribe on the roadmap?

https://developers.cloudflare.com/ai-gateway/providers/elevenlabs/

I tried replacing "text-to-speech" with "speech-to-text", but the endpoint doesn't exist.

Scribe docs - https://elevenlabs.io/docs/capabilities/speech-to-text

Cloudflare Docs

ElevenLabs · Cloudflare AI Gateway docs

ElevenLabs offers advanced text-to-speech services, enabling high-quality voice synthesis in multiple languages.

Speech to Text | ElevenLabs Documentation

Learn how to turn spoken audio into text with ElevenLabs.

smile•3/31/25, 1:16 PM

Hi Everyone. I'm trying to integrate the cloudflare non-realtime websocket api. But I faced the following issues. The WebSocket connection is established, but after 1 or 2 messages, the event listener stops triggering. Please guide me to resolve it.

smile•3/31/25, 1:17 PM

https://developers.cloudflare.com/ai-gateway/configuration/websockets-api/non-realtime-api/

Based on this doc I have created the sample.

Cloudflare Docs

Non-realtime WebSockets API · Cloudflare AI Gateway docs

The Non-realtime WebSockets API allows you to establish persistent connections for AI requests without requiring repeated handshakes. This approach is ideal for applications that do not require real-time interactions but still benefit from reduced latency and continuous communication.

smile•3/31/25, 1:20 PM

Does this API work only with a paid account? First, I want to test a working sample for both real-time and non-real-time WebSocket APIs to evaluate their performance and reduce latency for chat completion.

Milo•3/31/25, 9:02 PM

Hi guys, I got banned from AI Bot from Cloudfare said my provider, he had the AI Bot detection on and he said he wont whitelist me its against protection. So how do I approach Cloudfare to fix this?

MMilo Hi guys, I got banned from AI Bot from Cloudfare said my provider, he had the AI...

Isaac McFadyen•3/31/25, 9:22 PM

?crossposting AI BOT banned me by mistake and my provider tells me I need to handle it with Cloudfare?

SuperHelpflare•3/31/25, 9:22 PM

Please do not post your question in multiple channels/post it multiple times per the rules at #

welcome-and-rules. It creates confusion for people trying to help you and doesn't get your issue or question solved any faster.

[ { "provider": "workers-ai", "endpoint": "@cf/meta/llama-2-7b-chat-int8", "headers": { "Authorization": "Bearer CF_TOKEN-INVALID", "Content-Type": "application/json" }, "config":{ "skipCache": true }, "query": { "messages": [ { "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "10+1=?" } ] } }, { "provider": "workers-ai", "endpoint": "@cf/meta/llama-2-7b-chat-int8", "headers": { "Authorization": "Bearer CF_TOKEN", "Content-Type": "application/json" }, "config":{ "skipCache": true }, "query": { "messages": [ { "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "10+10=?" } ] } } ]

Yes. It's a super simple flow of: Embed query and retrieve from Vector Store Build a big prompt with

Similar Threads

Similar Threads

Similar Threads