yeah definitely, as i can just add ip to metadata, currently i use durable object to rate limit user to 1 req/15s, not sure which is more cost effective tho. It would be great if i can also do filter on token cost by metadata, i.e i want to know how much ishould charge a user by user_id
Hi everyone, can the gateways be configured to protect my custom serverless function? Currently, it seems they are only set up for direct connections to various mainstream AI API providers. I have an AI bot that connects to different databases and prompts, all bundled into a serverless function. The AI gateways are excellent for setting up authentication, rate limiting, and other features. Is there a way to combine them together ?
Is there any new bug with gemini logs. I think after I turned on caching, gemini logs stopped showing up in the AI gateway dashboard. It also seems like the cache isn't being applied to them.
thank you! Experimenting still but mostly due to speed. LLaMa on Cerebras architecture outputs at 2k tokens/sec which makes a world of difference in terms of latency in the UX
Hi guys, is there a way to do analytic, collect token costs without enable logs ? i feel my user prob dont want me to see their inappropriate prompts in logs
I'm using Google AI studio and no changes to default caching. I did more investigation and it seems like when I send request locally things show up on the dashboard but inside my deployed cloudflare workflow only the Claude api calls show up in the dashboard.
we shipped a binding you can use for calling @cloudflaredev ai gateway directly from a worker! if you're already using workers + workers ai, just add this to your existing code:
gateway: { id: "my-gateway" } or send granular feedback w your logs → https://t.co/ZZIvA04kPb
Does anyone know how to use the websockets api with streaming? Im trying to use the universal endpoint with openai.
If i send a content type application/json header with no streaming it works fine, if i try and do it with streaming i see the request going through on the dashboard but my worker never receives any websocket messages. if i send it without a content type header or with event-stream i allways get an error about a missing model
Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
Does anyone get errors when trying to google gemini 1.5 flash? I know it's not an ai gateway problem, but just to be sure if I'm the only one having trouble.
There is no universal way to use ai gateway with any openai compatabil endpoint correct? I am using deepinfra and would like to use ai gateway however I don't see any options to set my own base url
Hey guys, can anyone tell me how to download the AI Gateway logs in order to create an AI dataset for LLM fine tuning? I already created a dataset from a date range, but I can't do anything with it besides deleting it.
AI Gateway allows you to securely export logs to an external storage location, where you can decrypt and process them. You can toggle Workers Logpush on and off in the Cloudflare dashboard settings. This product is available on the Workers Paid plan. For pricing information, refer to Pricing.
Hi, I'm experiencing an issue with the CF AI Gateway while using Google Gemini Flash 2.0.
About 90% of the time, I receive the following error:
423: 'Resource has been exhausted (e.g., check quota).'
Here are some details about my situation:
My API key is on a paid plan with Gemini. I send a relatively low number of requests: 1-5 per minute, totaling around 1,000 per month. When I bypass the gateway and use a direct curl request from my PC, I don't encounter any blocking issues. Could you help me resolve this?