are you using google vertex or google ai studio? i just tested on google ai studio and seems like ca

are you using google vertex or google ai studio? i just tested on google ai studio and seems like caching is working properly. Have you set any different cache configuration for gemini? https://developers.cloudflare.com/ai-gateway/configuration/caching/
Cloudflare Docs
Caching · Cloudflare AI Gateway docs
Override caching settings on a per-request basis.
18 Replies
Kathy
KathyOP4mo ago
thanks for pointing out. ill get that fixed
laurynas
laurynas4mo ago
thank you! Experimenting still but mostly due to speed. LLaMa on Cerebras architecture outputs at 2k tokens/sec which makes a world of difference in terms of latency in the UX
BumblebeeSquare
BumblebeeSquare4mo ago
Hi guys, is there a way to do analytic, collect token costs without enable logs ? i feel my user prob dont want me to see their inappropriate prompts in logs
tornado
tornado4mo ago
I'm using Google AI studio and no changes to default caching. I did more investigation and it seems like when I send request locally things show up on the dashboard but inside my deployed cloudflare workflow only the Claude api calls show up in the dashboard.
rob
rob4mo ago
rita kozlov 🐀 (@ritakozlov_) on X
we shipped a binding you can use for calling @cloudflaredev ai gateway directly from a worker! if you're already using workers + workers ai, just add this to your existing code: gateway: { id: "my-gateway" } or send granular feedback w your logs → https://t.co/ZZIvA04kPb
From An unknown user
X
Kathy
KathyOP4mo ago
hope yall like! 🤗
Zig
Zig4mo ago
Does anyone know how to use the websockets api with streaming? Im trying to use the universal endpoint with openai. If i send a content type application/json header with no streaming it works fine, if i try and do it with streaming i see the request going through on the dashboard but my worker never receives any websocket messages. if i send it without a content type header or with event-stream i allways get an error about a missing model
No description
dom
dom4mo ago
Using claude-3-5-sonnet-latest which currently defaults to claude-3-5-sonnet-20241022 I see o3 mini also has this issue.
No description
usualdev
usualdev4mo ago
Feature request, add ability to adjust the column length. I don't want feedback, latency, or status to be that wide, I would rather see the model name
No description
Kathy
KathyOP3mo ago
Cloudflare Docs
Request handling · Cloudflare AI Gateway docs
Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
Kathy
KathyOP3mo ago
nevermind i found it- the changelog THANKS
rob
rob3mo ago
Cloudflare Developers (@CloudflareDev) on X
🙂We've got 2 new exciting updates for AI Gateway! •Gain observability and control over your Cerebras, ElevenLabs and Cartesia usage via AI Gateway. •Add more control to your requests with new timeout, retry, and fallback options. Learn more about these two updates below👇
From An unknown user
X
luiseok
luiseok3mo ago
Does anyone get errors when trying to google gemini 1.5 flash? I know it's not an ai gateway problem, but just to be sure if I'm the only one having trouble.
No description
No description
rob
rob3mo ago
📈
Kavatch
Kavatch3mo ago
There is no universal way to use ai gateway with any openai compatabil endpoint correct? I am using deepinfra and would like to use ai gateway however I don't see any options to set my own base url
morpig
morpig3mo ago
hi! wondering if there any plans to return request ID (what we see in AI gateway logs) in workers. or is it possible already?
No description
Unknown User
Unknown User3mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?