Hey @Sudhan . There are a couple of things that might help here: 1. you can pass in a stream of by

Hey @Sudhan . There are a couple of things that might help here:

1. you can pass in a stream of bytes for the inputs in many of the models, e.g. in the nova-3 example you can fetch and pass in the body directly (without needing to await it: https://developers.cloudflare.com/workers-ai/models/nova-3/)
2. Some models (like nova-3) support our async batch API (https://developers.cloudflare.com/workers-ai/features/batch-api/). So you can submit the job for running the inference and then poll for the response.

In both cases, it shouldn't be using much compute in the worker itself.

John Spurlock•10/12/25, 11:10 PM

Does anyone know what the

"p"

"p"

property represents when streaming results back from

@cf/google/gemma-3-12b-it

@cf/google/gemma-3-12b-it

? I can't find any docs on it

Peps•10/12/25, 11:28 PM

its padding

JJohn Spurlock Does anyone know what the `"p"` property represents when streaming results back ...

Peps•10/12/25, 11:28 PM

the reason for the padding is more covered here: https://blog.cloudflare.com/ai-side-channel-attack-mitigated/

The Cloudflare Blog

Mitigating a token-length side-channel attack in our AI products

The Workers AI and AI Gateway team recently collaborated closely with security researchers at Ben Gurion University regarding a report submitted through our Public Bug Bounty program. Through this process, we discovered and fully patched a vulnerability affecting all LLM providers. Here’s the story

PPeps the reason for the padding is more covered here: https://blog.cloudflare.com/ai-...

John Spurlock•10/12/25, 11:29 PM

thanks!

Ssamjs Hey @Sudhan . There are a couple of things that might help here: 1. you can pa...

Sudhan•10/13/25, 5:05 AM

Thank you for the help

zsueo•10/13/25, 5:24 PM

does workers ai json mode respect patterns in the json schema?

zsueo•10/13/25, 5:25 PM

ie if i include regex will it follow the regex?

Zzsueo does workers ai json mode respect patterns in the json schema?

Ashkan•10/13/25, 9:22 PM

I've tried it, the answer is almost Yes but for some models. as far as I tested,

llama-4

llama-4

and

gpt-oss-120b

gpt-oss-120b

response in provided JSON schema. But keep it mind that schema Must well defined and emphasize in your prompt that use JSON schema.

AAshkan I've tried it, the answer is almost Yes but for some models. as far as I tested,...

Isaac McFadyen•10/13/25, 9:37 PM

I think they are specifically asking about regex support not JSON schema support in general

IIsaac McFadyen I think they are specifically asking about regex support not JSON schema support...

Ashkan•10/13/25, 9:44 PM

Yea I see, I thought she means pattern of schema but it is pattern of regex. thanks for clarification.

antim8•10/14/25, 6:49 PM

are onnx custom models supported / available somehow? Wasm seems supported: https://blog.cloudflare.com/webgpu-in-workers/

The Cloudflare Blog

You can now use WebGPU in Cloudflare Workers

Today, we are introducing WebGPU support to Cloudflare Workers. This blog will explain why it's important, why we did it, how you can use it, and what comes next.

Aantim8 are onnx custom models supported / available somehow? Wasm seems supported: htt...

Isaac McFadyen•10/14/25, 6:50 PM

Unfortunately WebGPU only ever landed in local dev and is not a thing in production Workers. And no, Workers AI does not currently support custom models.

Isaac McFadyen•10/14/25, 6:51 PM

https://developers.cloudflare.com/durable-objects/api/webgpu/

antim8•10/14/25, 6:51 PM

is it possible to use the onnx through rust workers which are built on top of wasm to my understanding?

Isaac McFadyen•10/14/25, 6:52 PM

You might be able to? It would be CPU-only and likely quite slow but maybe usable for very small models. Keep in mind that Workers have a size limit of 10mb at deployment time and while you could load in models at runtime you're limited to 128mb of RAM total, shared among all incoming requests to that specific machine.

antim8•10/14/25, 6:53 PM

yes, its an incredibly small model, (at least the inference), would provide my trained onnx file through r2 and the inference on a worker

Calumk•10/14/25, 8:03 PM

Having a very werid issue with AutoRag (search)

Set it up to index ~ 100 MD files in R2

They reference lots of URLS like https://exampleurl.com/course/901izaxkdas1u4r

The search works really well, but edits ALL of these reference urls, to remove numbers.... the search spits out the urls like https://exampleurl.com/course/izaxkdasur (missing 901__1_4)

Its really consistant. it knows everything in all the MD files, but perfectly edits out all numbers referenced in these urls
(and also edits them out from places where they are not specifically in a url)

bosuutap•10/15/25, 5:51 AM

How many requests can do with 10k neurons?

Bbosuutap How many requests can do with 10k neurons?

sodepr•10/19/25, 8:40 PM

Depend on model and input/output token

10K neurons ~ 0,11$

If you uses

gpt-oss-120b

gpt-oss-120b

,
If each request use ~10K input tokens and ~1K output tokens (avg of most AI agent)
=>

10000/1000000*0.35 + 1000/1000000*0.75 = $0.00425/request

10000/1000000*0.35 + 1000/1000000*0.75 = $0.00425/request

=> $0.11 / $0.00425 = ~25 requests

If you just chat, each request spends ~100 input token and ~100 output token
=>

100/1000000*0.35 + 100/1000000*0.75 = $0.00011/request

100/1000000*0.35 + 100/1000000*0.75 = $0.00011/request

=> $0.11 / $0.00011 = ~1000 request

koalaspider•10/20/25, 2:09 AM

Both my dev and deploy version for AI worker used to work perfectly fine. But last few days, the deploy version Ai worker stop working. Have been trying troubleshoot for the last two days. The dev version still working fine. Anyone know why?

Kkoalaspider Both my dev and deploy version for AI worker used to work perfectly fine. But l...

samjsOP•10/20/25, 9:56 PM

Hey @koalaspider . Can you expand a little on what has stopped working, e.g. is it the inference requests? something else?

blindChicken•10/20/25, 10:21 PM

I'm getting the error:

AiError: 3040: Capacity temporarily exceeded, please try again.

AiError: 3040: Capacity temporarily exceeded, please try again.

This appears to happen intermittently, I have had this occur in the past, and it eventually resolves itself. Is there any information as to the cause, how to prevent it and what can be done to resolve it when these errors occur?

BblindChicken I'm getting the error:`AiError: 3040: Capacity temporarily exceeded, please try ...

Chaika•10/20/25, 11:44 PM

as far as been said in the past, they just need to scale
When I use worker ai, I get an error.
workers-ai
might be helpful for them if you include the model name here.

CChaika as far as been said in the past, they just need to scale https://discord.com/cha...

blindChicken•10/21/25, 12:52 AM

In this case I was calling

"@cf/meta/llama-4-scout-17b-16e-instruct"

"@cf/meta/llama-4-scout-17b-16e-instruct"

blindChicken•10/21/25, 12:53 AM

So what you are suggesting is that this is model specific. Then I should be able to catch the error and then call a backup model.

BblindChicken So what you are suggesting is that this is model specific. Then I should be able...

Chaika•10/21/25, 12:59 AM

yes, def model specific. Also those look resolved now, I have some workers ai usage and saw ~43 of those around the time you reported, but none for the past bid

CChaika yes, def model specific. Also those look resolved now, I have some workers ai u...

blindChicken•10/21/25, 1:01 AM

Yes, these generally don't last long. But the problem is that when they occur they basically break my app. So I need some kind of a fallback so that my users can continue to be served.

koalaspider•10/21/25, 12:44 PM

Hi @samjs, the AI does not respond on the deploy version but the dev version works fine. I rebuilt everything on the deploy version yesterday. The built process was completed and no error. But when it comes to wrangler deploy, it said "A request to the Cloudflare API failed.
Authentication error [code: 10000]"

robinsloan•10/21/25, 2:48 PM

Just want to register here that the Workers AI users of the world would love to see the new DeepSeek-OCR among the offerings!

koalaspider•10/21/25, 3:16 PM

My suspect is the execution of "npx wrangl deploy" lack of permission. But I already has a build token which I gave almost all the permission I can think of..

Kkoalaspider My suspect is the execution of "npx wrangl deploy" lack of permission. But I al...

samjsOP•10/21/25, 3:55 PM

If it's wrangler deploy that's failing it might be worth asking in #wrangler or even #general-help -- those folks might be able to provide more help in debugging this.

carter•10/21/25, 8:56 PM

Curious about the current state of this - is there any way to use CLIP models on CF?
workers-ai

carter•10/21/25, 9:09 PM

^ or if anyone has found any workarounds for hosting multimodal retrieval apps on CF

Ashkan•10/21/25, 10:03 PM

is it possible to deposit 100$ for only AI use? I mean pre-deposit and at max 100$ as I have for AI cost.

Laurence•10/22/25, 9:35 AM

Hello, I'm having an issue with @cf/deepgram/flux. Whenever I try to use it I get an error "Argument of type '"@cf/deepgram/flux"' is not assignable to parameter of type 'keyof AiModels'.". Yet I'm able to run other models without issue.

LLaurence Hello, I'm having an issue with @cf/deepgram/flux. Whenever I try to use it I g...

Kevin•10/22/25, 9:47 AM

do you have the latest version of wrangler?

Laurence•10/22/25, 9:48 AM

Yes.

Laurence•10/22/25, 9:55 AM

When the error occurred I updated my cloudflare/ai using "npm install @Cloudflare/ai@latest" but the error is still there.

Kevin•10/22/25, 10:17 AM

Seems like the list has not been updated in a while https://github.com/cloudflare/ai/blob/main/types/workerd.d.ts#L3924

GitHub

ai/types/workerd.d.ts at main · cloudflare/ai

Contribute to cloudflare/ai development by creating an account on GitHub.

KKevin Seems like the list has not been updated in a while https://github.com/cloudflar...

Laurence•10/22/25, 10:20 AM

I'll try using it like this, not ideal but maybe it will work.

James•10/22/25, 12:49 PM

Types for workers AI feel mostly abandoned at this point sadly. I’ve raised it again and again, both publicly and privately, and there’s never any real conclusion:

- https://github.com/cloudflare/workerd/issues/5080
- https://github.com/cloudflare/workerd/issues/2418
- https://github.com/cloudflare/workerd/issues/2181

If you want good DX, with a maintained platform, sdk, and reliable inference, I’d personally recommend using vercel’s ai sdk and OpenAI directly at this point unfortunately.

JJames Types for workers AI feel mostly abandoned at this point sadly. I’ve raised it a...

Laurence•10/22/25, 1:24 PM

Thanks, I'll check it them out. I've currently been using fal.ai but it doesn't have all the models that I'm looking for.

LLaurence I'll try using it like this, not ideal but maybe it will work.

Laurence•10/22/25, 8:56 PM

So, the method that I used above works but it's kind of finicky. The response is kind of inconsistent at it throws a lot of internal errors. So, I'm currently using the direct WebSocket connection according to Deepgram's website https://developers.deepgram.com/docs/flux/quickstart (works well). I'll try Cloudflare's implementation later.

Getting Started with Flux | Deepgram's Docs

Flux is the first conversational speech recognition model built specifically for voice agents. Unlike traditional STT that just transcribes words, Flux understands conversational flow and automatically handles turn-taking.

JJames Types for workers AI feel mostly abandoned at this point sadly. I’ve raised it a...

carter•10/23/25, 3:19 AM

Wow, that's really unfortunate. No indication that they'll work to improve workers AI? I'm a bit new to CF's inference, but what do you mean by (not) "reliable inference"? Is the latency all over?

samjsOP•10/23/25, 3:10 PM

Hey @James , @carter and others. Apologies for the slow progress here, and I understand and share the frustration.

We are absolutely working on fixing these issues, and we hope to have an update soon. We've got a little backed up on some work here, but should be getting around to it soon. To give you an idea of the kinds of things we've been working on:

- Improved platform stability and performance: we've recently rolled out a lot of "invisible" changes to make the platform more stable to things like high load and more consistently return successful responses. While there will still be occasions that people will see 3040 errors due to limited capacity, these should be much less often than before
- For the gpt-oss models, we've been waiting for some vLLM changes to land before pushing out more updates. In hindsight we shouldn't have let ourselves be blocked on this, and we're working to address this by seeing if we can build this ourselves internally. That's going to bring streaming, chat/completions support, and support for async requests. Along with a number of bug fixes.
- Support for realtime audio models like deepgram/flux, and websocket support for those.

On the DX front -- all the feedback that gets shared here/in GitHub is absolutely heard internally. e.g. you have @Kevin and I both core member of the Workers AI team checking this regularly.

Ssamjs Hey @James , @carter and others. Apologies for the slow progress here, and I un...

James•10/23/25, 3:33 PM

Thanks Sam, excited for updates. I'd love to see DX around types with model releases a clear focus, so that models aren't added to the platform with an immediately negative DX. Other platforms have types available with great DX on day one or two, especially for platforms they advertise as a "partner", but some models on Cloudflare end up weeks, or even months out of date, if not missing entirely.

samjsOP•10/23/25, 3:37 PM

Agreed. We have a process and tooling problem we need to fix here. On the former, as you say we've been way to slow to update types along with model releases. And on the latter, currently updating types is a somewhat manual process for us (which obviously feeds into the former).

samjsOP•10/23/25, 3:38 PM

Kevin and I have been discussing a bunch how we want to improve it. We'll get better

James•10/23/25, 3:40 PM

Great to hear. Let me know if there's anything I can do to help here!

Hey @Sudhan . There are a couple of things that might help here: 1. you can pass in a stream of by

Similar Threads