What issues have you seen there?

engnadeau•9/19/25, 8:51 PM

error of the day:

  error: '4002: could not route request to AI model',

  error: '4002: could not route request to AI model',

with

  model: '@cf/baai/bge-m3',

  model: '@cf/baai/bge-m3',

is this also due to high demand?

engnadeau•9/19/25, 9:07 PM

is Workers AI considered production ready?

James•9/19/25, 9:42 PM

https://www.cloudflarestatus.com/incidents/k9wl5k2hb13m

Workers AI experiencing degraded availability in some models

Ffortytwo What issues have you seen there?

.r20•9/19/25, 9:48 PM

workers ai is still a few iterations away from production ready i cant launch it to my customers in the state its in rn (although they are working on it p fast), and indexing isnt super reliable rn imo (partially due to workers ai and then other generic errors it throws)

.r20•9/19/25, 9:49 PM

for prototyping its fine but vectorize is very good and i still recommend that and not a lot more work than autorag

fortytwoOP•9/19/25, 9:49 PM

Yeah I did have a bunch of indexing errors when I pushed stuff, had to redo it 5 or 6 times to get all of them done

.r20•9/19/25, 9:49 PM

yeah the problem is my customers update the rag extremely frequently so i cant rely on their indexing, but like i said its still good for prototyping

fortytwoOP•9/19/25, 9:55 PM

Ok, well that's good to know. So you're using vectorize "manually", basically? Still storing data in a cloudflare bucket but generating and searching the vectors yourself rather than the autorag apis?

.r20•9/19/25, 10:38 PM

i store data in d1 and yeah i just make the embeddings through openai and insert into the vectorize

.r20•9/19/25, 10:38 PM

and then searching isnt that hard either

.r20•9/19/25, 10:39 PM

u can vibe code most of this stuff anyway through cloudflare's mcp server and whatever ai powered ide u use

ray•9/21/25, 4:23 AM

Is any one else getting issues with tool calls? I can only seem to get proper tool call responses via the ai sdk when I use mistral-small-3.1-24b-instruct

ray•9/21/25, 4:24 AM

All the other tool calling models either return a text type with tool params json stringified or error out

Ashkan•9/21/25, 9:53 AM

Hello, when I want to poll batch request I encounter this error:

Error: 3030: 1 validation error for VllmBatchRequest

Error: 3030: 1 validation error for VllmBatchRequest

does anyone know what doesn't mean and how to resolve it?

Ashkan•9/21/25, 9:56 AM

Gateway's logs looks hanging. it took tens of minutes but no change:

hellc•9/22/25, 1:21 PM

seems like bge-m3 over workers-ai is lagging again. Same issue from friday's evening...

.venv/lib/python3.10/site-packages/llama_index/embeddings/cloudflare_workersai/base.py", line 118, in _aget_text_embeddings
    return resp["result"]["data"]
KeyError: 'data'

.venv/lib/python3.10/site-packages/llama_index/embeddings/cloudflare_workersai/base.py", line 118, in _aget_text_embeddings
    return resp["result"]["data"]
KeyError: 'data'

Leo•9/22/25, 3:34 PM

Do you have an automated article-writing bot for your website?

blindChicken•9/22/25, 4:28 PM

I'm having an issue with workersAI, it is returning all kinds of strange errors with little information of what the problem may be. I have gotten:

InferenceUpstreamError: error code: 1031

InferenceUpstreamError: error code: 1031

and now I'm getting

InferenceUpstreamError: <!DOCTYPE html>...

InferenceUpstreamError: <!DOCTYPE html>...

where it is returning CF html error page, with 500 error. I've tried using my production worker, and then in dev with --remote and dev locally. In all cases it returns some variation of the above but all fail. The code worked fine yesterday and hasn't changed. It fails at the call

"@cf/qwen/qwen2.5-coder-32b-instruct"

"@cf/qwen/qwen2.5-coder-32b-instruct"

. All other requests appear to work.

BblindChicken I'm having an issue with workersAI, it is returning all kinds of strange errors ...

blindChicken•9/22/25, 6:02 PM

It appears that this error is being caused by having

remote = true

remote = true

included in my

wrangler.toml

wrangler.toml

I had it set in both

queues

queues

and

r2_buckets

r2_buckets

, but oddly not in

ai

ai

. Removing them fixed the problem.

BblindChicken It appears that this error is being caused by having `remote = true` included in...

blindChicken•9/22/25, 6:16 PM

Removing

remote = true

remote = true

solved the problem in dev but not in production. I redeployed worker, but the request still hangs. Nothing in logs...

fortytwoOP•9/23/25, 5:14 AM

@.r20 regarding your moving off of autorag...is it only literally autorag that is still in beta? Did you try using all cloudflare services but doing it more "manually", did that work ok? Or have to use thirdparty stuff to get something prod-ready?

Ffortytwo @.r20 regarding your moving off of autorag...is it only literally autorag that ...

.r20•9/23/25, 5:15 AM

launched vectorize to prod today it works great

.r20•9/23/25, 5:16 AM

its pretty easy to use

..r20 launched vectorize to prod today it works great

fortytwoOP•9/23/25, 5:16 AM

so are you just manually chunking, using cloudflare to generate vectors, inserting them, then searching and using one of cloudflare's llm models for the overall query?

fortytwoOP•9/23/25, 5:16 AM

I.e. all still cloudflare stuff, just not autorag to automatically do some things?

.r20•9/23/25, 5:17 AM

manually chunking, going to openai for vector embeddings, inserting them, and using open router models for agentic capabilities

.r20•9/23/25, 5:17 AM

i don’t use workers ai

fortytwoOP•9/23/25, 5:18 AM

Ok got it, so only using vector db to hold the vectors generated elsewhere, then search it but pass the found data to a model elsewhere for the actual query? I'm hitting cloudflare errors when trying to hit the rag/handle queries and just wondering if it's just autorag that has issues or if I need to move the whole impl to use other services, basically

.r20•9/23/25, 5:19 AM

yea that’s what i use vectorize for, what errors are you getting?

.r20•9/23/25, 5:19 AM

for me all the errors stopped when i went off of autorag

fortytwoOP•9/23/25, 5:19 AM

Stuff like

Workers AI: 9003: unknown internal error

Workers AI: 9003: unknown internal error

for example

.r20•9/23/25, 5:20 AM

wait isn’t that a dns error?

.r20•9/23/25, 5:21 AM

maybe one of the devs or someone else more experienced with that error can help you with that

.r20•9/23/25, 5:21 AM

not entirely sure about that one

fortytwoOP•9/23/25, 5:22 AM

I've seen some others too, trying to dig them up now

.r20•9/23/25, 5:22 AM

maybe stay off of workers ai if u can’t solve it

.r20•9/23/25, 5:22 AM

openrouter has some free rate limited models

fortytwoOP•9/23/25, 5:23 AM

I've also seen timeouts:

Workers AI: WORKERS AI: Operation timed out after 40000 ms

Workers AI: WORKERS AI: Operation timed out after 40000 ms

.r20•9/23/25, 5:23 AM

yeah that one is a workers ai issue that i think the devs are aware of

fortytwoOP•9/23/25, 5:48 AM

@.r20 do you use ai gateway to access the other stuff or just go straight to them?

.r20•9/23/25, 5:50 AM

ai gateway

fortytwoOP•9/23/25, 5:51 AM

cool. thanks for the info

pietrolc.com•9/23/25, 1:08 PM

Hi, is there a way to retrieve the amount of used tokens for each given AI model call?
Many thanks in advance

khoi•9/23/25, 9:33 PM

Has anyone used Smart Turn V2 for voice activity detection via Workers AI?

Seems like Smart Turn is optimized for full audio clips -- does Cloudflare automatically do that for us over the websocket? Or...

Notes on input format

Smart Turn takes 16kHz PCM audio as input. Up to 8 seconds of audio is supported, and we recommend providing the full audio of the user's current turn.

The model is designed to be used in conjunction with a lightweight VAD model such as Silero. Once the VAD model detects silence, run Smart Turn on the entire recording of the user's turn, truncating from the beginning to shorten the audio to around 8 seconds if necessary.

If additional speech is detected from the user before Smart Turn has finished executing, re-run Smart Turn on the entire turn recording, including the new audio, rather than just the new segment. Smart Turn works best when given sufficient context, and is not designed to run on very short audio segments.

Note that audio from previous turns does not need to be included.

ALECKS•9/24/25, 8:19 PM

~~Is there a way to disable the Agent sdk logging of state changes? It seems to output messages like this on every state change:~~

AALECKS ~~ Is there a way to disable the Agent sdk logging of state changes? It seems t...

samjs•9/24/25, 8:25 PM

Hey @ALECKS , there's a dedicated channel #agents for agents SDK that will have more info for you

Hew•9/25/25, 6:04 PM

So in the August blog post announcing the gpt-oss models, it's mentioned that chat/completions endpoint support is "coming soon". It's been almost two months now. Any update on what "soon" means?

khoi•9/25/25, 9:47 PM

Has anyone used the Nova-3 Deepgram STT model with endpointing and interim results on? For me, it doesn't seem like the endpointing does anything?

Jeffrey•9/27/25, 12:15 AM

Anyone using the playwright-mcp worker and happen to know what the correct endpoint is to retrieve screenshots/pdfs? The repo shows image responses in their example, but I've never gotten an actual response back with the screenshot or pdf, just a /tmp/ path that returns nothing when visiting the url directly...

https://github.com/cloudflare/playwright-mcp/tree/main

GitHub

GitHub - cloudflare/playwright-mcp: Playwright MCP fork that works ...

Playwright MCP fork that works with Cloudflare Browser Rendering - cloudflare/playwright-mcp

pietrolc.com•9/27/25, 3:55 PM

Is checking for nsfw worth or are ai workers always checking for that?

What issues have you seen there?

Similar Threads

Similar Threads

Similar Threads