Hi Cloudflare team, I've been working with AutoRAG and it's a great feature for building RAG syste

Hi Cloudflare team,

I've been working with AutoRAG and it's a great feature for building RAG systems. However, I notice it's currently limited to models available in Workers AI.
Is there any plan to support custom model providers and custom embedding providers in AutoRAG?

Specifically:
Custom LLM providers (OpenAI, Claude, ETC)
Custom embedding providers (OpenAI embeddings, specialized models)
Bring-your-own-model support

This would enable using specialized embeddings for specific domains, cost optimization across providers, and better architectural flexibility for enterprise use cases.
Is this being considered for the roadmap, or is there currently an undocumented way to achieve this?

Thanks for the great work on Workers AI.

�𝗞enneth Hi Cloudflare team, I've been working with AutoRAG and it's a great feature fo...

Chaika•9/18/25, 8:39 PM

I would ask in #autorag , it's a bit weird to ask in the workers-ai channel about not using workers-ai lol

CChaika I would ask in #autorag , it's a bit weird to ask in the workers-ai channel abou...

samjs•9/18/25, 9:00 PM

Haha, that's okay we have thick skin

Does sound like a reasonable use case, but indeed you'll get a better response in the #autorag channel!

�

𝗞ennethOP•9/18/25, 9:20 PM

@samjs Sorry for posting in the wrong channel! I meant to ask this in #autorag. Thanks for pointing that out.

achesui•9/18/25, 11:38 PM

Hello, I'm not being able to make the new model nova-3nova-3 work with language spanish websocket via REST API, i'm testing this with the repository: https://github.com/cloudflare/realtime-examples/tree/main/ai-tts-stt when setting the language to 'es' or 'es-419' it gives me a 500 error: Failed to establish Nova WebSocket: 500 while the websocket keeps running, it works for english only, can anyone help me?

GitHub

realtime-examples/ai-tts-stt at main · cloudflare/realtime-examples

Contribute to cloudflare/realtime-examples development by creating an account on GitHub.

Ffortytwo Ah I missed that space. Thanks!

.r20•9/19/25, 7:56 PM

yeah itll just index the entire bucket not just the folder

..r20 yeah itll just index the entire bucket not just the folder

fortytwo•9/19/25, 8:11 PM

I did notice you can filter on directories. But not sure when multiple rag source buckets vs one bucket + multiple dirs + dir filters would be best

.r20•9/19/25, 8:40 PM

idt it matters id just do the directory that’s easiest

.r20•9/19/25, 8:40 PM

but i think autorag has some fundamental issues they need to fix so i would rather use vectorize directly its better

fortytwo•9/19/25, 8:46 PM

What issues have you seen there?

engnadeau•9/19/25, 8:51 PM

error of the day: error: '4002: could not route request to AI model', error: '4002: could not route request to AI model', with model: '@cf/baai/bge-m3', model: '@cf/baai/bge-m3',

is this also due to high demand?

engnadeau•9/19/25, 9:07 PM

is Workers AI considered production ready?

James•9/19/25, 9:42 PM

https://www.cloudflarestatus.com/incidents/k9wl5k2hb13m

Workers AI experiencing degraded availability in some models

Ffortytwo What issues have you seen there?

.r20•9/19/25, 9:48 PM

workers ai is still a few iterations away from production ready i cant launch it to my customers in the state its in rn (although they are working on it p fast), and indexing isnt super reliable rn imo (partially due to workers ai and then other generic errors it throws)

.r20•9/19/25, 9:49 PM

for prototyping its fine but vectorize is very good and i still recommend that and not a lot more work than autorag

fortytwo•9/19/25, 9:49 PM

Yeah I did have a bunch of indexing errors when I pushed stuff, had to redo it 5 or 6 times to get all of them done

.r20•9/19/25, 9:49 PM

yeah the problem is my customers update the rag extremely frequently so i cant rely on their indexing, but like i said its still good for prototyping

fortytwo•9/19/25, 9:55 PM

Ok, well that's good to know. So you're using vectorize "manually", basically? Still storing data in a cloudflare bucket but generating and searching the vectors yourself rather than the autorag apis?

.r20•9/19/25, 10:38 PM

i store data in d1 and yeah i just make the embeddings through openai and insert into the vectorize

.r20•9/19/25, 10:38 PM

and then searching isnt that hard either

.r20•9/19/25, 10:39 PM

u can vibe code most of this stuff anyway through cloudflare's mcp server and whatever ai powered ide u use

ray•9/21/25, 4:23 AM

Is any one else getting issues with tool calls? I can only seem to get proper tool call responses via the ai sdk when I use mistral-small-3.1-24b-instruct

ray•9/21/25, 4:24 AM

All the other tool calling models either return a text type with tool params json stringified or error out

Ashkan•9/21/25, 9:53 AM

Hello, when I want to poll batch request I encounter this error: Error: 3030: 1 validation error for VllmBatchRequestError: 3030: 1 validation error for VllmBatchRequest does anyone know what doesn't mean and how to resolve it?

Ashkan•9/21/25, 9:56 AM

Gateway's logs looks hanging. it took tens of minutes but no change:

hellc•9/22/25, 1:21 PM

seems like bge-m3 over workers-ai is lagging again. Same issue from friday's evening...

.venv/lib/python3.10/site-packages/llama_index/embeddings/cloudflare_workersai/base.py", line 118, in _aget_text_embeddings
    return resp["result"]["data"]
KeyError: 'data'

.venv/lib/python3.10/site-packages/llama_index/embeddings/cloudflare_workersai/base.py", line 118, in _aget_text_embeddings
    return resp["result"]["data"]
KeyError: 'data'

Leo•9/22/25, 3:34 PM

Do you have an automated article-writing bot for your website?

blindChicken•9/22/25, 4:28 PM

I'm having an issue with workersAI, it is returning all kinds of strange errors with little information of what the problem may be. I have gotten: InferenceUpstreamError: error code: 1031InferenceUpstreamError: error code: 1031 and now I'm getting InferenceUpstreamError: <!DOCTYPE html>...InferenceUpstreamError: <!DOCTYPE html>... where it is returning CF html error page, with 500 error. I've tried using my production worker, and then in dev with --remote and dev locally. In all cases it returns some variation of the above but all fail. The code worked fine yesterday and hasn't changed. It fails at the call "@cf/qwen/qwen2.5-coder-32b-instruct""@cf/qwen/qwen2.5-coder-32b-instruct". All other requests appear to work.

BblindChicken I'm having an issue with workersAI, it is returning all kinds of strange errors ...

blindChicken•9/22/25, 6:02 PM

It appears that this error is being caused by having remote = trueremote = true included in my

wrangler.toml

wrangler.toml

I had it set in both queuesqueues and r2_bucketsr2_buckets, but oddly not in aiai. Removing them fixed the problem.

BblindChicken It appears that this error is being caused by having `remote = true` included in...

blindChicken•9/22/25, 6:16 PM

Removing remote = trueremote = true solved the problem in dev but not in production. I redeployed worker, but the request still hangs. Nothing in logs...

fortytwo•9/23/25, 5:14 AM

@.r20 regarding your moving off of autorag...is it only literally autorag that is still in beta? Did you try using all cloudflare services but doing it more "manually", did that work ok? Or have to use thirdparty stuff to get something prod-ready?

Ffortytwo @.r20 regarding your moving off of autorag...is it only literally autorag that ...

.r20•9/23/25, 5:15 AM

launched vectorize to prod today it works great

.r20•9/23/25, 5:16 AM

its pretty easy to use

..r20 launched vectorize to prod today it works great

fortytwo•9/23/25, 5:16 AM

so are you just manually chunking, using cloudflare to generate vectors, inserting them, then searching and using one of cloudflare's llm models for the overall query?

fortytwo•9/23/25, 5:16 AM

I.e. all still cloudflare stuff, just not autorag to automatically do some things?

.r20•9/23/25, 5:17 AM

manually chunking, going to openai for vector embeddings, inserting them, and using open router models for agentic capabilities

.r20•9/23/25, 5:17 AM

i don’t use workers ai

fortytwo•9/23/25, 5:18 AM

Ok got it, so only using vector db to hold the vectors generated elsewhere, then search it but pass the found data to a model elsewhere for the actual query? I'm hitting cloudflare errors when trying to hit the rag/handle queries and just wondering if it's just autorag that has issues or if I need to move the whole impl to use other services, basically