Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-8b-instruct-fast (fr

Looking for input from folks testing Cloudflare Workers AI!
I’m using llama-3.1-8b-instruct-fast (free tier) — works fine until prompts go past ~9K tokens, then it starts ignoring system instructions and hallucinating (even though it’s supposed to support 128K context).
Anyone found free-tier models on Cloudflare that handle large contexts more reliably, or just work best for chat systems?
I’m testing a bunch and trying to build a list of the top free-tier models — any pointers would be awesome!

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

Razmjoo•10/28/25, 11:19 PM

I have the same issue with gpt-oss-120b: it doesn't follow the JSON format from time to time. I am unsure whether these models are quantized. or something else is going on

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

David RaphiOP•10/28/25, 11:28 PM

@Ahmad Awais @John Spurlock @nora @samjs

DDavid Raphi @Ahmad Awais @John Spurlock @nora @samjs

Isaac McFadyen•10/29/25, 12:52 AM

?pings

SuperHelpflare•10/29/25, 12:52 AM

Please do not ping community members for non-moderation reasons. Doing so will not solve your issue faster and will make people less likely to want to help you.

muhtasim•10/29/25, 9:03 AM

Hello
I am getting this error from my Worker AI suddenly, the system was working relatively good few days ago

{
  "httpCode": 408,
  "internalCode": 3046,
  "message": "AiError: AiError: Request timeout (a22d7817-d6c4-42a8-b5fb-1693703a1845)",
  "name": "AiError",
  "skipSentry": true,
  "description": "Request timeout"
}

{
  "httpCode": 408,
  "internalCode": 3046,
  "message": "AiError: AiError: Request timeout (a22d7817-d6c4-42a8-b5fb-1693703a1845)",
  "name": "AiError",
  "skipSentry": true,
  "description": "Request timeout"
}

what could be the reason? I don't find the internal code 30463046 can it happen that I am using max tokens? Because I am using base64 images

I am using this model @cf/google/gemma-3-12b-it@cf/google/gemma-3-12b-it

Can anyone help me please? any suggestions are welcome

DragoDiKomo•10/29/25, 3:58 PM

Sometimes I'm getting this error: Error: error code: 1031

While trying to use AI via a opennext worker

zsueo•10/29/25, 7:14 PM

Hey Ive noticed a fairly large uptick in the rate I have been getting "AiError: Capacity temporarily exceeded, please try again. " is cloudflare workers ai stable enough to be relied on or will this issue persist into the future?

zsueo•10/29/25, 7:16 PM

I mean right now I am getting 500 errors from workers ai through the cloudflare rest api

Celestial Rose•10/29/25, 10:29 PM

hey there !

trying to use MeloTTS in French but I get this error, the documentation does mention French (and MeloTTS actually should support way more languages btw)
any guesses ?

Capture_decran_2025-10-30_a_00.28.14.png

Capture_decran_2025-10-30_a_00.28.40.png

CCelestial Rose hey there ! 🙂 trying to use MeloTTS in French but I get this error, the documen...

Chaika•10/30/25, 12:20 AM

last mentioned they said they only support en workers-ai

CChaika last mentioned they said they only support en https://discord.com/channels/59531...

Celestial Rose•10/30/25, 6:52 AM

thanks! usually have the reflex on looking up first but forgot !
alrighty - but still its either proposing other languages or fixing the doc right ? how can i contribute ?

IIsaac McFadyen ?pings

David RaphiOP•10/30/25, 7:07 AM

I was surprised to see that this discord channel has an overall message of below 20 for a day. Great news, might be Cloudflare is perfect and has no issues, or might be people don't have any hope finding any solutions here. I handle a discord channel where overall daily message is above 50, and I find it ok to handle. If one ping was disturbing you, I am really sorry for that, and thank you for the help. It was really helpful getting nothing.

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

Ashkan•10/30/25, 8:51 AM

Maybe this is your answer: "Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time" https://developers.cloudflare.com/workers-ai/features/batch-api/

Cloudflare Docs

Asynchronous Batch API

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

KZ•10/30/25, 3:24 PM

Which cloudflare group has the ball on web-bot-auth specifically the http message signature (9421) is that this group or agents?

Fra3957•10/31/25, 8:59 AM

Hi everyone — I have a question, and I’m not sure if this is the right place to ask. I’m trying to understand which Cloudflare product would be the best fit for my use case.

I’m building an application that, when a user clicks a button, needs to trigger an external process that runs independently from the app itself.
In my view, this process should follow three main steps:
1. Fetch data from the database
2. Call an LLM many times in parallel, maximizing throughput within a limited execution time
3. Write the results back to the database

This entire operation should happen outside the main application, which will later display the updated data once the process is completed. The goal is to run this background workflow efficiently and reliably on Cloudflare’s infrastructure.

I’m considering different ways to orchestrate these jobs:

(A) The application inserts items into a queue on Supabase and then triggers a process that reads and processes the queue.

(B) The queue itself automatically triggers the processing when new messages arrive, though I’m concerned this might rely on polling, which doesn’t seem the most elegant or efficient approach.

(C) The application sends the data directly for processing at the time of activation, without using any intermediate queue.

Each user can initiate this process multiple times in parallel, so concurrency, execution duration, and timeout management are important considerations.

Question: Based on these requirements, which Cloudflare products or combination of products would be best suited to implement this architecture?

If this isn’t the right place to ask — or if Workers / Workers AI aren’t the right tools for this kind of problem — could you please point me to the best channel or community to ask this question?
Thanks a lot!

ChromaToad•10/31/25, 1:41 PM

Are there any plans to make the toMarkdown functionality more customizable? Like being able to tell the llm what specifically you're looking for in its summary conversion to markdown?

CChromaToad Are there any plans to make the toMarkdown functionality more customizable? Like...

Celestial Rose•10/31/25, 1:50 PM

Agreed! I also found that the images within a pdf dont get OCR'd. They're being "described" like a caption, but not analyzed as text. It should

Sytrex•11/1/25, 8:14 PM

i have a problem with the tunnels? here is a picture

achesui•11/2/25, 3:33 AM

Hello, does anyone knows why when connecting to deepgram nova-3 with websockets im getting billed (neurons) by the duration of the websocket connection but not for the input audio?, I made some tests connecting to the websocket with NO mic, NO audio but still cloudflare bills only for connecting to deepgram every minute.. im confused, its supposed to bill depending on the usage (audio input - 0.0092 per minute), maybe i miss something?

samjs•11/2/25, 3:11 PM

Hey @achesui. That's correct -- we treat the entire websocket connection as if you were continuously sending audio and so the usage applies to the connection. When you create the websocket connection it's connecting to the model/GPU and allocating that capacity for you so that it can respond in realtime with low latency. We're not currently doing anything like reallocating the capacity if the websocket connection is idle.

First of all, we can make the docs clearer on that. Out of curiosity though, do you have a use case for creating the websocket a while before receiving any data?

bigpoppaenzo•11/2/25, 3:18 PM

Good morning, I am freshly new to the disc and wanted to see if I could get some help with an issue i’m having. So I am able to log into my banking app/website on every other device but mine. I have followed the troubleshooting process that Cloudfare has and still no success with logging into it on my iphone. Does anyone have the same problem and if so how did you fix it?

Bbigpoppaenzo Good morning, I am freshly new to the disc and wanted to see if I could get some...

samjs•11/2/25, 5:04 PM

Hey @bigpoppaenzo -- I think you may want the #

turnstile channel. I see a few people there discussing a similar issue.

yinxingmaiming6409•11/3/25, 2:39 AM

Does AI seem to have no streaming transmission method? Can't it be used in Cherry Studio?

Tim•11/3/25, 4:50 AM

Hello, I'm using Deepgram flux, but I'm haivng an issue where I connect to the websocket - but I'm not getting the open event triggering or receiving a "connected" message frame in the message listener.

The odd thing is that this seems intermittant. Every couple of minute or so, when I establish the connection, it works. However, when I try again through a new connection, the websocket is failing. Context, I have a phone AI system. When I establish the call, I connect to the various systems we use.

Ideally, I'd like to use flux via CF. However, the connection seems intermittent. The websocket connection is established for every call that comes through. I close the websocket upon ending the call.

But what'll happen is that I'll establish the connection, sometimes, I get logs telling me that the websocket opened. Other times, I'm not getting any feedback, but it'll finally connect after a second or 2 instead of it being somewhat instant.

But then, I'll try yet another call, and this call simply won't establish the connection. No errors, that I'm seeing, the on open listener, I get nothing in the logs for it.

And the "connected" message frame flux I believe sends, doesn't send. Is there something I might be doing wrong?

Tim•11/3/25, 4:51 AM

I have noticed that the log item I have confirming that when I've successfully closed the connection, this seems to take it's time sometimes.

Tim•11/4/25, 9:52 PM

Hello, do you have any intention to offer text to speech services as a websocket / other formats apart from mpeg?

TTim Hello, do you have any intention to offer text to speech services as a websocket...

samjs•11/4/25, 9:57 PM

I believe aura-2 supports websockets: https://developers.cloudflare.com/workers-ai/models/aura-2-en/

CCelestial Rose Agreed! I also found that the images within a pdf dont get OCR'd. They're being ...

Celestial Rose•11/4/25, 10:00 PM

Any update ?

Tim•11/4/25, 10:02 PM

I tried aura-2 yesterday for websockets, using the websocket: true option, but it didn't return a websocket

Tim•11/4/25, 10:07 PM

I'll try again and see that I'm not crazy

yinxingmaiming6409•11/5/25, 5:32 AM

It seems that many interfaces are not fully compatible with OpenAI

yinxingmaiming6409•11/5/25, 5:32 AM

Hope to integrate into the qwen3-vl model

Aiden•11/5/25, 8:29 PM

Anyone else getting AiError 9002 today? Seems to be a service issue

Aiden•11/5/25, 11:04 PM

Looks like specifically editing vector DBs where we get the 9002 internal error

DRPower•11/6/25, 3:25 AM

Hello everyone, I am trying to figure out how to fix the following deploy error from Cloudflare VibeSDK, I keep getting the following error "▲ [WARNING] You are about to publish a Workers Service that was last published via the Cloudflare Dashboard.
10:18:17.676
10:18:17.676 Edits that have been made via the dashboard will be overridden by your local code and config.
10:18:17.676
10:18:17.676
10:18:17.676 ✘ [ERROR] A request to the Cloudflare API (/accounts/44488d79973a81689876492e372fe199/workers/scripts/october-vibesdk) failed.
10:18:17.677
10:18:17.677 Cannot apply new-class migration to class 'DORateLimitStore' that is already depended on by existing Durable Objects [code: 10074]
10:18:17.677
10:18:17.677 If you think this is a bug, please open an issue at: https://github.com/cloudflare/workers-sdk/issues/new/choose
10:18:17.678
10:18:17.678
10:18:17.678

Logs were written to "/opt/buildhome/.config/.wrangler/logs/wrangler-2025-11-05_15-18-11_223.log"
10:18:17.678
10:18:17.681 error: script "deploy" exited with code 1
10:18:17.684 Failed: error occurred while running deploy command"

GitHub

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

zsueo•11/6/25, 12:51 PM

Are there any plans for open sourcing Omni? I would love to contribute if that is possible

Celestial Rose•11/6/25, 3:56 PM

https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/ hey! for llama4 type here could be

text

text

or image_urlimage_url but the type is simply

string

string

, could be more insightful with a union type

Capture_decran_2025-11-06_a_17.55.58.png

Cloudflare Docs

llama-4-scout-17b-16e-instruct

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Celestial Rose•11/6/25, 4:31 PM

Experimenting with llama-4-scout in Workers AI, the model stops abruptly almost 100% of the time and doesn't finish its reply in a single message chat using env.AI.
Do we know if its a momentary issue ? Anyone experienced that ?

CCelestial Rose Experimenting with llama-4-scout in Workers AI, the model stops abruptly almost ...

Celestial Rose•11/6/25, 4:32 PM

both in stream mode and not. stream sends a DONE before the end of the chat, its not the worker failing

itspauv•11/6/25, 7:18 PM

hey, any chance we will get the new DeepSeek OCR model? (or any other OCR model for that matter)

yinxingmaiming6409•11/8/25, 1:22 PM

Does anyone know? When can he support qwen3-vl

Jadu•11/8/25, 5:12 PM

@SuperHelpflare can you help em with an ai diccument writer i am getting error in it

Jadu•11/8/25, 5:13 PM

@SuperHelpflare

Jadu•11/8/25, 5:13 PM

why isnt it working

Celestial Rose•11/8/25, 9:39 PM

Opened a PR for Llama4 images to work for the AI SDK https://github.com/cloudflare/ai/pull/303
This follows the documentation found herehttps://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/

GitHub

Add support for base64 images in chat messages by celestial-rose ·...

Handle base64 image data for Llama 4 in chat messages.
As per the documentation found here https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/

Cloudflare Docs

llama-4-scout-17b-16e-instruct

CCelestial Rose Opened a PR for Llama4 images to work for the AI SDK https://github.com/cloudfla...

Celestial Rose•11/8/25, 9:55 PM

forgot i had to also change the "text" case so it outputs an object for it to work. put it to draft; wondering how to adapt without breaking anything. the inputs needs to have "type": "text""type": "text"

Andrew Hansen•11/9/25, 3:53 AM

Is there a plan to make GPU’s available to host custom models on someday? We have a custom model we are integrating currently

Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-8b-instruct-fast (fr

Similar Threads

Similar Threads

Similar Threads