Hey @James , @carter and others. Apologies for the slow progress here, and I understand and share t

Hey @James , @carter and others. Apologies for the slow progress here, and I understand and share the frustration.

We are absolutely working on fixing these issues, and we hope to have an update soon. We've got a little backed up on some work here, but should be getting around to it soon. To give you an idea of the kinds of things we've been working on:

- Improved platform stability and performance: we've recently rolled out a lot of "invisible" changes to make the platform more stable to things like high load and more consistently return successful responses. While there will still be occasions that people will see 3040 errors due to limited capacity, these should be much less often than before
- For the gpt-oss models, we've been waiting for some vLLM changes to land before pushing out more updates. In hindsight we shouldn't have let ourselves be blocked on this, and we're working to address this by seeing if we can build this ourselves internally. That's going to bring streaming, chat/completions support, and support for async requests. Along with a number of bug fixes.
- Support for realtime audio models like deepgram/flux, and websocket support for those.

On the DX front -- all the feedback that gets shared here/in GitHub is absolutely heard internally. e.g. you have @Kevin and I both core member of the Workers AI team checking this regularly.

Ssamjs Hey @James , @carter and others. Apologies for the slow progress here, and I un...

James•10/23/25, 3:33 PM

Thanks Sam, excited for updates. I'd love to see DX around types with model releases a clear focus, so that models aren't added to the platform with an immediately negative DX. Other platforms have types available with great DX on day one or two, especially for platforms they advertise as a "partner", but some models on Cloudflare end up weeks, or even months out of date, if not missing entirely.

samjsOP•10/23/25, 3:37 PM

Agreed. We have a process and tooling problem we need to fix here. On the former, as you say we've been way to slow to update types along with model releases. And on the latter, currently updating types is a somewhat manual process for us (which obviously feeds into the former).

samjsOP•10/23/25, 3:38 PM

Kevin and I have been discussing a bunch how we want to improve it. We'll get better

James•10/23/25, 3:40 PM

Great to hear. Let me know if there's anything I can do to help here!

carter•10/23/25, 3:41 PM

Thanks for the updates Sam! Appreciate the communication.

JJames Great to hear. Let me know if there's anything I can do to help here!

samjsOP•10/23/25, 3:46 PM

Thanks James! Will share with you updated types for feedback when we have them

! Tᐯ-GᗩᗰE.ᗪEᐯ•10/24/25, 10:46 AM

How can I send to a model with mcp server?

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-4-scout-17b-16e-instruct \
  -X POST \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-4-scout-17b-16e-instruct \
  -X POST \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'

!! Tᐯ-GᗩᗰE.ᗪEᐯ How can I send to a model with mcp server? ```curl https://api.cloudflare.com/cl...

! Tᐯ-GᗩᗰE.ᗪEᐯ•10/24/25, 10:47 AM

my server: https://mcp.exa.ai/mcp

Razmjoo•10/25/25, 12:34 PM

Hi guys, any active incidents here? I am getting 408 error on @cf/openai/gpt-oss-120b, status 408;

{
  "error": {
    "code": "invalid_prompt",
    "message": "AiError: AiError: Request timeout (a613c7db-647c-4c07-ba94-a5702bc99d52)"
  }
}

{
  "error": {
    "code": "invalid_prompt",
    "message": "AiError: AiError: Request timeout (a613c7db-647c-4c07-ba94-a5702bc99d52)"
  }
}

Razmjoo•10/25/25, 12:42 PM

@sunbuhui do I click on general support or bug report? unsure which team is it?

Razmjoo•10/25/25, 12:46 PM

@sunbuhui could you please point it out;

Razmjoo•10/25/25, 12:48 PM

done

Razmjoo•10/25/25, 1:00 PM

Seriously, dude? @sunbuhui You're trying to scam people for their wallets in the Cloudflare channel full of security experts during debugging an error?

Razmjoo•10/25/25, 1:01 PM

@samjs would you please advise, and also ban the scammer user?

RRazmjoo Seriously, dude? @sunbuhui You're trying to scam people for their wallets in th...

zegevlier•10/25/25, 1:03 PM

Thank you for the report, I have banned the user. For the next time, you can ping the green role (community champions) for moderation issues like this

zegevlier•10/25/25, 1:04 PM

And just to confirm, yes, this is a scam. Cloudflare will never ask you to join another discord server for support

Razmjoo•10/25/25, 1:04 PM

he did zero efford; asked directly for 'wallet info' what an impatent scammer

samjsOP•10/25/25, 2:45 PM

Hey @Razmjoo , thanks for reporting! And thank you Zegevlier for jumping on it.

With respect to the timeouts: I'm not seeing any indications of an outage. I can see the spike in inference times that would have caused the timeouts, but there's no correlation with location/server or anything like that to suggest there's a bug. And no other accounts were experiencing any spikes. Were the inputs particularly long? E.g. using reasoning high as well?

Razmjoo•10/25/25, 6:32 PM

@samjs input were a bit long (maybe around 50,60 kb) but the reasoning was default;

Razmjoo•10/25/25, 6:33 PM

i mean total message (system + user + json schema) were in total around 50-60kb

Razmjoo•10/25/25, 7:10 PM

I have reduced them to 19kb but still the same issue.

Iñaki Fuentes•10/27/25, 8:31 AM

Hey team, anyone know what’s going on here? Looks like the counters reset and show 10k neurons available

but I’m still getting an API error.

Is this expected or is something borked? Lmk if you’ve seen this too.

  "error": {
        "message": "4006: you have used up your daily free allocation of 10,000 neurons, please upgrade to Cloudflare's Workers Paid plan if you would like to continue usage.",
        "stage": "pipeline",
        "code": "internal_error"
    },

    "timestamp": "2025-10-27T08:26:51.928Z"

  "error": {
        "message": "4006: you have used up your daily free allocation of 10,000 neurons, please upgrade to Cloudflare's Workers Paid plan if you would like to continue usage.",
        "stage": "pipeline",
        "code": "internal_error"
    },

    "timestamp": "2025-10-27T08:26:51.928Z"

IIñaki Fuentes Hey team, anyone know what’s going on here? Looks like the counters reset and sh...

Iñaki Fuentes•10/27/25, 9:04 AM

To pay is always a solution, solved...

MichaelC•10/27/25, 12:26 PM

Are there changes ongoing with the llama 4 model? The latency has gone through the roof for about a week now and it is very unstable. Can see some data here as well https://delay.chaika.me/ai/@cf/meta/llama-4-scout-17b-16e-instruct.

ItsAsheer•10/28/25, 7:19 AM

Hi, so I've been trying to connect LiteLLM (LLM proxy) to Cloudflare and when i try to send a message i get an error (check below), doing this with other services doesnt give me this error so i believe its Cloudflare.

Error code: 400 - {'success': False, 'errors': [{'code': 7000, 'message': 'No route for that URI'}], 'messages': [], 'result': None}

Error code: 400 - {'success': False, 'errors': [{'code': 7000, 'message': 'No route for that URI'}], 'messages': [], 'result': None}

Base Url: https ://api.cloudflare.com/client/v4/accounts/{my account id}/ai/v1

(the space between https and :// is on purpose to not format it like a link)

si.kiskre.•10/28/25, 9:12 AM

worker paid vs. free
isn't 300000 per month equals to 10000 per day ??

zegevlier•10/28/25, 12:12 PM

Pretty much, yea. The difference is that with the 300k/month, you can (I think) use all of them up in a single day, then nothing for the rest of the month. With 10k/day, you cannot use any more than those 10k that day

si.kiskre.•10/28/25, 12:40 PM

yup

si.kiskre.•10/28/25, 12:40 PM

idk

David Raphi•10/28/25, 11:07 PM

Looking for input from folks testing Cloudflare Workers AI!
I’m using llama-3.1-8b-instruct-fast (free tier) — works fine until prompts go past ~9K tokens, then it starts ignoring system instructions and hallucinating (even though it’s supposed to support 128K context).
Anyone found free-tier models on Cloudflare that handle large contexts more reliably, or just work best for chat systems?
I’m testing a bunch and trying to build a list of the top free-tier models — any pointers would be awesome!

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

Razmjoo•10/28/25, 11:19 PM

I have the same issue with gpt-oss-120b: it doesn't follow the JSON format from time to time. I am unsure whether these models are quantized. or something else is going on

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

David Raphi•10/28/25, 11:28 PM

@Ahmad Awais @John Spurlock @nora @samjs

DDavid Raphi @Ahmad Awais @John Spurlock @nora @samjs

Isaac McFadyen•10/29/25, 12:52 AM

?pings

SuperHelpflare•10/29/25, 12:52 AM

Please do not ping community members for non-moderation reasons. Doing so will not solve your issue faster and will make people less likely to want to help you.

muhtasim•10/29/25, 9:03 AM

Hello
I am getting this error from my Worker AI suddenly, the system was working relatively good few days ago

{
  "httpCode": 408,
  "internalCode": 3046,
  "message": "AiError: AiError: Request timeout (a22d7817-d6c4-42a8-b5fb-1693703a1845)",
  "name": "AiError",
  "skipSentry": true,
  "description": "Request timeout"
}

{
  "httpCode": 408,
  "internalCode": 3046,
  "message": "AiError: AiError: Request timeout (a22d7817-d6c4-42a8-b5fb-1693703a1845)",
  "name": "AiError",
  "skipSentry": true,
  "description": "Request timeout"
}

what could be the reason? I don't find the internal code 30463046 can it happen that I am using max tokens? Because I am using base64 images

I am using this model

@cf/google/gemma-3-12b-it

@cf/google/gemma-3-12b-it

Can anyone help me please? any suggestions are welcome

DragoDiKomo•10/29/25, 3:58 PM

Sometimes I'm getting this error: Error: error code: 1031

While trying to use AI via a opennext worker

zsueo•10/29/25, 7:14 PM

Hey Ive noticed a fairly large uptick in the rate I have been getting "AiError: Capacity temporarily exceeded, please try again. " is cloudflare workers ai stable enough to be relied on or will this issue persist into the future?

zsueo•10/29/25, 7:16 PM

I mean right now I am getting 500 errors from workers ai through the cloudflare rest api

Celestial Rose•10/29/25, 10:29 PM

hey there !

trying to use MeloTTS in French but I get this error, the documentation does mention French (and MeloTTS actually should support way more languages btw)
any guesses ?

Capture_decran_2025-10-30_a_00.28.14.png

Capture_decran_2025-10-30_a_00.28.40.png

CCelestial Rose hey there ! 🙂 trying to use MeloTTS in French but I get this error, the documen...

Chaika•10/30/25, 12:20 AM

last mentioned they said they only support en workers-ai

CChaika last mentioned they said they only support en https://discord.com/channels/59531...

Celestial Rose•10/30/25, 6:52 AM

thanks! usually have the reflex on looking up first but forgot !
alrighty - but still its either proposing other languages or fixing the doc right ? how can i contribute ?

IIsaac McFadyen ?pings

David Raphi•10/30/25, 7:07 AM

I was surprised to see that this discord channel has an overall message of below 20 for a day. Great news, might be Cloudflare is perfect and has no issues, or might be people don't have any hope finding any solutions here. I handle a discord channel where overall daily message is above 50, and I find it ok to handle. If one ping was disturbing you, I am really sorry for that, and thank you for the help. It was really helpful getting nothing.

DDavid Raphi Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-...

Ashkan•10/30/25, 8:51 AM

Maybe this is your answer: "Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time" https://developers.cloudflare.com/workers-ai/features/batch-api/

Cloudflare Docs

Asynchronous Batch API

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

KZ•10/30/25, 3:24 PM

Which cloudflare group has the ball on web-bot-auth specifically the http message signature (9421) is that this group or agents?

Fra3957•10/31/25, 8:59 AM

Hi everyone — I have a question, and I’m not sure if this is the right place to ask. I’m trying to understand which Cloudflare product would be the best fit for my use case.

I’m building an application that, when a user clicks a button, needs to trigger an external process that runs independently from the app itself.
In my view, this process should follow three main steps:
1. Fetch data from the database
2. Call an LLM many times in parallel, maximizing throughput within a limited execution time
3. Write the results back to the database

This entire operation should happen outside the main application, which will later display the updated data once the process is completed. The goal is to run this background workflow efficiently and reliably on Cloudflare’s infrastructure.

I’m considering different ways to orchestrate these jobs:

(A) The application inserts items into a queue on Supabase and then triggers a process that reads and processes the queue.

(B) The queue itself automatically triggers the processing when new messages arrive, though I’m concerned this might rely on polling, which doesn’t seem the most elegant or efficient approach.

(C) The application sends the data directly for processing at the time of activation, without using any intermediate queue.

Each user can initiate this process multiple times in parallel, so concurrency, execution duration, and timeout management are important considerations.

Question: Based on these requirements, which Cloudflare products or combination of products would be best suited to implement this architecture?

If this isn’t the right place to ask — or if Workers / Workers AI aren’t the right tools for this kind of problem — could you please point me to the best channel or community to ask this question?
Thanks a lot!

ChromaToad•10/31/25, 1:41 PM

Are there any plans to make the toMarkdown functionality more customizable? Like being able to tell the llm what specifically you're looking for in its summary conversion to markdown?

CChromaToad Are there any plans to make the toMarkdown functionality more customizable? Like...

Celestial Rose•10/31/25, 1:50 PM

Agreed! I also found that the images within a pdf dont get OCR'd. They're being "described" like a caption, but not analyzed as text. It should

Sytrex•11/1/25, 8:14 PM

i have a problem with the tunnels? here is a picture

Hey @James , @carter and others. Apologies for the slow progress here, and I understand and share t

Similar Threads