Looking for input from folks testing Cloudflare Workers AI! I’m using llama-3.1-8b-instruct-fast (fr
Looking for input from folks testing Cloudflare Workers AI!
I’m using llama-3.1-8b-instruct-fast (free tier) — works fine until prompts go past ~9K tokens, then it starts ignoring system instructions and hallucinating (even though it’s supposed to support 128K context).
Anyone found free-tier models on Cloudflare that handle large contexts more reliably, or just work best for chat systems?
I’m testing a bunch and trying to build a list of the top free-tier models — any pointers would be awesome!
I’m using llama-3.1-8b-instruct-fast (free tier) — works fine until prompts go past ~9K tokens, then it starts ignoring system instructions and hallucinating (even though it’s supposed to support 128K context).
Anyone found free-tier models on Cloudflare that handle large contexts more reliably, or just work best for chat systems?
I’m testing a bunch and trying to build a list of the top free-tier models — any pointers would be awesome!









