Hello! I was wonder if some features were on the roadmap, and if not, I'd like to request them! I th

Hello! I was wonder if some features were on the roadmap, and if not, I'd like to request them! I think that enforcing JSON output like Ollama and Llama.cpp do would be a great feature for devs who want to get structured data output consistently for parsing. Ollama lets you use

format: "json"

format: "json"

in your request and it handles applying the grammar for you. See here for an example: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

Another features I'd like to see is the option to return token counts along with the response. Its hard to judge what my users are using with no token counts, and I don't really want to have to add a step in order to calculate them.

GitHub

ollama/docs/api.md at main · ollama/ollama

Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/ollama

Raylight•3/21/24, 11:22 AM

Is the

strength

strength

parameter for

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img

clamped and/or quantized in some weird way? Strength 0.095 generates an output that's close to identical to the source (albeit degraded), while 0.095001 generates a noisy version of the source. Strength 0.0 seems to produce as wildly different output as if it was set to 1.0. (Values outside the range 0.0 to 1.0 throws InferenceUpstreamError.)

Raylight•3/21/24, 11:37 AM

While I'm at it, looks like

@cf/runwayml/stable-diffusion-v1-5-inpainting

@cf/runwayml/stable-diffusion-v1-5-inpainting

throws InferenceUpstreamError on strength values below 0.05 (except 0). Haven't tested this on version 1.1.0, though.

Cchand1012 Hello! I was wonder if some features were on the roadmap, and if not, I'd like t...

acusti•3/22/24, 2:53 AM

@chand1012 i had this exact need (the

format: 'json'

format: 'json'

option), and i couldn’t find a satisfactory solution out there, so i made a

parseAsJSON

parseAsJSON

util that takes a string and parses it as JSON even if it has syntax errors or too many closing curly braces, or not enough curly braces, or overly escaped characters, or unescaped characters, or pre- and post-ambles, or…

it’s a JS (typescript) package available on NPM called

@acusti/parsing

@acusti/parsing

. here’s the readme: https://github.com/acusti/uikit/tree/main/packages/parsing

i also implemented some simple few-shot prompting modelled on what’s documented in this guide to get the LLM to return JSON: https://www.pinecone.io/learn/llama-2/#Llama-2-Chat-Prompt-Structure

GitHub

uikit/packages/parsing at main · acusti/uikit

UI toolkit monorepo containing a React component library, UI utilities, a drag-and-drop library, and more - acusti/uikit

Llama 2: AI Developers Handbook | Pinecone

Llama 2 is the latest Large Language Model (LLM) from Meta AI. It has been released as an open-access model, enabling unrestricted access to corporations and open-source hackers alike. Here we learn how to use it with Hugging Face, LangChain, and as a conversational agent.

TheMightyPenguin•3/22/24, 4:05 AM

Hi all, I am trying to use cloudflare AI locally, running

wrangler dev

wrangler dev

, and I get the following error when I try using the

'@cf/facebook/bart-large-cnn'

'@cf/facebook/bart-large-cnn'

model:

Uncaught (async) Error: InferenceUpstreamError: {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}

Uncaught (async) Error: InferenceUpstreamError: {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}

If I run

wrangler dev --remote

wrangler dev --remote

, it does work, but I cannot use remote mode as I also use Cloudflare Queues in my app together with workers AI, and queues do not work in remote mode.

Any ideas what might be wrong? I am using

"wrangler": "^3.36.0"

"wrangler": "^3.36.0"

and

"@cloudflare/ai": "^1.1.0"

"@cloudflare/ai": "^1.1.0"

acusti•3/22/24, 3:02 PM

@TheMightyPenguin whenever i see

Authentication error

Authentication error

i run

wrangler login

wrangler login

. it surprises me how often i have to re-authenticate when doing local dev (once every few days). i’ve speculated that it might have to do with how often i’m connecting to a different wifi network and changing my IP address, but that might be totally wrong. regardless, re-authing has always resolved it for me.

Aacusti @TheMightyPenguin whenever i see `Authentication error` i run `wrangler login`. ...

TheMightyPenguin•3/22/24, 5:28 PM

that did it, thank you! (I also had to upgrade to the latest version of wrangler, was a few minors behind and wrangler login was showing an error)

CCraig Dennis I'm working on that...unlaunched yet, but let me know if this is along the lines...

fakebrin•3/24/24, 7:40 PM

I seem to get Internal Server Error regardless of the models used. Your (?) public deployment works though

Ffakebrin I seem to get Internal Server Error regardless of the models used. Your (?) publ...

fakebrin•3/24/24, 8:06 PM

I get these logs:

POST https://***.pages.dev/api/chat - Ok @ 3/24/2024, 9:34:08 PM
  (error) Error: Ai binding is undefined. Please provide a valid binding.

POST https://***.pages.dev/api/chat - Ok @ 3/24/2024, 9:34:08 PM
  (error) Error: Ai binding is undefined. Please provide a valid binding.

Ffakebrin I get these logs: ``` POST https://***.pages.dev/api/chat - Ok @ 3/24/2024, 9:34...

Mangum•3/25/24, 2:54 AM

To bind Workers AI to your worker, add the following to the end of your wrangler.toml file:

[ai]
binding = "AI" #available in your worker via env.AI
source: https://blog.cloudflare.com/workers-ai/

The Cloudflare Blog

Workers AI: serverless GPU-powered inference on Cloudflare’s global...

We are excited to launch Workers AI - an AI inference as a service platform, empowering developers to run AI models with just a few lines of code, all powered by our global network of GPUs

MMangum To bind Workers AI to your worker, add the following to the end of your wrangler...

rob•3/25/24, 3:42 AM

yep, the extra step required need to have it bound in your toml file

Mangum•3/25/24, 5:30 AM

Hi there, i'm trying to follow this tutorial https://developers.cloudflare.com/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai/
When I enter the command "npx wrangler vectorize create vector-index --dimensions=768 --metric=cosine"
into my terminal, I get: vectorize.not_entitled [code: 1005]
Is this because I'm using a free account and Vectorize is not provisioned?

Cloudflare Docs

Build a Retrieval Augmented Generation (RAG) AI · Cloudflare Worker...

This guide will instruct you through setting up and deploying your first application with Cloudflare AI. You will build a fully-featured AI-powered …

MMangum To bind Workers AI to your worker, add the following to the end of your wrangler...

fakebrin•3/25/24, 7:07 AM

it is there in the wrangler.toml file, if the clone worked correctly

zehawk•3/25/24, 8:51 AM

Unable to run even the simplest code on wrangler dev local (eg https://developers.cloudflare.com/workers-ai/configuration/workers-ai-sdk/). Getting

[wrangler:err] InferenceUpstreamError: {"errors":[{"message":"Server Error","code":1000}],"success":false,"result":{},"messages":[]}

[wrangler:err] InferenceUpstreamError: {"errors":[{"message":"Server Error","code":1000}],"success":false,"result":{},"messages":[]}

AI workers were working fine earlier, and starting giving an error now. Hence testing with boiler plate code.

Raylight•3/25/24, 5:49 PM

Some feedback regarding

@cf/runwayml/stable-diffusion-v1-5-inpainting

@cf/runwayml/stable-diffusion-v1-5-inpainting

(and

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img

):
The models seem to produce an output with a noticeable shift in color balance. Repeated inpainting will quickly shift the image towards purple. See the attached images below. The first image is the result of doing 10 passes (i.e. using the result as source) of inpainting with an empty (black) mask. The second is the same process but with an added "hack" (i.e. more of a band-aid than a solution) that does histogram matching after each pass. The third image is a typical result from a single pass of inpainting merged with unmasked pixels from the source. (Shown with mask overlayed on source.)

This seems to be a well-known problem that (possibly) can be traced back to the VAE, and (possibly) be solved by using an updated version of the VAE. Whether that's the case here is another question.

inpainting_stp20_g7.5_str1_10_iter_w_histo_matching.png

Raylight•3/25/24, 5:49 PM

(The black to the left is simply because the "safety checker" believes the image is harmful content without it.)

Mmichelle this is so interesting, is it a model problem though or specific to workers ai?

Raylight•3/25/24, 10:38 PM

I believe it's a problem with the model, so maybe it belongs to the catch all "cloudflare-ai". The result should be the same if using the API. Although using workers ai + client side javascript will involve more head-scratching and second-guessing your code.

I found this https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/410 which seems to match my own testing.

playlogo•3/25/24, 10:43 PM

Hey, is there any way we can use

mistral-7b-instruct-v0.2

mistral-7b-instruct-v0.2

via the workers rest api ?

playlogo•3/25/24, 10:44 PM

Also I'm sometimes getting the same output for slightly different promps. Is this some sort of caching that i can maybe disable ?

rob•3/25/24, 11:00 PM

are you using the ai gateway ?

Dan (Openside)•3/26/24, 1:47 AM

Im using model

@hf/thebloke/deepseek-coder-6.7b-instruct-awq

@hf/thebloke/deepseek-coder-6.7b-instruct-awq

It appears there is a character limit on the response (around 800 chars). Is this a known limit? any way to get the full response?

playlogo•3/26/24, 11:57 AM

According to their API Docs you can set the

max_tokens

max_tokens

property in the request. But don't see it mentioned anywhere else, so it might not work...

Cloudflare API Documentation

Interact with Cloudflare's products and services via the Cloudflare API

YeungKC•3/26/24, 12:45 PM

Hey guys, I would like to integrate an AI interface into worker-rs. Does anyone know where I can find the source code for @cloudflare/ai?

rob•3/26/24, 1:33 PM

https://twitter.com/StabilityAI/status/1772345514023116828

Stability AI (@StabilityAI) on X

Introducing Stable Code Instruct 3B, our new instruction tuned LLM based on Stable Code 3B. With natural language prompting, this model can handle a variety of tasks such as code generation, math and other software engineering related outputs.

This model’s performance rivals…

Twitter

•

3/25/24, 7:31 PM

YYeungKC Hey guys, I would like to integrate an AI interface into worker-rs. Does anyone ...

John Spurlock•3/26/24, 9:51 PM

it's not public afaik, you can always: https://www.npmjs.com/package/@cloudflare/ai?activeTab=code

npm

@cloudflare/ai

Cloudflare's Workers AI SDK. Latest version: 1.1.0, last published: 19 days ago. Start using @cloudflare/ai in your project by running

npm i @cloudflare/ai

npm i @cloudflare/ai

. There are 3 other projects in the npm registry using @cloudflare/ai.

John Spurlock•3/26/24, 9:52 PM

or install locally etc

JJohn Spurlock it's not public afaik, you can always: https://www.npmjs.com/package/@cloudflare...

YeungKC•3/27/24, 1:45 AM

Is it not ready for open source yet, or is there no plan to open source it?

James•3/27/24, 1:53 AM

I believe the goal is to make AI a more native binding in workers and remove the need for the package entirely

YeungKC•3/27/24, 1:58 AM

Thx, I'm really looking forward to it.

rob•3/27/24, 1:58 AM

yah more for calling inside of workers

Nooc•3/28/24, 3:55 AM

I'm calling Workers AI in a nextjs project, the project used the edge runtime in every page, because I want to deploy it on Cloudflare Pages. But the Workers AI calling continuously occur the error '⨯ Error: A Node.js API is used (DecompressionStream) which is not supported in the Edge Runtime.'. Has anyone had the same problem?

Nooc•3/28/24, 3:55 AM

Nooc•3/28/24, 4:02 AM

I am using

@cloudflare/next-on-pages

@cloudflare/next-on-pages

to adapte between nextjs and

@cloudflare/ai

@cloudflare/ai

, so maybe it's because there is no

nodejs_compat

nodejs_compat

compatibility-flag, but I don't how to add that.

Mistic92•3/28/24, 3:15 PM

What version and what parameters are used on Whisper? I'm testing asr and trying different encoders, sample rates and bitrates recording on Android. When I'm using Cloudflare Workers Whisper model it looks like very dumb. I tried Replicate hosted whisper and I got perfect transcription of my audio.

miko1383•3/28/24, 8:37 PM

Hi - not sure if I should post here or in cloudflare-ai. I'm trying the @cf/runwayml/stable-diffusion-v1-5-inpainting model, and it's automatically resizing my images to 512x512. I need to be able to control the size. I don't see any parameters for the size... where can we vote for such features?

LilHuyyyy•3/29/24, 1:58 AM

hi guys i can not deploy my worker ai application it keeps showing binding AI of type ai failed to generate: internal error; please try again later or contact
support [code: 10021]. I can still deploy my worker yesterday but this morning, this error occurs. Can any one help me. Thanks a lot

LilHuyyyy•3/29/24, 2:02 AM

oh thanks so i should sit down drink a coffee and wait :))))

LilHuyyyy•3/29/24, 2:06 AM

anyway i just have another problem :< yesterday when i testing the worker ai inference i got this error. InferenceUpstreamError: ERROR 3001: Unknown internal error. for some models that need to read buffer from file for example. @cf/openai/whisper or some other images classification and image to text model (eg: @cf/microsoft/resnet-50, @cf/unum/uform-gen2-qwen-500m). The weirdo thing is that when i run wrangler dev in my local machine it just works fine but when i deploy my worker the error showed up. Any one have same problem like me and how can i fix it ?

LilHuyyyy•3/29/24, 3:05 AM

Thank a lot sir

LilHuyyyy•3/29/24, 3:05 AM

Ok i can deploy my worker ai now :v

user•3/29/24, 4:12 AM

Is there a way to limit the tokens output like how you can in openAI?

Uuser Is there a way to limit the tokens output like how you can in openAI?

user•3/29/24, 8:51 AM

if not, this would imo be a very important thing to have because uh, https://user.you-sk.id/IPzlj5FWbl54

uwu

owo

user•3/29/24, 8:52 AM

this happens way too frequently...

Hello! I was wonder if some features were on the roadmap, and if not, I'd like to request them! I th

Similar Threads