Cloudflare Developers•2y ago

Is there a way to use a custom model?

Rrzvdrl Is there a way to use a custom model?

Isaac McFadyen•4/23/24, 6:16 PM

?workers-ai-models

Flare•4/23/24, 6:16 PM

Workers AI currently only supports popular open-source models provided by the Cloudflare team. You cannot currently upload your own models or use a model from HuggingFace. See the documentation for the list of Cloudflare-provided models: https://developers.cloudflare.com/workers-ai/models/

Isaac McFadyen•4/23/24, 6:16 PM

I should probably update that because LoRAs are now possible, just not custom models.

rzvdrlOP•4/23/24, 7:02 PM

It would be awesome if I could import any hf model

Cameron Aaron•4/24/24, 12:20 AM

I run https://slangtranslator.com/ and rn we use GPT 4 turbo do you know a model that might work better that I can use on AI worker like a LORA? Right now i'm using the AI gateway but I think that AI worker might be cheaper than GPT 4 turbo but I cant tell

Slang Translator - Translate Slang to English

Translate slang into standard English. Understand the meaning and context of popular slang terms.

LLogan Grasby Try @cf/openai/whisper-tiny-en . Maybe large shows up tomorrow too...

Quill•4/24/24, 3:00 AM

How about the process? Thanks

CCameron Aaron I run https://slangtranslator.com/ and rn we use GPT 4 turbo do you know a model...

chand1012•4/24/24, 5:40 PM

Llama 3 8B is smarter than some of the early iterations of GPT-4

rob•4/24/24, 5:47 PM

yah and that without any tuning

thipperz•4/24/24, 5:47 PM

Hi there,

Is it currently possible to generarate Google Vertex AI multimodal embeddings using Workers AI or proxying through it?

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings

Google Cloud

Multimodal embeddings | Generative AI on Vertex AI | Google Clo...

Tthipperz Hi there, Is it currently possible to generarate Google Vertex AI multimodal em...

scotto•4/24/24, 6:31 PM

similar question but for images, i know we have resnet but in the docs is used for classification and not embedding generation

Mmichelle anyone using function calling e.g., with openai or others? would love to ask som...

scotto•4/25/24, 2:14 PM

i'm am using with openai

Mmichelle anyone using function calling e.g., with openai or others? would love to ask som...

cosbgn•4/25/24, 4:35 PM

Yeah with openai quite extensively (unfetch.com)

cosbgn•4/25/24, 4:36 PM

Would love to be able to switch to other models, especially on CF if function calling would be supported

Hello, I’m Allie!•4/26/24, 11:06 AM

Hey @michelle, remember talking about TTS Model support at one point(though it has been a little while). If you are still looking for one, I've been having pretty good results with

sherpa-onnx

sherpa-onnx

, though not entirely sure how that fits in with how #workers-ai works on the backend

cj•4/26/24, 1:27 PM

Is uform-gen2-qwen-500m multilingual? Can I use

prompt

prompt

to get response in a specific language?

Cloudflare Docs

uform-gen2-qwen-500m · Cloudflare Workers AI docs

Run AI models in Workers, Pages, or via API.

cj•4/26/24, 1:29 PM

Also, as it is in

beta

beta

it is free for now. Any idea how much it would cost once out of beta?

Alex•4/26/24, 5:35 PM

Is there some kind of moderation api to check the user input like openai has? I don't want my account banned because users are inputting naughty stuff

Mmichelle you can try llamaguard https://developers.cloudflare.com/workers-ai/models/llama...

Alex•4/26/24, 6:38 PM

thanks i will look into it

hob8yte•4/27/24, 3:03 AM

I'm evaluating using Workers AI instead of OpenAI for turning text into JSON, but am nearly always passing too much text in and getting this error. Is there a way to get around this? Do I need to pass in multiple prompts?

InferenceUpstreamError: must have required property 'prompt', must NOT have more than 6144 characters, must match exactly one schema in oneOf

InferenceUpstreamError: must have required property 'prompt', must NOT have more than 6144 characters, must match exactly one schema in oneOf

ddts86•4/27/24, 9:11 AM

what does this error mean: InferenceUpstreamError: {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}

ddts86•4/27/24, 9:30 AM

I use the local storage for kv namespace and I run npx wrangler dev. When I call local api for the model, I got this error.

tutankhamen•4/27/24, 12:19 PM

How much would llama-3-8b-instruct cost if you charged for it?

I must say this "neuron" idea of yours is terrible and poorly documented. It's as if even you don't understand it yourself. You keep talking about neurons, yet all the calculation and examples are in 1M input/output tokens. It's impossible to make any sense of how many neurons a model uses.

GenshinMinecraft•4/27/24, 2:51 PM

i want to ask a question
now all the aimodel is beta
so i can use them for free and unlimit？

Ggotfredsen call my crazy but would that not be a smart ide to use Apples OpenELM https://hu...

user•4/28/24, 4:14 AM

boost :lul:

GenshinMinecraft•4/28/24, 4:36 AM

b'{"errors":[{"message":"InferenceUpstreamError: InferenceUpstreamError: ERROR 3001: Unknown internal error","code":1000}],"success":false,"result":{},"messages":[]}'

GenshinMinecraft•4/28/24, 4:36 AM

When i

GenshinMinecraft•4/28/24, 4:37 AM

use the whisper

GenshinMinecraft•4/28/24, 4:37 AM

I will get the wrong

James•4/28/24, 4:55 PM

Hello there,

Please I'll like to know if it's possible to use a HuggingFace model or even an OpenAI model as a foundation model for my RAG chatbot. Will I be able to use them with Vectorize and Workers? Or am I stuck with Workers AI models?

JJames Hello there, Please I'll like to know if it's possible to use a HuggingFace mod...

Isaac McFadyen•4/28/24, 4:57 PM

?workers-ai-models See below.

Flare•4/28/24, 4:57 PM

Workers AI currently only supports popular open-source models provided by the Cloudflare team, as well as your own LoRAs that can be applied on top of the Cloudflare-provided models. You cannot currently upload your own models or use a model from HuggingFace. See the documentation for the list of Cloudflare-provided models: https://developers.cloudflare.com/workers-ai/models/

Isaac McFadyen•4/28/24, 4:58 PM

You can use OpenAI models via #ai-gateway-beta but it will still be provided by OpenAI, just proxied through AI Gateway.

James•4/28/24, 5:00 PM

Alright, thank you!

kingmesal•4/29/24, 7:58 PM

Is there a max time length or file size that whisper can handle?
I've got a 5.8MB mp3 file and I'm getting

"InferenceUpstreamError: ERROR 3010: Invalid or incomplete input for the model: failed to decode JSON: Request is too large"

Mmichelle i believe it works best if audio files are <30s (whisper model limitation)

kingmesal•4/29/24, 11:08 PM

that cannot be a hard limit because billing on this model is per minute of audio ...

scotto•4/30/24, 12:19 AM

getting this error with the llama 3 instruct

scotto•4/30/24, 12:20 AM

took 30 secs to return error

Mmichelle i believe it works best if audio files are <30s (whisper model limitation)

tottocorro•4/30/24, 12:55 AM

I got the same error with this 29s whisper example sound file
https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav

Dragon•4/30/24, 10:13 AM

I am getting this error

Dragon•4/30/24, 10:14 AM

I do not see a support for this. Is coludflare community active?

Mmichelle i believe it works best if audio files are <30s (whisper model limitation)

Raylight•4/30/24, 2:15 PM

Some feedback on this.
Although whisper might be limited, I believe "Request is too large" is a workers AI problem rather than a problem with the model. I tested and reproduced the exact same error with

@cf/microsoft/resnet-50

@cf/microsoft/resnet-50

and

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img

, and believe that the same would hold for any model that accepts a large input. I used the following to test and measure:

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

This is with 1.1.0 so that I can insert a logging statement, but changing to env.AI.run doesn't affect the outcome. The issue seems to be size, rather than dimensions (e.g. length, width, height). Changing

{ audio: jsArray }

{ audio: jsArray }

{ image: jsArray }

{ image: jsArray }

and calling resnet-50 would throw the same error.

After fetching a 5 MB file, the worker has to make a copy to turn it into a ~20 MB array, assuming no overhead. The array is then stringified into a ~17 MB string. The receiving end would be faced with potentially parsing 17 MB of json with the format

[123,78,30,255,0,...]

[123,78,30,255,0,...]

. Unless there's a limit somewhere, then at some point something has to give. In this case, there seems to be a limit of just below 10 million bytes.

The immediate and simple part of the problem is that developers typically don't have a good way to handle this. I mean, it's not like the above is common knowledge..

Codename_A•4/30/24, 2:20 PM

I'm unsure if using workers for this case would work:
I want to make openai api requests in the backend without showing the user my api key. Something like an api proxy. Ofcourse, then I would need to authenticate to make sure regular people couldn't just use that to make requests. Could workers be a use case for this? Or another product. How would I do this>

RRaylight Some feedback on this. Although whisper might be limited, I believe "Request is ...

kingmesal•4/30/24, 2:25 PM

It would be nice if the backend model in these cases could take the URL to process ... it doesn't solve the problem but it does remove 1 more variable from the equation

const srcURL = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav"; const res = await fetch(srcURL); const blob = await res.arrayBuffer(); const jsArray = [...new Uint8Array(blob)]; const input = { audio: jsArray }; console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB"); console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB"); // ai.run() stringifies input array before calling internal fetch: // const inpBody = JSON.stringify({ inputs: input }); // console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB"); const response = await ai.run("@cf/openai/whisper", input);

Is there a way to use a custom model?

Similar Threads

Similar Threads

Similar Threads