Some feedback on this. Although whisper might be limited, I believe "Request is too large" is a work

Some feedback on this.
Although whisper might be limited, I believe "Request is too large" is a workers AI problem rather than a problem with the model. I tested and reproduced the exact same error with

@cf/microsoft/resnet-50

@cf/microsoft/resnet-50

@cf/microsoft/resnet-50

@cf/microsoft/resnet-50 and

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img

@cf/runwayml/stable-diffusion-v1-5-img2img, and believe that the same would hold for any model that accepts a large input. I used the following to test and measure:

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

const srcURL  = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav";
const res     = await fetch(srcURL);
const blob    = await res.arrayBuffer();
const jsArray = [...new Uint8Array(blob)];
const input   = { audio: jsArray };

console.log("Blob size: " + (jsArray.length / (2 << 19)).toFixed(1) + " MB");
console.log("Input array size: " + (jsArray.length / (2 << 17)).toFixed(1) + " MB");

// ai.run() stringifies input array before calling internal fetch:
//   const inpBody = JSON.stringify({ inputs: input });
//   console.log("JSON size: " + (inpBody.length / (2 << 19)).toFixed(1) + " MB");

const response = await ai.run("@cf/openai/whisper", input);

This is with 1.1.0 so that I can insert a logging statement, but changing to env.AI.run doesn't affect the outcome. The issue seems to be size, rather than dimensions (e.g. length, width, height). Changing

{ audio: jsArray }

{ audio: jsArray }

{ audio: jsArray }

{ audio: jsArray } to

{ image: jsArray }

{ image: jsArray }

{ image: jsArray }

{ image: jsArray } and calling resnet-50 would throw the same error.

After fetching a 5 MB file, the worker has to make a copy to turn it into a ~20 MB array, assuming no overhead. The array is then stringified into a ~17 MB string. The receiving end would be faced with potentially parsing 17 MB of json with the format

[123,78,30,255,0,...]

[123,78,30,255,0,...]

[123,78,30,255,0,...]

[123,78,30,255,0,...]. Unless there's a limit somewhere, then at some point something has to give. In this case, there seems to be a limit of just below 10 million bytes.

The immediate and simple part of the problem is that developers typically don't have a good way to handle this. I mean, it's not like the above is common knowledge..

Codename_A•4/30/24, 2:20 PM

I'm unsure if using workers for this case would work:
I want to make openai api requests in the backend without showing the user my api key. Something like an api proxy. Ofcourse, then I would need to authenticate to make sure regular people couldn't just use that to make requests. Could workers be a use case for this? Or another product. How would I do this>

RRaylight Some feedback on this. Although whisper might be limited, I believe "Request is ...

kingmesal•4/30/24, 2:25 PM

It would be nice if the backend model in these cases could take the URL to process ... it doesn't solve the problem but it does remove 1 more variable from the equation

RRaylight Some feedback on this. Although whisper might be limited, I believe "Request is ...

Isaac McFadyen•4/30/24, 2:27 PM

Just to add to this: base64 is more efficient in my testing than an actual stringified array of [<number>,<number>,<number>][<number>,<number>,<number>] (by a significant amount) but would need to be supported internally by Workers AI which I don't believe it is currently

RaylightOP•4/30/24, 3:19 PM

Yeah.. what's needed (IMO) is a better/faster way to pass large inputs, although optimized internals could go a long way. In some cases, like with a fast model such as

@cf/microsoft/resnet-50

@cf/microsoft/resnet-50

, the actual inference can be a fraction of the total latency, where the rest is mainly due to juggling input data. I've done some testing on this.

Mmichelle what does optimized mean here?

RaylightOP•4/30/24, 10:00 PM

1.0.53 has two functions that become expensive on large arrays (serializeType and ensureShape). Additionally, the input is piped through a CompressionStream. "Optimized" is what you'd get if you skipped those operations. SDK 1.1.0 and native have solved much of the issue of high CPU usage attributed to the worker (

), so I've turned my attention to wall time instead.

falex•4/30/24, 10:02 PM

I am working with meta-llama-3-8b-instruct and sometimes the API produces this error: InferenceUpstreamError. I am not using the stream option.

Mmichelle phi-3 coming in hot

Ryan K•4/30/24, 10:47 PM

with lora adapter too?

ddts86•5/1/24, 3:55 AM

What is difference between beta and ga besides the usage?

another_User•5/2/24, 5:00 AM

Has anyone tried the image to text recently? I have been trying it for about an hour, I think it's down or I'm doing something very wrong.

Fuse•5/2/24, 9:36 AM

why response status alway be 500 and with that response error

Mmichelle how are you using it? not noticing any downtime

another_User•5/2/24, 2:16 PM

I first tried by running this code i got from the docs on the dashboard, it responded with an error, but the code and error are below:

import { Ai } from './vendor/@cloudflare/ai.js';

export default {
  async fetch(request, env) {
    const tasks = [];
    const ai = new Ai(env.AI);
    const res = await fetch("https://cataas.com/cat");
  const blob = await res.arrayBuffer();
  console.log(new Uint8Array(blob))
  const input = {
    image: new Uint8Array(blob),
    prompt: "Generate a caption for this image",
    max_tokens: 512
  };

    let response = await ai.run("@cf/unum/uform-gen2-qwen-500m", input);
    console.log(response);
    
    return Response.json(response);
  }
};

import { Ai } from './vendor/@cloudflare/ai.js';

export default {
  async fetch(request, env) {
    const tasks = [];
    const ai = new Ai(env.AI);
    const res = await fetch("https://cataas.com/cat");
  const blob = await res.arrayBuffer();
  console.log(new Uint8Array(blob))
  const input = {
    image: new Uint8Array(blob),
    prompt: "Generate a caption for this image",
    max_tokens: 512
  };

    let response = await ai.run("@cf/unum/uform-gen2-qwen-500m", input);
    console.log(response);
    
    return Response.json(response);
  }
};

i got this:

InferenceUpstreamError: must be string, must be array, must match exactly one schema in oneOf
    at Ai.run (vendor/@cloudflare/ai.js:3084:15)
    at async Object.fetch (index.js:16:20)
    at async jsonError (.internal-57a7950f-b522-44d5-b273-8b1fd9d88826-facade-1.js:12:12)
    at async jsonError (.internal-57a7950f-b522-44d5-b273-8b1fd9d88826-facade-1.js:12:12)

InferenceUpstreamError: must be string, must be array, must match exactly one schema in oneOf
    at Ai.run (vendor/@cloudflare/ai.js:3084:15)
    at async Object.fetch (index.js:16:20)
    at async jsonError (.internal-57a7950f-b522-44d5-b273-8b1fd9d88826-facade-1.js:12:12)
    at async jsonError (.internal-57a7950f-b522-44d5-b273-8b1fd9d88826-facade-1.js:12:12)

secondly, i tried curl and it was the same thing, I did put the Bearer token in the authentication header.

Mmichelle how are you using it? not noticing any downtime

another_User•5/2/24, 2:21 PM

Here is the dart code I also tried :

var image = Uint8List.fromList(
      (await http.get(Uri.parse("https://cataas.com/cat"))).bodyBytes);
  var uri = Uri.https(
    "api.cloudflare.com",
    "/client/v4/accounts/$accountID/ai/run/@cf/unum/uform-gen2-qwen-500m",
  );
  var img_desc = await http.post(uri,
      headers: {
        "Authorization": token
      },
      body: jsonEncode({
        "prompt": "generate a description for this image",
        "image": image,
        "max_tokens": 512
      }));

var image = Uint8List.fromList(
      (await http.get(Uri.parse("https://cataas.com/cat"))).bodyBytes);
  var uri = Uri.https(
    "api.cloudflare.com",
    "/client/v4/accounts/$accountID/ai/run/@cf/unum/uform-gen2-qwen-500m",
  );
  var img_desc = await http.post(uri,
      headers: {
        "Authorization": token
      },
      body: jsonEncode({
        "prompt": "generate a description for this image",
        "image": image,
        "max_tokens": 512
      }));

If returned with:

another_User•5/2/24, 2:27 PM

flutter: {"errors":[{"message":"InferenceUpstreamError: must be string, must have required property 'image', must be number, must match exactly one schema in oneOf","code":1000}],"success":false,"result":{},"messages":[]}

flutter: {"errors":[{"message":"InferenceUpstreamError: must be string, must have required property 'image', must be number, must match exactly one schema in oneOf","code":1000}],"success":false,"result":{},"messages":[]}

LLogan Grasby A simple example using llava 1.5: https://github.com/LoganGrasby/llava-1.5-worke...

Wallacy•5/2/24, 5:43 PM

I didint undestant... is @cf/llava-hf/llava-1.5-7b-hf a oficial CF model now or we can just use -hf models or something like that... I was planing to use a llava on my custom VM because resnet-50 is useless for what i need.

WWallacy I didint undestant... is @cf/llava-hf/llava-1.5-7b-hf a oficial CF model now or ...

Isaac McFadyen•5/2/24, 5:44 PM

Official model, you can't use HuggingFace models they haven't added.

Isaac McFadyen•5/2/24, 5:44 PM

?workers-ai-models

Flare•5/2/24, 5:44 PM

Workers AI currently only supports popular open-source models provided by the Cloudflare team, as well as your own LoRAs that can be applied on top of the Cloudflare-provided models. You cannot currently upload your own models or use a model from HuggingFace. See the documentation for the list of Cloudflare-provided models: https://developers.cloudflare.com/workers-ai/models/

Wallacy•5/2/24, 5:47 PM

Ah okay, that was i was thinking, because i look on docs and didn't find llava ...

Wallacy•5/2/24, 5:47 PM

Theres any place that i can see the updated list?

Wallacy•5/2/24, 5:48 PM

I was planing use the lava-llama-3-8b-v1_1-hf but im sure that is not there...

Isaac McFadyen•5/2/24, 5:50 PM

Docs: https://developers.cloudflare.com/workers-ai/models/ and also there's a list on the Cloudflare dashboard.

Wallacy•5/2/24, 6:13 PM

Like a said... The @cf/llava-hf/llava-1.5-7b-hf is not there... Thats why i asked how that project worked.

Wallacy•5/2/24, 6:18 PM

Sure! Thanks... I was just double checking if theres any list out there that i miss out...

Wallacy•5/2/24, 6:29 PM

Anyway, just to know... I just deployed one @cf/llava test and worked very fast... good job!

WWallacy Sure! Thanks... I was just double checking if theres any list out there that i m...

Chaika•5/2/24, 6:29 PM

there's the list in the dashboard which includes llava

Chaika•5/2/24, 6:29 PM

Chaika•5/2/24, 6:30 PM

iirc the docs build from that list so I assume the dash will always be more updated/latest if not containing everything

CChaika Click to see attachment

Wallacy•5/2/24, 6:32 PM

Thanks you so much! For some reason i didn't bother to look on the dashboard.

Mmichelle i wish 😢 docs doesn't build dynamically - we have to manually PR it. moving tow...

Chaika•5/2/24, 6:32 PM

yea I just meant that afaik this script https://github.com/craigsdennis/tmp-model-contentator/blob/main/scripts/gen-simple-docs.ts is used to build those ai docs which is pulling from the same api as the dash

Wallacy•5/2/24, 6:32 PM

Im the RTFM guy... i aways go to the docs first. Now i know! Thanks.

Original message was deleted

Isaac McFadyen•5/2/24, 6:39 PM

@944890 Please don't post phishing links in this Discord (or in this channel, it's unrelated to Workers AI).

Isaac McFadyen•5/2/24, 6:39 PM

I'm removing both your messages so people don't click on them.

Wallacy•5/2/24, 7:41 PM

Hey, i have one more question... Just to know better what to expect on the future. Theres any reasonable. limit for memory or other metrict that we can expect to run on workers-ai? Like, will 70b models be a thing on future?

Chaika•5/3/24, 1:20 AM

Am I confusing something or as of sometime today was @cf/bytedance/stable-diffusion-xl-lightning and @cf/stabilityai/stable-diffusion-xl-base-1.0 changed from returning a ReadableStream to an object containing a int array of the output? The example doesn't seem to work anymore either because of it

Some feedback on this. Although whisper might be limited, I believe "Request is too large" is a work

Similar Threads

Similar Threads

Similar Threads