Can we serve custom models on Workers AI or do we have to wait for them to get added?

R

RavenOP•5/8/24, 8:46 AM

And if it is not currently possible to serve custom models via Cloudflare, and it's not prohibited to discuss, what are some good options? I have a list and I'm checking it twice, but curious what people think and/or use

RRaven Can we serve custom models on Workers AI or do we have to wait for them to get a...

I

Isaac McFadyen•5/8/24, 12:54 PM

?workers-ai-models

F

Flare•5/8/24, 12:54 PM

Workers AI currently only supports popular open-source models provided by the Cloudflare team, as well as your own LoRAs that can be applied on top of the Cloudflare-provided models. You cannot currently upload your own models or use a model from HuggingFace. See the documentation for the list of Cloudflare-provided models: https://developers.cloudflare.com/workers-ai/models/

E

element14•5/8/24, 7:22 PM

Hi every one , Im trying to use my own LoRA adapters together with

@cf/mistral/mistral-7b-instruct-v0.2-lora

@cf/mistral/mistral-7b-instruct-v0.2-lora

and worker-ai
I have already uploaded my LoRA adapters successfully , but when I try to run an inference to my LoRAS , Im getting an unknown and undocumented error:

InferenceUpstreamError: ERROR 3028: Unknown internal error

InferenceUpstreamError: ERROR 3028: Unknown internal error

Im using the snippet-code described in the tutorial at the following link : https://developers.cloudflare.com/workers-ai/fine-tunes/loras/


export interface Env {

  AI: any;
}

export default {
  async fetch(request, env): Promise<Response> {


    const response = await env.AI.run(
      "@cf/mistral/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
      {
          messages: [{"role": "user", "content": "Hello world"}],
          raw: true,
          lora: "4e900000-0000-0000-0000-000000000",
      }
    );

    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;


export interface Env {

  AI: any;
}

export default {
  async fetch(request, env): Promise<Response> {


    const response = await env.AI.run(
      "@cf/mistral/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
      {
          messages: [{"role": "user", "content": "Hello world"}],
          raw: true,
          lora: "4e900000-0000-0000-0000-000000000",
      }
    );

    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;


export interface Env {

  AI: any;
}

export default {
  async fetch(request, env): Promise<Response> {


    const response = await env.AI.run(
      "@cf/mistral/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
      {
          messages: [{"role": "user", "content": "Hello world"}],
          raw: true,
          lora: "4e900000-0000-0000-0000-000000000",
      }
    );

    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;


export interface Env {

  AI: any;
}

export default {
  async fetch(request, env): Promise<Response> {


    const response = await env.AI.run(
      "@cf/mistral/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
      {
          messages: [{"role": "user", "content": "Hello world"}],
          raw: true,
          lora: "4e900000-0000-0000-0000-000000000",
      }
    );

    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;

The code is working correctly without my LoRA adapters.
The adapter has been trained with rank <= 8 as per documentation , using

Mistral-7B-Instruct-v0.2

Mistral-7B-Instruct-v0.2

as base Model (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
I know the features is in Beta , but any suggestion is appreciated.
PS:
a) The json in the tutorial code is not valid there is a missing

after

Hello world"

Hello world"

b) The suggested model in the same tutorial is not valid model is under the repo

mistral

mistral

and not

mistralai

mistralai

tutorial at the following link https://developers.cloudflare.com/workers-ai/fine-tunes/loras/#running-inference-with-loras

Cloudflare Docs

Using LoRA adapters · Cloudflare Workers AI docs

Upload and use LoRA adapters to get fine-tuned inference on Workers AI.

mistralai/Mistral-7B-Instruct-v0.2 · Hugging Face

I

Isaac McFadyen•5/8/24, 7:48 PM

Also it looks like you're specifying

raw: true

raw: true

but then providing the

messages

messages

field which I don't believe is compatible?

I

Isaac McFadyen•5/8/24, 7:48 PM

If you do

raw: true

raw: true

then you need to format it to ChatML yourself and provide it as an input string if I remember correctly

Mmichelle hey! few things you could try: - did you add `"model_type": "mistral"` to your a...

E

element14•5/8/24, 7:49 PM

yes model_type is defined into the uploaded adapter_config.json

IIsaac McFadyen Also it looks like you're specifying `raw: true` but then providing the `message...

E

element14•5/8/24, 7:50 PM

sorry , forgot to update in the sent message , but yes , I have tried to remove

raw

raw

property also tried true/false using the correct message/prompt format
The result is still the same

InferenceUpstreamError: ERROR 3028: Unknown internal error

InferenceUpstreamError: ERROR 3028: Unknown internal error

Mmichelle hey! few things you could try: - did you add `"model_type": "mistral"` to your a...

E

element14•5/8/24, 7:53 PM

tried also to pass the fine_tune name instead of the id , same result

E

element14•5/8/24, 7:58 PM

maybe something wrong with my adapters , for the record they were generated via mlx_lm , no problem Inferencing with them on my local machine via mlx_lm or Ollama.
Here my adapters_config.json

{
  "adapter_file": null,
  "adapter_path": "adapters",
  "batch_size": 5,
  "config": "lora.yml",
  "data": "./data/",
  "grad_checkpoint": false,
  "iters": 1800,
  "learning_rate": 1e-05,
  "lora_layers": 19,
  "lora_parameters": {
      "keys": [
          "self_attn.q_proj",
          "self_attn.v_proj"
      ],
      "rank": 8,
      "alpha": 16.0,
      "scale": 10.0,
      "dropout": 0.05
  },
  "lr_schedule": {
      "name": "cosine_decay",
      "warmup": 100,
      "warmup_init": 1e-07,
      "arguments": [
          1e-05,
          1000,
          1e-07
      ]
  },
  "max_seq_length": 32768,
  "model": "mistralai/Mistral-7B-Instruct-v0.2",
  "model_type": "mistral",
  "resume_adapter_file": null,
  "save_every": 100,
  "seed": 0,
  "steps_per_eval": 20,
  "steps_per_report": 10,
  "test": false,
  "test_batches": 100,
  "train": true,
  "val_batches": -1
}

{
  "adapter_file": null,
  "adapter_path": "adapters",
  "batch_size": 5,
  "config": "lora.yml",
  "data": "./data/",
  "grad_checkpoint": false,
  "iters": 1800,
  "learning_rate": 1e-05,
  "lora_layers": 19,
  "lora_parameters": {
      "keys": [
          "self_attn.q_proj",
          "self_attn.v_proj"
      ],
      "rank": 8,
      "alpha": 16.0,
      "scale": 10.0,
      "dropout": 0.05
  },
  "lr_schedule": {
      "name": "cosine_decay",
      "warmup": 100,
      "warmup_init": 1e-07,
      "arguments": [
          1e-05,
          1000,
          1e-07
      ]
  },
  "max_seq_length": 32768,
  "model": "mistralai/Mistral-7B-Instruct-v0.2",
  "model_type": "mistral",
  "resume_adapter_file": null,
  "save_every": 100,
  "seed": 0,
  "steps_per_eval": 20,
  "steps_per_report": 10,
  "test": false,
  "test_batches": 100,
  "train": true,
  "val_batches": -1
}

{
  "adapter_file": null,
  "adapter_path": "adapters",
  "batch_size": 5,
  "config": "lora.yml",
  "data": "./data/",
  "grad_checkpoint": false,
  "iters": 1800,
  "learning_rate": 1e-05,
  "lora_layers": 19,
  "lora_parameters": {
      "keys": [
          "self_attn.q_proj",
          "self_attn.v_proj"
      ],
      "rank": 8,
      "alpha": 16.0,
      "scale": 10.0,
      "dropout": 0.05
  },
  "lr_schedule": {
      "name": "cosine_decay",
      "warmup": 100,
      "warmup_init": 1e-07,
      "arguments": [
          1e-05,
          1000,
          1e-07
      ]
  },
  "max_seq_length": 32768,
  "model": "mistralai/Mistral-7B-Instruct-v0.2",
  "model_type": "mistral",
  "resume_adapter_file": null,
  "save_every": 100,
  "seed": 0,
  "steps_per_eval": 20,
  "steps_per_report": 10,
  "test": false,
  "test_batches": 100,
  "train": true,
  "val_batches": -1
}

{
  "adapter_file": null,
  "adapter_path": "adapters",
  "batch_size": 5,
  "config": "lora.yml",
  "data": "./data/",
  "grad_checkpoint": false,
  "iters": 1800,
  "learning_rate": 1e-05,
  "lora_layers": 19,
  "lora_parameters": {
      "keys": [
          "self_attn.q_proj",
          "self_attn.v_proj"
      ],
      "rank": 8,
      "alpha": 16.0,
      "scale": 10.0,
      "dropout": 0.05
  },
  "lr_schedule": {
      "name": "cosine_decay",
      "warmup": 100,
      "warmup_init": 1e-07,
      "arguments": [
          1e-05,
          1000,
          1e-07
      ]
  },
  "max_seq_length": 32768,
  "model": "mistralai/Mistral-7B-Instruct-v0.2",
  "model_type": "mistral",
  "resume_adapter_file": null,
  "save_every": 100,
  "seed": 0,
  "steps_per_eval": 20,
  "steps_per_report": 10,
  "test": false,
  "test_batches": 100,
  "train": true,
  "val_batches": -1
}

E

element14•5/8/24, 7:59 PM

@michelle @Isaac McFadyen | YYZ01 thanks to both of you for the suggestions, appreciated

IIsaac McFadyen ?workers-ai-models

R

RavenOP•5/8/24, 9:15 PM

Thanks, searching "own model" I realize this is a common question. Is bring-your-own-model on the roadmap?
Also I am looking into cloud compute providers to host custom inference endpoints—does Cloudflare offer any solutions along that axis?

I

Isaac McFadyen•5/8/24, 9:19 PM

I'll let Michelle answer the first. As for the second, they don't currently, it's just Workers AI. I'm not sure whether it's planned but I'd doubt it since currently the focus is on improving the core Workers AI product.

R

RavenOP•5/8/24, 9:27 PM

Sounds good, for context there are some TTS models that run in the browser here https://hexgrad.com/ (note: may not work on iOS due to low webapp memory budget).
It's free to use but/because it runs on your own CPU. So if you want to churn through a many-hour audiobook, unless you have a solid desktop with good cooling you're probably looking at major heat issues.
Trying to find options to outsource the compute somewhere else. And also potentially combine the TTS with ASR/LLMs to produce conversational/phone agents. Not a novel idea by any stretch, but TTS tends to be a fairly expensive puzzle piece so I think both the standalone TTS and combo agents can be done at relatively competitive prices.

Hexgrad

Speak any text in any voice on your own machine.

R

RavenOP•5/8/24, 9:28 PM

That website is currently hosted on Cloudflare's free tier via next-on-pages. Satisfied with the experience so far (although it is a static webpage and I saw laurent's caveats this morning)

J

Julian•5/9/24, 10:33 AM

Can I check that this issue is still current? That there's no ability to customise any LLM parameters such as temperature or output format?

https://github.com/cloudflare/cloudflare-docs/issues/11138

It was closed but to me "closed" means "we've done this" not "we're going to ignore it".

GitHub

Add Documentation for Controlling AI Temperature and Related AI Set...

Which Cloudflare product(s) does this pertain to? Workers AI Subject Matter I would like to request documentation that explains how to control the AI temperature and other related AI settings when ...

S

scotto•5/9/24, 2:22 PM

any plans to increase limits of non beta model? i mean having no limits

LLogan Grasby We're expanding the list of models for generating embeddings and taking requests...

S

scotto•5/9/24, 2:22 PM

any multilingual embedding model?

LLogan Grasby Temperature is supported on most LLMs as an input and additional parameters are ...

J

Julian•5/9/24, 2:44 PM

Thanks. Do you have a link to an example of setting temp?

P

Puliczek•5/9/24, 4:51 PM

Is it possible to use llama-3 function calling on workers ai?

R

rob•5/9/24, 8:23 PM

not right now

LLogan Grasby Only PEFT trained loras are compatible. See https://huggingface.co/docs/peft/en/...

E

element14•5/9/24, 11:02 PM

adapter_config.json

adapter_config.json

(click to show the full content)

{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",

"task_type": "CAUSAL_LM",
"model_type": "mistral",
"use_dora": false,
"use_rslora": false
}

LLogan Grasby Only PEFT trained loras are compatible. See https://huggingface.co/docs/peft/en/...

E

element14•5/9/24, 11:02 PM

Thanks for the suggestion.
I have followed the tutorial and re-trained the mistral model using the suggested jupiter notebook
https://github.com/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb

my

adapter_config.json

adapter_config.json

is now: see message above workers-ai

Do you notice anything wrong ? Because now , when I try to upload my

adapter_model.safetensors

adapter_model.safetensors

I recevie a new error from the wrangler

✘ [ERROR] 🚨 Couldn't upload file: A request to the Cloudflare API (/accounts/1111122223334444/ai/finetunes/6a4a4a4a4a4a4a4a4-a5aa5a5a-aaaaaa/finetune-assets) failed. FILE_PARSE_ERROR: 'file' should be of valid safetensors type [code: 1000], quiting...

✘ [ERROR] 🚨 Couldn't upload file: A request to the Cloudflare API (/accounts/1111122223334444/ai/finetunes/6a4a4a4a4a4a4a4a4-a5aa5a5a-aaaaaa/finetune-assets) failed. FILE_PARSE_ERROR: 'file' should be of valid safetensors type [code: 1000], quiting...

✘ [ERROR] 🚨 Couldn't upload file: A request to the Cloudflare API (/accounts/1111122223334444/ai/finetunes/6a4a4a4a4a4a4a4a4-a5aa5a5a-aaaaaa/finetune-assets) failed. FILE_PARSE_ERROR: 'file' should be of valid safetensors type [code: 1000], quiting...

✘ [ERROR] 🚨 Couldn't upload file: A request to the Cloudflare API (/accounts/1111122223334444/ai/finetunes/6a4a4a4a4a4a4a4a4-a5aa5a5a-aaaaaa/finetune-assets) failed. FILE_PARSE_ERROR: 'file' should be of valid safetensors type [code: 1000], quiting...

So to recap my tests:

The adapter previously trained using mlx_lm (https://github.com/ml-explore/mlx-examples , https://huggingface.co/docs/hub/en/mlx) is accepted during the fine-tune upload/creation process , but it generates the error
```
InferenceUpstreamError: ERROR 3028: Unknown internal error
```
```
InferenceUpstreamError: ERROR 3028: Unknown internal error
```
when I try to run an inference.
The adapter trained using autotrain from huggingface is not accepted during the fine-tune upload/creation process , giving me the error described above.

Is there any other required parameter like (

rank r <=8 or quantization = None

rank r <=8 or quantization = None

) that has not been specified into the documentation ?
Thanks again for the support

GitHub

autotrain-advanced/colabs/AutoTrain_LLM.ipynb at main · huggingface...

AutoTrain Advanced. Contribute to huggingface/autotrain-advanced development by creating an account on GitHub.

GitHub

GitHub - ml-explore/mlx-examples: Examples in the MLX framework

Examples in the MLX framework. Contribute to ml-explore/mlx-examples development by creating an account on GitHub.

Sscotto any plans to increase limits of non beta model? i mean having no limits

S

scotto•5/10/24, 12:41 AM

?

Sscotto ?

I

Isaac McFadyen•5/10/24, 1:11 AM

Not sure about plans but one of the people on the Workers AI team, Michelle, has previously said to reach out to her if you are looking for higher limits: https://canary.discord.com/channels/595317990191398933/1138522314594582578/1237041407982698598

Eelement14 Thanks for the suggestion. I have followed the tutorial and re-trained the mistr...

E

element14•5/10/24, 9:40 AM

I would also like to know if there is any REST API Endpoint (or via wrangler command ) to DELETE a prev created finetune , as after all my (unsuccessful) tests I have a big list of not-working finetune.
From the Documentation I see there are methods to list or create only.
Thanks

W

Wouter J•5/10/24, 11:46 AM

Hi, is it possible to give width/height to text-to-image? I see for stability ai SDXL e.g supports 1216x832. Basically I need to generate portrait and landscapes.

Eelement14 Thanks for the suggestion. I have followed the tutorial and re-trained the mistr...

E

element14•5/10/24, 11:47 AM

Found the solution thanks to @pshek
Be sure to use only

--target_modules q_proj,v_proj

--target_modules q_proj,v_proj

as target modules with autotrain

WWouter J Hi, is it possible to give width/height to text-to-image? I see for stability ai...

R

rob•5/10/24, 1:23 PM

not yet but team said params are on the way for some models

A

a5000•5/10/24, 3:17 PM

Hey there. I'm new to the channel. Testing Workers AI. Should I expect

@cf/meta/llama-3-8b-instruct

@cf/meta/llama-3-8b-instruct

@cf/meta/llama-3-8b-instruct

@cf/meta/llama-3-8b-instruct to be broken/unavailable? I can't get a response, but

@cf/meta/llama-2-7b-chat-fp16

@cf/meta/llama-2-7b-chat-fp16

@cf/meta/llama-2-7b-chat-fp16

@cf/meta/llama-2-7b-chat-fp16 works fine

Aa5000 Hey there. I'm new to the channel. Testing Workers AI. Should I expect `@cf/meta...

C

Chaika•5/10/24, 3:17 PM

https://www.cloudflarestatus.com/incidents/8kql553z8g0p

Issues with Workers AI inference

C

Chaika•5/10/24, 3:18 PM

it's broken right now but you should normally expect it to be working

A

a5000•5/10/24, 3:19 PM

Ah gotcha! Should have checked there. Thanks for the prompt response. I noticed the broken one wasn't' showing in the

Active Models

Active Models

Active Models

Active Models filter

Aa5000 Ah gotcha! Should have checked there. Thanks for the prompt response. I noticed ...

C

Chaika•5/10/24, 3:19 PM

afaik the active models is just supposed to show the models you are using, and is limited to billed models/non-beta

CChaika afaik the active models is just supposed to show the models you are using, and i...

A

a5000•5/10/24, 3:20 PM

ok great

CChaika https://www.cloudflarestatus.com/incidents/8kql553z8g0p

F

falex•5/10/24, 3:37 PM

I'm seeing the same error, 'InferenceUpstreamError', when call to llama 3-8b model.

C

Chaika•5/10/24, 3:38 PM

yea, I've seen it for the last few hours or so, they've got that incident open, hopefully fixed soon

冰

冰淇淋•5/10/24, 5:35 PM

Hello, I'm sorry to interrupt.
I have a problem using the Whisper model.

[wrangler:err] InferenceUpstreamError: AiError: undefined: ERROR 3001: Unknown internal error
    at Ai.run (cloudflare-internal:ai-api:66:23)
    at async Object.fetch (file:///C:/Users/Fathan/PetProject/cloudflare-demo/src/index.ts:15:21)
    at async jsonError (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-miniflare3-json-error.ts:22:10)
    at async drainBody (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-ensure-req-body-drained.ts:5:10)
[wrangler:inf] POST / 500 Internal Server Error (2467ms)

[wrangler:err] InferenceUpstreamError: AiError: undefined: ERROR 3001: Unknown internal error
    at Ai.run (cloudflare-internal:ai-api:66:23)
    at async Object.fetch (file:///C:/Users/Fathan/PetProject/cloudflare-demo/src/index.ts:15:21)
    at async jsonError (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-miniflare3-json-error.ts:22:10)
    at async drainBody (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-ensure-req-body-drained.ts:5:10)
[wrangler:inf] POST / 500 Internal Server Error (2467ms)

[wrangler:err] InferenceUpstreamError: AiError: undefined: ERROR 3001: Unknown internal error
    at Ai.run (cloudflare-internal:ai-api:66:23)
    at async Object.fetch (file:///C:/Users/Fathan/PetProject/cloudflare-demo/src/index.ts:15:21)
    at async jsonError (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-miniflare3-json-error.ts:22:10)
    at async drainBody (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-ensure-req-body-drained.ts:5:10)
[wrangler:inf] POST / 500 Internal Server Error (2467ms)

[wrangler:err] InferenceUpstreamError: AiError: undefined: ERROR 3001: Unknown internal error
    at Ai.run (cloudflare-internal:ai-api:66:23)
    at async Object.fetch (file:///C:/Users/Fathan/PetProject/cloudflare-demo/src/index.ts:15:21)
    at async jsonError (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-miniflare3-json-error.ts:22:10)
    at async drainBody (file:///C:/Users/Fathan/PetProject/cloudflare-demo/node_modules/wrangler/templates/middleware/middleware-ensure-req-body-drained.ts:5:10)
[wrangler:inf] POST / 500 Internal Server Error (2467ms)

冰

冰淇淋•5/10/24, 5:36 PM

full wrangle index.ts code

冰

冰淇淋•5/10/24, 5:36 PM

any idea how to fix it? Thank you

R

Rob M.•5/10/24, 8:53 PM

Hey guys

What is the best way to calculate neuron usage at runtime in worker?

LLogan Grasby The media recorder API will output Webm or other file types depending on the bro...

I

Isaac McFadyen•5/12/24, 9:24 PM

https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder#mimetype
You should be able to specify the codecs the browser uses, and as long as they're supported it'll use them rather than the defaults.

Eelement14 Thanks for the suggestion. I have followed the tutorial and re-trained the mistr...

J

jameskraus•5/13/24, 12:24 AM

@element14 , did you ever figure this out? I'm getting this same

FILE_PARSE_ERROR

FILE_PARSE_ERROR

FILE_PARSE_ERROR

FILE_PARSE_ERROR with my autotrained finetune

S

Sam White•5/13/24, 4:22 AM

Should workers be compatible with transformers.js yet? I've tried setting it up, hoping to cache everything in KV after the first request! But I can't seem to pull the wasm files. Docs say you should be able to load from the jsdelivr cdn instead of trying to pull from /public, which I thought would work for Workers but I'm getting

Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(both async and sync fetching of the wasm failed)

Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(both async and sync fetching of the wasm failed)

Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(both async and sync fetching of the wasm failed)

Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(both async and sync fetching of the wasm failed) . But the other files (config, onnx) are loading up fine. @Xenova I hope it's still okay to tag you! I'm so excited to get your library up and running!

Can we serve custom models on Workers AI or do we have to wait for them to get added?

Similar Threads

Similar Threads

Similar Threads