Hi, Issac. Is there a way to get rough estimation of neuron usage for a single stable diffusion infe

T

ToonyGenOP•8/18/24, 7:45 PM

For a100, I think it takes 1.x second for sd v1.5 512 x 512 and for sdxl lightening 1024 x 1024 around 3 to 4s.

T

ToonyGenOP•8/18/24, 7:49 PM

Please leave comments if you have any information about the pricing model for stable diffusion later.

TToonyGen Hi, Issac. Is there a way to get rough estimation of neuron usage for a single s...

I

Isaac McFadyen•8/18/24, 8:09 PM

I don't believe there is currently, no. There is a neuron pricing calculator but I'm not sure whether the text-to-image models have been added yet.

I

Isaac McFadyen•8/18/24, 8:09 PM

https://ai.cloudflare.com/#pricing-calculator

Rrob I don't think so , I thought flux only Partnered with few providers , possibly o...

R

Raylight•8/19/24, 7:51 AM

https://huggingface.co/black-forest-labs/FLUX.1-schnell can be used for commercial purposes. flux.1 dev would require some kind of arrangement, though.

TToonyGen Hi, Issac. Is there a way to get rough estimation of neuron usage for a single s...

V

vin•8/19/24, 8:47 AM

Maybe you can refer to other online text-to-image generation provider price, for commercial concerns, I think cloudflare won't make it much more expensive or cheaper than other providers. As far as i know, sdxl 1024*1024 20 steps cost about 0.2~0.3 cents per image.

R

rchaves•8/19/24, 1:03 PM

hey folks, where can I request the addition of new models? The

@hf/thebloke/llamaguard-7b-awq

@hf/thebloke/llamaguard-7b-awq

@hf/thebloke/llamaguard-7b-awq

@hf/thebloke/llamaguard-7b-awq there is a bit outdated, still uses llama 2 and it's not very smart, what about upgrading it to https://huggingface.co/meta-llama/Llama-Guard-3-8B which uses llama 3.1? And while at it perhaps also add https://huggingface.co/meta-llama/Prompt-Guard-86M

cloudflare is perfect for running those guardrails models which should run as fast as possible

P

Ping for toast•8/21/24, 12:35 AM

I'd like https://huggingface.co/BAAI/bge-reranker-large

K

kevin•8/21/24, 12:48 PM

I created custom LoRA models, but they all stopped working. I'm wondering if anyone else might have had such an issue. I'm not sure if there's a problem with my account or with Cloudflare's LoRA in general.

C

clew•8/23/24, 1:05 PM

Hello everyone!

I'm currently working on a personal project and have hit a bit of a roadblock. I'm using text generation models to create a chat interface, but I've noticed that I need to specify the system role and user prompts with each query. Unfortunately, there's no option to store a large dataset in the model's memory that it can reference each time it generates an output.

What I'd like to do is store a large set of data in one place, which the model can then access whenever I ask it a question. However, I'm relatively new to the Cloudflare ecosystem and could really use some guidance. Is there a way to achieve this using Cloudflare's functionalities or products?

Thanks in advance for your help!

C

clew•8/23/24, 1:08 PM

To elaborate, I'm working with the GitHub API and fetching large amount of json data, which I'd then like my model to reference everytime it generates an output.

Cclew Hello everyone! 👋🏻 I'm currently working on a personal project and have hit ...

V

vin•8/23/24, 2:30 PM

I'm not sure i understand you correctly and I'm not very familiar with LLM models, it sounds like you are trying to build a RAG (before LLM gives responding, it refers to some data set you have prepared)? If that's what you are try to do, you can refer to https://developers.cloudflare.com/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai .

In short, RAG is roughly like you need do a simple search by yourself, distill the reference data and make the reference data become much smaller before calling LLM models

As my understanding of the article, I think you can split the json data into many text segments based one some methods(maybe split by git project), using embedding models on workers ai to convert each text to vector, store the origin text to cloudflare D1 and get an ID, store (ID,vector) pair into cloudflare Vectorize. When you run LLM model, you can retrive relevant data through embedding the input and searching the input vector in Vectorize and get some IDs of relevant vectors, retrieve relevant text from D1 by these IDs, and use the relevant data as part of the prompt.

Cloudflare Docs

Build a Retrieval Augmented Generation (RAG) AI | Cloudflare Worker...

This guide will instruct you through setting up and deploying your first application with Cloudflare AI. You will build a fully-featured AI-powered application, using tools like Workers AI, Vectorize, D1, and Cloudflare Workers.

S

sypukcje•8/23/24, 6:48 PM

Hello, where i can find more templates that are working with workers-ai, im looking something for text-to-image

S

sypukcje•8/23/24, 6:57 PM

https://github.com/craigsdennis/image-model-streamlit-workers-ai?tab=readme-ov-file hmm

D

dctanner•8/24/24, 7:04 AM

For folks in London, UK - we’re hosting the first AI Engineer Meetup at Cloudflare offices on 12th Sep https://lu.ma/ynbdcv1d

AI Engineer London Meetup #1 - RAG in production, LLM fine tuning a...

The Age of the AI Engineer has begun
We're excited to launch the London chapter of AI Engineer meet-ups series. We're bringing a slice of the AI Engineer…

Vvin I'm not sure i understand you correctly and I'm not very familiar with LLM model...

C

clew•8/24/24, 2:22 PM

Thank your for the response! Yes, I think you're understanding it correctly. I reached a similar conclusion after tinkering around for a bit that I need to set up an external store, break down data in chunks and store them separately and then contextually retrieve them based on the prompt.

I took a look at Cloudflare KV store (https://developers.cloudflare.com/kv/), which is a key-value data storage. I was thinking I could extract relevant data from the github api and store each of them in their particular keys and retrieve them based on some condition. But I think building a RAG like you mentioned here might be more relevant and built for this specific use-case, although it does sound complex though. I'll take a look, thanks! :)

B

BoNour•8/25/24, 9:28 AM

whats the cost to run whisper-tiny-en?
I only see pricing for normal whisper

B

BoNour•8/25/24, 9:30 AM

oh nice ty!

V

Vikash•8/27/24, 3:45 PM

How to get the token consumed by a

ai.run()

ai.run()

ai.run()

ai.run() request?

For example, the below code. I want to know the total token consumed by this request so that we charge to customer accordingly.

const response = await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
  prompt: "tell me a joke about cloudflare";
});

const response = await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
  prompt: "tell me a joke about cloudflare";
});

const response = await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
  prompt: "tell me a joke about cloudflare";
});

const response = await env.AI.run('@cf/meta/llama-2-7b-chat-int8', {
  prompt: "tell me a joke about cloudflare";
});

A

alan•8/27/24, 10:31 PM

will the translation model eventually be updated? m2m100 by meta's own standard is outdated and should be replaced with something like https://ai.meta.com/blog/nllb-200-high-quality-machine-translation/

200 languages within a single AI model: A breakthrough in high-qual...

Meta AI has built a single AI model, NLLB-200, that is the first to translate across 200 different languages with state-of-the-art quality that has been validated through extensive evaluations for each of them.

VVikash How to get the token consumed by a `ai.run()` request? For example, the below c...

O

Owl Reddy•8/28/24, 3:23 AM

None of the questions regarding the token usage gets answered I see. Still awaiting any update on this

OOwl Reddy None of the questions regarding the token usage gets answered I see. Still await...

V

Vikash•8/28/24, 3:26 AM

I see CF use another metric called Neurons, but can’t find how it’s calculated on docs or any way to see that in ai.run() response on runtime to calculate for each user using the app.

VVikash I see CF use another metric called Neurons, but can’t find how it’s calculated ...

O

Owl Reddy•8/28/24, 3:32 AM

https://ai.cloudflare.com/#pricing-calculator You can check that here but there's no definite way to measure the usage of neurons or tokens based on individual queries. All we have is an estimate but no clear usage based metrics. Maybe in the birthday week next month they might change that or it'll just get ignored like most of the messages about token usage.
Beta models are free for now so they wont show up on pricing calculator

�

😈 Donkey 💫•8/28/24, 4:46 AM

const response = await ai.run(model || "@cf/stabilityai/stable-diffusion-xl-base-1.0", requestInput);
// Store the image name in R2, in background        
ctx.waitUntil(updateRecentImages(imageName, (await generateThumbs(imageName, '400x', true)), input, env, response));
return new Response(response, {
    headers: {
        "content-type": "image/png",
    },
});

const response = await ai.run(model || "@cf/stabilityai/stable-diffusion-xl-base-1.0", requestInput);
// Store the image name in R2, in background        
ctx.waitUntil(updateRecentImages(imageName, (await generateThumbs(imageName, '400x', true)), input, env, response));
return new Response(response, {
    headers: {
        "content-type": "image/png",
    },
});

const response = await ai.run(model || "@cf/stabilityai/stable-diffusion-xl-base-1.0", requestInput);
// Store the image name in R2, in background        
ctx.waitUntil(updateRecentImages(imageName, (await generateThumbs(imageName, '400x', true)), input, env, response));
return new Response(response, {
    headers: {
        "content-type": "image/png",
    },
});

const response = await ai.run(model || "@cf/stabilityai/stable-diffusion-xl-base-1.0", requestInput);
// Store the image name in R2, in background        
ctx.waitUntil(updateRecentImages(imageName, (await generateThumbs(imageName, '400x', true)), input, env, response));
return new Response(response, {
    headers: {
        "content-type": "image/png",
    },
});

response should be Unit8Array not Response class, but problem it's read once (I guess), which causing

TypeError: The ReadableStream has been locked to a reader.

TypeError: The ReadableStream has been locked to a reader.

TypeError: The ReadableStream has been locked to a reader.

TypeError: The ReadableStream has been locked to a reader. prevent updateRecentImages works

do you have any idea?

Mmichelle check back soon 👀

1

1984 Ford Laser•8/28/24, 6:26 AM

Im checking back soon! Any news on TTS models?

V

vampirehunter•8/28/24, 9:49 AM

can anyone please tell me what is the limit of worker ai for free tier it says 10000 neurons , what does it mean and i have no idea i have made few request to try the llama model and the request were successful but on the dashboard i cant see my request
help plz

Vvampirehunter can anyone please tell me what is the limit of worker ai for free tier it says 1...

V

vin•8/28/24, 3:16 PM

here is a calculator https://ai.cloudflare.com/#pricing-calculator , you can consider it as amount of computation, but frankly it's still a pretty confused concept...

A

AhmedHalat•8/29/24, 12:58 PM

Anyone know where to lookup error codes?

{
  "errors": [
    {
      "message": "Server Error",
      "code": 6001
    }
  ],
  "success": false,
  "result": {},
  "messages": []
}

{
  "errors": [
    {
      "message": "Server Error",
      "code": 6001
    }
  ],
  "success": false,
  "result": {},
  "messages": []
}

{
  "errors": [
    {
      "message": "Server Error",
      "code": 6001
    }
  ],
  "success": false,
  "result": {},
  "messages": []
}

{
  "errors": [
    {
      "message": "Server Error",
      "code": 6001
    }
  ],
  "success": false,
  "result": {},
  "messages": []
}

FFlare Workers AI currently only supports popular open-source models provided by the Cl...

A

Avocadio•8/29/24, 2:47 PM

has anything changed?

A

Avocadio•8/29/24, 2:47 PM

can we deploy our own pretrained models

R

rajeev•8/30/24, 5:50 PM

Hi,
Not sure if this is the right place for this, but the docs directed me to this server

I was playing around with Workers AI text generation models and have found that setting

top_p

top_p

top_p

top_p or

repetition_penalty

repetition_penalty

repetition_penalty

repetition_penalty value to

causes

Cloudflare API Error

Cloudflare API Error

Cloudflare API Error

Cloudflare API Error

If we are using streaming then we directly get the

data: [DONE]

data: [DONE]

data: [DONE]

data: [DONE] event,
if non-streaming then it is a

500 "Cloudflare API error"

500 "Cloudflare API error"

500 "Cloudflare API error"

500 "Cloudflare API error"

As per the docs the minimum allowed values for

top_p

top_p

top_p

top_p &

repetition_penalty

repetition_penalty

repetition_penalty

repetition_penalty is 0, so ideally we shouldn't get any error..

R

Ramilysk•8/31/24, 10:40 PM

https://tenor.com/view/cool-sunglasses-swag-penguin-with-gif-16107429049852947642

Tenor

O

Oreki•9/3/24, 3:54 AM

Are there some hidden output token limits? My response seems to be cut short after 2000 words regardless of whether I'm streaming or not on beta models

X

xav•9/3/24, 12:37 PM

Hey, I'm using llama-3.1-8b-instruct to classify and reject spam/inappropriate messages, it works mostly fine, but I'm struggling to reliably return a structured output (eg a json {"appropriate": boolean}).

X

xav•9/3/24, 12:39 PM

is there a trick beside finetuning the prompt and hope for the best? so far, asking to classify messages in other languages than english seems to trip the model (eg. it sometimes returns "approprié" as the key if the message is in french)

X

xav•9/3/24, 12:41 PM

I'm aware there are more specific models, eg. https://huggingface.co/predibase/jigsaw my question is how to tell the workers AI to answer following a specific structure

predibase/jigsaw · Hugging Face

M

mr.niko.la•9/3/24, 2:49 PM

Can we add Parler-TTS to ai workers ?

Parler-TTS is the Hugging Face open source model

Is there any that has implement TTS on cloudflare edge ?

OOreki Are there some hidden output token limits? My response seems to be cut short aft...

R

Raylight•9/3/24, 3:00 PM

Yes, and the limit depends on the model. See workers-ai

RRaylight Yes, and the limit depends on the model. See https://discord.com/channels/59531...

O

Oreki•9/3/24, 4:01 PM

I don't think

@cf/meta/llama-3.1-8b-instruct-fp8

@cf/meta/llama-3.1-8b-instruct-fp8

has 32k token limit

O

Oreki•9/3/24, 4:01 PM

I tried it as well, and it somewhat stopped at 3000 characters

Xxav Hey, I'm using llama-3.1-8b-instruct to classify and reject spam/inappropriate m...

R

rob•9/3/24, 8:59 PM

try feeding a asisstant response starting of "{" in the mesg hustoy and itll try. I have a few triks that yield 99% success or so if you want to DM me

V

Vikash•9/6/24, 8:59 AM

How to bind Vector index to Pages project, the button show "Get started" and redirect to /vectorize page while I aleady have an index.

VVikash How to bind Vector index to Pages project, the button show "Get started" and red...

V

Vikash•9/6/24, 4:38 PM

I used the API directly to update bindings, the dashboard is broken I think

V

Vikash•9/6/24, 4:41 PM

I am getting dimensions error in indexing. Any suggestion?

VECTOR_UPSERT_ERROR (code = 40012): invalid vector for id="sdfksd", expected 768 dimensions, and got 1024 dimensions

VECTOR_UPSERT_ERROR (code = 40012): invalid vector for id="sdfksd", expected 768 dimensions, and got 1024 dimensions

const embeddingResult = await env.AI.run('@cf/baai/bge-large-en-v1.5', {
      text:  value,      
    });
    const embeddingBatch: number[][] = embeddingResult.data;

    await env.VECTORIZE.upsert(
      embeddingBatch.map((embedding, index) => ({
        id: sourceId,
        values: embedding,
        namespace: 'default',
        metadata: {
          id: sessionId
        },
      }))
    );

const embeddingResult = await env.AI.run('@cf/baai/bge-large-en-v1.5', {
      text:  value,      
    });
    const embeddingBatch: number[][] = embeddingResult.data;

    await env.VECTORIZE.upsert(
      embeddingBatch.map((embedding, index) => ({
        id: sourceId,
        values: embedding,
        namespace: 'default',
        metadata: {
          id: sessionId
        },
      }))
    );

const embeddingResult = await env.AI.run('@cf/baai/bge-large-en-v1.5', {
      text:  value,      
    });
    const embeddingBatch: number[][] = embeddingResult.data;

    await env.VECTORIZE.upsert(
      embeddingBatch.map((embedding, index) => ({
        id: sourceId,
        values: embedding,
        namespace: 'default',
        metadata: {
          id: sessionId
        },
      }))
    );

const embeddingResult = await env.AI.run('@cf/baai/bge-large-en-v1.5', {
      text:  value,      
    });
    const embeddingBatch: number[][] = embeddingResult.data;

    await env.VECTORIZE.upsert(
      embeddingBatch.map((embedding, index) => ({
        id: sourceId,
        values: embedding,
        namespace: 'default',
        metadata: {
          id: sessionId
        },
      }))
    );

VVikash I am getting dimensions error in indexing. Any suggestion? ``` VECTOR_UPSERT_E...

A

ac•9/6/24, 5:03 PM

bge-large is a 1024-dimension model, so the vectorize index needs to be 1024 dimension as well. Looks like it's 768. bge-base is a 768-dimension model

V

Vikash•9/6/24, 5:17 PM

Ok thanks, I will use the base model

V

Vikash•9/6/24, 5:18 PM

Also, anyway to alter index to change it to 1024?

J

JustinNoel•9/7/24, 12:37 PM

I'm getting really horrible results with all of the text to image AI models (I've tried all of the ones CF has). I'm using the prompts below and getting garbage responses like those in this image.

Does anyone have suggestions on how to actually get what I want?

export default {
  async fetch(request, env) {
    const inputs = {
      prompt: "create an image  that is 512x512. the background should be a solid, plain, yellow color. text over the background should say 'Learn How to Pronounce MySQL' in English. Text should be red and use an Arial font. ",
      negative_prompt: "There shOuld not be any other effects or images.",
      height: 512,
      width: 1024
    };

    const response = await env.AI.run(
      "@cf/bytedance/stable-diffusion-xl-lightning",
      inputs
    );

    return new Response(response, {
      headers: {
        "content-type": "image/png",
      },
    });

  },
};

export default {
  async fetch(request, env) {
    const inputs = {
      prompt: "create an image  that is 512x512. the background should be a solid, plain, yellow color. text over the background should say 'Learn How to Pronounce MySQL' in English. Text should be red and use an Arial font. ",
      negative_prompt: "There shOuld not be any other effects or images.",
      height: 512,
      width: 1024
    };

    const response = await env.AI.run(
      "@cf/bytedance/stable-diffusion-xl-lightning",
      inputs
    );

    return new Response(response, {
      headers: {
        "content-type": "image/png",
      },
    });

  },
};

export default {
  async fetch(request, env) {
    const inputs = {
      prompt: "create an image  that is 512x512. the background should be a solid, plain, yellow color. text over the background should say 'Learn How to Pronounce MySQL' in English. Text should be red and use an Arial font. ",
      negative_prompt: "There shOuld not be any other effects or images.",
      height: 512,
      width: 1024
    };

    const response = await env.AI.run(
      "@cf/bytedance/stable-diffusion-xl-lightning",
      inputs
    );

    return new Response(response, {
      headers: {
        "content-type": "image/png",
      },
    });

  },
};

export default {
  async fetch(request, env) {
    const inputs = {
      prompt: "create an image  that is 512x512. the background should be a solid, plain, yellow color. text over the background should say 'Learn How to Pronounce MySQL' in English. Text should be red and use an Arial font. ",
      negative_prompt: "There shOuld not be any other effects or images.",
      height: 512,
      width: 1024
    };

    const response = await env.AI.run(
      "@cf/bytedance/stable-diffusion-xl-lightning",
      inputs
    );

    return new Response(response, {
      headers: {
        "content-type": "image/png",
      },
    });

  },
};

K

katopz•9/7/24, 2:03 PM

SD is bad at text, Should be better when CF support FLUX i think

K

katopz•9/7/24, 2:06 PM

Text Embeddings also useless to me, only en no multi-language model

Hi, Issac. Is there a way to get rough estimation of neuron usage for a single stable diffusion infe

Similar Threads