can I use beta llm model in prod? I need the latest model (llama 3-8b) but its not out of beta yet.

FFieraRyan can I use beta llm model in prod? I need the latest model (llama 3-8b) but its n...

V

Victor•6/7/24, 11:25 PM

From a technical perspective, there's nothing stopping you. However beta comes without SLA guarantees and is simply "best effort". Usage is also free for the duration while in beta, but will be charged after. Also

Beta models may have lower rate limits while we work on performance and scale

https://developers.cloudflare.com/workers-ai/platform/limits/

Cloudflare Docs

Limits · Cloudflare Workers AI docs

Workers AI is now Generally Available. We’ve updated our rate limits to reflect this.

AAtony Hello, could you please ask if it is possible to integrate the Qwen2 series mode...

M

MissS•6/7/24, 11:45 PM

When qwen2 :O?

F

FieraRyanOP•6/8/24, 1:07 AM

Does this mean Paid Workers ($5) give 300000 neurons for free, or is this the upper limit?

U

Unsmart•6/8/24, 1:08 AM

its included usage, no upper limits beyond standard rate limits

MMissS When qwen2 :O?

I

Isaac McFadyen•6/8/24, 2:39 AM

Considering it was literally released yesterday, it might be a bit still

UUnsmart its included usage, no upper limits beyond standard rate limits

F

Fierylion•6/8/24, 1:48 PM

what does included mean? do you mean the $5 includes 300000 neurons?

R

rob•6/8/24, 7:53 PM

it's confusing

Rrob it's confusing

U

Unsmart•6/8/24, 8:19 PM

What is? The included usage? Not sure how else they could word it to say that you get usage included

F

Fierylion•6/8/24, 9:18 PM

are you sure its free 300000?

F

Fierylion•6/8/24, 9:19 PM

oh I see 10k a day, 30 days a month?

F

Fierylion•6/8/24, 9:19 PM

uh oh that means we get less than 10k a day for months with more than 30 days? xD

FFierylion uh oh that means we get less than 10k a day for months with more than 30 days? x...

I

Isaac McFadyen•6/9/24, 1:41 AM

You get 10k a day.

I

Isaac McFadyen•6/9/24, 1:41 AM

So in a month with 31 days, you get 310 000

J

jepcd•6/9/24, 4:23 AM

are there plans to add any multimodal models like clip?

I’d like to be able to store images and query them with text.

Using replicate rn, but would love to keep everything on cloudflare

Jjepcd are there plans to add any multimodal models like clip? I’d like to be able to ...

F

Fierylion•6/9/24, 4:08 PM

https://developers.cloudflare.com/workers-ai/models/llava-1.5-7b-hf/

Cloudflare Docs

llava-1.5-7b-hf · Cloudflare Workers AI docs

Run AI models in Workers, Pages, or via API.

F

Fierylion•6/9/24, 4:09 PM

llava 1.5 only support tiny tiny images (336x336) though

F

Fierylion•6/9/24, 4:09 PM

at least its cheap lol

F

Fierylion•6/9/24, 4:09 PM

I hope they add 1.6 which supports much bigger images

J

jepcd•6/9/24, 10:44 PM

yeah thats a bit too small unfortunately

Jjepcd yeah thats a bit too small unfortunately

F

Fierylion•6/10/24, 2:22 PM

to be fair CLIP models also only process tiny images 224X224 or 336x336

A

Afeelingmore•6/10/24, 11:53 PM

Hello everyone, I have a question regarding worker AI token management. I created a token through the REST API page, but I can’t seem to find where the tokens I’ve created are displayed. Can anyone help?

G

gauravmandall•6/11/24, 4:32 PM

I'm entering this command but somehow it's not working:

bunx wrangler vectorize create --dimensions=1536 ers-v1 --metric=cosine

bunx wrangler vectorize create --dimensions=1536 ers-v1 --metric=cosine

bunx wrangler vectorize create --dimensions=1536 ers-v1 --metric=cosine

bunx wrangler vectorize create --dimensions=1536 ers-v1 --metric=cosine

showing this error:

🚧 Creating index: 'ers-v1'

✘ [ERROR] A request to the Cloudflare API (/accounts/53e0c2270158721ff328e572f56950ea/vectorize/indexes) failed.

  vectorize.not_entitled [code: 1005]

  If you think this is a bug, please open an issue at:
  https://github.com/cloudflare/workers-sdk/issues/new/choose

🚧 Creating index: 'ers-v1'

✘ [ERROR] A request to the Cloudflare API (/accounts/53e0c2270158721ff328e572f56950ea/vectorize/indexes) failed.

  vectorize.not_entitled [code: 1005]

  If you think this is a bug, please open an issue at:
  https://github.com/cloudflare/workers-sdk/issues/new/choose

🚧 Creating index: 'ers-v1'

✘ [ERROR] A request to the Cloudflare API (/accounts/53e0c2270158721ff328e572f56950ea/vectorize/indexes) failed.

  vectorize.not_entitled [code: 1005]

  If you think this is a bug, please open an issue at:
  https://github.com/cloudflare/workers-sdk/issues/new/choose

🚧 Creating index: 'ers-v1'

✘ [ERROR] A request to the Cloudflare API (/accounts/53e0c2270158721ff328e572f56950ea/vectorize/indexes) failed.

  vectorize.not_entitled [code: 1005]

  If you think this is a bug, please open an issue at:
  https://github.com/cloudflare/workers-sdk/issues/new/choose

#cloudflare-typescript #wrangler #vectorize-beta

Ggauravmandall I'm entering this command but somehow it's not working: ```bunx wrangler vectori...

F

Fierylion•6/11/24, 5:12 PM

fyi tagging the channels doesn't do anything, you should post your question in the respective channels

F

Fierylion•6/11/24, 5:12 PM

also according to this u need to be on a paid plan to use vectorize https://github.com/cloudflare/workers-sdk/issues/4042

GitHub

🐛 BUG: Failed wrangler vectorize create because vectorize.not_entit...

Which Cloudflare product(s) does this pertain to? Wrangler core What version(s) of the tool(s) are you using? 3.10.0 [Wrangler] What version of Node are you using? 18.14.0 What operating system are...

L

linchpin•6/11/24, 10:26 PM

Hello, I am trying to upload

adapter_model.safetensors

adapter_model.safetensors

adapter_model.safetensors

adapter_model.safetensors to a created finetune and got error, see thread

AAfeelingmore Hello everyone, I have a question regarding worker AI token management. I create...

K

Kathy•6/12/24, 12:36 AM

search bar: search "token" and "API Tokens" will come up

K

Kathy•6/12/24, 12:36 AM

or can go to "My Profile" and click on "API Tokens"

F

Ferdi KIZILTOPRAK•6/13/24, 6:39 AM

Hello, do you plan to add Stable Diffusion 3 Medium to Model Catalog?

L

Luka•6/13/24, 11:44 AM

Any plans to add newer OS embedding models?

L

Luka•6/13/24, 11:44 AM

I'm building a speedy search where my API is in the worker and using bge small seems much nicer. But it has falled behind in constrast to the latest & greatest...

FFerdi KIZILTOPRAK Hello, do you plan to add Stable Diffusion 3 Medium to Model Catalog?

I

Isaac McFadyen•6/13/24, 1:40 PM

Given that Stability AI has an extremely restrictive commercial license, and that the model actually isn't that good compared to SDXL (especially the fine-tuned SDXL models) I highly doubt it.

I

icyfox•6/17/24, 1:25 PM

Are there plans to add constrained generation (specifically json output) to the LLMs and LORA hosting of models on Workers?

I know there was some conversation about this back in March, but haven't seen any updates since: workers-ai

R

rob•6/17/24, 7:06 PM

is there a list of text models on workers AI support json mode , if any?

Rrob is there a list of text models on workers AI support json mode , if any?

I

Isaac McFadyen•6/17/24, 8:27 PM

None currently support JSON-constrained outputs (although you can tell it to output JSON and try and parse it yourself). The person right above your message was asking for updates as well but as far as I know there are no updates.

R

rob•6/17/24, 8:43 PM

I went thru them all, looks like @hf/nousresearch/hermes-2-pro-mistral-7b suports json mode @icyfox

R

rob•6/17/24, 8:44 PM

https://developers.cloudflare.com/workers-ai/models/hermes-2-pro-mistral-7b/ at least thats what the docs said.

But im working on something now that should be able to get json back from most of the cf ai text models, still testing it

Cloudflare Docs

hermes-2-pro-mistral-7b · Cloudflare Workers AI docs

Run AI models in Workers, Pages, or via API.

Rrob I went thru them all, looks like @hf/nousresearch/hermes-2-pro-mistral-7b suport...

I

Isaac McFadyen•6/17/24, 9:02 PM

The model itself supports JSON mode, but it's not the same as JSON constraining. The model should output JSON if you prompt it properly, but it's not guaranteed like proper JSON constraining is.

R

rob•6/18/24, 12:36 AM

yeah makes sense

T

Tojak•6/18/24, 3:40 PM

Hi, how can I get statistics of workers AI usage? I don't see any api documentation for it? Is it possible?

IIsaac McFadyen None currently support JSON-constrained outputs (although you can tell it to out...

I

icyfox•6/18/24, 4:21 PM

Thanks @Isaac McFadyen | YYZ01, EWR01, that's what I assumed too based on the docs. Looks like it's not a good fit for my usecase (lora r limitations, lack of constrained schemas) but will follow along in case any updates land soon.

A

Ali_hat•6/19/24, 3:36 PM

Hey everyone, I'm currently using

llama-3-8b-instruct

llama-3-8b-instruct

llama-3-8b-instruct

llama-3-8b-instruct with the REST API. It works somewhat but it sometimes feels like it's trying to chat, remembering context from previous HTTP calls. Has anyone ran into similar issues? I've been refining the prompts using system, user and assistant roles.

R

rob•6/19/24, 3:46 PM

@Ali_hat what type of tasks are you trying to get from it ? or more so what's the use case

Iicyfox Thanks @Isaac McFadyen | YYZ01, EWR01, that's what I assumed too based on the do...

R

rob•6/19/24, 3:48 PM

I have something I made for this exctly, for getting models to return your structured data properly DM me if you want a link

I

icyfox•6/19/24, 6:25 PM

@rob Without logit-level constraints it's not possible to have a guarantee that it will return structured data every time.

R

rob•6/19/24, 6:26 PM

oh yea for sure. it'll either retry and error but it's a drastic increase over other available methods or having to prompt engineer every single model

Rrob @Ali_hat what type of tasks are you trying to get from it ? or more so what's t...

A

Ali_hat•6/19/24, 9:35 PM

Hi @rob, I basically use it for Q&A based on customer data. I don't need structured data as I only need plain text. Here's the code in a nutshell.

const systemContent = `You are a knowledgeable employee familiar with the company ${companyName}, responding to customer inquiries. Follow these guidelines:
    - Answer in the same language as the question.
    - Do not reveal your identity.
    - If you don't know the answer, admit it without making anything up.
    - Maintain a neutral tone.
    - Do not provide opinions or personal views.
    - Avoid asking for feedback.
    - Keep the conversation strictly to the point; do not engage in small talk or recommendations.
    - Do not apologize.
    - Do not initiate or continue small talk.
    - Do not use phrases like "I'm sorry" or "I apologize."`;
await got.post(`https://api.cloudflare.com/client/v4/accounts/${Env.CLOUDFLARE_ACCOUNT_ID}/ai/run/${model}`, {
      headers: { Authorization: `Bearer ${Env.CLOUDFLARE_WORKERS_AI_KEY}` },
      json: {
        max_tokens: 350,
        messages: [
          { role: 'system', content: systemContent },
          { role: 'user', content: `Question:${question}` },
          { role: 'assistant', content: context }
        ],
        temperature: 0.5
      }
    });

const systemContent = `You are a knowledgeable employee familiar with the company ${companyName}, responding to customer inquiries. Follow these guidelines:
    - Answer in the same language as the question.
    - Do not reveal your identity.
    - If you don't know the answer, admit it without making anything up.
    - Maintain a neutral tone.
    - Do not provide opinions or personal views.
    - Avoid asking for feedback.
    - Keep the conversation strictly to the point; do not engage in small talk or recommendations.
    - Do not apologize.
    - Do not initiate or continue small talk.
    - Do not use phrases like "I'm sorry" or "I apologize."`;
await got.post(`https://api.cloudflare.com/client/v4/accounts/${Env.CLOUDFLARE_ACCOUNT_ID}/ai/run/${model}`, {
      headers: { Authorization: `Bearer ${Env.CLOUDFLARE_WORKERS_AI_KEY}` },
      json: {
        max_tokens: 350,
        messages: [
          { role: 'system', content: systemContent },
          { role: 'user', content: `Question:${question}` },
          { role: 'assistant', content: context }
        ],
        temperature: 0.5
      }
    });

const systemContent = `You are a knowledgeable employee familiar with the company ${companyName}, responding to customer inquiries. Follow these guidelines:
    - Answer in the same language as the question.
    - Do not reveal your identity.
    - If you don't know the answer, admit it without making anything up.
    - Maintain a neutral tone.
    - Do not provide opinions or personal views.
    - Avoid asking for feedback.
    - Keep the conversation strictly to the point; do not engage in small talk or recommendations.
    - Do not apologize.
    - Do not initiate or continue small talk.
    - Do not use phrases like "I'm sorry" or "I apologize."`;
await got.post(`https://api.cloudflare.com/client/v4/accounts/${Env.CLOUDFLARE_ACCOUNT_ID}/ai/run/${model}`, {
      headers: { Authorization: `Bearer ${Env.CLOUDFLARE_WORKERS_AI_KEY}` },
      json: {
        max_tokens: 350,
        messages: [
          { role: 'system', content: systemContent },
          { role: 'user', content: `Question:${question}` },
          { role: 'assistant', content: context }
        ],
        temperature: 0.5
      }
    });

const systemContent = `You are a knowledgeable employee familiar with the company ${companyName}, responding to customer inquiries. Follow these guidelines:
    - Answer in the same language as the question.
    - Do not reveal your identity.
    - If you don't know the answer, admit it without making anything up.
    - Maintain a neutral tone.
    - Do not provide opinions or personal views.
    - Avoid asking for feedback.
    - Keep the conversation strictly to the point; do not engage in small talk or recommendations.
    - Do not apologize.
    - Do not initiate or continue small talk.
    - Do not use phrases like "I'm sorry" or "I apologize."`;
await got.post(`https://api.cloudflare.com/client/v4/accounts/${Env.CLOUDFLARE_ACCOUNT_ID}/ai/run/${model}`, {
      headers: { Authorization: `Bearer ${Env.CLOUDFLARE_WORKERS_AI_KEY}` },
      json: {
        max_tokens: 350,
        messages: [
          { role: 'system', content: systemContent },
          { role: 'user', content: `Question:${question}` },
          { role: 'assistant', content: context }
        ],
        temperature: 0.5
      }
    });

R

rob•6/19/24, 9:43 PM

remembering context from previous HTTP calls

what do you mean there?

Rrob remembering context from previous HTTP calls what do you mean there?

℠

℠•6/20/24, 3:13 AM

I’d bet my bald head that he means the model keeps training on chat-data instead of starting every chat-session fresh (from base model) .

I

Isaac McFadyen•6/20/24, 3:20 AM

Which is not the case - the model only remembers history/messages you give it (if anything) and is not automatically trained or tuned over time.

can I use beta llm model in prod? I need the latest model (llama 3-8b) but its not out of beta yet.

Similar Threads

Similar Threads

Similar Threads