if it's a text generation model you can lower the `max_tokens`

if it's a text generation model you can lower the

max_tokens

max_tokens

max_tokens

max_tokens

codebamOP•5/15/24, 9:28 AM

you'll get shorter replies though

Ccodebam if it's a text generation model you can lower the `max_tokens`

Shanmukeshwar•5/15/24, 9:28 AM

I managed the UI to stream the response

Shanmukeshwar•5/15/24, 9:29 AM

Thanks for the reply

raj.js•5/15/24, 5:22 PM

How much is prompt limit for Llama 3 model? Is there any model on cloudfalre which has atleast 8k token limit

Rraj.js How much is prompt limit for Llama 3 model? Is there any model on cloudfalre whi...

codebamOP•5/15/24, 7:59 PM

4096 characters. no

Isaac McFadyen•5/16/24, 2:37 AM

^ just to clarify this: that's tokens, not characters.

Isaac McFadyen•5/16/24, 2:38 AM

A token is almost always more than 1 character (to see this for yourself, you can put your text in here, although note that this is not the same tokenizer that the Workers AI models use: https://platform.openai.com/tokenizer)

Isaac McFadyen•5/16/24, 2:38 AM

But yes, 4096 tokens maximum (for Mistral v0.2).

AndyJessop•5/16/24, 2:05 PM

Does anyone know where I can find a price for Llama3-8b-instruct? There's no pricing for it on the models page, and it's not in the dropdown on the calculator.

https://developers.cloudflare.com/workers-ai/platform/pricing/#text-generation
https://ai.cloudflare.com/#pricing-calculator

AAndyJessop Does anyone know where I can find a price for Llama3-8b-instruct? There's no pri...

Isaac McFadyen•5/16/24, 2:21 PM

https://developers.cloudflare.com/workers-ai/platform/pricing/ It's still beta so unbilled currently.

Isaac McFadyen•5/16/24, 2:21 PM

>You can still enjoy unlimited usage on the beta models in the catalog until they graduate out of beta.

IIsaac McFadyen >You can still enjoy unlimited usage on the beta models in the catalog until the...

AndyJessop•5/16/24, 2:26 PM

That's great news, I thought I was going to bre in for a huge bill...

Mrinank•5/16/24, 2:28 PM

does langsmith works with workers AI?

Sscotto whister should work with webm format? in the openai docs says it's supported but...

scotto•5/16/24, 10:41 PM

news?

Isaac McFadyen•5/16/24, 11:34 PM

Note that depending on your source of WebM files (if it's the browser microphone) you can take a look at that link I sent to force the browser to output a more standardized format

IIsaac McFadyen But yes, 4096 tokens maximum (for Mistral v0.2).

Raylight•5/17/24, 2:36 PM

The limit for Llama 3 (on Cloudflare) seems to be ~2800 tokens though. The model stops mid-sentence when prompt/messages + generated text goes above that limit.

RRaylight The limit for Llama 3 (on Cloudflare) seems to be ~2800 tokens though. The model...

Isaac McFadyen•5/17/24, 2:40 PM

Interesting - doesn't seem to be listed on the model page but the limit for LLaMA 2 is 2048 tokens so perhaps they share similar limits? https://developers.cloudflare.com/workers-ai/models/llama-2-7b-chat-int8/#properties

Isaac McFadyen•5/17/24, 2:40 PM

The 4096 limit I found was for Mistral v0.2: https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.2/

Isaac McFadyen•5/17/24, 2:40 PM

So I expect each model is different.

Isaac McFadyen•5/17/24, 2:41 PM

Edited my original message to clarify, thanks for bringing it up

IIsaac McFadyen Interesting - doesn't seem to be listed on the model page but the limit for LLaM...

Raylight•5/17/24, 3:08 PM

No idea. I'm reconstructing the raw prompt and counting with the total with https://github.com/belladoreai/llama3-tokenizer-js. The limit is 2827 for some reason.

IIsaac McFadyen Note that depending on your source of WebM files (if it's the browser microphone...

scotto•5/17/24, 7:30 PM

sure that works on mobile?

Sscotto sure that works on mobile?

Isaac McFadyen•5/17/24, 7:40 PM

I'm not sure, no. You'd have to test.

Rraj.js How much is prompt limit for Llama 3 model? Is there any model on cloudfalre whi...

Raylight•5/17/24, 8:57 PM

Looks like

@cf/qwen/qwen1.5-7b-chat-awq

@cf/qwen/qwen1.5-7b-chat-awq

has a context length way above 8k. Managed to chop up and squeeze in ~20k tokens, ~100k characters, and ask a question about a detail buried in the beginning of the text.

SciCat•5/18/24, 8:23 PM

Hello I was wondering, do anyone know if it is possible to pass image as input ?

Sergey•5/20/24, 6:04 AM

Hey! Is there any rough ETA when text2img models will get out of beta?

SSciCat Hello I was wondering, do anyone know if it is possible to pass image as input ?

AngusMa•5/20/24, 2:06 PM

@cf/llava-hf/llava-1.5-7b-hf

@cf/llava-hf/llava-1.5-7b-hf

and

@cf/unum/uform-gen2-qwen-500m

@cf/unum/uform-gen2-qwen-500m

accept an image as input.
https://developers.cloudflare.com/workers-ai/models/#image-to-text

Cloudflare Docs

Models · Cloudflare Workers AI docs

Browse our entire catalog of models.

AngusMa•5/20/24, 2:11 PM

The model

bge-large-en-v1.5

bge-large-en-v1.5

appears on the model list, but not on the dashboard.
Was it forgotten?

AAngusMa The model `bge-large-en-v1.5` appears on the model list, but not on the dashboar...

Raylight•5/20/24, 2:31 PM

@cf/openchat/openchat-3.5-0106

@cf/openchat/openchat-3.5-0106

@cf/qwen/qwen1.5-14b-chat-awq

@cf/qwen/qwen1.5-14b-chat-awq

@hf/google/gemma-7b-it

@hf/google/gemma-7b-it

are also gone. Seems like the dashboard has a limit of 50 models.

Mmichelle taking a look - definitely not the intention.

Chaika•5/20/24, 6:10 PM

you've probably already figured it out, but fwiw the per_page query param on the search models endpoint has been broken for a while, I forgot to mention it and have just been working around it lol, it's always 50 no matter what you specify (in this case, the dash is trying 1000, but getting 50 back, and clicking through the list pages in the dash isn't paginating via the api)

Mrinank•5/20/24, 7:32 PM

can we use pinecone or upstash vector db in cf workers??

MMrinank can we use pinecone or upstash vector db in cf workers??

Mrinank•5/20/24, 7:35 PM

Getting an error while connecting : const pinecone = new Pinecone({ apiKey: '' });

[ERROR] Error compiling schema, function code: const schema2 = scope.schema[2];const schema1 = scope.schema[1]......

X [ERROR] Error in fetch handler: EvalError: Code generation from strings disallowed for this context

[ERROR] Error compiling schema, function code: const schema2 = scope.schema[2];const schema1 = scope.schema[1]......

X [ERROR] Error in fetch handler: EvalError: Code generation from strings disallowed for this context

MMrinank Getting an error while connecting : **const pinecone = new Pinecone({ apiKey: ''...

Isaac McFadyen•5/20/24, 8:50 PM

eval

eval

new Function

new Function

cannot be used in Workers for security reasons. Based on the error message I assume it's trying to do some sort of schema validation which often uses new Function.

Isaac McFadyen•5/20/24, 8:51 PM

If you have control over it then you can switch to a schema library that doesn't use new Function but if it's part of the pinecode library then that on't work.

Isaac McFadyen•5/20/24, 8:51 PM

You might be able to manually connect to Pinecone over HTTP if they have an HTTP API, or (more work) over TCP if needed.

Llui Hi! I'm evaluating the usage of Whisper from a cloudflare worker for the company...

catnaut.•5/21/24, 7:45 AM

encountered a similar issue. From my reading of the documentation, it seems that the only required input parameter is audio. However, I'm also interested in utilizing other parameters such as prompt, which are crucial for standardizing transcription format and providing hot words. Can anyone guide me on how to use these additional parameters with Whisper? Additionally, I'm looking for an OpenAI-compatible whisper endpoint, so I can migrate OpenAI services to Cloudflare. Any help would be greatly appreciated!

falex•5/23/24, 5:33 PM

I am working using '@hf/meta-llama/meta-llama-3-8b-instruct' in stream mode. the JSON answer has the below structure : data: {"response":" to","p":"abcdef"}
data: {"response":"Once","p":"abcdef"} . I not sure what's means the "p" parameter . any comments

falex•5/23/24, 5:34 PM

For example in this case "p" has a single letter data: {"response":" this","p":"a"}

Ffalex For example in this case "p" has a single letter data: {"response":" this","p":...

Chaika•5/23/24, 5:42 PM

https://blog.cloudflare.com/ai-side-channel-attack-mitigated

The Cloudflare Blog

Mitigating a token-length side-channel attack in our AI products

The Workers AI and AI Gateway team recently collaborated closely with security researchers at Ben Gurion University regarding a report submitted through our Public Bug Bounty program. Through this process, we discovered and fully patched a vulnerability affecting all LLM providers. Here’s the story

Chaika•5/23/24, 5:42 PM

it's a defense against that

is just random data

Nnoman i'm often getting these cancelled responses inside my worker with no info about ...

jason•5/24/24, 5:48 AM

@noman did you figure out why this is happening, I"m getting the same thing today with no reason as to why it might be happening?

POST http://workers-binding.ai/run?version=3 - Canceled @ 24/05/2024, 06:28:16

POST http://workers-binding.ai/run?version=3 - Canceled @ 24/05/2024, 06:28:16

scotto•5/24/24, 5:24 PM

any news on the whisper webm format support? seems be reported several times on discord , quite importnat for web support

if it's a text generation model you can lower the `max_tokens`

Similar Threads

Similar Threads

Similar Threads