It takes a long time because you're not using streaming. You're basically waiting for the LLM to gen

RRaylight It takes a long time because you're not using streaming. You're basically waitin...

K

Keebs•9/27/24, 1:59 AM

it takes 20 seconds to write a sentence?

K

Keebs•9/27/24, 1:59 AM

how’s any other external api then 30x faster?

K

Keebs•9/27/24, 2:00 AM

that still doesn’t make any sense

R

RaylightOP•9/27/24, 2:01 AM

Try the playground (https://playground.ai.cloudflare.com/) with your prompt and you'll get a feel for the perfomance.

K

Keebs•9/27/24, 2:01 AM

yeah takes a second

K

Keebs•9/27/24, 2:02 AM

pick meta llama
prompt: What is the origin of the phrase Hello, World

K

Keebs•9/27/24, 2:03 AM

alright you might be right

K

Keebs•9/27/24, 2:03 AM

it’s just super ass slow lol

K

Keebs•9/27/24, 2:03 AM

but streaming it won’t make the response faster, just streamed

F

Fernando Dilland•9/27/24, 2:08 AM

I'm using Llama via api.cloudflare with Bearer Auth. Where can I check my usage? If I exceed 10,000 free neurons, am I charged automatically or does it stop working?

B

Bibi•9/27/24, 12:26 PM

It stops working

B

Bibi•9/27/24, 12:26 PM

And their pricing is no longer in neurons

B

Bibi•9/27/24, 12:30 PM

https://developers.cloudflare.com/workers-ai/platform/pricing/#free-allocation

Cloudflare Docs

Pricing | Cloudflare Workers AI docs

Workers AI is included in both the Free and Paid Workers plans and is priced based on model task, model size, and units.

R

rob•9/27/24, 1:32 PM

neurons may have been the worst idea ever conceptualized

Mmichelle use `@cf/meta/llama-3.1-8b-instruct-fast` new model we launched yesterday, shoul...

K

Keebs•9/27/24, 7:34 PM

better :)

K

Keebs•9/27/24, 7:41 PM

oh btw. in the worker binding

K

Keebs•9/27/24, 7:41 PM

async env.AI.run()

K

Keebs•9/27/24, 7:41 PM

is there docs on this, i cant find them...

K

Keebs•9/27/24, 7:42 PM

im curious if theres a seed param like openai offers in the latest models

K

Keebs•9/27/24, 7:42 PM

to get more deterministic results

K

Keebs•9/27/24, 8:04 PM

almost looks like sending a seed param worked somehow

KKeebs is there docs on this, i cant find them...

R

RaylightOP•9/28/24, 8:18 AM

Check the model page(s) (https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast#Parameters). Beware that for the most part, models within a category list the same set of parameters. E.g. @hf/nousresearch/hermes-2-pro-mistral-7b doesn't support temperature or seed, even though those parameters are listed on the model page. (Reported it here workers-ai a while ago. Never got any response so no idea if it's intended to be that way). Also, expect some quirks. E.g. a subset of the models will break if you set max_tokens to 597 or higher (Reported here workers-ai)

I

icemelt_melt•9/28/24, 4:54 PM

what formats are support in cf/whisper?

RRaylight Check the model page(s) (<https://developers.cloudflare.com/workers-ai/models/ll...

K

Keebs•9/29/24, 12:20 AM

oh so each model has its own parameter, thanks

T

Tony Leung•9/29/24, 11:33 AM

it seems like seed in all stable diffusion models does not work at all. I'm trying to use the same seed and it all produced different output. Am I doing anything wrong?

E

echoes221•9/29/24, 4:55 PM

I'm trying to get started with workers ai function calling, for the life of me I can't get it it to call the function I've provided or get it to return in the schema shape that I've specified. I always need the response to be in the json schema that I've given. With open AI you could enforce this with tool_choice. is there an equivalent for workers ai / hermes 2?

C

crazyjack12•9/29/24, 5:07 PM

Isit the playground or something else that allows you to use aCF gui to build multiple interactions and outputs pipelines

I

icemelt_melt•9/29/24, 5:11 PM

please tell me the files supported in cf/whisper, no files work for me

Iicemelt_melt please tell me the files supported in cf/whisper, no files work for me

T

Tony Leung•9/29/24, 5:13 PM

have you tried wav format?

L

LM•9/30/24, 12:12 AM

Does anyone know how I can check if a model exists in the cloudflare API?

L

LM•9/30/24, 12:12 AM

Using the rest api

T

tadhglewis•9/30/24, 3:35 AM

Potentially dumb question (haven't used Workers AI much before) is it possible to parse (into a normalised object) and/or summarise pdfs and doc files? ChatGPT allows you to upload arbitary files and parse/summarise them. I know it's possible for images?

Eechoes221 I'm trying to get started with workers ai function calling, for the life of me I...

C

cookie•9/30/24, 9:33 AM

This would be nice, structured outputs/json mode

Ttadhglewis Potentially dumb question (haven't used Workers AI much before) is it possible t...

B

Brett (Fiberplane)•9/30/24, 3:37 PM

there is an example "chat with PDF" app that could help? https://x.com/rafalwilinski/status/1830911922805551560

Rafal Wilinski (@rafalwilinski) on X

Can you build a full-stack AI app on @cloudflare? The answer is: absolutely.

Introducing "Fullstack Cloudflare RAG" aka Chat with PDFs solution built 100% on Cloudflare using only their services.

Link to demo

Twitter

•

9/3/24, 10:13 AM

K

Kraii•9/30/24, 4:56 PM

are the beta models still free and unlimited?

K

Kraii•9/30/24, 4:57 PM

or does it count towards the free limit now

T

tadhglewis•9/30/24, 5:57 PM

https://developers.cloudflare.com/workers-ai/platform/pricing/

Cloudflare Docs

Pricing | Cloudflare Workers AI docs

Workers AI is included in both the Free and Paid Workers plans and is priced based on model task, model size, and units.

K

Kraii•9/30/24, 6:16 PM

yeah I checked that, it doesn't mention about beta models so I came here to confirm

T

tadhglewis•9/30/24, 6:26 PM

Standard models are large image models such as @cf/stabilityai/stable-diffusion-xl-base-1.0...

It lists this model in the pricing, which is beta.

C

cloudwhere?•9/30/24, 8:54 PM

Hi everyone! I need some help with integrating Cloudflare with mailchimp. Specifically, I’m looking for guidance on how to copy and paste the CNAME and A Record codes from mailchimp into Cloudflare in the right places without messsing up my company's connections, Where exactly should I put them? Any tips or steps would be greatly appreciated

T

thousandmiles•10/1/24, 2:57 AM

does gemini on workers ai support streaming and tool use?

R

Razmjoo•10/2/24, 1:20 PM

Hi everyone,

is there any way that we can request a new embedding model to be added to the list? one of the top MTEB Leaderboard > Bitext models is not available, and it would be great to have it in deployed models for workers

https://huggingface.co/intfloat/multilingual-e5-large

intfloat/multilingual-e5-large · Hugging Face

R

Rakhim•10/2/24, 1:57 PM

Hi everyone,
I'm new to LLMs, and I'm wondering what could I use to classify text into tags (I want to auto-tag blog posts, for example).

@cf/huggingface/distilbert-sst-2-int8

@cf/huggingface/distilbert-sst-2-int8

@cf/huggingface/distilbert-sst-2-int8

@cf/huggingface/distilbert-sst-2-int8 can seemingly only do sentiment classification.

�

😈 Donkey 💫•10/2/24, 7:29 PM

const validModel = ['gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-4o', 'gpt-4o-2024-08-06', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106']; // NO o1-mini
export default class extends WorkerEntrypoint<Env> {
    async fetch(request: Request): Promise<Response> {
        const { model, messages }: any = await request.json();
        return this.rpc(model, messages);
    }

    async rpc(_model: string, messages: Messages[], colo?: string): Promise<Response> {
        const model = _model?.split('/')[1]; // slice prefix openai/
        if (!model || !validModel.includes(model)) {
            return new Response(JSON.stringify({ message: 'Invalid model' }), { status: 400 });
        }

...
        const aiPayload = {
            model: model,
            messages: messages,
            max_tokens: 2048,
            stream: true
        }
...
        const response = await fetch(cfGateway, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${token}`
            },
            body: JSON.stringify(aiPayload),

        });

const validModel = ['gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-4o', 'gpt-4o-2024-08-06', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106']; // NO o1-mini
export default class extends WorkerEntrypoint<Env> {
    async fetch(request: Request): Promise<Response> {
        const { model, messages }: any = await request.json();
        return this.rpc(model, messages);
    }

    async rpc(_model: string, messages: Messages[], colo?: string): Promise<Response> {
        const model = _model?.split('/')[1]; // slice prefix openai/
        if (!model || !validModel.includes(model)) {
            return new Response(JSON.stringify({ message: 'Invalid model' }), { status: 400 });
        }

...
        const aiPayload = {
            model: model,
            messages: messages,
            max_tokens: 2048,
            stream: true
        }
...
        const response = await fetch(cfGateway, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${token}`
            },
            body: JSON.stringify(aiPayload),

        });

const validModel = ['gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-4o', 'gpt-4o-2024-08-06', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106']; // NO o1-mini
export default class extends WorkerEntrypoint<Env> {
    async fetch(request: Request): Promise<Response> {
        const { model, messages }: any = await request.json();
        return this.rpc(model, messages);
    }

    async rpc(_model: string, messages: Messages[], colo?: string): Promise<Response> {
        const model = _model?.split('/')[1]; // slice prefix openai/
        if (!model || !validModel.includes(model)) {
            return new Response(JSON.stringify({ message: 'Invalid model' }), { status: 400 });
        }

...
        const aiPayload = {
            model: model,
            messages: messages,
            max_tokens: 2048,
            stream: true
        }
...
        const response = await fetch(cfGateway, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${token}`
            },
            body: JSON.stringify(aiPayload),

        });

const validModel = ['gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-4o', 'gpt-4o-2024-08-06', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106']; // NO o1-mini
export default class extends WorkerEntrypoint<Env> {
    async fetch(request: Request): Promise<Response> {
        const { model, messages }: any = await request.json();
        return this.rpc(model, messages);
    }

    async rpc(_model: string, messages: Messages[], colo?: string): Promise<Response> {
        const model = _model?.split('/')[1]; // slice prefix openai/
        if (!model || !validModel.includes(model)) {
            return new Response(JSON.stringify({ message: 'Invalid model' }), { status: 400 });
        }

...
        const aiPayload = {
            model: model,
            messages: messages,
            max_tokens: 2048,
            stream: true
        }
...
        const response = await fetch(cfGateway, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${token}`
            },
            body: JSON.stringify(aiPayload),

        });

Someone from Russia was trying to break my backend worker with model

o1-mini

o1-mini

o1-mini

o1-mini
My worker is private without public endpoint, which is invoked by service binding via rpc.

I can't understand how this happens.
How o1-mini can bypass my validModel check.

The image is shot from logs AI gateway dashboard.

Any potential vulnerable here?

�😈 Donkey 💫```ts const validModel = ['gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-4o', 'gp...

I

Isaac McFadyen•10/2/24, 8:56 PM

Have you double-checked whether they were actually hitting the Worker or were hitting AI Gateway directly? I suspect that OpenAI returns "unsupported country" before checking the authentication, so if they found your AI Gateway URL and were hitting that then you'd see that.

�

😈 Donkey 💫•10/3/24, 3:38 AM

he was hitting my domain worker, 90% for sure

payload scheme was pre-defined same as inside my worker.
Analytics logs have corresponding requests at the same time too

It takes a long time because you're not using streaming. You're basically waiting for the LLM to gen

Similar Threads

Similar Threads

Similar Threads