Cloudflare Developers•16mo ago

yes when you change code

KeebsOP•9/26/24, 7:53 PM

KeebsOP•9/26/24, 7:53 PM

thats openai

KKeebs yes when you change code

Bibi•9/26/24, 7:55 PM

Ye so you’re running the code multiple times per code change

Bibi•9/26/24, 7:55 PM

Because it says response time only once that’s why I’m wondering

KeebsOP•9/26/24, 7:55 PM

cause its one request

KeebsOP•9/26/24, 7:55 PM

KeebsOP•9/26/24, 7:55 PM

im not sure how to tell you this, but its not that lol

KKeebs Click to see attachment

Bibi•9/26/24, 7:56 PM

Ah I see now

Bibi•9/26/24, 7:57 PM

Well I thought that maybe it has to load the model into memory the first time it runs on a worker so it takes so long and following requests are faster

Bibi•9/26/24, 7:57 PM

But ye that’s not the case

KeebsOP•9/26/24, 7:57 PM

its not loading anything into any memory

Bibi•9/26/24, 7:57 PM

How is it supposed to run the model then

Bibi•9/26/24, 7:57 PM

At some point that has to be done

Bibi•9/26/24, 7:58 PM

Even if it’s not right before you access it

KeebsOP•9/26/24, 7:58 PM

you think they load an instance of a llm into some memory somewhere for every worker that invokes it?

Bibi•9/26/24, 7:58 PM

Only if it wasn’t in memory already, ya

Bibi•9/26/24, 7:59 PM

“Some” memory being the ram of the machine your worker code runs on

Bibi•9/26/24, 8:00 PM

Or rather its vram

KeebsOP•9/26/24, 8:07 PM

and then where is it executed?

KeebsOP•9/26/24, 8:07 PM

on a vcpu in the worker?

KKeebs on a vcpu in the worker?

Bibi•9/26/24, 8:23 PM

Why a cpu? Cf has gpus in their machines

Bibi•9/26/24, 9:01 PM

Huh

Bibi•9/26/24, 9:02 PM

Why thumbs up

dexter•9/27/24, 12:46 AM

does flux work for anyone?
i'm getting

AiExternal: Couldn't fetch external AI provider response (500, Internal Server Error)

AiExternal: Couldn't fetch external AI provider response (500, Internal Server Error)

dexter•9/27/24, 12:46 AM

using the code provided in the docs here: https://developers.cloudflare.com/workers-ai/models/flux-1-schnell

KKeebs ```js const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", { ...

Raylight•9/27/24, 1:58 AM

It takes a long time because you're not using streaming. You're basically waiting for the LLM to generate the whole text before returning a response.

RRaylight It takes a long time because you're not using streaming. You're basically waitin...

KeebsOP•9/27/24, 1:59 AM

it takes 20 seconds to write a sentence?

KeebsOP•9/27/24, 1:59 AM

how’s any other external api then 30x faster?

KeebsOP•9/27/24, 2:00 AM

that still doesn’t make any sense

Raylight•9/27/24, 2:01 AM

Try the playground (https://playground.ai.cloudflare.com/) with your prompt and you'll get a feel for the perfomance.

KeebsOP•9/27/24, 2:01 AM

yeah takes a second

KeebsOP•9/27/24, 2:02 AM

pick meta llama
prompt: What is the origin of the phrase Hello, World

KeebsOP•9/27/24, 2:03 AM

alright you might be right

KeebsOP•9/27/24, 2:03 AM

it’s just super ass slow lol

KeebsOP•9/27/24, 2:03 AM

but streaming it won’t make the response faster, just streamed

Fernando Dilland•9/27/24, 2:08 AM

I'm using Llama via api.cloudflare with Bearer Auth. Where can I check my usage? If I exceed 10,000 free neurons, am I charged automatically or does it stop working?

Bibi•9/27/24, 12:26 PM

It stops working

Bibi•9/27/24, 12:26 PM

And their pricing is no longer in neurons

Bibi•9/27/24, 12:30 PM

https://developers.cloudflare.com/workers-ai/platform/pricing/#free-allocation

Cloudflare Docs

Pricing | Cloudflare Workers AI docs

Workers AI is included in both the Free and Paid Workers plans and is priced based on model task, model size, and units.

rob•9/27/24, 1:32 PM

neurons may have been the worst idea ever conceptualized

Mmichelle use `@cf/meta/llama-3.1-8b-instruct-fast` new model we launched yesterday, shoul...

KeebsOP•9/27/24, 7:34 PM

better :)

KeebsOP•9/27/24, 7:41 PM

oh btw. in the worker binding

KeebsOP•9/27/24, 7:41 PM

async env.AI.run()

KeebsOP•9/27/24, 7:41 PM

is there docs on this, i cant find them...

KeebsOP•9/27/24, 7:42 PM

im curious if theres a seed param like openai offers in the latest models

KeebsOP•9/27/24, 7:42 PM

to get more deterministic results

KeebsOP•9/27/24, 8:04 PM

almost looks like sending a seed param worked somehow

KKeebs is there docs on this, i cant find them...

Raylight•9/28/24, 8:18 AM

Check the model page(s) (https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast#Parameters). Beware that for the most part, models within a category list the same set of parameters. E.g. @hf/nousresearch/hermes-2-pro-mistral-7b doesn't support temperature or seed, even though those parameters are listed on the model page. (Reported it here workers-ai a while ago. Never got any response so no idea if it's intended to be that way). Also, expect some quirks. E.g. a subset of the models will break if you set max_tokens to 597 or higher (Reported here workers-ai)

yes when you change code

Similar Threads