does anyone know why cloudflare worker ai llama 3.1 is 3x slower than local llama 3.1 running on rt

does anyone know why cloudflare worker ai llama 3.1 is 3x slower than local llama 3.1 running on rtx3080? is there no way to speed this up? 30-40 seconds for text generation is insane. I get that it is free credits but damn that is kinda slow

Kkonedi does anyone know why cloudflare worker ai llama 3.1 is 3x slower than local lla...

Chaika•8/2/24, 6:46 PM

?channel-crossposting -> #workers-ai

Flare•8/2/24, 6:46 PM

Please do not post your question in multiple channels/post it multiple times per the rules at ⁠#

welcome-and-rules. It creates confusion for people trying to help you and doesn't get your issue or question solved any faster.

Chaika•8/2/24, 6:46 PM

you have the best chance of being answered there anyway

konediOP•8/2/24, 6:47 PM

got it. thanks. sorry, it just seems like all channels are somewhat dead so I needd to see which is the one that is best for this

Chaika•8/2/24, 6:50 PM

people respond/post questions in these channels and don't talk otherwise

does anyone know why cloudflare worker ai llama 3.1 is 3x slower than local llama 3.1 running on rtx3080? is there no way to speed this up? 30-40 seconds for text generation is insane. I get that it is free credits but damn that is kinda slow

AI team explain more in their channel, I'm no AI guy and don't know 100% their setup, but comparing local vs remote seems a bit silly. Workers AI is powered by a ton of shared GPUs, vs your one unshared gpu, and they've got lots of magic in front of it with request routing/etc to try to scale/shard requests. There's lots of different ways to run models too is my understanding, each with different quirks

kabocha•8/2/24, 10:17 PM

does anyone know how long a request to increase subrequest limit takes to process?

also is it possible to increase the subrequest limit on a free project, if the use case requires it?

DanGamble•8/3/24, 12:24 PM

Has anyone been able to get Remix + Vite + Workers working with queues in dev? Feel like you can only pass load-contextload-context to Vite which will just contain

env

env

stuff. The worker/server.tsworker/server.ts is never actually hit so it can't consume the queue

KKashall Is there any way for a worker to tell what region it is running?

Hello, I’m Allie!•8/3/24, 1:16 PM

When handling a fetch event, you can check request.cf.colorequest.cf.colo to get the IATA code

HHello, I’m Allie!When handling a fetch event, you can check `request.cf.colo` to get the IATA cod...

Brett Willis•8/4/24, 12:44 AM

Unless I'm mistaken, that is the client's closest datacentre, not necessarily where the worker is running if smart placement is enabled?

BBrett Willis Unless I'm mistaken, that is the client's closest datacentre, not necessarily wh...

Chaika•8/4/24, 1:16 AM

If smart placement is enabled then request.cf.colo would indeed just be the entry data center and not the worker is running in. You'd have to fetch /cdn-cgi/trace to get the colo then within the worker or look at the cf-placement response header (which you can't get/see within the worker).
In other cases request.cf.colo would be accurate though and it's way easier then those other options

Brett Willis•8/4/24, 2:32 AM

Would /cdn-cgi/trace/cdn-cgi/trace be against any Cloudflare "orange clouded" hostname?

BBrett Willis Would `/cdn-cgi/trace` be against any Cloudflare "orange clouded" hostname?

Chaika•8/4/24, 2:36 AM

You can just fetch https://cloudflare.com/cdn-cgi/tracehttps://cloudflare.com/cdn-cgi/trace

Chaika•8/4/24, 2:37 AM

any proxied/orange cloud hostname would work though yea, although you wouldn't want to depend on something that could change in the future. It's a pretty cheap subrequest as it should be handled by same machine/same location

Brett Willis•8/4/24, 2:37 AM

Ok understood

Barry_Based_Benson•8/4/24, 7:13 PM

What's the correct way to skip a prod worker when the locally-running version of that same worker also needs to access the same route so that it doesn't conflict with the prod worker's code?

Barry_Based_Benson•8/4/24, 7:19 PM

Currently I send a key in a header to skip if the env is prod but it seems a bit messy. Not sure if there's a more official way to do it

AngusMa•8/5/24, 5:36 AM

https://i.imgur.com/iAIi2Z7.png

Imgur

AngusMa•8/5/24, 5:36 AM

The devtools on the dashboard is blocked by Microsoft Edge

ItsWendell•8/5/24, 8:09 AM

I've spotted TraceMetrics while I was working on my tail worker in the typescript types for @clouflare/workers-types@clouflare/workers-types:

export interface TraceMetrics {
  readonly cpuTime: number;
  readonly wallTime: number;
}
export interface UnsafeTraceMetrics {
  fromTrace(item: TraceItem): TraceMetrics;
}

export interface TraceMetrics {
  readonly cpuTime: number;
  readonly wallTime: number;
}
export interface UnsafeTraceMetrics {
  fromTrace(item: TraceItem): TraceMetrics;
}

Is this already available under a flag?

IItsWendell I've spotted TraceMetrics while I was working on my tail worker in the typescrip...

Isaac McFadyen•8/5/24, 2:22 PM

Usually the types marked unsafe are either for internal use only or only in the non-production (i.e. selfhosted) workerdworkerd runtime.

IIsaac McFadyen Usually the types marked unsafe are either for internal use only or only in the ...

ItsWendell•8/5/24, 2:24 PM

Yeah I already suspected that, I even tried to see if the production global scope of a tail worker had these available but sadly no, would be great to have it available in a tail / trace worker to better debug CPU spikes

Rohan•8/5/24, 2:26 PM

Hey guys im new to cloudflare so bear with me
my question is whether I can run a tensorflow.js program in worker
its basically a model to convert chess images to FEN(notation)
from what i searched the answer is kinda vague

Rohan•8/5/24, 2:27 PM

I dont think it can run in python worker from what i searched since its pure python

Rohan•8/5/24, 2:29 PM

https://github.com/cloudflare/tensorflow-nata (look at the end of readme file)
they kinda something similar here but also used a gpu server so idk

GitHub

GitHub - cloudflare/tensorflow-nata: Our model uses a convolutional...

Our model uses a convolutional neural network and TensorFlow to infer if an image is a Pastel de Nata or not. We trained it with thousands of Portuguese egg custard tart (Pasteis de Nata) images an...

RRohan Hey guys im new to cloudflare so bear with me my question is whether I can run ...

Isaac McFadyen•8/5/24, 2:30 PM

That's correct (re Python workers). You might be able to run via Tensorflow.js assuming the combination of runtime and model is light enough - under 10MB combined.

Isaac McFadyen•8/5/24, 2:30 PM

I'm assuming tensorflow.js uses WASM which is supported on Workers with the caveat of the 10MB size limit total.

Rohan•8/5/24, 2:32 PM

Okay thank you so much

0xKIBO•8/5/24, 3:32 PM

How can I import hosted JS code in Cloudflare workers

00xKIBO How can I import hosted JS code in Cloudflare workers

Hard@Work•8/5/24, 3:36 PM

You cannot. All code you wish to run must be already deployed to Cloudflare

kunal•8/5/24, 9:34 PM

I couldn't tell, how is the Workers Rate Limiting functionality billed?
https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/

kunal•8/5/24, 9:36 PM

I see it was free during the beta period, but curious how it will be charged after it graduates

Brendan•8/5/24, 10:26 PM

Does anyone know if it is possible to send an email to <SOME_ADDRESS>@example.com
As noreply@<MY_DOMAIN>.com

KKashall How do you include encrypted secrets in the wrangler.toml, so that if someone wa...

Erisa•8/6/24, 5:45 AM

you dont really, I usually add them as comments to the wrangler.toml

Ashley•8/6/24, 8:39 AM

And if they are encrypted because they are genuine secrets though, e.g. API keys, then they shouldn't be kept in plaintext at all and whoever wanted to use/fork your worker would need to add it themselves - if they are not secret and just need to be set in the environment, you can set environment variables

delta•8/6/24, 8:55 AM

Is there something like io.LimitedReader in worker?

BBrendan Does anyone know if it is possible to send an email to <SOME_ADDRESS>@example.co...

delta•8/6/24, 9:03 AM

you need to host smtp server or use hosted service like aws ses

Ddelta Is there something like io.LimitedReader in worker?

delta•8/6/24, 1:03 PM

solved using transformStream

iska•8/6/24, 4:10 PM

why prisma D1 integration doesnt work out of the box and i have to run trizillions commands to get it to work ?

iska•8/6/24, 4:13 PM

which orm has a perfect integration with D1 ?

iska•8/6/24, 4:13 PM

and optionally not so heavy like prisma 2mb yikes

Iiska which orm has a perfect integration with D1 ?

Hard@Work•8/6/24, 4:14 PM

I like https://orm.drizzle.team/

Drizzle ORM - next gen TypeScript ORM.

Drizzle ORM is a lightweight and performant TypeScript ORM with developer experience in mind.

iska•8/6/24, 4:15 PM

thanks gonna check them out , hopefully it blends in fast with easy dev exp

Rovertgamehead•8/6/24, 5:52 PM

Hey @Vero - @andrew_nyr - @Chaika
Iv'e got a question for ya'll
--> https://discord.com/channels/595317990191398933/1270436774648221828

RRovertgamehead Hey @Vero - @andrew_nyr - @Chaika Iv'e got a question for ya'll --> https://...

andrew_nyr•8/6/24, 5:53 PM

?pings

Flare•8/6/24, 5:53 PM

Please do not ping community members for non-moderation reasons. Doing so will not solve your issue faster and will make people less likely to want to help you.

Chaika•8/6/24, 5:53 PM

Isn't related to this channel at all either

Aandrew_nyr ?pings

Rovertgamehead•8/6/24, 5:54 PM

Yeah I did what about it?

CChaika Isn't related to this channel at all either

Rovertgamehead•8/6/24, 5:54 PM

I need help from the workers?

does anyone know why cloudflare worker ai llama 3.1 is 3x slower than local llama 3.1 running on rt

Similar Threads

Similar Threads

Similar Threads