Hi everyone. I wonder if someone could help me fill some gaps in my knowledge relating to using Vect

Hi everyone. I wonder if someone could help me fill some gaps in my knowledge relating to using Vectorize (and vectorisation generally) to help with AI data analysis on potentially large datasets.

Without Vectorize, I can send data to an LLM for analysis/categorisation etc., but I'm of course limited by tokens as to how much data I can send. I have done some reading and I understand that getting my data into a Vectorize DB, then querying that as a preliminary step to sending data off to the LLM for analysis, might be the answer here.

But I'm hazy on the exact steps/rationale. Specifically, how does this help limit the amount of data I ultimately send to the LLM? Is the idea that Vectorize reduces it to a subset, so duplicates or near-duplicates are merged/omitted, or...? (Sorry for the long message - just looking for some high-level guidance!)

MMitya Hi everyone. I wonder if someone could help me fill some gaps in my knowledge re...

Isaac McFadyen•12/4/24, 3:43 PM

So the idea with RAG is this:

you embed all your documents and store them in a vector DB (vectorize or otherwise)
when a user question comes in, you embed the user question
you do a search in the vector DB for "top <n> items" where
```
n
```
```
n
```
is 3, or 5, or 10, or higher depending on your use-case
you give the results to the LLM along with the query

The idea is that, because embeddings between related items are similar, when you query the vector DB with the embedding of the user question, you're likely to get back things in the vector DB related to the user question. Then you give that to the LLM to actually answer the question based on the gathered knowledge.

IIsaac McFadyen So the idea with RAG is this: - you embed all your documents and store them in a...

MityaOP•12/4/24, 3:53 PM

Thank you, that's really helpful. So this model depends on a particular question as a filter, and is less useful if, say, I simply want to run sentiment analysis on 100,000 user messages. Right?

MMitya Thank you, that's really helpful. So this model depends on a particular question...

Isaac McFadyen•12/4/24, 3:55 PM

So you can definitely run sentiment analysis using an LLM, but you wouldn't really need a RAG pipeline for that unless your sentiment depends on some external knowledge you want to give the LLM.

MityaOP•12/4/24, 3:56 PM

Right. My thinking for introducing RAG was to get round the issue of there being a limit on how much data I can send to an LLM in one go. Perhaps that requires a different solution i.e. piecemeal analysis on 1000 messages at a time and then derive an average of all the sentiments or something.

MMitya Right. My thinking for introducing RAG was to get round the issue of there being...

Isaac McFadyen•12/4/24, 3:57 PM

Do the sentiments relate to one another?

Isaac McFadyen•12/4/24, 3:57 PM

As in, usually you'd only send a few messages at a time, but do you need the context of all the messages at once to derive overall sentiment?

MityaOP•12/4/24, 3:59 PM

Probably yes, because the user messages are written freely. A real example: Users might feed back to the question "What do you think of our show?". And we'd like to do two things: 1) give sentiment analysis e.g. people are mostly happy; 2) have the LLM give suggestions e.g. "People want more matinée shows"

MMitya Probably yes, because the user messages are written freely. A real example: User...

Isaac McFadyen•12/4/24, 3:59 PM

I see. So usually sending all the messages at once is going to significantly degrade your result and cost a lot.

Isaac McFadyen•12/4/24, 4:00 PM

Instead you'd analyze each message individually, and then do what you want with the statistics.

Isaac McFadyen•12/4/24, 4:00 PM

That also allows different kinds of reporting: "10 users disliked it but the majority thought it was great" or "1% of users gave <x> feedback about the show" etc.

Isaac McFadyen•12/4/24, 4:01 PM

The only scenario where you'd want to send all the messages at once to an LLM would be if each message depended on each other.

Isaac McFadyen•12/4/24, 4:02 PM

In general the more data you send to an LLM the worse the quality gets as well. For example, Gemini advertises 1M context length but as soon as you get past about 100k the quality plummets (based on multiple reports from different sources)

MityaOP•12/4/24, 4:03 PM

Interesting - didn't realise that. There's obvious cost/latency issues with sending more data but didn't realise the quality dropped off. OK so you're suggesting (if I understand correctly) analysing each message as it comes in and storing that analysis locally, for later aggregation/insights.

MMitya Probably yes, because the user messages are written freely. A real example: User...

Isaac McFadyen•12/4/24, 4:05 PM

Yeah, basically. So in this specific scenario I'd do this:

grab a message, ask the LLM something like:
Analyze this comment about this show <give context on the show here>. Respond with an overall sentiment, such as "positive" or "negative", and the most prominent piece of feedback in the user review
throw that into a DB somewhere
then you can analyze "10k reviews were positive, 10k were negative" and then also "100 feedbacks said that the show should add <x> part"

Isaac McFadyen•12/4/24, 4:06 PM

The trickiest part here is going to be matching user feedback so that you can say "100 users gave this feedback" but you can actually maybe do that with a vector DB because similar feedbacks will have similar embeddings.

Isaac McFadyen•12/4/24, 4:06 PM

That also allows the user to ask "how many reviews thought the show should do <x> better"

MityaOP•12/4/24, 4:09 PM

OK cool so does that mean storing the analysis from the LLM in vectors? So for example, if the LLM, when asked to return the most prominent piece of feedback from the message, gives:

The show is too long
The show goes on too long
The show finishes too late
... i.e. all the same, feed those into vectors, which will harmonise them as they'll be similar embeddings?

MMitya OK cool so does that mean storing the analysis from the LLM in vectors? So for e...

Isaac McFadyen•12/4/24, 4:09 PM

Exactly, yup!

Isaac McFadyen•12/4/24, 4:10 PM

And then you embed "what are feedbacks about the show length", and query the vector DB, and it's going to most likely return all of the feedbacks about show length.

Isaac McFadyen•12/4/24, 4:10 PM

Then you feed that to LLM, and the LLM can say "well the vector DB gave 10 feedbacks, all said the show was too long" and then give that to the user.

MityaOP•12/4/24, 4:11 PM

Right! When you say:

embed "what are the feedbacks about the show length"

...by "embed" here, does that mean "store"? So this is something I'd store in Vectorize, rather than merely query it against it?

MMitya Right! When you say: > _embed "what are the feedbacks about the show length"_ .....

Isaac McFadyen•12/4/24, 4:20 PM

When I say "embedding" I mean to convert text -> a vector representation. You use another model for this, like

jina-embeddings-v3

jina-embeddings-v3

mxbai-embed-large-v1

mxbai-embed-large-v1

or similar (OpenAI has their own too, basically every big provider does).

Isaac McFadyen•12/4/24, 4:21 PM

So you embed and store the user feedbacks, but then to query them based on a question you don't need to store the question, you just embed (convert to numbers) the question and ask the vector DB to return similar vectors.

MityaOP•12/4/24, 4:24 PM

Ah that makes perfect sense! Finally on this, I wonder if AI functions could be useful. That way, I can predict the LLM response format e.g. perhaps argument 1 passed to my function is sentiment ("good") and argument 2 is the piece of feedback. I could even route the LLM towards predefined feedback categories rather than allowing it to freeform ("Valid feedback categories are length, humour, food, other - pick from these.")

MMitya Ah that makes perfect sense! Finally on this, I wonder if AI functions could be ...

Isaac McFadyen•12/4/24, 4:26 PM

Yup, that makes sense! You have two options there:

functions, where it calls a function with the sentiment
structured generation, where you force it to return a valid JSON object with a key "sentiment"
Doesn't really matter which, you can try both and see which performs better.

Isaac McFadyen•12/4/24, 4:27 PM

I'd also err on the side of storing too much (ask the LLM to return sentiment, feedback as a string, a bunch of other stuff) because then you don't need to re-process through the LLM if you need more data later.

MityaOP•12/4/24, 4:28 PM

Perfect - didn't even know about structured generation outside Functions. That sounds ideal. OK cool, I'll no doubt be back but I'll go do some playing. Thanks so much for your kind help, it's super appreciated.

Isaac McFadyen•12/4/24, 4:34 PM

No problem, good luck

_.Mass._•12/11/24, 5:11 AM

hi, I am evaluating Vectorize as the data store for an AI RAG application I am building for work. I noticed the metadata filtering does not implement

$gte

$gte

and

$lte

$lte

matching. Are those operators on the roadmap?

MityaOP•12/13/24, 6:01 PM

Are there any plans to allow the creation of meta indexes after vectors have been inserted, rather than before as currently?

MityaOP•12/13/24, 6:05 PM

The docs don't really cover how to make a decision re: the

dimensions

dimensions

and

metric

metric

parameters when creating an index. The RAG tutorial suggests 768 and cosine, while the Get started suggests a mere 32 and Euclidian. Could anyone help me understand how to choose here? Is it a trade-off of some kind? Thank you.

MMitya The docs don't really cover how to make a decision re: the `dimensions` and `met...

Isaac McFadyen•12/13/24, 6:17 PM

Sure. So dimension is the number of numbers (say that 5 times fast haha) in your embedding and metric is how they're compared.

So for example, OpenAI has the

text-embedding-3-large

text-embedding-3-large

model which outputs dimension

3,072

3,072

, or

text-embedding-3-small

text-embedding-3-small

which outputs

1,536

1,536

, etc. When you embed 1 text they will give you back an array of <n> numbers where <n> is the dimension. With OpenAI you can request smaller embeddings by specifying the dimension you want, but they'll lose some accuracy as a result.

Metric is how to match embeddings. Generally you want

cosine

cosine

unless the model/provider tells you otherwise (OpenAI recommends

cosine

cosine

for their models). Euclidian also works and will give identical numbers to

cosine

cosine

(if the embeddings are normalized which they often are) but

cosine

cosine

is a bit faster and has some other benefits.

MityaOP•12/13/24, 6:20 PM

Thanks @Isaac McFadyen, much appreciated. So it's cosine for metric, then. But in terms of dimension, is it just a case of choosing a bigger number for more accurate (similar) results? Just not sure what I should be choosing here. Basically we have a customer feedback platform and want to pull out similar bits of feedback for clients.

MMitya Thanks @Isaac McFadyen, much appreciated. So it's cosine for metric, then. But i...

Isaac McFadyen•12/13/24, 6:23 PM

So you'll need to match the number to the model. Some models can output lower dimensions (in the case of OpenAI) but they can't ever output larger dimensions. It's really a cost (storage space/query time) versus accuracy trade-off.

Isaac McFadyen•12/13/24, 6:23 PM

Vectorize also has a limit on the dimension size.

Isaac McFadyen•12/13/24, 6:23 PM

https://developers.cloudflare.com/vectorize/platform/limits/ looks like it's limited to

for dimension.

Isaac McFadyen•12/13/24, 6:24 PM

I'd take a look at the documentation for your model. If it's higher than

then you'll obviously have to somehow reduce it to

but if it's lower then you just have to use the lower dimension.

is common for open source models.

IIsaac McFadyen I'd take a look at the documentation for your model. If it's higher than `1536` ...

MityaOP•12/13/24, 8:46 PM

Thanks very much for the info. I'll do some reading.

MityaOP•12/13/24, 9:17 PM

Just checking some terminology. When the limits page says a maximum of 5,000,000 "vectors", that's akin to "rows" in a conventional DB, right? So vectors = rows, and embeddings = the numbers that make up that vector, right?

MMitya Just checking some terminology. When the limits page says a maximum of 5,000,000...

Isaac McFadyen•12/13/24, 9:26 PM

Correct, yeah.
1 embedding = <n> numbers in an array where <n> is the dimension
5 000 000 embeddings = 5 million arrays each with <n> numbers
An embedding is a type of vector, but not all vectors are embeddings. In the case of Machine Learning they usually are though.

MityaOP•12/13/24, 9:28 PM

OK great, so if I wanted to store a "row" in my Vectorize DB for every product in my relational DB, I'd be limited to 5million.

Isaac McFadyen•12/13/24, 9:29 PM

Correct, yeah.

MityaOP•12/13/24, 9:31 PM

I really hope they give you a job at CF for all your expertise/help!

MityaOP•12/13/24, 9:37 PM

Can Vectorize handle embeddings on non-text content e.g. images? This page suggests not as the "use case" column for its single embeddings model says "text"?

MMitya Can Vectorize handle embeddings on non-text content e.g. images? [This page](htt...

Isaac McFadyen•12/13/24, 9:41 PM

Vectorize absolutely can, it can use vectors from anything (I've even thrown facial recognition embeddings in and they worked well). What you need is an embedding model that can generate vectors from images.

Isaac McFadyen•12/13/24, 9:42 PM

Vectorize just stores the embeddings, it doesn't generate them. That's the purpose of a model from Workers AI or OpenAI or similar (Workers AI doesn't accept images but some other ones do).

Isaac McFadyen•12/13/24, 9:42 PM

Specifically you're looking for "multi-modal embedding models"

MityaOP•12/13/24, 9:46 PM

Oh - I was thinking that to use Vectorize I had to use CF's embeddings model, but if I understand you right I can use any? i.e. some that can handle images.

Hi everyone. I wonder if someone could help me fill some gaps in my knowledge relating to using Vect

Similar Threads

Similar Threads

Similar Threads