So the idea with RAG is this: - you embed all your documents and store them in a vector DB (vectoriz

So the idea with RAG is this:

you embed all your documents and store them in a vector DB (vectorize or otherwise)
when a user question comes in, you embed the user question
you do a search in the vector DB for "top <n> items" where
```
n
```
n
```
n
```
n
is 3, or 5, or 10, or higher depending on your use-case
you give the results to the LLM along with the query

The idea is that, because embeddings between related items are similar, when you query the vector DB with the embedding of the user question, you're likely to get back things in the vector DB related to the user question. Then you give that to the LLM to actually answer the question based on the gathered knowledge.

IIsaac McFadyen So the idea with RAG is this: - you embed all your documents and store them in a...

Mitya•12/4/24, 3:53 PM

Thank you, that's really helpful. So this model depends on a particular question as a filter, and is less useful if, say, I simply want to run sentiment analysis on 100,000 user messages. Right?

MMitya Thank you, that's really helpful. So this model depends on a particular question...

Isaac McFadyenOP•12/4/24, 3:55 PM

So you can definitely run sentiment analysis using an LLM, but you wouldn't really need a RAG pipeline for that unless your sentiment depends on some external knowledge you want to give the LLM.

Mitya•12/4/24, 3:56 PM

Right. My thinking for introducing RAG was to get round the issue of there being a limit on how much data I can send to an LLM in one go. Perhaps that requires a different solution i.e. piecemeal analysis on 1000 messages at a time and then derive an average of all the sentiments or something.

MMitya Right. My thinking for introducing RAG was to get round the issue of there being...

Isaac McFadyenOP•12/4/24, 3:57 PM

Do the sentiments relate to one another?

Isaac McFadyenOP•12/4/24, 3:57 PM

As in, usually you'd only send a few messages at a time, but do you need the context of all the messages at once to derive overall sentiment?

Mitya•12/4/24, 3:59 PM

Probably yes, because the user messages are written freely. A real example: Users might feed back to the question "What do you think of our show?". And we'd like to do two things: 1) give sentiment analysis e.g. people are mostly happy; 2) have the LLM give suggestions e.g. "People want more matinée shows"

MMitya Probably yes, because the user messages are written freely. A real example: User...

Isaac McFadyenOP•12/4/24, 3:59 PM

I see. So usually sending all the messages at once is going to significantly degrade your result and cost a lot.

Isaac McFadyenOP•12/4/24, 4:00 PM

Instead you'd analyze each message individually, and then do what you want with the statistics.

Isaac McFadyenOP•12/4/24, 4:00 PM

That also allows different kinds of reporting: "10 users disliked it but the majority thought it was great" or "1% of users gave <x> feedback about the show" etc.

Isaac McFadyenOP•12/4/24, 4:01 PM

The only scenario where you'd want to send all the messages at once to an LLM would be if each message depended on each other.

Isaac McFadyenOP•12/4/24, 4:02 PM

In general the more data you send to an LLM the worse the quality gets as well. For example, Gemini advertises 1M context length but as soon as you get past about 100k the quality plummets (based on multiple reports from different sources)

Mitya•12/4/24, 4:03 PM

Interesting - didn't realise that. There's obvious cost/latency issues with sending more data but didn't realise the quality dropped off. OK so you're suggesting (if I understand correctly) analysing each message as it comes in and storing that analysis locally, for later aggregation/insights.

MMitya Probably yes, because the user messages are written freely. A real example: User...

Isaac McFadyenOP•12/4/24, 4:05 PM

Yeah, basically. So in this specific scenario I'd do this:

grab a message, ask the LLM something like:
Analyze this comment about this show <give context on the show here>. Respond with an overall sentiment, such as "positive" or "negative", and the most prominent piece of feedback in the user review
throw that into a DB somewhere
then you can analyze "10k reviews were positive, 10k were negative" and then also "100 feedbacks said that the show should add <x> part"

Isaac McFadyenOP•12/4/24, 4:06 PM

The trickiest part here is going to be matching user feedback so that you can say "100 users gave this feedback" but you can actually maybe do that with a vector DB because similar feedbacks will have similar embeddings.

Isaac McFadyenOP•12/4/24, 4:06 PM

That also allows the user to ask "how many reviews thought the show should do <x> better"

Mitya•12/4/24, 4:09 PM

OK cool so does that mean storing the analysis from the LLM in vectors? So for example, if the LLM, when asked to return the most prominent piece of feedback from the message, gives:

The show is too long
The show goes on too long
The show finishes too late
... i.e. all the same, feed those into vectors, which will harmonise them as they'll be similar embeddings?

MMitya OK cool so does that mean storing the analysis from the LLM in vectors? So for e...

Isaac McFadyenOP•12/4/24, 4:09 PM

Exactly, yup!

Isaac McFadyenOP•12/4/24, 4:10 PM

And then you embed "what are feedbacks about the show length", and query the vector DB, and it's going to most likely return all of the feedbacks about show length.

Isaac McFadyenOP•12/4/24, 4:10 PM

Then you feed that to LLM, and the LLM can say "well the vector DB gave 10 feedbacks, all said the show was too long" and then give that to the user.

Mitya•12/4/24, 4:11 PM

Right! When you say:

embed "what are the feedbacks about the show length"

...by "embed" here, does that mean "store"? So this is something I'd store in Vectorize, rather than merely query it against it?

MMitya Right! When you say: > _embed "what are the feedbacks about the show length"_ .....

Isaac McFadyenOP•12/4/24, 4:20 PM

When I say "embedding" I mean to convert text -> a vector representation. You use another model for this, like jina-embeddings-v3jina-embeddings-v3 or mxbai-embed-large-v1mxbai-embed-large-v1 or similar (OpenAI has their own too, basically every big provider does).

Isaac McFadyenOP•12/4/24, 4:21 PM

So you embed and store the user feedbacks, but then to query them based on a question you don't need to store the question, you just embed (convert to numbers) the question and ask the vector DB to return similar vectors.

Mitya•12/4/24, 4:24 PM

Ah that makes perfect sense! Finally on this, I wonder if AI functions could be useful. That way, I can predict the LLM response format e.g. perhaps argument 1 passed to my function is sentiment ("good") and argument 2 is the piece of feedback. I could even route the LLM towards predefined feedback categories rather than allowing it to freeform ("Valid feedback categories are length, humour, food, other - pick from these.")

MMitya Ah that makes perfect sense! Finally on this, I wonder if AI functions could be ...

Isaac McFadyenOP•12/4/24, 4:26 PM

Yup, that makes sense! You have two options there:

functions, where it calls a function with the sentiment
structured generation, where you force it to return a valid JSON object with a key "sentiment"
Doesn't really matter which, you can try both and see which performs better.

Isaac McFadyenOP•12/4/24, 4:27 PM

I'd also err on the side of storing too much (ask the LLM to return sentiment, feedback as a string, a bunch of other stuff) because then you don't need to re-process through the LLM if you need more data later.

Mitya•12/4/24, 4:28 PM

Perfect - didn't even know about structured generation outside Functions. That sounds ideal. OK cool, I'll no doubt be back but I'll go do some playing. Thanks so much for your kind help, it's super appreciated.

Isaac McFadyenOP•12/4/24, 4:34 PM

No problem, good luck

_.Mass._•12/11/24, 5:11 AM

hi, I am evaluating Vectorize as the data store for an AI RAG application I am building for work. I noticed the metadata filtering does not implement $gte$gte and $lte$lte matching. Are those operators on the roadmap?

Mitya•12/13/24, 6:01 PM

Are there any plans to allow the creation of meta indexes after vectors have been inserted, rather than before as currently?

Mitya•12/13/24, 6:05 PM

The docs don't really cover how to make a decision re: the dimensionsdimensions and metricmetric parameters when creating an index. The RAG tutorial suggests 768 and cosine, while the Get started suggests a mere 32 and Euclidian. Could anyone help me understand how to choose here? Is it a trade-off of some kind? Thank you.

MMitya The docs don't really cover how to make a decision re: the `dimensions` and `met...

Isaac McFadyenOP•12/13/24, 6:17 PM

Sure. So dimension is the number of numbers (say that 5 times fast haha) in your embedding and metric is how they're compared.

So for example, OpenAI has the text-embedding-3-largetext-embedding-3-large model which outputs dimension 3,0723,072, or text-embedding-3-smalltext-embedding-3-small which outputs 1,5361,536, etc. When you embed 1 text they will give you back an array of <n> numbers where <n> is the dimension. With OpenAI you can request smaller embeddings by specifying the dimension you want, but they'll lose some accuracy as a result.

Metric is how to match embeddings. Generally you want cosinecosine unless the model/provider tells you otherwise (OpenAI recommends cosinecosine for their models). Euclidian also works and will give identical numbers to cosinecosine (if the embeddings are normalized which they often are) but cosinecosine is a bit faster and has some other benefits.

Mitya•12/13/24, 6:20 PM

Thanks @Isaac McFadyen, much appreciated. So it's cosine for metric, then. But in terms of dimension, is it just a case of choosing a bigger number for more accurate (similar) results? Just not sure what I should be choosing here. Basically we have a customer feedback platform and want to pull out similar bits of feedback for clients.

MMitya Thanks @Isaac McFadyen, much appreciated. So it's cosine for metric, then. But i...

Isaac McFadyenOP•12/13/24, 6:23 PM

So you'll need to match the number to the model. Some models can output lower dimensions (in the case of OpenAI) but they can't ever output larger dimensions. It's really a cost (storage space/query time) versus accuracy trade-off.

Isaac McFadyenOP•12/13/24, 6:23 PM

Vectorize also has a limit on the dimension size.

Isaac McFadyenOP•12/13/24, 6:23 PM

https://developers.cloudflare.com/vectorize/platform/limits/ looks like it's limited to 15361536 for dimension.

Isaac McFadyenOP•12/13/24, 6:24 PM

I'd take a look at the documentation for your model. If it's higher than 15361536 then you'll obviously have to somehow reduce it to 15361536 but if it's lower then you just have to use the lower dimension. 512512 or 10241024 is common for open source models.

IIsaac McFadyen I'd take a look at the documentation for your model. If it's higher than `1536` ...

Mitya•12/13/24, 8:46 PM

Thanks very much for the info. I'll do some reading.

Mitya•12/13/24, 9:17 PM

Just checking some terminology. When the limits page says a maximum of 5,000,000 "vectors", that's akin to "rows" in a conventional DB, right? So vectors = rows, and embeddings = the numbers that make up that vector, right?

MMitya Just checking some terminology. When the limits page says a maximum of 5,000,000...

Isaac McFadyenOP•12/13/24, 9:26 PM

Correct, yeah.
1 embedding = <n> numbers in an array where <n> is the dimension
5 000 000 embeddings = 5 million arrays each with <n> numbers
An embedding is a type of vector, but not all vectors are embeddings. In the case of Machine Learning they usually are though.

Mitya•12/13/24, 9:28 PM

OK great, so if I wanted to store a "row" in my Vectorize DB for every product in my relational DB, I'd be limited to 5million.

Isaac McFadyenOP•12/13/24, 9:29 PM

Correct, yeah.

Mitya•12/13/24, 9:31 PM

I really hope they give you a job at CF for all your expertise/help!

Mitya•12/13/24, 9:37 PM

Can Vectorize handle embeddings on non-text content e.g. images? This page suggests not as the "use case" column for its single embeddings model says "text"?

MMitya Can Vectorize handle embeddings on non-text content e.g. images? [This page](htt...

Isaac McFadyenOP•12/13/24, 9:41 PM

Vectorize absolutely can, it can use vectors from anything (I've even thrown facial recognition embeddings in and they worked well). What you need is an embedding model that can generate vectors from images.

Isaac McFadyenOP•12/13/24, 9:42 PM

Vectorize just stores the embeddings, it doesn't generate them. That's the purpose of a model from Workers AI or OpenAI or similar (Workers AI doesn't accept images but some other ones do).

Isaac McFadyenOP•12/13/24, 9:42 PM

Specifically you're looking for "multi-modal embedding models"

Mitya•12/13/24, 9:46 PM

Oh - I was thinking that to use Vectorize I had to use CF's embeddings model, but if I understand you right I can use any? i.e. some that can handle images.

Mitya•12/13/24, 9:48 PM

Lastly for tonight: if I store a bunch of user feedback strings in my Vectorize (the embeddings thereof), can I query it with something like "give me all the items about toilets", or would I need to feed it an example piece of feedback so it could find similar ones?

So the idea with RAG is this: - you embed all your documents and store them in a vector DB (vectoriz

Similar Threads