So the idea with RAG is this: - you embed all your documents and store them in a vector DB (vectoriz

So the idea with RAG is this:
  • you embed all your documents and store them in a vector DB (vectorize or otherwise)
  • when a user question comes in, you embed the user question
  • you do a search in the vector DB for "top <n> items" where n is 3, or 5, or 10, or higher depending on your use-case
  • you give the results to the LLM along with the query
The idea is that, because embeddings between related items are similar, when you query the vector DB with the embedding of the user question, you're likely to get back things in the vector DB related to the user question. Then you give that to the LLM to actually answer the question based on the gathered knowledge.
Was this page helpful?