How to set up HNSW index? (Nextjs, Prisma, Langchain)

I am using nextjs with prisma, langchain, and Neon. Users can create teams and upload content that should be used as context for LLMs. Uploaded content is chunked and put into the Chunks table together with a vector embedding using pg_vector extension. Using Langchain I then retrieve the top 10 vectors based on the user query. So far I havent created an index on the embedding column but with growing vector count now it becomes slow that I have to tackle this. I have read that HNSW is the most suitable for my use case. Needing speed and most accuracy. Also I need metadata filtering so that the retrieval only gets the vectors that are from one team and doesnt include other teams data. Is that possible with HNSW or do I need to create an index for each team individually (if that is possible?) Question now is how I would best set up the index? Ideally using Prisma directly. (Also I read that HNSW is compute intensive on first creation: How can I use Neons autoscaling to automate that upscaling for when creating or updating that process) Thanks a lot for the help.

3 Replies

wise-white•12mo ago

cc @raoufchebri might be able to help

correct-apricot•12mo ago

Thanks Mahmoud, Hi @1MS1 , can you please share how many vectors you have? What are the latencies you observe?

Also I need metadata filtering so that the retrieval only gets the vectors that are from one team and doesnt include other teams data.

How is your metadata structured? JSON or additional columns in your table?

Is that possible with HNSW

You can filter with HNSW, but the way you do it is by increasing the number of vectors to for example 100, then use a WHERE clause on that. But this increases query execution time.

do I need to create an index for each team individually (if that is possible?)

This is a possible approach too, but is only worth it if you a large number of teams.

Question now is how I would best set up the index?

I'm not sure how/if to create an HNSW index in prisma, but you can use SQL : CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops); There are other parameters to take into account like ef_construction, m and ef_search. But those values will depend on how large your dataset is.

How can I use Neons autoscaling to automate

Make sure to set up autoscaling in your project, then:

SET max_parallel_maintenance_workers = 7; -- number of CPUs - 1)
SET maintenance_work_mem = '8GB'; -- ideally no more than half of instance max memory, but depends on how large your dataset is
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

SET max_parallel_maintenance_workers = 7; -- number of CPUs - 1)
SET maintenance_work_mem = '8GB'; -- ideally no more than half of instance max memory, but depends on how large your dataset is
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

Basically, the size of your dataset will determine many of the above parameters. Let me know how I can further help.

solid-orangeOP•12mo ago

Thanks for your support 🙂 We are building a chat and voice based RAG system so especially for voice the retrieval time has to be super fast. Every client gets his own assistantId in the database where he can upload content for retrieval. I use the langchain prisma vector store and filter on the column assistantId (also have metadata column but partially json filtering is not working with the langchain code therefore I added the own column) Currently there are 10 customers which each have 10-30000 vectors and retrieval times are 3-5 secs because I have no index at all. So your suggestion would be keep the assistantId column query similarity search for all vectors and after the retrival filter on the assistantId? How does this behave if customers have very similar topics / content? Thanks for the sql sample. After the index is created do i have to reset the max_parallel_maintenance_workes and work_mem or can i leave it like this? also does the index auto reindexes when I add new content (vectors) or do i have to rerun the sql again?

Gaming

Programming

How to set up HNSW index? (Nextjs, Prisma, Langchain)

Did you find this page helpful?