Vector storage
If I would want to contribute to TypeDB, is there contributors documentation available? I can't really find any.
I managed to compile the 'typedb' repo. But looking into the source I noticed you are using the 'TypeDB sync tool' to generate Cargo .toml files.
I do have to say I'm just a Rust beginner. I have been programming professionally for more than 25 years and program in many languages, mainly Java and Python, but have programmed in many others for years as well. I'm mostly interested in trying to add some text, vector and hybrid search.
Either sync data to an external search engines / vector database (https://superlinked.com/vector-db-comparison) or build a simple version internally in TypeDB for starters.
Vector DB Comparison
Vector DB Comparison is a free and open source tool from VectorHub to compare vector databases.
5 Replies
Hi Joroen! We would love that, though we haven't fully geared up to support it well yet!
However, your question is timely and we are actually just doing the research on vector storage integrations. We're leaning towards embedding LanceDB, and borrowing the Postgres vector extension's syntax
Since we don't have a clearly defined extension system like postgres has we'd look to integrate this one into the core directly, which means it would a rather collaborative process with us, if you're up for it!
It would loosely look like this:
1) define the language extensions to support the new value types and operators for the Vector type
2) decide on the vector engine to embed (probably Lance)
3) decide on the transactionality guarantees & indexing operations we want to offer
4) extend the core with a line all the way from the language -> IR -> compiler -> executors
Note that hybrid search should fall out of the language capabilities if we do it this way!
Hi Joshua, there is a lot to decide. I have been in the information retrieval / natural language processing business for 18 years, so I might be able to help you guys. There are so many differences between engines with all pros and cons, like shown in the Superlinked comparison. Sadly there is a lot of focus on just the vectors, while there are so many more things that will impact precision and recall. Vectors are inherently fuzzy, that's not always what you want. LanceDB is great for the basics, however, there is a lot missing. From a feature perspective it would not be my first choice. Also it depends so much on the customers, what their needs are. That's why I was thinking about making multiple integrations possible. I think that would increase adoption of TypeDB, because you are not forcing them to use one single vector/search solution
@Jeroen I would be interested to chat to you about it over video, could I reach out to exchange emails and set something up?
Cheers 🙂
I'd love to. I sent you a DM
thanks 🙂