Fulltext-search-like features without ElasticSearch, OpenSearch, Solr and such?

I've read in multiple sources that Apache TinkerPop isn't optimized for text search operations like partial string matching or Regex matching. A common "solution" seems to involve integrating the database with fulltext search engines like ElasticSearch or Solr. Is there another way of handling these kind of operations without adding another tool? I'm afraid this is getting way more complex than I wanted. Just some context, what I'm trying to do is filter nodes by one of their properties called legal_name, some similar to SQL SELECT * FROM customers WHERE legal_name LIKE '%John%', the query itself is of course more complex than that, but that Step is making it really nonperformant.
15 Replies
Bo
Bo4mo ago
There's an ongoing effort to add Couchbase (a storage engine that supports full-text search) to JanusGraph: https://github.com/JanusGraph/janusgraph/pull/4086
GitHub
4084: adds Couchbase as JanusGraph backend by chedim · Pull Request...
Issue #4084 This PR adds couchbase JanusGraph backend and search. The backend is in alfa stage and is not yet recommended for production use. All the dependencies for the backend are either already...
Gil
Gil4mo ago
that would make it support fulltext search right out of the box?
Bo
Bo4mo ago
seems so
triggan
triggan4mo ago
TinkerPop, by itself, is a framework. So the provider that implements the framework would need to implement things such as text search indexing. That being said, there was an addition made to TinkerPop 3.6 to provide extensions in the form of call() steps. https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_call https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/service/Service.java#L41-L51 This likely gets overlooked, as the documentation for this is pretty light. There is, however, a reference implementation of implementing a regex based search by creating a "service" and using the related call() step: https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/services/TinkerTextSearchFactory.java You could use that as a the basis for creating a service that makes a remote call to something like OpenSearch.
GitHub
tinkerpop/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gr...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
GitHub
tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/s...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
spmallette
spmallette4mo ago
I think that some of the published information out there from previous years might have been satisfied by Gremlin having native regex support. This was added in 3.6.0 - https://tinkerpop.apache.org/docs/current/upgrade/#_textp_regex - that feature might satisfy some text search use cases.
triggan
triggan4mo ago
Yes. Maybe a need for some better examples for Service Registry.
Gil
Gil4mo ago
I'm currently using TextP.Regex for my queries (as well as "startsWith", "containing" etc), but it absolutely kills the performance of the query, and this seems to be one of the most common reasons why people go and integrate it with something like ElasticSearch well this still leads to me having to integrate my DB with another tool, which is exactly what I was trying to avoid.
dmcmanus
dmcmanus4mo ago
I was thinking about this recently as well for attempting to implement a fuzzy search on a name... I haven't fully fleshed out exactly how it would work (or if it would work at all) but essentially, I wondered if it could be accomplished by creating a separate vertex for each letter in the legal_name with a CONTAINS_LETTER Edge (and maybe a positional property on the edge?) My thought was that given an input string (a name, in this example) you could use a repeat() step until() some pre-defined match criteria were met *Edit - I'm a relative Gremlin newbie, so forgive me if that makes zero sense!
spmallette
spmallette4mo ago
the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context. theoretically, i think you could model search the way you describe, but it does create a lot of extra infrastructure in your graph which might have performance/space/administrative implications.
Bo
Bo4mo ago
the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context.
Haha cannot agree more. Everything that sits in the old RDBMS space can be reinvented in graph universe.
spmallette
spmallette4mo ago
i've even infected my children. they see stuff randomly in the world and are like, "whoa, that's a graph"
ManabuBeach
ManabuBeach3mo ago
A quite a bit of side but I have been working extensively on Google Firebase Firestore lately. The same thing. no case insensitive search what-so-ever. Their suggestion - buy an index service. Not wanting that I just parse out words, lower case them and store them in an array property. Latest gremlin supports RegEx perdicate thoough now, so that's a lot better today.
M. alhaddad
M. alhaddad3mo ago
@Gil I used TextP longtime ago but then it started to be slower with increased data size. So i integrated elasticsearch Engine: Neptune AWS But today i am facing a new problem, it fails on retrieving something with hyphens
triggan
triggan3mo ago
Is that failing on the OpenSearch side, or from the call from Neptune?
M. alhaddad
M. alhaddad3mo ago
by fail i meant fail to retrieve results, it brings empty output.
Want results from more Discord servers?
Add your server
More Posts
Conditionally updating a variable with choose()How do I create and update a variable with a conditional? I need a number to be calculated based on Systems Analysis Report on Apache TinkerPop - Where to Start?Hey all, I'm currently writing an alaysis on Apache TinkerPop for grad school and was just hoping thLambda example in TypeScriptDoes anyone know where I can find example code that demonstrates up-to-date best practices for writimergeE(): increment counter on matchHi, is there an easy way to increment an existing edge property based on its current value using `meSerialization IssueI have a weird error, when I am connecting with JanusGraph gremlin client using `conf/remote-graph-Design decision related to multiple heterogenous relational graphsI'm working with over 100k instances of heterogeneous, relational node-and-edge attributed graphs, eStackoverflow when adding a larger list of property values using traverser.property()Hey, we encounter a stack overflow: ``` Exception during Transaction, rolling back ... org.apache.tijava: package org.apache.tinkerpop.shaded.jackson.core does not existWhile trying to `mvn clean install` with jdk11, I ran into the above error using the master branch. Performance issue in large graphsWhen performing changes in large graph (ca. 100K nodes, 500K edges) which is stored in one kryo fileConcurrent queries to authentication required sever resulted in 401 errorHey guys, playing around with gremlin & encountered this very odd error where concurrent queries wil