Very slow regex query (AWS Neptune)
We have a query that searches a data set of about ~400,000 vertices, matching properties using a case insensitive
Simplified query:
We are on a
We're looking for any information that might help us optimize this query or at least understand the poor performance a little better. Thanks in advance!
TextP.regex() expression. We are observing very bad query performance; even after several other optimizations, it still takes 20-45 seconds, often timing out.Simplified query:
We are on a
db.r6g.xlarge instance, and do NOT observe any meaningful CPU or memory spikes from this query. We have profiled the query and the TextP.regex() portion seems to take 99%+ of the total runtime.We're looking for any information that might help us optimize this query or at least understand the poor performance a little better. Thanks in advance!
Solution
When Neptune stores data it stores it in 3 different indexed formats (https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html#feature-overview-storage-indexing), each of which are optimized for a specific set of common graph patterns. Each of these indexes is optimized for exact match lookups so when running queries that require partial text matches, such as a regex query, all the matching property data needs to be scanned to see if it matches the provided expression.
To get a performant query for partial text matches the suggestion is to use the Full Text search integration (https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html) , which will integrate with OpenSearch to provide robust full text searching capabilities within a Gremlin query
To get a performant query for partial text matches the suggestion is to use the Full Text search integration (https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html) , which will integrate with OpenSearch to provide robust full text searching capabilities within a Gremlin query
Using Neptune full-text search.
Learn about the four positions of a Neptune quad element.