6 replies

Very slow regex query (AWS Neptune)

We have a query that searches a data set of about ~400,000 vertices, matching properties using a case insensitive

TextP.regex()

TextP.regex()

expression. We are observing very bad query performance; even after several other optimizations, it still takes 20-45 seconds, often timing out.

Simplified query:

g.V()
.has_label("foo").or_(
  __.has("property_1", TextP.regex("(?i)^bar")),
  __.has("property_2", TextP.regex("(?i)^bar")),
  __.has("property_3", TextP.regex("(?i)^bar")),
)
.order()
.by("date", Order.asc)
.limit(1)
.value_map(True)

g.V()
.has_label("foo").or_(
  __.has("property_1", TextP.regex("(?i)^bar")),
  __.has("property_2", TextP.regex("(?i)^bar")),
  __.has("property_3", TextP.regex("(?i)^bar")),
)
.order()
.by("date", Order.asc)
.limit(1)
.value_map(True)

We are on a

db.r6g.xlarge

db.r6g.xlarge

instance, and do NOT observe any meaningful CPU or memory spikes from this query. We have profiled the query and the

TextP.regex()

TextP.regex()

portion seems to take 99%+ of the total runtime.

We're looking for any information that might help us optimize this query or at least understand the poor performance a little better. Thanks in advance!

Solution

When Neptune stores data it stores it in 3 different indexed formats (https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html#feature-overview-storage-indexing), each of which are optimized for a specific set of common graph patterns. Each of these indexes is optimized for exact match lookups so when running queries that require partial text matches, such as a regex query, all the matching property data needs to be scanned to see if it matches the provided expression.

To get a performant query for partial text matches the suggestion is to use the Full Text search integration (https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html) , which will integrate with OpenSearch to provide robust full text searching capabilities within a Gremlin query

Full text search in Amazon Neptune using Amazon OpenSearch Service ...

Using Neptune full-text search.

Neptune Graph Data Model - Amazon Neptune

Learn about the four positions of a Neptune quad element.

Jump to solution

Very slow regex query (AWS Neptune)

We have a query that searches a data set of about ~400,000 vertices, matching properties using a case insensitive

TextP.regex()

TextP.regex()

expression. We are observing very bad query performance; even after several other optimizations, it still takes 20-45 seconds, often timing out.

Simplified query:

g.V()
.has_label("foo").or_(
  __.has("property_1", TextP.regex("(?i)^bar")),
  __.has("property_2", TextP.regex("(?i)^bar")),
  __.has("property_3", TextP.regex("(?i)^bar")),
)
.order()
.by("date", Order.asc)
.limit(1)
.value_map(True)

g.V()
.has_label("foo").or_(
  __.has("property_1", TextP.regex("(?i)^bar")),
  __.has("property_2", TextP.regex("(?i)^bar")),
  __.has("property_3", TextP.regex("(?i)^bar")),
)
.order()
.by("date", Order.asc)
.limit(1)
.value_map(True)

We are on a

db.r6g.xlarge

db.r6g.xlarge

instance, and do NOT observe any meaningful CPU or memory spikes from this query. We have profiled the query and the

TextP.regex()

TextP.regex()

Solution

Full text search in Amazon Neptune using Amazon OpenSearch Service ...

Using Neptune full-text search.

Neptune Graph Data Model - Amazon Neptune

Learn about the four positions of a Neptune quad element.

Jump to solution

Very slow regex query (AWS Neptune)

Very slow regex query (AWS Neptune)

Continue the conversation

Apache TinkerPop

Similar Threads

Similar Threads

Continue the conversation

Apache TinkerPop

Similar Threads