11 replies

How to create indexes by Label?

In search of performance improvements, the AWS Neptune experts suggested that I create some indexes. To better contextualize, I have 3 operations in a single POST endpoint with the database. A query of previous data bringing the relationships of a specific ID, a deletion of edges if there is a registration in the database and a registration/update of vertices and edges. Today I am trying to attack two problems. Improve the performance of the creation that takes approximately 150ms and improve the performance of the query that is currently bogging down between 1.2-17 seconds.
Is it possible to create an index for vertexes and edges by specifying them by label since I have vertices and edges with different labels that have different properties? Does anyone know what this implementation would look like?
In my current implementation I do it in a simple way as follows:

client_write = client.Client(neptune_url, "g", message_serializer=serializer.GraphSONMessageSerializer())

queries = [
"graph.createIndex('journey_id', Vertex.class)",
"graph.createIndex('person_type', Vertex.class)",
"graph.createIndex('relationship_type', Edge.class)"
]

for query in queries:
client_write.submit(query).all().result()

Solution

just by calling g.V().limit(1) with concurrent calls on an r6g.2xlarge machine, the average time is 250ms

How may concurrent calls? An

r6g.2xlarge

r6g.2xlarge

instance has 8 vCPUs (and 16 available query execution threads). If you're issuing more than 16 requests in parallel, any additional concurrent requests will queue (an instance can queue up to 8000 requests). You can see this with the

/gremlin/status

/gremlin/status

API (of

%gremlin_status

%gremlin_status

Jupyter magic) with the number of executing queries and the number of "accepted" queries. If you need more concurrency, then you'll need to add more vCPUs (either by scaling up or scaling out read replicas).

But in the query mentioned, the bottleneck starts at the stage where it calls the last otherV() before path().
g.V().has(T.id, "client-id-uuid").bothE("has_profile", "has_affiliated", "has_controlling").has(T.id, containing("tenant-id-uuid")).otherV().path().unfold().dedup().elementMap().toList()

Makes sense as you're using a text predicate here (

containing()

containing()

). Neptune does not maintain a Full Text Search index. So any use of text predicates as

containing()

containing()

startingWith()

startingWith()

endingWith()

endingWith()

etc. will incur some form of range scan and also require dictionary materialization (we lose all of the benefits of data compression here as each value must be fetched from the dictionary to compare with the predicate value you've provided).

Jump to solution

Apache TinkerPop•15mo ago•

11 replies

Alex

How to create indexes by Label?

Solution

just by calling g.V().limit(1) with concurrent calls on an r6g.2xlarge machine, the average time is 250ms

How may concurrent calls? An

r6g.2xlarge

r6g.2xlarge

/gremlin/status

/gremlin/status

API (of

%gremlin_status

%gremlin_status

But in the query mentioned, the bottleneck starts at the stage where it calls the last otherV() before path().
g.V().has(T.id, "client-id-uuid").bothE("has_profile", "has_affiliated", "has_controlling").has(T.id, containing("tenant-id-uuid")).otherV().path().unfold().dedup().elementMap().toList()

Makes sense as you're using a text predicate here (

containing()

containing()

). Neptune does not maintain a Full Text Search index. So any use of text predicates as

containing()

containing()

startingWith()

startingWith()

endingWith()

endingWith()

Jump to solution

How to create indexes by Label?

How to create indexes by Label?

Similar Threads

Similar Threads

Similar Threads