© 2026 Hedgehog Software, LLC

TwitterGitHubDiscord
More
CommunitiesDocsAboutTermsPrivacy
Search
Star
Setup for Free
Apache TinkerPopAT
Apache TinkerPop•17mo ago•
3 replies
Alex

How to improve Performance using MergeV and MergeE?

I made an implementation similar to this:
g.mergeV([(id): 'vertex1'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])).mergeV([(id): 'vertex2'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])).mergeV([(id): 'vertex3'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])) 
g.mergeV([(id): 'vertex1'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])).mergeV([(id): 'vertex2'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])).mergeV([(id): 'vertex3'].option(onCreate, [(label): 'Person', 'property1': 'value1', 'updated_at': 'value2']).option(onMatch, ['updated_at': 'value2'])) 


So I'm send 2 requests to neptune. The first one with 11 vertexes and the second with 10 edges in two different requests and doing a performance test using neptune. The duration of the process for this amount of content is like 200ms-500ms. Is there a way to improve this query to be faster? For connection I'm using
gremlin = client.Client(neptune_url, 'g', transport_factory=lambda: AiohttpTransport(call_from_event_loop=True), message_serializer=serialier.GraphSONMessageSerializer())
gremlin = client.Client(neptune_url, 'g', transport_factory=lambda: AiohttpTransport(call_from_event_loop=True), message_serializer=serialier.GraphSONMessageSerializer())
so I send this query by
gremlini.submit(query)
gremlini.submit(query)
Solution
In general, the method to get the best write performance/throughput on Neptune is to both batch multiple writes into a single requests and then do multiple batched writes in parallel.

Neptune stores each atomic component of the graph as separate records (node, edge, and property). For example, if you have a node with 4 properties, that turns into 5 records in Neptune. A batched write query with around 100-200 records is a sweet spot that we've found in testing. So issuing queries with that many records and running those in parallel should provide better throughput.

Conditional writes will slow things down, as additional locks are being taken to ensure data consistency. So writes that use straight
addV()
addV()
,
addE()
addE()
,
property()
property()
steps will be faster than using
mergeV()
mergeV()
or
mergeE()
mergeE()
. The latter can also incur more deadlocks (exposed in Neptune as
ConcurrentModificationExceptions
ConcurrentModificationExceptions
). So it is also good practice to implement exponential backoff and retries whenever doing parallel writes into Neptune.
Jump to solution
Apache TinkerPop banner
Apache TinkerPopJoin
Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.
1,376Members
Resources
Was this page helpful?

Similar Threads

Recent Announcements

Similar Threads

I am not sure how to use mergeE and mergeV using gremlin_python
Apache TinkerPopATApache TinkerPop / questions
13mo ago
using mergeV/E
Apache TinkerPopATApache TinkerPop / questions
4y ago
gremlin-go, MergeE, and Neptune
Apache TinkerPopATApache TinkerPop / questions
3y ago
Possibilities to improve performance on query?
Apache TinkerPopATApache TinkerPop / questions
17mo ago