Gremlin python MergeV update properties

I'm working with gremlin python 3.7.1 and AWS Neptune 1.3.2.1, and I'm trying to update vertex properties with MergeV().option(OnMatch, {...}), however the behavior isn't what I expect, it should be a=b but appears to be a=[c.b] where c is the old value. Someone knows how to implement correctly this behavior with MergeV?
Solution:
If I'm understanding your question correctly, I think what you are seeing is a result of Neptune defaulting to set cardinality for properties. Essentially what that means, is if I start with a vertex with property("name", "Alice"), and I try to overwrite the property with property("name", "Bob") Neptune will instead add the new property to a set such that vertex.name = {"Alice", "Bob"}. I think this is what you are seeing this set cardinality behaviour when using MergeV(). If you want to use mergeV and enforce single cardinality for properties (overwrite existing values instead of appending), you can try a query like this: ```...
Jump to solution
10 Replies
Solution
ColeGreer
ColeGreer9mo ago
If I'm understanding your question correctly, I think what you are seeing is a result of Neptune defaulting to set cardinality for properties. Essentially what that means, is if I start with a vertex with property("name", "Alice"), and I try to overwrite the property with property("name", "Bob") Neptune will instead add the new property to a set such that vertex.name = {"Alice", "Bob"}. I think this is what you are seeing this set cardinality behaviour when using MergeV(). If you want to use mergeV and enforce single cardinality for properties (overwrite existing values instead of appending), you can try a query like this:
from gremlin_python.process.traversal import Merge, T, CardinalityValue

g.merge_v({T.id_: "x1234"})
.option(Merge.on_create, {T.label: 'Dog', 'name': 'Toby', 'age': 10})
.option(Merge.on_match, {'age': CardinalityValue.single(11)})
.toList()
from gremlin_python.process.traversal import Merge, T, CardinalityValue

g.merge_v({T.id_: "x1234"})
.option(Merge.on_create, {T.label: 'Dog', 'name': 'Toby', 'age': 10})
.option(Merge.on_match, {'age': CardinalityValue.single(11)})
.toList()
masterhugo
masterhugoOP9mo ago
ahhh thats the answer I need, thank you so much
masterhugo
masterhugoOP9mo ago
hmmm it appears this error when i tried to implement.
GremlinServerError: 499: {"code":"UnsopportedOperationException", "requestID":"...", "detailedMessage":"Unsupported property value type: org.apacge.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal"}
GremlinServerError: 499: {"code":"UnsopportedOperationException", "requestID":"...", "detailedMessage":"Unsupported property value type: org.apacge.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal"}
and the dict looks like this
{'node_update':[['CardinalityValueTraversal', <Cardinality.single: 3>, datetime.datetime(...)]], 'company': [['CardinalityValueTraversal', <Cardinality.single: 3>, 'str']]}
{'node_update':[['CardinalityValueTraversal', <Cardinality.single: 3>, datetime.datetime(...)]], 'company': [['CardinalityValueTraversal', <Cardinality.single: 3>, 'str']]}
ColeGreer
ColeGreer9mo ago
That's strange, I'm able to run the exact query I shared above with gremlin python 3.7.1 and Neptune 1.3.2.1. Could I ask you to double check the versions you are using here? The syntax I shared above is a relatively recent addition to TinkerPop (3.7.0) (docs), the error you are seeing seems consistent with what I would expect from an older server which does not yet support the syntax to set individual cardinalities on each property. I'm not recognizing what that dict represents. Where are you extracting that dict from? It might be helpful if you could share the full query you are attempting to run (with any sensitive property keys and values replaced with dummy values).
masterhugo
masterhugoOP9mo ago
ohhh you right, i was using the wrong engine version, i was using Neptune 1.3.1.0
ColeGreer
ColeGreer9mo ago
In that case, none of the new syntax to specify cardinalities in property maps is supported. If upgrading your Neptune is an option for you, it looks like 1.3.2.0 is the earliest which supports the new syntax. https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-client.html If you are stuck on the older version, a query such as this might work for you:
g.merge_v({T.id_: "x1234"})
.option(Merge.on_create, {T.label: 'Dog', 'name': 'Toby', 'age': 10})
.option(Merge.on_match, __.side_effect(__.property(Cardinality.single, "age", 11)).constant(dict()))
.toList()
g.merge_v({T.id_: "x1234"})
.option(Merge.on_create, {T.label: 'Dog', 'name': 'Toby', 'age': 10})
.option(Merge.on_match, __.side_effect(__.property(Cardinality.single, "age", 11)).constant(dict()))
.toList()
I don't like recommending to use a query such as this as it's really abusing the ability to pass a sub-traversal to produce a map, to instead modify the matched vertex directly. This wasn't how mergeV was intended to be used but it might solve your issue.
Java-based Gremlin clients to use with Amazon Neptune - Amazon Neptune
You can use either of two open-source Java-based Gremlin clients with Amazon Neptune: the Apache TinkerPop Java Gremlin client , or the Gremlin client for Amazon Neptune .
masterhugo
masterhugoOP2mo ago
Hi Thanks for your previous suggestion. I attempted to implement the mergeV().option(onCreate, ...).option(onMatch, ...) pattern for updating node properties. It works well in low-throughput scenarios, but when I deployed it against a high-transactional production dataset on an AWS Neptune 1.4.4.0 cluster, I encountered performance issues—specifically, CPU spikes on the writer instance and high write request volumes. It seems that calling mergeV() with onMatch is causing heavy overhead, especially when the node's properties haven't changed. Unfortunately, the write load becomes unsustainable under this pattern. Do you have any recommendations or more optimized approaches for updating node properties in Neptune 1.4.4.0, particularly under high concurrency?
triggan
triggan2mo ago
What size writer instance are you attempting to use? Neptune has a static number of query execution threads that is equal to 2x the number of vCPUs. So if you're attempting to use something like an db.r7i.large with 2 vCPUs, then you'll only be able to compute 4 concurrent write requests at a time (and you should see close to 100% CPU when taxing all of the query threads).
masterhugo
masterhugoOP2mo ago
In my case I’m using Neptune serverless, and I’m writing in batch like 200 nodes with relations per 2 seconds approx., the number of instances that normally used is 10 in the writer instance Also I have multiple lambdas trying to write that kind of data at the same time
triggan
triggan2mo ago
Neptune Serverless instances (db.serverless) you could equate to roughly the same maximum size (at 128 NCUs) as an 8xl instance. And to that fact, the Serverless instance has pre-allocated the same number of execution threads as an 8xl instance (32 vCPUs == 64 query execution threads). The difference with Serverless is that memory is dynamically allocated as the number of queries/requests are accepted. And scaling happens somewhat exponentially... scaling from 2-4 NCUs is going to take longer than 4-8 or 8 to 16, etc. So starting with a higher minimum NCU allows scaling to start "faster". As scaling happens, you may see higher swap usage as threads aren't fully allocated all of the available memory that they may request to execute a query. The other factor is memory allocated for buffer pool cache. Neptune reserves about two-thirds (a little bit less than that) of available system memory for buffer pool cache. As data is read in from disk, database pages are cached in buffer pool cache to decrease latency for any other queries that may need to reference those database pages. In a traditional/provisioned Neptune instance, pages remain in buffer pool cache indefinitely until they are invalidated (by an incoming write) or until the cache fills, in which an LRU algo is used to evict older pages. In Serverless, pages in buffer pool cache can be evicted on scale-down (as memory is given back). In the case where you may be doing bursts of updates, the latency could be caused by the lack of relevant data being in buffer pool, and always having to go to disk. Using Neptune Serverless is great for temporal workloads. As with any architecture, there are tradeoffs. There was a really great presentation from re:Invent 2023 that discussed when you should (and should not) use Neptune Serverless: https://youtu.be/xAdWa0Ahiok?si=V5bNqrUKQpclqNC3
AWS Events
YouTube
AWS re:Invent 2023 - Amazon Neptune architectures for scale, availa...
What if you could bring the connected data insights of your Amazon Neptune application to all your users—graph practitioners and non-technical users alike—while evolving your application to satisfy increasingly demanding availability, performance, and scaling requirements? In this session, review architectures that can help you operate and e...

Did you find this page helpful?