triggan
triggan
ATApache TinkerPop
Created by Captator on 4/22/2025 in #questions
Neptune Local Development Access Methods
SSL protects against man-in-the-middle attacks but also encrypts data sent between client and server to ensure any medium in which that traffic traverses is incapable of decrypting that data without the private key used in the creation of the server-side SSL cert.
10 replies
ATApache TinkerPop
Created by Captator on 4/22/2025 in #questions
Neptune Local Development Access Methods
Have you thought of just doing all of this within a container? That way you could modify /etc/hosts in the container and not effect your workstation's config.
10 replies
ATApache TinkerPop
Created by JK on 4/5/2025 in #questions
Usage of single cardinality AWS Neptune
Edge properties can only be single cardinality. So there's no need to specify cardinality when specifying properties for mergeE. https://tinkerpop.apache.org/docs/3.6.1-SNAPSHOT/reference/#vertex-properties ...
Moreover, while an Edge can only have one property of key "name" (for example), a Vertex can have multiple "name" properties.
5 replies
ATApache TinkerPop
Created by masterhugo on 2/10/2025 in #questions
Gremlin python trying to connect Neptune WS when is down
And to further clarify, you mean you actually stopped the cluster using the start/stop API (https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-stop-start.html) not that the Serverless instance had scaled to 1? (Just trying to get a clearer picture of what might be going on here).
39 replies
ATApache TinkerPop
Created by masterhugo on 2/10/2025 in #questions
Gremlin python trying to connect Neptune WS when is down
Just for clarity, when you say "disabled" is the cluster in a Stopped state? or did you delete all instances from the cluster? Or is the cluster completely deleted? instance being rebooted?
39 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
You've stumbled upon a common gap in Gremlin... has() steps cannot currently take a traversal as an argument. It's listed as a roadmap item for a future TinkerPop 4.x release: https://github.com/apache/tinkerpop/blob/087b3070914123055d3e4ededc2550f12715a0b4/docs/src/dev/future/index.asciidoc#has-traversal
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
The data modeling looks a bit odd here. I'm not sure I would try to use any component of an edge ID as a filter. At that point, you're sort of attempting to use the edges to model some form of entity. This can be a bit of an anti-pattern. Edges are meant to represent relationships (actions, verbs) in a graph where nodes/vertices are meant to represent entities (nouns, things). If this is a common query pattern, you may want to look at further de-normalizing the data model and creating a labeled node of Tenant. Executing a query of g.V(<client_id>).repeat(both(<list_of_edge_labels>).simplePath()).times(2).path() should perform better than what you currently have.
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
just by calling g.V().limit(1) with concurrent calls on an r6g.2xlarge machine, the average time is 250ms
How may concurrent calls? An r6g.2xlarge instance has 8 vCPUs (and 16 available query execution threads). If you're issuing more than 16 requests in parallel, any additional concurrent requests will queue (an instance can queue up to 8000 requests). You can see this with the /gremlin/status API (of %gremlin_status Jupyter magic) with the number of executing queries and the number of "accepted" queries. If you need more concurrency, then you'll need to add more vCPUs (either by scaling up or scaling out read replicas).
But in the query mentioned, the bottleneck starts at the stage where it calls the last otherV() before path(). g.V().has(T.id, "client-id-uuid").bothE("has_profile", "has_affiliated", "has_controlling").has(T.id, containing("tenant-id-uuid")).otherV().path().unfold().dedup().elementMap().toList()
Makes sense as you're using a text predicate here (containing()). Neptune does not maintain a Full Text Search index. So any use of text predicates as containing(), startingWith(), endingWith() etc. will incur some form of range scan and also require dictionary materialization (we lose all of the benefits of data compression here as each value must be fetched from the dictionary to compare with the predicate value you've provided).
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
This is might be one of your issues:
has(T.id, containing("tenant-id-uuid"))
has(T.id, containing("tenant-id-uuid"))
Neptune does not have a Full Text Search index. Using any of the text predicates (i.e. containing(), startswith(), endswith, etc.) will require dedictionarifying the values for each of the solutions up to that portion of the query. If that is a common pattern, I might suggest using a different property so you can do just a has(key, value) filter.
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
If you have some more details on the query, I might be able to help determine the best way to (re)write that to take advantage of the current indexes. Or potentially rework your data model to better fit within the existing indexes.
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
12 replies
ATApache TinkerPop
Created by Alex on 12/11/2024 in #questions
How to create indexes by Label?
the AWS Neptune experts suggested that I create some indexes
Which experts were these? Neptune doesn't support the creation of indexes beyond the 3 native indexes that are created by default. There's a fourth optional index, but only needed for very specific use cases: https://docs.aws.amazon.com/neptune/latest/userguide/features-lab-mode.html#features-lab-mode-features-osgp-index
12 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
merge steps were released in 3.7.x
12 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
The merge steps are fairly new. So hard to say this isn't something people "usually do" as we're still deriving patterns on how best to use those steps.
12 replies
ATApache TinkerPop
Created by Alex on 11/28/2024 in #questions
How to Work with Transactions with Gremlin Python
You may see write throughput exceed 120,000 in some cases. There are a number of dependencies that drive that. But that's the safe number to use when estimating load speed/rates.
11 replies
ATApache TinkerPop
Created by Alex on 11/28/2024 in #questions
How to Work with Transactions with Gremlin Python
If you're looking to optimize for write throughput on Neptune, you want to consider the following: - For each write requests, attempt to batch 100-200 "object" into a single write request/query. An "object" would be any combination of a vertex, edge, or subsequent vertex/edge properties (vertex with 4 properties == 5 "objects"). - Use parallel write requests. If using Python, consider using multiprocessing to create separate processes. They can share a connection pool to Neptune if you so choose. The number of parallel processes should equal the number of query execution threads available on your Neptune writer instance (which is equal to 2x the number of vCPUs on whatever size instance you're using). If you follow those guidelines, you should get similar performance to what you would see with Neptune's bulk loader. Note that conditional writes will have overhead. If using mergeV(), you're unlikely to see the same write throughput as Neptune's bulk loader as the bulk loader is not doing conditional writes. Neptune's "top speed" for write throughput is going to be about 120,000 "objects" per second when writing vertex and vertex properties and about half of that when writing edges (due to vertex reference checks when creating an edge). These numbers can only be attained if using a x.12xlarge writer instance or larger. Smaller instances will scale linearly in terms of throughput.
11 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
The issue here is seeing duplicates when trying to find the matching pairs to create the edges. I have solution that maybe close, but this creates duplicate edges (one in each direction):
g.V().hasLabel('E').as('v1').
V().hasLabel('E').as('v2').
select('v1','v2').
where('v1',neq('v2')).by(id).
or(
where('v1',eq('v2')).by('bName'),
where('v1',eq('v2')).by('cName')
).
constant([:]).
merge([(T.label):'newEdge']).
merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
mergeE()
g.V().hasLabel('E').as('v1').
V().hasLabel('E').as('v2').
select('v1','v2').
where('v1',neq('v2')).by(id).
or(
where('v1',eq('v2')).by('bName'),
where('v1',eq('v2')).by('cName')
).
constant([:]).
merge([(T.label):'newEdge']).
merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
mergeE()
Note that this will not work in Gremlify, as this uses the merge() step that was introduced in 3.7.x. Though I tested this on Neptune and it works fine. It takes a bit of Gremlin "hackery" to create the map that you pass into the mergeE() step at the end.
Basically, this does a cartesian join of all vertices with label "E" to all other vertices of label "E" and then filters on pairs that have different IDs but the same property of "bName" or "cName". At that point, you end up with a list of maps of paired vertices. That then needs to be converted into the map format supported by mergeE(), which all of the merge() steps accomplish.
12 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
Nvm... I think I see it now. You're trying to connect vertices based on common properties. Does direction matter?
12 replies
ATApache TinkerPop
Created by Coldfire on 12/10/2024 in #questions
Parameterized edges creation in existing graph
Are you trying to create a fully connected graph from all vertices with a label of E?
12 replies
ATApache TinkerPop
Created by Alex on 11/28/2024 in #questions
How to Work with Transactions with Gremlin Python
I received this suggestion to use tranctions to try to have more performance than using query string, thats why I`m try to implement it and check the difference in performance.
Unsure where this is coming from. What sort of performance gain are you looking for?
If you're using Gremlin Server, what backing store are you using? TinkerGraph? If so, ensure you're using TinkerTransactionGraph.
There's more on how to use TinkerTransactionGraph for unit testing of transactions here: https://aws.amazon.com/blogs/database/unit-testing-apache-tinkerpop-transactions-from-tinkergraph-to-amazon-neptune/
11 replies