Direct and indirect Edges

This may also be an "it depends" question, but here I have labels A, B and C where entity C is created by entity B on behalf of entity A. At the creation time of C I know the instance of both B and A The cardinality will be roughly say ten Bs for each A (though the Bs can be used by many As and there are potentially millions of Cs, which belong to only one B Sometimes I want to start with all Cs for A and sometimes with all Cs created by a particular B on behalf of A. So my choice seems to be only have edges: A->B->C
Or, to also to have: A->C There will be storage for A->C of course, but given the low cardinality of B, is A->C overkill and I can just find all C by iterating B for A without too much cost.
7 Replies
spmallette
spmallette•12mo ago
I think if you have just 10 A->B type edges there shouldn't be much need to have a shortcut edge from A->C. If there are tons of Cs in that equation then you could end up in a situation where you then slow the A->B traversals which would have been really fast without that. i think you mentioned elsewhere that you were using @neptune - i'm not sure if there is additional advice folks might offer specific to that graph database, but let's see if anyone else drops in some more comments.
kelvinl2816
kelvinl2816•12mo ago
The nice thing about graph data modeling is that it's pretty easy (and typically recommended) to iterate on the data model as needed to improve query performance. A key question will be how much filtering of the edges will you need to do? Will it be by label or by label and/or one or more properties? At face value starting with A -> B -> C seems resonable but until you start to write queries you may not know for sure what optimizations make sense. The key thing to worry about though is not so much the number of edges, crossing those is fast in Neptune, it's the amount of edge filtering that needs perhaps some more discussion.
Jim Idle
Jim Idle•12mo ago
OK - that's good reasoning. As I said in another post, the answer is likely in experimentation, which I will do. I was just wondering if there were any rule-of-thumb things to consider, and it seems that there are. Just wondering why millions of A->C edges would slow A->B. Or you mean if I don't explicitly name/label the outgoing edges from A? Being in Taiwan makes back and forth follow ups last a day 😉
spmallette
spmallette•12mo ago
I guess my statement depends on the graph to some degree. Generally though, I just meant that with A->B and A->C, A would now have supernode status whereas with just A->B there would be a simple 10 or so edges to reason about when traversing.
ManabuBeach
ManabuBeach•12mo ago
Just want to confirm, given: a. g.V("09dc9684-fdb6-4729-887f-d653cd35a121").inE("soft-removed") vs. b. g.V("09dc9684-fdb6-4729-887f-d653cd35a121").inE("primeUser").has("linkKind", "soft-removed") It is a LOT faster with a, right, I hope!
spmallette
spmallette•12mo ago
speaking purely from query execution time (not serialization) it depends on the graph. not all will optimize that. TinkerGraph would be fairly identical as all filtering is in-memory. JanusGraph would be faster, but only if you defined an index for "linkKind".
ManabuBeach
ManabuBeach•12mo ago
Kind of diverging from the main topic here, but as the indexing topic is brought up, I would assume that in Neptune the performance will get better as certain properties are used often, if I understand the auto index builder (which I think is very nice.)
Want results from more Discord servers?
Add your server
More Posts
Advantage/Disadvantage of In and out edge vs one edgeI realize that the answer to this questions might be "it depends", but if I have a bunch of verticesMergeV uint32 properties inserted as longUsing a property map with OnCreate in MergeV exhibits different behavior if the property type is uinUsing the modern graph, how can I write a query that finds the name of the oldest person?I want to take the graph produced by TinkerFactory.createModern() and write a query that finds the oConcurrent MergeV gremlingo - Vertex Id existsIf I use more than one go routine to update the gremlin server, so each has its own connection and tHelp Needed with Sample Method in Gremlin-GoHello Apache TinkerPop Community, I'm currently working on a Go project using the Apache TinkerPop Is it possible to extract the requestId of an Amazon Neptune Gremlin query via gremlin-driver?Is there any way via gremlin-driver to extract the requestId of a completed query submitted to AmazoAggregating vertices with set-cardinality propertiesI am aggregating traversed vertices that have both single and set-cardinality properties. When captuAre the developers of TinkerPop interested in the Performance difference on the equivalent queries?Hi all! I was recently working on generating equivalent Gremlin queries to test TinkerGraph and thenGremlingo with Neptune - Read loop errorCode that works perfectly on my local TinkerPop Gremlin server, fails in a (to me right now anyway ;3.6.2 gremlin-server possible memory leakHas there been any memory leak reports in versions >=3.6.2? I just tracked down a memory leak that