TreeStep and MultiQuery support

On JanusGraph 1.0, a traversal like g.V().has(...).out(...).has(...).out(...).has(...) nicely leverages the MultiQuery optimisation and returns results in acceptable time. However, as soon as we add a tree() step, as in g.V().has(...).out(...).has(...).out(...).has(...).tree(), all MultiQuery optimisations are disabled and the traversal time increases drastically. Based on the following code, I think this applies to all Steps with PATH requirement (e.g. PathStep, TreeStep): https://github.com/JanusGraph/janusgraph/blob/v1.0/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L393 Could a knowledgeable person chime in and explain if disabling MultiQuery is a hard requirement by design (e.g. the traverser's history needs to be kept and MultiQuery does not allow that), if it's just that the optimisation was not implemented for this step or if this can be changed easily (as in just removing that condition), or if there could be other approaches to get a subgraph/tree that wouldn't have such limitation? Thanks!
GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
6 Replies
porunov
porunov5mo ago
Hi Clement! I couldn't figure out the reason of disabling multi-query optimization when there is a PathProcessor step. Thus, I left it as it is (disabled) for such traversals. You can see the code responsible for that is here: https://github.com/JanusGraph/janusgraph/blob/12708188397f69616adddc933e539e841af409e4/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L392-L405 If you find out it's OK to enable multi-query optimization for such cases then you can disable that PathProcessor check and add TreeStep.class to the following list of supported parent steps: https://github.com/JanusGraph/janusgraph/blob/12708188397f69616adddc933e539e841af409e4/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L110-L135
GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
cdegroc
cdegroc5mo ago
Thanks! That's worth a try! 👀 👋🏻 Hey. This worked and the traversal now leverages multiQuery, resulting in a nice performance improvement in my tests.
cdegroc
cdegroc5mo ago
Even though unit tests are green, I imagine this could be breaking some traversal types I haven't tried or am not used to. @rngcntr, since you're the original author of this change (https://github.com/JanusGraph/janusgraph/pull/2516/files#diff-e1f91b256e6c63d882f9b043cbfa4d264c15299c52bae1b845dcd90b8beadabbR239-R252), would you remember why MultiQuery optimizations were disabled for Path-based traversals by any chance? 🙇🏻
GitHub
Add config option to use barrier size as batch size limit by rngcnt...
This PR adds the functionality to configure a limit for MultiQuery batch sizes. As discussed in #2514, the batch size limit is set to the barrier size of the preceding barrier() step. To control th...
rngcntr
rngcntr5mo ago
Hi @Clément de Groc ! The reasoning is explained in the Javadoc (https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L385-L390): Similar to NoOpBarrierStep, the MultiQueryStep s purpose is to aggregate traversers before handling them and passing results to the next step. Not having path tracking enabled is a hard requirement for TinkerPop's NoOpBarrierStep (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L46) so to be safe, I applied that requirement to MultiQueryStep as well.
GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
GitHub
tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/p...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
rngcntr
rngcntr5mo ago
I can't tell anymore if I actually managed to figure out why NoOpBarrierStep is not allowed in path tracking traversals or not. But since that's part of TinkerPop, there may be test cases in their repository that should fail if the check in LazyBarrierStrategy is dropped.
cdegroc
cdegroc5mo ago
Thanks for your quick answer. I can see this requirement was added long ago. I will review TinkerPop tests, and then ask questions on the TinkerPop discord. FI started this TinkerPop thread: https://discord.com/channels/838910279550238720/1197157803907874829/1197157803907874829
Want results from more Discord servers?
Add your server
More Posts