TreeStep and MultiQuery support
On JanusGraph 1.0, a traversal like
g.V().has(...).out(...).has(...).out(...).has(...)
nicely leverages the MultiQuery optimisation and returns results in acceptable time.
However, as soon as we add a tree()
step, as in g.V().has(...).out(...).has(...).out(...).has(...).tree()
, all MultiQuery optimisations are disabled and the traversal time increases drastically.
Based on the following code, I think this applies to all Steps with PATH requirement (e.g. PathStep
, TreeStep
): https://github.com/JanusGraph/janusgraph/blob/v1.0/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L393
Could a knowledgeable person chime in and explain if disabling MultiQuery is a hard requirement by design (e.g. the traverser's history needs to be kept and MultiQuery does not allow that), if it's just that the optimisation was not implemented for this step or if this can be changed easily (as in just removing that condition), or if there could be other approaches to get a subgraph/tree that wouldn't have such limitation?
Thanks!GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
6 Replies
Hi Clement! I couldn't figure out the reason of disabling multi-query optimization when there is a PathProcessor step. Thus, I left it as it is (disabled) for such traversals.
You can see the code responsible for that is here:
https://github.com/JanusGraph/janusgraph/blob/12708188397f69616adddc933e539e841af409e4/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L392-L405
If you find out it's OK to enable multi-query optimization for such cases then you can disable that
PathProcessor
check and add TreeStep.class
to the following list of supported parent steps:
https://github.com/JanusGraph/janusgraph/blob/12708188397f69616adddc933e539e841af409e4/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L110-L135GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
Thanks! That's worth a try! 👀
👋🏻 Hey. This worked and the traversal now leverages multiQuery, resulting in a nice performance improvement in my tests.
Even though unit tests are green, I imagine this could be breaking some traversal types I haven't tried or am not used to.
@rngcntr, since you're the original author of this change (https://github.com/JanusGraph/janusgraph/pull/2516/files#diff-e1f91b256e6c63d882f9b043cbfa4d264c15299c52bae1b845dcd90b8beadabbR239-R252), would you remember why MultiQuery optimizations were disabled for Path-based traversals by any chance? 🙇🏻
GitHub
Add config option to use barrier size as batch size limit by rngcnt...
This PR adds the functionality to configure a limit for MultiQuery batch sizes. As discussed in #2514, the batch size limit is set to the barrier size of the preceding barrier() step. To control th...
Hi @Clément de Groc ! The reasoning is explained in the Javadoc (https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L385-L390): Similar to
NoOpBarrierStep
, the MultiQueryStep
s purpose is to aggregate traversers before handling them and passing results to the next step. Not having path tracking enabled is a hard requirement for TinkerPop's NoOpBarrierStep
(https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L46) so to be safe, I applied that requirement to MultiQueryStep
as well.GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/tin...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
GitHub
tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/p...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
I can't tell anymore if I actually managed to figure out why
NoOpBarrierStep
is not allowed in path tracking traversals or not. But since that's part of TinkerPop, there may be test cases in their repository that should fail if the check in LazyBarrierStrategy
is dropped.Thanks for your quick answer. I can see this requirement was added long ago. I will review TinkerPop tests, and then ask questions on the TinkerPop discord.
FI started this TinkerPop thread: https://discord.com/channels/838910279550238720/1197157803907874829/1197157803907874829