JanusGraphJ
JanusGraph13mo ago
rpuga

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/

However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V().

In G.V(), I tried this while selecting Groovy Mode for query submission:
graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

but I get the following error:
No such property: SparkGraphComputer for class: Script10

I'm assuming this is because the needed plugins are not loaded. I tried:
:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

but this does not work outside of the console.

Any suggestions, @gdotv ?

Also, is there any example code I could use to run OLAP queries from a Java application?

Thanks!
Solution
Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource.

In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo :
https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159

I've been able to include all missing dependencies and compile/run this example:
package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}
Was this page helpful?