7 replies

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/

However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V().

In G.V(), I tried this while selecting

Groovy Mode

Groovy Mode

for query submission:

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

but I get the following error:

No such property: SparkGraphComputer for class: Script10

No such property: SparkGraphComputer for class: Script10

I'm assuming this is because the needed plugins are not loaded. I tried:

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

but this does not work outside of the console.

Any suggestions, @gdotv ?

Also, is there any example code I could use to run OLAP queries from a Java application?

Thanks!

Solution

Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource.

In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo :
https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159

I've been able to include all missing dependencies and compile/run this example:

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

Jump to solution

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Groovy Mode

Groovy Mode

for query submission:

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

but I get the following error:

No such property: SparkGraphComputer for class: Script10

No such property: SparkGraphComputer for class: Script10

I'm assuming this is because the needed plugins are not loaded. I tried:

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

but this does not work outside of the console.

Any suggestions, @gdotv ?

Also, is there any example code I could use to run OLAP queries from a Java application?

Thanks!

Solution

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

Jump to solution

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Similar Threads

Similar Threads

Similar Threads