Implementing Graph Filter for Sub-Graph Loading in Spark Cluster with JanusGraph
Hello,
I'm currently utilizing JanusGraph 0.6.4 with Bigtable as the storage backend and encountering difficulties when attempting to run OLAP queries on my graph via SparkGraphComputer. The graph is quite large, containing billions of vertices, and I'm only able to execute queries on significantly smaller graphs.
My queries are being run through the Gremlin console, and the problem appears to be related to loading the graph into Spark RDD. I'm interested in applying a filter to load only vertices and edges with specific labels before running the query.
I've noticed that creating a vertex program and using it as described in the Tinkerpop documentation(https://tinkerpop.apache.org/docs/current/reference/#graph-filter) only loads the specified subgraph into the Spark RDDs:
Is it possible to implement this filtering directly through the Gremlin console?
I've attempted to use g.V().limit(1), but without success. I suspect this is because the entire graph is being loaded into the RDD for this query as well. Here's the code I used:
Any insights or suggestions would be greatly appreciated. Thank you.
I'm currently utilizing JanusGraph 0.6.4 with Bigtable as the storage backend and encountering difficulties when attempting to run OLAP queries on my graph via SparkGraphComputer. The graph is quite large, containing billions of vertices, and I'm only able to execute queries on significantly smaller graphs.
My queries are being run through the Gremlin console, and the problem appears to be related to loading the graph into Spark RDD. I'm interested in applying a filter to load only vertices and edges with specific labels before running the query.
I've noticed that creating a vertex program and using it as described in the Tinkerpop documentation(https://tinkerpop.apache.org/docs/current/reference/#graph-filter) only loads the specified subgraph into the Spark RDDs:
Is it possible to implement this filtering directly through the Gremlin console?
I've attempted to use g.V().limit(1), but without success. I suspect this is because the entire graph is being loaded into the RDD for this query as well. Here's the code I used:
Any insights or suggestions would be greatly appreciated. Thank you.