7 replies

Can `CqlInputFormat` do predicate pushdowns/query based prefilters?

Hi! First of all, thank you all for your work on JanusGraph.

In my use case, I have a medium-large graph, ~3TB currently, might be 1-2 orders of magnitude bigger later. The data in it is generally clustered in a time-based fashion, e.g. newer vertices are mostly connected to other newer vertices (a timestamp is stored as a vertex property).

I am writing an OLAP pipeline with Spark where JanusGraph, backed by Cassandra, is the source, and I use Tinkerpop's

hadoop-gremlin

hadoop-gremlin

to build vertex programs and run OLAP gremlin queries. Per my understanding, in this setup the only point of contact with JanusGraph is through the

CqlInputFormat

CqlInputFormat

and the server itself is not involved at all. Is that correct?

A very common operation that I'm going to have to do, based on the above clustering assumption, is pre-filtering vertices by a timestamp range before running my logic on the subgraph. As an example, I would like to, say, download the last couple days' worth of vertices on my laptop for running some tests. Per my understanding, currently this would entail unconditionally loading the entire dataset in the Spark cluster's memory every time. Is that correct? Is there an alternative?

I have looked into

CqlInputFormat

CqlInputFormat

's code and I noticed that you can add

WHERE

WHERE

clauses, but it looks like there are caveats to that and I could not understand how to map a (simple) predicate on a vertex property to a CQL clause. I was considering rolling my own input format class once I grokked how to run CQL queries directly. I'm not super familiar with JanusGraph's codebase, nor I am a Java expert really, but I'm willing to get my hands dirty -- could I please ask for a bird's eye view explanation of how graph data is mapped into the backend, or even just pointers into how to navigate the codebase pertaining to that? Or do you have other suggestions that could point me in the right direction?

Thank you!

cc @criminosis

JanusGraph•15mo ago•

7 replies

johndisandonato

Can `CqlInputFormat` do predicate pushdowns/query based prefilters?

hadoop-gremlin

hadoop-gremlin

to build vertex programs and run OLAP gremlin queries. Per my understanding, in this setup the only point of contact with JanusGraph is through the

CqlInputFormat

CqlInputFormat

CqlInputFormat

CqlInputFormat

's code and I noticed that you can add

WHERE

WHERE

cc @criminosis

Can `CqlInputFormat` do predicate pushdowns/query based prefilters?

Can `CqlInputFormat` do predicate pushdowns/query based prefilters?

Similar Threads

Similar Threads

Similar Threads