Olap using spark cluster taking much more time than expected.
Hi All, We have setup a spark cluster to run olap queries on janusgraph with bigtable as storage backend.
Details:
Now I'm trying to count all the vertices with the label
ticket
which we know are of the order of ~100k, the query fired to do that is as follows:
The query is running from the past 36 hours and is still not completed, looking at the average throughput (>50 mb/sec) at which data is being read it should have read the ~3.6TB of data by now.
Is it possible to use indexes while running the olap query resulting in faster loading of the subgraph into spark rdds (currently it is scanning the full graph) ?1 Reply
Is it possible to use indexes while running the olap query resulting in faster loading of the subgraph into spark rdds (currently it is scanning the full graph)Unfortunately, no. It sounds like an interesting idea, though!