JanusGraph•2y ago

How to run the mapreduce reindexing job

Did anyone succeed in running the map-reduce reindexing job? We went into the usual dependencies nightmare. I would assume we should put together all the dependencies into an uber-jar right? Otherwise we should put in the yarn node classpath the janusgraph dependencies, no?

5 Replies

Bo•2y ago

I used to have a successful set up on yarn cluster, but cannot find it anymore. IIRC, a uber-jar sounds like the way to go.

we should put in the yarn node classpath the janusgraph dependencies

I am not 100% sure but I don't think I did this

dgrecoOP•2y ago

Thank you 🙏 the Uber-jar seems the most plausible solution. Then there is the usual mess for putting all the dependencies together A last point, did you ever think to create reindexing job based on spark instead of MR? It would be more portable, MR is restricted to the hadoop env, yarn etc.

Bo•2y ago

I agree we should gradually move away from MapReduce, or at least, allow people to do reindexing using Spark. I don't foresee it happens in the near future, unless someone is willing to tackle that. There's an adhoc way to do it by yourself: use Spark job to scan all vertices, update the properties that you want to reindex, and commit. It could be a no-op update that just does an in-place update without changing the value, but it will trigger a reindexing for that vertex/edge (if I recall correctly).

Bo•2y ago

In case you don't know how to "use Spark job to scan all vertices ... and commit", here's an example: https://github.com/Citegraph/citegraph/blob/main/backend/src/main/java/io/citegraph/data/spark/loader/VertexPropertyEnricher.java

GitHub

citegraph/backend/src/main/java/io/citegraph/data/spark/loader/Vert...

CiteGraph: A citation graph web visualizer. Contribute to Citegraph/citegraph development by creating an account on GitHub.

dgrecoOP•2y ago

Thank you so much, very helpful thanks

Gaming

Programming

How to run the mapreduce reindexing job

Did you find this page helpful?