JanusGraph

J

JanusGraph

JanusGraph - Distributed, open source, massively scalable graph database.

Join

Olap using spark cluster taking much more time than expected.

Hi All, We have setup a spark cluster to run olap queries on janusgraph with bigtable as storage backend. Details: ```Backend: Bigtable Vertices: ~4 Billion...

Reindexing using the Mgmt System

Hi all! we have an internal debate on how to best perform a reindex, after adding a new index. On JanusGraph 0.6, which of those options is preferred? and why? ```...

Deleting duplicate connections from the schema?

Hi all! I have a Java app working with a Cassandra-based JG, that checks the JG schema at startup and adds any missing elements via the JanusGraphManagement interface. Due to a bug that existed for a long time that app created thousands of duplicate connections between same node and edge labels via management.addConnection(). This has become a problem, because these connections are getting cached in the StandardSchemaCache which has unlimited size and started taking up all the heap. I'm looking for a way to safely delete the duplicated connections from the schema without dropping the schema and without disrupting other instances of the app working with this graph. Does anyone have experience with anything similar? I'm currently exploring the internals of JanusGraphManagement and ways to use the tx.query() interface to remove the unwanted relations, but I'd really appreciate any tips and ideas of an easier/safer solution....

Accelerating the vertex upsert

We need to accelerate the ingestion rate; the scenario is pretty typical. We could have repeated vertexes with new relationships. So, at each vertex insertion, we should check if it's already been inserted. Is there any particular recipe to accelerate this step? I would assume that this check would cause contention for maintaining consistency. We are considering introducing an external memory-based cache where we can accumulate all the vertex IDs and check the cache before hitting the DB. Any ot...

Janusgraph Tokenizer & Solr

I recently encountered what I believe is an incompatibility between the JanusGraph tokenizer being applied to queries before their submission to Solr. It appears this is uniquely only done to Solr in comparison to Elasticsearch. Moreover only for one particular predicate for Solr. Has anyone else bumped into this? Here's a link to my post on the listserve that gives more detail and links to the code in question: https://lists.lfaidata.foundation/g/janusgraph-users/message/6760...

JanusGraph 1.0 full-text search predicate in python - broken

Hi All, with JanusGraph 0.6 and gremlin-python 3.5.4, I was able to use the following in Python to use JanusGraph full-text search predicate: ----- from gremlin_python.process.traversal import P...
Solution:
The problem here is probably that JanusGraph used to serialize its text predicates as if they were TinkerPop text predicates, just with a value corresponding to the value of the JanusGraph text predicate, e.g., TextP.textContains() was serialized as if it were P.textContains(). That was changed in version 0.6.0 of JanusGraph to let JanusGraph serialize its predicates with a JanusGraph specific type identifier, but the server kept a fallback mechanism so it could still deserialize predicates sent that way: https://docs.janusgraph.org/changelog/#serialization-of-janusgraph-predicates-has-changed This fallback mechanism was then removed in JanusGraph 1.0.0: https://docs.janusgraph.org/changelog/#remove-support-for-old-serialization-format-of-janusgraph-predicates ...

Unable to use next() in gremlin-python

Hi, has anyone tried to use gremlin-python with janusgraph 1.0.0? It seems that there is a bug that makes next() unusable. Here is an example of how to reproduce the issue: ...
Solution:
This is my config https://github.com/Citegraph/citegraph/blob/main/backend/src/main/resources/gremlin-server-cql.yaml. I just tested python driver and java driver and they both worked well.

Speeding up node adding to Janusgraph

Hi Everybody, I am using Janusgraph with Berkley DB JE. In my use case I have to add initially nodes one by one. The number could be quite huge in certain cases, i.e., 200k-500k. It is currently taking quite some time, as compared to inmemory test setup. Have tried the following to expedite and got some improvement: .set("storage.berkeleyje.cache-percentage", CACHE_PERCENTAGE) .set("cache.db-cache",true) .set("cache.db-cache-size", DB_CACHE_SIZE)...

Splitting Backing ElasticSearch Index To Increase Primary Shards As JG Mixed Index Grows

Has anyone had to resize a ElasticSearch index that's backing a JanusGraph Mixed Index? Configuration wise it seems you're only able to convey to JanusGraph a singular primary shard & replica count when it creates an ES Index. I'm projecting to eventually have a couple Mixed Indices exceed what will be reasonable for a single primary shard in the backing ES Index (given the rule of thumb of 10-50GB or 200M documents). So as a configuration default it makes sense to leave it as 1 for the other Mixed Indices....

Usage of _lock_ tables with ConfiguredGraphFactory vs. JanusGraphFactory

Hi everybody, we are noticing weird behavior of JanusGraph regarding the tables edgestore_lock_ and graphindex_lock_. We are operating two JanusGraph clusters which use the same schema, both running on ScyllaDB. While one instance is managed by JanusGraphFactory, we have configured multiple graphs in the other instance using ConfiguredGraphFactory. Recently, we noticed an unexpected storage usage caused by the table edgestore_lock_, so we started comparing the utilization of these tables for both scenarios: ```...
Solution:
The lock is acquired by StandardJanusGraph when vp[~T$VertexExists->true] is deleted. This only affects deletions because on additions, the vertex is always "new" https://github.com/JanusGraph/janusgraph/blob/06526e728f468bf7fca072c3cf2c5d9024830be0/janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java#L762

OLAP job failing with NullPointerException error

I am running an Apache Spark job on my graph and it fails with the below error: i am not sure what id : 525 is....
No description

Unable to access/drop a vertex after dropping a property key.

We followed the example mentioned here and dropped one property key in our schema, post which we are unable to access/drop any vertex. It is throwing us a Error during serialization: [no message for java.lang.NullPointerException] exception every time we run either g.V(node_id).valueMap() or g.V(node_id).drop().iterate(). We were wondering if there is any way we could recover from this error or any workarounds...

drop() slow performance

I have to drop thousands of vertex ids from the graph, and .drop().iterate() takes like 1 minute for each vertex. Seems like it is gonna take ages for my task to complete. Is there any other quicker way to achieve this? like a bulk drop operation?...

where to find gremlin-java docs

I'm looking to learn how to make gremlin queries from a Java app. I've discovered I can't use raw gremlin inside a java app and need something called Java-gremlin. What is a good resource for learning java-gremlin?...
Solution:
I’m not completely sure what do you mean by not being able to use Gremlin in Java app. Gremlin here different language variants and the Java GLV was the first one originally developed. A great way to start I think would be to read Practical Gremlin book by @KelvinL because it has a lot of great use cases and their detailed explanations: https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html If you interested in Gremlin syntax and how it works, I would suggest TinkerPop documentation: https://tinkerpop.apache.org/docs/current/reference/ ...

connect JG tutorial with java(dependency issues)

I started looking into Janus Graph and wanted to see what a springboot app connecting to a Janus graph would look like. I just can't seem to get past step two of the "connecting to janusGraph using java" tutorial. I've placed my two dependencies into my maven pom.xml file and when loading my dependencies only the gremlin driver was loaded. The Janusgraph-driver dependency fails with error stating that it couldn't find the dependency. Error: " ...
Solution:
0.6.4 is not officially released yet, so you don't find it in maven

Changing the Cardinality of a Property

I have a property which was created with the default cardinality (i.e. SINGLE), but I want the cardinality of this property to be Set. What is the easiest way to make this change.

storage.cql.executor-service

Hello: storage.cql.executor-service.enabled property is removed in 1.0.0. iirc the current documentation contains the recommendation, that for production scenarios it should be disabled and let the driver handle the queries. Is there a change?

Benchmarks

We are trying to benchmark the ingestion rate of JG, we use SOLR as indexing engine, is there any number already available?

How to run the mapreduce reindexing job

Did anyone succeed in running the map-reduce reindexing job? We went into the usual dependencies nightmare. I would assume we should put together all the dependencies into an uber-jar right? Otherwise we should put in the yarn node classpath the janusgraph dependencies, no?

Performance Problems/Config review

Hello, I am reaching out as I anticipate enhanced performance from our Scylla + JanusGraph deployment and am seeking insights to optimize it. We have tested versions 0.6.3 and 1.0.0-20230918-091019.c39a12a. We currently operate a concise setup on Kubernetes, consisting of three nodes. Each node is allocated 10 CPUs exclusively for our tests and 60GB of memory. Scylla and JanusGraph are deployed on the same nodes, and a fourth node runs a benchmark program to assess our configuration’s efficacy....