Does the quintillion edges limitation stil exists after introducing custom vertex id support?
We have exhausted almost 25% of our ids after 6 months of using JG (edges are still below 100 billion), Was curious if this change related to custom-vertex-id remove the limitation on custom id? also does this mean this limitation as mentioned in the title will also get removed?
Would also like to know when are we planning to release 1.0.0 stable release?
https://docs.janusgraph.org/master/advanced-topics/custom-vertex-id/
3 Replies
I’m not sure how you calculate ids exhaustion. How many ids did you use? You said you have less than 100 billion edges. For each edge you use 1 id. In such case you used ~0.00000867 % of ids. How did you come up with 25%?
Nevertheless, if that isn’t enough, you can switch to string vertex ids to increase the amount of possible vertices in the cluster (which increases the amount of possible edges).
As for the JanusGraph 1.0.0 you can follow the milestone here: https://github.com/JanusGraph/janusgraph/milestone/21
There are some issues left which were planned for 1.0.0 release. However, it doesn’t mean that all pf those issues will be addressed in 1.0.0 release. It could be that some issues are retargeted to later releases.
There is usually no ETA for the releases because it fully depends from the community how soon the releases will be shipped. Usually more help from the community means faster release cycles.
Feel free to get involved into the targeted issues as well as release discussion here: https://lists.lfaidata.foundation/g/janusgraph-dev/topic/discuss_1_0_0_0_6_3/95312566?p=,,,20,0,0,0::recentpostdate/sticky,,,20,2,0,95312566,previd%3D1679492280756512460,nextid%3D1642671202377452358&previd=1679492280756512460&nextid=1642671202377452358
GitHub
Release v1.0.0 Milestone · JanusGraph/janusgraph
JanusGraph: an open-source, distributed graph database - Release v1.0.0 Milestone · JanusGraph/janusgraph
custom-vertex-id eliminates the need for auto-allocating vertex ids, but JanusGraph still needs edge ids, property ids, etc.
FWIW supporting custom edge ids seems not hard - it's just we haven't seen any feature request
I’m not sure how you calculate ids exhaustion.@porunov Most of our ids got wasted due to server restarts as we have Janusgraph server running on kubernetes pods (~ 20-30 pods). Also, we have set
cluster.max-partitions to 1024
which we didn't know at the start that will reserve 10 bits of id, giving 2^50 ids for edges and 2^49 for vertices.
Initially we ingested a lot of data into the graph , block-size
was set to 1million and recently while going through the Janusgraph code we found that Janusgraph id pool for edge namespace has a block size 8 times of base block size
which equates to 8million per id pool.
Now one deployment costs us 8m * (1024 id pools) * (~ 25 pods) = 204 billion edges
, we've had a lot of deployments in the starting phase. we are planning to move Janusgraph server to VMs and reduce the block size, but unfortunately we cannot change cluster.max-partitions
as it is a fixed config.
Please let us know if our understading is wrong and any more step we can take to reduce the wastage of ids.
Also sometime we have to restart due to the following reasons:
1. Mapping additional props to vertex/edge label.
2. Changing state of newly created index from INSTALLED to REGISTERED.
In both the cases changes does not reflect by itself unless we restart, we tried waiting for hours for changes to be communicated to all the instances via backend but only after redeployment it happens.