AT
Apache TinkerPopdracule_redrose

Design decision related to multiple heterogenous relational graphs

I'm working with over 100k instances of heterogeneous, relational node-and-edge attributed graphs, each graph having around 5k vertices and 10k edges. Vertices are of 3 types with 10 attributes (7 numerical, 3 string), and edges are of 5 types with 8 attributes (4 numerical, 4 string). Considering the complexity and size of the data, running queries like traversal paths, average clustering coefficients, and identifying nodes in clustering triangles across all these instances presents a significant challenge. I've been using a naive gremlin-server setup with an in-memory database to run my queries on one graph instance, but it's becoming clear that this approach isn't sustainable for multi-graph persistence or memory efficiency, as a single graph instance consumes about 1.2 GB of RAM. I'm exploring the possibility of switching to JanusGraph with a Berkeley DB backend to support persistent storage of multiple graphs (based on the feedback I got from the gremlin google group, https://groups.google.com/g/gremlin-users/c/UotOZFVvi3k/m/-hVd2oNNAQAJ). Given the data structure and requirements, especially the need for efficient loading and querying of individual graph instances in a possibly serializable fashion, do you think JanusGraph with Berkeley DB is a viable solution, or are there alternative approaches I should consider for managing and querying this volume of graph data effectively? I tried finding similar question, the closest matching question i found was https://discord.com/channels/838910279550238720/1087383361129037845, but was asking how to manage multiple graphs in gremlin-server.
Solution:
No we actually recommend using user-defined IDs
Discord
Discord - A New Way to Chat with Friends & Communities
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
C
ColeGreer48d ago
Hosting 100k small graph instances isn't a usage pattern I've seen a whole lot. JanusGraph seems like a reasonable choice to me, although I see you've been running into issues with conflicting vertex/edge id's. I'm unsure if JanusGraph supports non-globally unique id's in multiple graph deployments. My understanding is that JanusGraph generally recommends avoiding using user-defined id's whenever possible, in favour of automatically generated id's from JanusGraph. Perhaps some @janusgraph folks with more familiarity with configuring multiple graphs can give some clearer advice for your setup.
Solution
B
Bo47d ago
No we actually recommend using user-defined IDs
B
Bo47d ago
But yeah I've never seen anyone hosting 100k small graph instances In theory it should work, though I might be wrong but IIRC different graphs could use same IDs without issues. In other words, there's no globally unique ID in a multi-tenant JanusGraph setup.
D
dracule_redrose47d ago
thank you so much guys. As I make progress, I will update this thread, in case someone asks for it in the future.
Want results from more Discord servers?
Add your server
More Posts
Stackoverflow when adding a larger list of property values using traverser.property()Hey, we encounter a stack overflow: ``` Exception during Transaction, rolling back ... org.apache.tijava: package org.apache.tinkerpop.shaded.jackson.core does not existWhile trying to `mvn clean install` with jdk11, I ran into the above error using the master branch. Performance issue in large graphsWhen performing changes in large graph (ca. 100K nodes, 500K edges) which is stored in one kryo fileConcurrent queries to authentication required sever resulted in 401 errorHey guys, playing around with gremlin & encountered this very odd error where concurrent queries wilDiscrepancy between console server id conventions and NeptuneSo I'm working with my test server and on Neptune--and I'm noticing a difference in the type of the how to connect the amothic/neptune container to the volume?I need to know which directory needs to attach to containeer. so that the data is stored safely. eveDocker yaml authentication settings (gremlinserver.authentication) questionDoes anyone have any experience setting up authentication on Docker by using the supplied .yaml fileGremlin Injection Attacks?Is anyone talking about or looking into attacks and mitigations for Gremlin Injection Attacks? That Returned vertex properties (JS client)Hi, I've got a question regarding the returned vertex value when using the JS client. How come non-aAnyone using Tinkerpop docker as a local Cosmos replacementRunning into some random issues. Looking for tips and tricks.Configuring Websockets connection to pass through a proxy serverHey, I'm working on making G.V() fully proxy aware, but I can't seem to get websockets connection tpython goblin vs spring-data-goblin for interactions with gremlin serverI want an OGM to interact with my gremlin server. What would be a good choice?Is there any open source version of data visualizer for aws neptune?Is there any open source version of data visualizer for aws neptune. I'll need it since it essentialDynamic select within query not working.Any insights or help would be greatly appreciated. I have to pass a list of lists in the format beAdding multiple properties to a vertex using gremlin-goHello Community, I have a question regarding how multiple properties can be added to a vertex using Is it possible to walk 2 different graphs using custom TraversalStrategy in Gremlin?I have 2 different graphs in 2 different Neptune cluster. Both of them can have few reference verticSideEffect a variable, Use it later after BarrierStep?I seek a query that builds a list and then needs to both sum the list's mapped values and divide theMemory issue on repeatI am traversing all nodes occuring in the same cluster given one of the nodes in that cluster. Surp