Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join

Systems Analysis Report on Apache TinkerPop - Where to Start?

Hey all, I'm currently writing an alaysis on Apache TinkerPop for grad school and was just hoping that someone could point me in the right direction for some good materials or some example uses for this system! Any answers are appreciated, finding out a good chunk on my own through the official docs, but just wanted to see if there are any specific things that the community here thinks are notable. Have not used this before so trying to put together resources in order for me to better explain "w...
Solution:
i'm not sure what you're looking for when it comes to "example uses for this system" - i assume you mean real-world use case examples. you often have to search for the actual TinkerPop implementations to get some of those answers at times. TinkerPop as a framework tends to not get the core mention in blog posts and other news items. anyway, here's a few cases you could look at: https://innovation.ebayinc.com/tech/engineering/how-we-export-billion-scale-graphs-on-transactional-graph-databases/ https://aws.amazon.com/blogs/database/cox-automotive-scales-digital-personalization-using-an-identity-graph-powered-by-amazon-neptune/?pg=ln&sec=c ...

Lambda example in TypeScript

Does anyone know where I can find example code that demonstrates up-to-date best practices for writing TypeScript Lambda Functions that interact with Neptune?

mergeE(): increment counter on match

Hi, is there an easy way to increment an existing edge property based on its current value using mergeE() in one single query? (e.g., counter += 1) Something similar to this: ``` g.mergeE([(T.label):'called', (from): person1, (to):person2])....
Solution:
gremlin> g.mergeE([(Direction.from):44,(Direction.to):8]).valueMap(true)
==>[id:5062,label:route,dist:549]
gremlin> g.mergeE([(Direction.from):44,(Direction.to):8]).valueMap(true)
==>[id:5062,label:route,dist:549]
and then...

Serialization Issue

I have a weird error, when I am connecting with JanusGraph gremlin client using conf/remote-graph-binary.yaml I am able to get results. But when I am trying to use my java application I am getting, java.io.IOException: Serializer for custom type 'janusgraph.RelationIdentifier' not found. Googling around I got that this is due to serialization issue. It looks to me that the gremlin-client and my java application has similar configs but gremlin-client is not having any serialization problem. ``` hosts: [localhost] port: 8182...
Solution:
I have faced a similar issue in the past (but mostly related to gremlin-python) and @Boxuan Li suggested a solution in the JanusGraph discord server. It was something like along these lines: ``` private static MessageSerializer createGraphBinaryMessageSerializerV1() { final GraphBinaryMessageSerializerV1 serializer = new GraphBinaryMessageSerializerV1();...

Design decision related to multiple heterogenous relational graphs

I'm working with over 100k instances of heterogeneous, relational node-and-edge attributed graphs, each graph having around 5k vertices and 10k edges. Vertices are of 3 types with 10 attributes (7 numerical, 3 string), and edges are of 5 types with 8 attributes (4 numerical, 4 string). Considering the complexity and size of the data, running queries like traversal paths, average clustering coefficients, and identifying nodes in clustering triangles across all these instances presents a significant challenge. I've been using a naive gremlin-server setup with an in-memory database to run my queries on one graph instance, but it's becoming clear that this approach isn't sustainable for multi-graph persistence or memory efficiency, as a single graph instance consumes about 1.2 GB of RAM. I'm exploring the possibility of switching to JanusGraph with a Berkeley DB backend to support persistent storage of multiple graphs (based on the feedback I got from the gremlin google group, https://groups.google.com/g/gremlin-users/c/UotOZFVvi3k/m/-hVd2oNNAQAJ). Given the data structure and requirements, especially the need for efficient loading and querying of individual graph instances in a possibly serializable fashion, do you think JanusGraph with Berkeley DB is a viable solution, or are there alternative approaches I should consider for managing and querying this volume of graph data effectively?...
Solution:
No we actually recommend using user-defined IDs

Stackoverflow when adding a larger list of property values using traverser.property()

Hey, we encounter a stack overflow: ``` Exception during Transaction, rolling back ... org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(org/apache/tinkerpop/gremlin/process/traversal/step/util/AbstractStep.java:150): Java::JavaLang::StackOverflowError from org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(org/apache/tinkerpop/gremlin/process/traversal/step/util/ExpandableStepIterator.java:55)...

java: package org.apache.tinkerpop.shaded.jackson.core does not exist

While trying to mvn clean install with jdk11, I ran into the above error using the master branch. Any idea?

Performance issue in large graphs

When performing changes in large graph (ca. 100K nodes, 500K edges) which is stored in one kryo file I am experiencing some huge delays. Just as an example, when writing initially I can change 10K nodes in minutes, but when the graph is big the same changes need more than one hour. Is there any easy solution possible, i.e., like breaking down and saving in smaller files etc. Any suggestion is helpful. Initial preference is saving in file system (local or network). Thanks for your suggestions/sol...
Solution:
I'm not sure if any of the other serializations such as GraphML or GraphSON might perform better, but I would say this is likely not a common way we see those graphs used so we may not have too much data on which techniques may work best. With the exception of TinkerGraph which is often used as an in-memory, somewhat ephemeral, graph, we typically see persistent graph stores used where the data is persisted on disk by the database and you do not need to constantly reload the data each time. If y...

Concurrent queries to authentication required sever resulted in 401 error

Hey guys, playing around with gremlin & encountered this very odd error where concurrent queries will break authentication: ```js import gremlin from "gremlin"; ...
Solution:
Looks like a bug. Could you create an issue in https://issues.apache.org/jira/projects/TINKERPOP ?...

Discrepancy between console server id conventions and Neptune

So I'm working with my test server and on Neptune--and I'm noticing a difference in the type of the T.id field. Is there any way to configure the type of id generated by the gremlin server?
Solution:
Amazon Neptune uses strings for all IDs. You can configure a Gremlin Server to also use String IDs. There is a nice writeup here that may be useful (it's from the graph-notebook repo but the steps still apply) https://github.com/aws/graph-notebook/tree/main/additional-databases/gremlin-server
No description

how to connect the amothic/neptune container to the volume?

I need to know which directory needs to attach to containeer. so that the data is stored safely. even after a restart.
Solution:
Check out the graphLocation and graphFormatconfig options here: https://tinkerpop.apache.org/docs/current/reference/#tinkergraph-configuration You may also want to use a mapped directory from your local machine to ensure data is not lost if the contianer is deleted: https://docs.docker.com/storage/volumes/...

Docker yaml authentication settings (gremlinserver.authentication) question

Does anyone have any experience setting up authentication on Docker by using the supplied .yaml file? I'm having trouble passingin a map to properly set one of the options: gremlinserver.authentication.config. Additional info, but not related to the my main problem: I have a file with the contents of username/password pairs which follow the schema: ...
Solution:
Due to gremlin server expecting a map, but docker being unable to pass it to the server in the format that is expected.
I think you simply have a slight misunderstanding of the YAML format here. YAML is basically a nested map of maps. Now, if your YAML looks like this: ...

Gremlin Injection Attacks?

Is anyone talking about or looking into attacks and mitigations for Gremlin Injection Attacks? That is, just like all the commentary on how to design your PHP-based web frontend with Postgres backend to not be a sucker for an easy SQL Injection Attack, is anyone looking at how to handle your users of your Gremlin Server when those users give you Groovy lambdas that are rich in aggressive behavior?
Solution:
I think this goes back to a different thread we had where I mentioned that security was a reason driving an idea that lambdas should not be allowed outside of embedded use cases and why they should be removed otherwise. For some lightweight security you can try to sandbox the ScriptEngine in the server: https://tinkerpop.apache.org/docs/current/reference/#script-execution but it is not a perfect solution and really just a reference implementation that we have. Some commercial offerings in the...

Returned vertex properties (JS client)

Hi, I've got a question regarding the returned vertex value when using the JS client. How come non-array properties are parsed & returned as an array of length 1, as seen in the example below? Thank you. ```json { "id": 4104, "label": "account",...
Solution:
array is used to work with properties whose cardinality list or set gremlin> g.addV('test').property(list,'a','1').property(list,'a','2') ==>v[13] gremlin> g.V(13).valueMap() ==>[a:[1,2]]...

Anyone using Tinkerpop docker as a local Cosmos replacement

Running into some random issues. Looking for tips and tricks.
Solution:
One thing to consider in trying to do this is that you would likely use TinkerGraph and Gremlin Server for this local replacement. CosmosDB has a number of limitations and differences that this local environment would not catch, so it's possible that you could write some Gremlin that works locally but then fails when you try the same query on CosmosDB. That said, if you stay aware of those differences, stick to sending scripts and prefer the 3.4.x server release it could give you a basic but not...

Configuring Websockets connection to pass through a proxy server

Hey, I'm working on making G.V() fully proxy aware, but I can't seem to get websockets connection to pass through a SOCKS/HTTP proxy configuration. I've got all the proxy configuration java system properties set and working for HTTP connections. Is there any specific configuration to add to let the Gremlin driver to use a configured proxy?...

python goblin vs spring-data-goblin for interactions with gremlin server

I want an OGM to interact with my gremlin server. What would be a good choice?
Solution:
I've not kept up with the latest changes to these libraries. Goblin might be the most currently maintained one. If you're using Python I suppose I'd start there. Not sure if anyone here can chime in with some success stories around using OGMs. Most applications I hear about tend to just use Gremlin directly.

Is there any open source version of data visualizer for aws neptune?

Is there any open source version of data visualizer for aws neptune. I'll need it since it essential for me for using neptune for small scale purposes. I have used g.V(), and it was perfect for my use case. But because of budget constraints. Can;t offered it. Any solutions?
Solution:
AWS maintains Graph Explorer and Graph Notebook (https://docs.aws.amazon.com/neptune/latest/userguide/visualization-graph-explorer.html and https://github.com/aws/graph-notebook), there's some overlap with what G.V() offers. I was gonna suggest to hit me up re your budget constraints to see if we can work on something there too!

Dynamic select within query not working.

Any insights or help would be greatly appreciated. I have to pass a list of lists in the format below. Hundreds of them which is why I'm trying to iterate in a single query. Please explain why accessing element 0 within the row data works here:...
Solution:
Sorry it took a while for someone to get to this. I think your problem here is that you are trying to use has(String, Traversal) in __.V().hasLabel('UsdValue').has('date', select('row').limit(local, 1).unfold()) but it doesn't work the way you expect. basically, the result of the traversal you give to has() is not given as the value to the comparator. More generally, P does not take a Traversal making any such usage impossible. It is designed to work such that the value of "UsdValue" i...

Adding multiple properties to a vertex using gremlin-go

Hello Community, I have a question regarding how multiple properties can be added to a vertex using gremlin-go. I did something like this ...
Solution:
to add all properties from map to same vertex can be used something like `t := g.AddV("Person") for k, v := range prop { t = t.Property(k,v) }...