Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join
RN
RN2/15/2024

Is it possible to walk 2 different graphs using custom TraversalStrategy in Gremlin?

I have 2 different graphs in 2 different Neptune cluster. Both of them can have few reference vertices referring to vertex in other graph. e.g. As we walk through graph A and reach a reference vertex (referring to vertex in graph B), we should be able to traverse normally further inside Graph B and get the results of the query. Basically Graph A + Graph B should act as single virtual graph.
Solution:
At this time, there would be no easy way to do this and I don't think a custom TraversalStrategy would help in any way i can imagine. Maybe the closest thing I can imagine would be to subgraph the two graphs with their references vertices and merge them to a single TinkerGraph in your application and then run additional queries on that directly. I'm not sure that suits a lot of use cases we hear about though in relation to this feature so that suggestion may not be helpful. cc/ @Dave Bechberg...
Lonnie VanZandt
Lonnie VanZandt2/14/2024

SideEffect a variable, Use it later after BarrierStep?

I seek a query that builds a list and then needs to both sum the list's mapped values and divide the resulting sum by the count of the original list. This would be the mean() step - if the mapped list was still a Gremlin traversal object that offered mean(). However, the mapped list is, by that time, a Groovy list and mean() is no longer available....
Solution:
It can be done with all Gremlin steps in 3.7.1 where you have date functions available. Assuming: ```groovy g.addV().as('a'). addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:00:00Z'). addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:30:00Z')....
M. alhaddad
M. alhaddad2/14/2024

Memory issue on repeat

I am traversing all nodes occuring in the same cluster given one of the nodes in that cluster. Surprisingly, after a depth limit im getting memory issues as showing in the image: Engine is Neptune 1.2.1.0...
Solution:
If using a t3.medium or t4g.medium instance, the amount of memory available for a query execution thread is very constrained. Memory allocation per thread increases as you go up in instance size until you get to the 4xlarge or 8xlarge sizes (at which point, memory allocation is at maximum per thread).
No description
funki
funki2/12/2024

Which database should i use for my DJ set planning software?

Hi, i want to develop a software that lets DJs plan a set (i.e. playlist) and i'm wondering if graph databases are the right way and if yes, which one to pick. The workflow is that the software automatically reads all tracks from the DJ software on the computer and the DJ has to tell the software which tracks mix well together. These links are called "transitions" and contain some data like difficulty grade, quality rating and notes. The software should then assist DJs when selecting tracks (or specifically the next track) for a playlist by considering the "non-repeating transition depth" of a given track....
Solution:
Sorry we missed this question. I think I could see how a graph could fit here. It seems like a sensible use case. As for the graph database to choose, i tend to almost always suggest that If you are new to graphs you should just start with TinkerGraph. It will help you get started with the least amount of pain. Once you understand it, learn some Gremln and get to know the features and capabilties of other graph databases then you can make the switch. For the most part, you should be able to make...
austinjb32
austinjb322/12/2024

How will i add unique values to the vertices or edge properties in Neptune

I can't get a doc regarding adding unique data through gremlin. Is there any way to do it, other than the preset unqiue which is available only in field id
Solution:
if by "unique data" i can't help wondering you're looking for some mechanism to define constraints on a property key. if so, there is no such feature for Neptune. You would have to contrive some system for ensuring uniqueness on your own. There are graphs that have full schema support like JanusGraph that can use an index to support uniqueness: https://docs.janusgraph.org/schema/index-management/index-performance/#index-uniqueness
austinjb32
austinjb322/9/2024

Not getting result in hasId() but id().is() works

I don't get any response using g.V().hasId(48). But when i use g.V().id().is(48). it shows output. So, how will i use hasId(). I'm a beginner. I don;t have much idea in it
Solution:
It's different queries. g.V().hasId(48) or same g.V(48) should return Vertex with id == 48. g.V().id().is(48) can return 48 if Vertex with id == 48 present. great resource to start learning about Gremlin is https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html...
Paule96
Paule962/9/2024

dotnet `Enumeration has not started. Call MoveNext` if I try to enumerate over a result

I recently try to use gremlin to created a graph and query this graph. Currently I get it working to push data into the graph. 🎉 But now my problem is to get data back from the graph. What is working is something like that: ```csharp...
Solution:
You're on .NET 8, right? This is unfortunately a known issue which will be fixed in the next release: https://issues.apache.org/jira/browse/TINKERPOP-3029 I'm afraid you'll probably have to use .NET 7 until the next release...
No description
austinjb32
austinjb322/9/2024

I can't create an edge in aws neptune using gremlin. I can create vertices. but not edge.

import { driver, process as gremlinProcess, structure } from "gremlin"; async function checkOut() { const DriverRemoteConnection = driver.DriverRemoteConnection; const Graph = structure.Graph;...
criminosis
criminosis2/5/2024

Iterating over responses

I've got a query akin to this in a python application using gremlin-python: ``` t = traversal().with_remote(DriverRemoteConnection(url, "g")) \ .V().has("some_vertex", "some_property", "foo").values()...
Solution:
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response
Haieketoun
Haieketoun2/1/2024

AWS Neptune updating gremlin driver to 3.6.2 introduced many bugs to working queries

After updating Amazon Neptune engine version from 1.2.0.2 to 1.2.1.0 and the Gremlin.Net (C# nuget) driver from 3.5.2 to 3.6.2, suddenly queries started throwing exceptions, specifically exceptions about serialization errors. To pinpoint the cause, I downgraded the Gremlin.Net driver to 3.5.2 while leaving the engine version updated to 1.2.1.0, the queries started working like before. The problem is, according to the AWS documentation, the minimum required Gremlin.net version is 3.6.2 Will there be a problem keeping the Gremlin.net driver version to 3.5.2? Will there be any side effects?...
No description
mle
mle1/31/2024

vertex-label-with-given-name-does-not-existERROR with Janusgraph 0.5.3

vertex-label-with-given-name-does-not-exist ERROR with Janusgraph 0.5.3 while adding labels to vertices I get this error only when I enable storage.batch-loading=true. My schema.default is still set to default when checking from gremlin console. mgmt.get("schema.default") => default...
Solution:
Any reason why you need automatic schema creation (schema.default=none)? This feature is mainly intended for cases where you just want to try out JanusGraph and don't want to bother with creating a schema. But it's not intended for production use cases. The docs also discourage its usage in general: ...
criminosis
criminosis1/26/2024

Documentation states there should be a mid-traversal .E() step?

Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-traversal .V() step. But it seems like a possible copy paste error is in the Tinkerpop docs asserting a similar power exists for an .E() step? https://tinkerpop.apache.org/docs/current/reference/#e-step ```...
Solution:
Mid-traversal E() has been added to TinkerPop in version 3.7.0. JanusGraph v1.0 is based on TP 3.7.0, so have to support it.
Lyndon
Lyndon1/19/2024

Disabling strategies via string in remote driver

Is there a way to disable a strategy in a providers implementation without a reference to the class? For example, let's say StrategyA is in the providers implementation and I am in Python without access to this. Is there no way to do g.withoutStrategies("com.provider.strategies.StrategyA").V().<etc>()?...
Solution:
in Java, i think you can use TraversalStrategyProxy directly inside of withStrategies() but there is nothing analogous for withoutStrategies(). We probably should have a better way to do both of these things in the Gremlin language which really doesn't have a notion of classes and such.
cdegroc
cdegroc1/17/2024

LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking

👋🏻 Hi all! In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278388334/1195313165278388334), we were investigating if TreeStep could be used jointly with bulked traversers so as to improve traversal time. Based on answers there, it looks like TinkerPop's LazyBarrierStrategy explicitly excludes "path-tracking" traversals (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L85) and won't insert NoOpBarrierSteps in those cases, preventing us from bulking traversers....
RN
RN1/11/2024

Is there a way to store the tinkerpop graph in DynamoDB?

AWS provides Neptue graph database but problem with it is that it is not distributed and can't be horizontally scaled like DGraph etc. So I was wondering as DynamodDB is distributed database, if there is a way to store tinkerpop graph in DynamoDB directly?
Solution:
TinkerPop, in general, can be designed to use nearly any back-end. You just need a storage plugin for it. To make it performant, it would also require overriding many of the underlying query execution operators to make sure they are fetching data from DynamoDB table(s) efficiently. TinkerGraph is a reference implementation of this where the storage medium is in-memory hashmaps for both vertices and edges. In practice, most people start with reviewing the code for TinkerGraph as a starting point for creating support for other storage mediums. Once upon a time there was an implementation of TinkerPop called Titan (later became the basis of DSE Graph) that had a storage plugin that worked with DynamoDB. Someone later forked it and added support for such for JanusGraph (another TinkerPop implementation). The plugin is still out there, but hasn't been supported/maintained. https://github.com/amazon-archives/dynamodb-janusgraph-storage-backend JanusGraph, itself, supports a Cassandra backend. We have seen a few folks attempt to use JanusGraph with Amazon Keyspaces (for Apache Cassandra). ...
borgirer
borgirer1/10/2024

Connection to Azure cosmos db using Go

Hi All, Asking this as a newbie to Graphs databases in general. I have been trying to connect to an Azure Cosmos Graph database using Apache Tinkerpop SDK for Golang, but am unable to proceed because I can't get past the websocket 1011 error while trying to execute Gremlin queries. Any help would be appreciated....
Solution:
Ah OK - yes. Cosmos DB does not support Gremlin bytecode. It might be worth looking at this documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/support They also seem to document that newer Gremlin clients will not work with CosmosDB. I suppose all you can really do is try it. Queries will need to be submitted using the Client.Submit.... type of approach though given "byte code" is not supported....
Lonnie VanZandt
Lonnie VanZandt1/9/2024

I met a man with seven wives, each of which had seven sacks.

I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container that can hold up to 500 items and I need to inform a number of men that they and their families can board my ship because I know that all the items in their families' sacks will fit it the container. There may be a few empty spaces, but I can't tell a man that he and his family can board if any of their items would overflow the container. How do I construct a query which selects men as long as all the items of in the 7 sacks of their 7 wives will fit. Here's the challenge: I don't know how many items are in each sack until the family is considered. I ask the men and their families to line up and then I board men until the container is nearly full or exactly full and where any of the items of the next family would assuredly not fit. Let's say we have Vertices for Item, Sack, Wife, and Man and Relations marriedTo, hasSack, and hasItem. Let's say we rank order Man by Lastname....
Lonnie VanZandt
Lonnie VanZandt1/7/2024

May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?

Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 and having read the recommended links on how to paginate the end of a query, I am wondering about how to manage large sets of traversals and large side-effect-collected collections which a query might be encountering or constructing as the graph is visited when the paths offer relatively large datasets after having been wisely filtered. For example, what is advised if one really does need to group by first-name all the followers of Taylor Swift (i.e. some exemplary uber-set) and wants to bag that for a later phase of a query which isn't the final collection that will be consumed by some external REST client? Yes, the final collect step can be easily paginated as advised - but what about all that earlier processing? What should we be thinking when we anticipate having 500,000, or 10x this, traversals heading into a group by - by - bye! barrier / collecting stage? Other than, "Punt" or "Run away!"?...
Solution:
There is not a feature in Gremlin directly that will directly handle this for you automatically but the drivers do let you stream back results instead of collecting them all at once which can help mitigate transferring large result sets. If you are using Amazon Neptune it also has a query results cache to assist with paging: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html#gremlin-results-cache-paginating
salman_walmart
salman_walmart1/3/2024

Integration tests for AWS Neptune DB

do we have any Testcontainers for AWS Neptune for writing integration tests in java applications
Bo
Bo1/2/2024

G.V() IDE can't visualize path().by(valueMap()) query

Hi @G.V() - Gremlin IDE (Arthur) sorry if this is a duplicate question. I am playing around with G.V() IDE and run into a problem: if I run a path() query, G.V() IDE could properly display the nodes and the edges, but since a path query itself does not return properties, G.V cannot display the properties. If I add .by(valueMap()) to the end of the query, then the Gremlin query result would include both the path and properties, but G.V IDE cannot visualize it. Is this a known problem? Thx!