Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join

I am not sure how to use mergeE and mergeV using gremlin_python

I am using janusgraph with cassandra for my project and I want to implement upsert functionality. For that I didn't find much resources, only this gist (https://gist.github.com/spmallette/5cd448f38d5dae832c67d890b576df31) . From that, I wrote certain code which does work. The code looks like this ``` For vertex/node g.merge_v({T.id: node.id}).option( Merge.on_create, {...

Gremlin-JavaScript Global Websocket

I’m just wondering if there is a reason the gremlin-JavaScript 3.7 branch does not pass options/headers to the globalThis.WebSocket if defined?
Solution:
Here is my attempt at solving this: https://github.com/apache/tinkerpop/pull/2968...

How to dynamically add custom Steps at runtime?

I am maintaining a graph engine service that provides different products. I have found that some complex scenarios cannot get query results through native Steps. Special business logic operations are required to implement custom special syntax through business code. The current custom syntax is done through @GremlinDsl, which is generated at compile time and the code is written in my own service. I hope that this special syntax can be defined by the product itself and pushed to the specified dir...
Solution:
You can use Groovy for all manner of trickery (AST manipulation, basic metaprogramming APIs, etc), but I'd take care with depending on it given the security risks associated with it. TinkerPop has been slowly moving away from it, first with bytecode based requests and more currently with the Gremlin ANTLR grammar. There is a general consensus that the latter will ultimately replace groovy/bytecode going forward into the future. I'm not sure if any of that impacts your decisions on how to maintai...

Graph computer question

In some cases TinkerGraphComputer removes duplicates from input, is this a bug or a feature? For example gremlin> g.V(1,1).count() ==>2 gremlin> g.withComputer().V(1,1).count()...
Solution:
i'd say this is the bug:
gremlin> g.V(1,1).count()
==>2
gremlin> g.V(1,1).count()
==>2
i wonder when that got introduced..........

TypeScript incomplete declaration of Traverser

This is a bit of a small (and probably dumb) question, as I'm new to TypeScript. I'm having trouble compiling my TypeScript code, translating it from Javascript. I see that toList returns an instance of Promise<Traverse[]>, which is fine and dandy, but it looks like several bits of documentation have get as an method to retrieve a specific object. ``` const { graph: orgDb, client: nClient } = connectNeptune(); const dept_and_div = (await orgDb.V().has('email', "xxx@yyy.com") .inE('manages').outV().valueMap()...

How to create indexes by Label?

In search of performance improvements, the AWS Neptune experts suggested that I create some indexes. To better contextualize, I have 3 operations in a single POST endpoint with the database. A query of previous data bringing the relationships of a specific ID, a deletion of edges if there is a registration in the database and a registration/update of vertices and edges. Today I am trying to attack two problems. Improve the performance of the creation that takes approximately 150ms and improve the performance of the query that is currently bogging down between 1.2-17 seconds.
Is it possible to create an index for vertexes and edges by specifying them by label since I have vertices and edges with different labels that have different properties? Does anyone know what this implementation would look like? In my current implementation I do it in a simple way as follows: ...
Solution:
just by calling g.V().limit(1) with concurrent calls on an r6g.2xlarge machine, the average time is 250ms
How may concurrent calls? An r6g.2xlarge instance has 8 vCPUs (and 16 available query execution threads). If you're issuing more than 16 requests in parallel, any additional concurrent requests will queue (an instance can queue up to 8000 requests). You can see this with the /gremlin/status API (of %gremlin_status Jupyter magic) with the number of executing queries and the number of "accepted" queries. If you need more concurrency, then you'll need to add more vCPUs (either by scaling up or scaling out read replicas).
But in the query mentioned, the bottleneck starts at the stage where it calls the last otherV() before path(). ...

Parameterized edges creation in existing graph

Hi :gremlin_smile: , I'm currently experimenting with Janusgraph. My graph is a directed hierachical graph coming straight from parsing an XML file. After this first bulk load, I want to add multiple new edges between vertices to create shortcuts or remove property duplication. This was easily done using Cypher and a double MATCH but struggle to do the same thing in Gremlin. I created a small dataset in Gremlify https://gremlify.com/jf036ue70jj/4 ...
Solution:
You've stumbled upon a common gap in Gremlin... has() steps cannot currently take a traversal as an argument. It's listed as a roadmap item for a future TinkerPop 4.x release: https://github.com/apache/tinkerpop/blob/087b3070914123055d3e4ededc2550f12715a0b4/docs/src/dev/future/index.asciidoc#has-traversal

Neo4j Chypre convention in to gremlin query

We are trying to convert the Neo4j chypre query into a Gremlin query, but we are stuck on some extract methods in the chypre query that need to be converted into the Gremlin query. The chypre query follows: `` MATCH (from: Person {title: "John"}), (to: Location` {title: "New York City"}) MATCH p = (from)-[rel*..5]->(to)...

How to Work with Transactions with Gremlin Python

I`m trying to implement transactions but I have two scenarios. I start a transaction but when I use iterate on every add_v it saves on my gremlin_server before the commit. The second situation is if if take out the .iterate() and run a commit() it doenst save on gremlin-server. What am I doing wrong?...
Solution:
If you're looking to optimize for write throughput on Neptune, you want to consider the following: - For each write requests, attempt to batch 100-200 "object" into a single write request/query. An "object" would be any combination of a vertex, edge, or subsequent vertex/edge properties (vertex with 4 properties == 5 "objects"). - Use parallel write requests. If using Python, consider using multiprocessing to create separate processes. They can share a connection pool to Neptune if you so choose. The number of parallel processes should equal the number of query execution threads available on your Neptune writer instance (which is equal to 2x the number of vCPUs on whatever size instance you're using). If you follow those guidelines, you should get similar performance to what you would see with Neptune's bulk loader. Note that conditional writes will have overhead. If using mergeV(), you're unlikely to see the same write throughput as Neptune's bulk loader as the bulk loader is not doing conditional writes....
No description

mergeV with onMerge when extra properties are unknown

I'm in the following situation: ``` jobId = "spark:bdx_job_1" ...

Using java/gremlin inside python with Jpype!

I recently experimented with using Jpype to give the python world at my day job access to Sqlg. It seems a very easy and powerful way to give python code full access to the any java api. In my case I am making SqgGraph available to python. It is about 5 lines of setup code and voila, the python code has the same functionality as native java. Does anyone use Jpype, anything caveats I should know about?...

Structure Test Suite - Test Data Types and Serialization Types Don't Match?

This issue is based on some assumptions I've and knowledge from my team. Correct me please if any of it is wrong or misguided. An ongoing thing we're doing is better supporting the structure testing suite and having a more accurate features list for our Graph. Array types are supported by our Graph, and the way we handled it is by using Lists, since when GLVs serialize property values of type array or list they come in as an ArrayList. However, the structure test suite, namely PropertyTest, sends the property value type directly to the graph as int[]{1, 2, 3} for example which breaks our Graph since we only expected ArrayList due to the expectation of serialization....
Solution:
From my understanding, the structure tests were suited for embedded graphs, whereas the feature tests should cover the remote cases, which might imply opting out of those tests. However, do note that the remote feature tests might not cover all cases at this point, though there is intention to make it more well-rounded.

What's the significance of done: false ? (after calling .next())

Hi, I've encountered a query that I execute, and it usually never returns "done" false. But in a specific case, it does. I run 2 queries, and sometimes I'm not calling .next() or any terminal steps. ...

Profiling Neptune from javascript

Hi, I'm looking to profile some existing gremlin queries via Neptune so I can understand the current performance and then optimise. Looking at the documentation, there isa http endpoint at "<endpoint>:<port>/gremlin/profile. I've been able to access this through curl, sending a serialised string....
Solution:
Profiles for Gremlin queries either require using the HTTPS endpoint: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html Or you can use the AWS SDK: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/neptunedata/command/ExecuteGremlinProfileQueryCommand/ There is no way to get this profile using the Gremlin drivers. ...

select T.id + optional properties

I am trying to work out how to select vertex id and some optional properties. select().by does not work as it filters out not productive properties. Here is the sample graph I am testing this on. ...

Is there a way to specify a query execution timeout via the GremlinLangScriptEngine?

I'm adding a way to specify a query timeout when running queries via G.V(). On the G.V() playground which uses TinkerGraph internally, we "submit" queries directly to the in-memory graph via the GremlinLangScriptEngine. Is there an equivalent of adding a timeout as seen in the Client object via the RequestOptions?
Solution:
no, it's just like standard ScriptEngine implementations in that it operates in the current thread without interrupt. we'd wrapped the GremlinScriptEngine up into the GremlinExecutor to try to generalize behavior for timeouts and Future based execution. you would have to use that class to get that sort of behavior and avoid direct use of the GremlinLangScriptEngine directly.

What algorithms exist for this hypergraph data structure?

This is very minimal, but it hints at a type of ontology structure and software system I want to develop. Does it remind you of any known, studied data structures and algorithms? ```python ontology = set()...

Basic vertex querying does not work in Amazon Neptune but it works with local Gremlin Server

``` const fankode : any = await this.gremlinService.readClientSource .V( profileId ) .hasLabel( 'FAN' ) .next();...
Solution:
The gremlin-javascript driver deserializes the elementMap() step into the Map class. await this.gremlinService.readClientSource.V().elementMap().toList() will return an Array of Maps. JSON.stringify() , which NestJS is likely calling for you, doesn't support Maps so you need to convert them into objects using something like Object.fromEntries()....

CollectingBarrierStep bug

Solution:
Overriding this function seems to fix ``` @Override public Traverser.Admin<Vertex> processNextStart() {...

pymogwai

https://github.com/juupje/pyMogwai is a an attempt for a python native implementation of the gremling query language there is a demo at: https://mogwai.bitplan.com/ Comments/Issues and Feedback are welcome!...