AT
Apache TinkerPopLonnie VanZandt

SideEffect a variable, Use it later after BarrierStep?

I seek a query that builds a list and then needs to both sum the list's mapped values and divide the resulting sum by the count of the original list. This would be the mean() step - if the mapped list was still a Gremlin traversal object that offered mean(). However, the mapped list is, by that time, a Groovy list and mean() is no longer available. A groovy list can be averaged by reducing it to it sum and then dividing by the count of the list - but to get the count of the list, a separate reference to the list is needed or the count has to have been cached before. Therefore, is there a way to begin a query, sideEffect a value into a Groovy variable, complete the query, pass through a barrier step, and then divide the saved count?
g.E().
has('LABEL', 'PROP', 'MYPROPVAL').
values('createdDate').
order().by(Order.asc).
// would like to sideEffect cache the count of this set here
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
toList().
collate(2, true).
collect{ it -> ( it[1]-it[0] ) / 1000 }.
inject( 0 ) { accum,it -> accum + it }
// would like to divide the BigDecimal here by half the count
g.E().
has('LABEL', 'PROP', 'MYPROPVAL').
values('createdDate').
order().by(Order.asc).
// would like to sideEffect cache the count of this set here
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
toList().
collate(2, true).
collect{ it -> ( it[1]-it[0] ) / 1000 }.
inject( 0 ) { accum,it -> accum + it }
// would like to divide the BigDecimal here by half the count
Solution:
It can be done with all Gremlin steps in 3.7.1 where you have date functions available. Assuming: ```groovy g.addV().as('a'). addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:00:00Z'). addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:30:00Z')....
Lonnie VanZandt
Lonnie VanZandt95d ago
If there is a way to have Gremlin traverse a collection and to operate on adjacent pairs (to get the delta between creations) without having to fall into Groovy, then I could stay in Gremlin and use mean() at the end of that process, directly. I think I could "slide" a sideEffect "cursor" as the traversal gets collected to perform the map() in-line - but it wasn't as pretty as Groovy collate.
Solution
spmallette
spmallette94d ago
It can be done with all Gremlin steps in 3.7.1 where you have date functions available. Assuming:
g.addV().as('a').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:00:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:30:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T01:00:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T01:30:00Z').iterate()
g.addV().as('a').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:00:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T00:30:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T01:00:00Z').
addE('link').from('a').to('a').property('createdDate', '2023-01-01T01:30:00Z').iterate()
You can use the "partition" pattern i came up with for a talk I have a long time ago about DSLs. It effectively does the same as a Groovy collate(). Combining that with the neat new date functions I think this gives you the ability to do the mean() you want at the end:
gremlin> g.E().values('createdDate').asDate().fold().
......1> emit().
......2> until(__.not(unfold())).
......3> repeat(skip(local,2)).
......4> filter(unfold()).
......5> limit(local,2).as('dts').limit(local,1).
......6> dateDiff(select('dts').skip(local,1).unfold()).
......7> mean()
==>-1800.0
gremlin> g.E().values('createdDate').asDate().fold().
......1> emit().
......2> until(__.not(unfold())).
......3> repeat(skip(local,2)).
......4> filter(unfold()).
......5> limit(local,2).as('dts').limit(local,1).
......6> dateDiff(select('dts').skip(local,1).unfold()).
......7> mean()
==>-1800.0
Here's a link to the slides for the talk called "Extending Gremlin with Foundational Steps" in case you're interested: https://www.slideshare.net/StephenMallette/extending-gremlin-with-foundational-steps
SlideShare
Extending Gremlin with Foundational Steps
Extending Gremlin with Foundational Steps - Download as a PDF or view online for free
Lonnie VanZandt
Lonnie VanZandt94d ago
Whether or not it works, I love the opportunity to study the pattern. To try to understand fold.emit.until(__.not(unfold()).repeat(skip(local,2) Also, I have to work with older Tinkerpop. The date methods would be nice.
spmallette
spmallette94d ago
nice to see you appreciate some of the finer points of Gremlin :gremlin: 🙂
Lonnie VanZandt
Lonnie VanZandt94d ago
I do have this sense that "local" is like some magical token that can be dropped into more formal settings to solve challenges that would otherwise take substantially more method calls. I might have chosen "doIntendedVsDocumented" as the string for that keyword. I have to go study your query to really grok it.
spmallette
spmallette94d ago
well, local does spare you an unfold() at minimum which is great when you want to preserve a list structure. my example doesn't preserve it but hopefully you get the idea:
gremlin> g.inject([1,2],[3,4]).map(unfold().sum())
==>3
==>7
gremlin> g.inject([1,2],[3,4]).sum(local)
==>3
==>7
gremlin> g.inject([1,2],[3,4]).map(unfold().sum())
==>3
==>7
gremlin> g.inject([1,2],[3,4]).sum(local)
==>3
==>7
spmallette
spmallette94d ago
also, we've overloaded the word "local" a bit more than we should have perhaps. don't confuse local() step with Scope.local. it's worth noting that local() step has some interesting properties: https://stephen.genoprime.com/snippet/2020/04/25/snippet-10.html and https://stephen.genoprime.com/snippet/2023/04/13/snippet-15.html
stephen mallette
Use of local()
One of the more misunderstood, misused or simply unknown parts of the Gremlin language is local()-step. Its purpose is to execute a child traversal on a single element within the stream. In the following example limit(2) is applied to the stream in a global fashion and thus only two results are returned:
stephen mallette
Use of local() Revisited
In recent weeks there have been many questions about the local()-step both in Discord and in TinkerPop’s JIRA. I’d written on this topic once before in Use of local() but it perhaps begs a further revisit. The example that arose from Discord helped inspire this post as it was sufficiently primitive to hopefully clarify usage.
Lonnie VanZandt
Lonnie VanZandt94d ago
Ok, it's a functional beauty. // fetch the sequence of date values and sort them chronologically from earliest to most recent g.E().values('createdDate').asDate().order().by( Order.asc ) // wrap the sequence as a single collection .fold(). // generate a sequence of shortening collections, each collection will be two elements shorter than the prior list emit(). // each iteration, yield the contextual collection until(__.not(unfold())). // just keep going until we run out into the empty set repeat(skip(local,2)). // each iteration, make sure to chop off the head and the penultimate head // remove the last collection that is the empty set filter(unfold()). // now, for each collection in the sequence of collections // fetch the head and penultimate head, memoize the first as dts and step forward one limit(local,2).as('dts').limit(local,1). // given the current date entry, diff it from the prior one excluding the one after it dateDiff(select('dts').skip(local,1).unfold()). // we now have a sequence of deltas, compute their mean mean() I'm intrigued. When this plays out in real time, do we actually see it resolve down to finding a set of dates and then picking from that one set, a pair at a time, to contribute to an accumulating sum? Or, no, we would see it making numerous copies of large (but shrinking) collections over and over only to shed most of the list to obtain the leading pair?
spmallette
spmallette93d ago
I think this better fits the second scenario you describe. You make one large list after the first fold() and then you have the semantics of a do/while with the repeat() where that big list is being winnowed down 2 items at a time, so that when you do limit(local, 2) you are only grabbing the leading pair. not spectacularly efficient, but it was nice for my demonstration in my talk and for cases where you don't have a massive list to partition it could work reasonably well. As I was trying to answer your question I tried to see if It could be done without the initial fold() because I think I did that purposefully for the context of my talk all those years ago. In your case, you don't really need it. I spent some more time thinking on it this morning and can't really come up with a way. I think that in Gremlin collate() (or as i called it in my talk partition() step), I feel like we would probably call it take() that would let you do:
gremlin> g.inject(1,2,3,4).take(2)
==>[1,2]
-->[3,4]
gremlin> g.inject(1,2,3,4).take(2)
==>[1,2]
-->[3,4]
It feels like limit() but that's a filter step so when you limit(2) it just chops out the rest of the stream. take() would let you process the whole stream in chunks. @Kelvin Lawrence do you have any tricky ways to do a "take()" as shown above beyond my earlier pattern that I described with fold()...repeat()? i suppose a Scope.local version of take() would probably exist too, though that sounds weird. partition() or collate() seems to make more sense in a List context. i guess you could "partition" or "collate" a stream too. eh...i dunno: https://issues.apache.org/jira/browse/TINKERPOP-3057
Lonnie VanZandt
Lonnie VanZandt93d ago
Stephen, I think this is not right. And it is my fault for how I proposed the setup. I want to compute the difference in time between each date in the set from its neighbor - and not compute the differences between pairs that form a set of pairs. For example, with A..B..C..D, I want 3 differences, not 2. That is, B-A, C-B, and D-C. Which means, btw, I was pretty dim. The average difference in a long partially ordered sequence like that is just (max - min)/num. Here's that query:
g.E().
has('Relation', 'container', 'MYVAL').
values('createdDate').
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
order().by(Order.asc).
fold().as( 'list' ).
project( 'oldest', 'newest', 'count' ).by( limit(local, 1) ).by( tail(local, 1) ).by( count(local) ).select( values ).
map{ it -> { t -> ( t[1] - t[0] ) /1000 /t[2] }( it.get() ) }
g.E().
has('Relation', 'container', 'MYVAL').
values('createdDate').
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
order().by(Order.asc).
fold().as( 'list' ).
project( 'oldest', 'newest', 'count' ).by( limit(local, 1) ).by( tail(local, 1) ).by( count(local) ).select( values ).
map{ it -> { t -> ( t[1] - t[0] ) /1000 /t[2] }( it.get() ) }
I'm sure that the projection and the reduction could be more Gremlin-idiomatic - but it is hard to diagnose why my "by" clauses are not tolerated by the engine. Getting the right class of object to those is not always intuitive. I can see that I resort to "aw heck, give me the collection as a Groovy/Java list and I can revert to that language to do what I want". Related, yes, Scala has a take(n) concept. BTW, does or could Tinkerpop map() come to accept by() clauses such that we could feed multiple arguments into the map lambda? For pseudo-example: something().map{ x,y,z -> x+y+z }.by( method-to-get-x ).by( method-to-get-y ).by( method-to-get-z ) The idea is very similar to how "math()" is implemented. The difference is that "math" is opinionated about the transformation. It is going to take parameterized inputs and try to construct a valid mathematical expression and then compute the result. Map-By-By would allow the users to construct their own parameterized transformations. Without the anonymous applied lamba trick that I use (see above where I subtract and then divide given an incoming member instance).
spmallette
spmallette93d ago
it's not quite the same thing. the number of arguments to the closure need to be understood at a compile time. not at run time. besides i think that would violate the intention of map() which is a 1:1 transformation
Lonnie VanZandt
Lonnie VanZandt93d ago
my intent is still 1-to-1 (many to 1 would be a reduce operation). In the example query above, it only yields 1 result - like a reduction - because the stream feeding into that map has been constructed to yield only 1 member. A 1:1 map with a multiple-argument closure is very useful - and is often easier to analyze than is a single-argument closure that happens to include external state via variable reference where the variables are conveniently available in the closure's context. The number of arguments would be explicit - just as they are for "math". The author must provide a by clause for each parameter - or it must be very intuitive how Gremlin would dynamically resolve "implicit" parameters from arbitrary queries. (Scala loves these nasty implicits.)
spmallette
spmallette93d ago
maybe i'm not following what you're saying exactly, but at least in Java, the map() step signature for a closure is map​(Function<Traverser<E>,​E2> function). To do two arguments you would also define some sort of BiFunction signature. For three arguments a TriFunction and so on. I don't see how n number of by() clauses as you describe can work with this model. map() is meant to apply a map function to the current Traverser object, so I'm not even sure what the second, third, etc argument would be. fwiw, TinkerPop doesn't really promote the use of closures/lambdas. They don't work for every programming language or for every graph database. They also bring performance and security problems for certain cases that relate to Gremlin Server. Given those drawbacks, I don't think we'd be looking to expand their capabilities all that much.
Lonnie VanZandt
Lonnie VanZandt91d ago
I agree with you on how it might look for Java. Because Gremlin's math step is very similar to what I intend, I ask how multi-variate mathematical steps with by clauses are done in Gremlin. Why is it not apparent how convenient it would be to perform an arbitrary transformation to each element of a sequence where the arguments of the applied function include not only the current element but one or more additional arguments which are related to the current element? I appreciate the Beauty of a language but I am not adverse to be practical and to use language features when they exist. Gremlin supports Groovy lambdas and there are times when they get the job done. Sometimes, getting an answer at all is better than not being able to get an answer because it wasn't optimized for performance. I would argue against removing lambda support from Gremlin, simply because performance stinks at times. the secondary arguments would all be the result of traversals which begin at the traversal arriving at the map-by-by step, where each secondary traversal is determined by the corresponding by clause. One could, of course, provide constant() or cap() values.
spmallette
spmallette90d ago
I ask how multi-variate mathematical steps with by clauses are done in Gremlin
no by() operations in Gremlin produce a syntax where there the number of actual arguments to the closure change. math() is the same as map() - it is effectively taking the current Traverser<E> and returning E2 just like map(). The MathStep.java implementation even extends MapStep.java to implement what it does. The by() steps merely apply transformations to the Traverser argument to produce different "variables". These variables for math() are then evaluated through a math expression library to produce the output so you can dynamically determine the number of arguments at run time. I don't see how it is possible to do that for map() in the same way because it requires a compile time closure definition for an argument. i suppose you could add some builtin versions like map(BiFunction), map(TriFunction), etc. to some top number of arguments, but that would limit the number of by() you could have and isn't the same as what you've been talking about with math() from my understanding. Personally, I don't favor those overloads. If the developer has chosen to use a closure and map() because Gremlin couldn't do quite what they wanted, then they have already chosen a non-Gremlin way to handle a task. With the entire JVM available to them for processing their data in that closure, I don't feel like they need any extra help from Gremlin at that point to use that closure.
I would argue against removing lambda support from Gremlin
i wouldn't be in favor of removing them completely, just from languages that don't support them natively, like Python, Go, Javascript and C#. I'd also think it smart to remove them as a default for remote scripting in Gremlin Server.
Lonnie VanZandt
Lonnie VanZandt90d ago
This is precisely what I understand: The MathStep.java implementation even extends MapStep.java to implement what it does. The by() steps merely apply transformations to the Traverser argument to produce different "variables". What I ask is, how about making that generally available to MapStep itself? Let the author of the query decide whether or not that is business useful. It is true that the unpacking of the Traversal, the "it", can be done instead the lambda/closure - but sometimes it is more convenient to use the Gremlin language and method to do that, immediately before entering the Groovy/Java context.
Want results from more Discord servers?
Add your server
More Posts
Memory issue on repeatI am traversing all nodes occuring in the same cluster given one of the nodes in that cluster. SurpWhich database should i use for my DJ set planning software?Hi, i want to develop a software that lets DJs plan a set (i.e. playlist) and i'm wondering if graphHow will i add unique values to the vertices or edge properties in NeptuneI can't get a doc regarding adding unique data through gremlin. Is there any way to do it, other thaNot getting result in hasId() but id().is() worksI don't get any response using g.V().hasId(48). But when i use g.V().id().is(48). it shows output. Sdotnet `Enumeration has not started. Call MoveNext` if I try to enumerate over a resultI recently try to use gremlin to created a graph and query this graph. Currently I get it working toI can't create an edge in aws neptune using gremlin. I can create vertices. but not edge.import { driver, process as gremlinProcess, structure } from "gremlin"; async function checkOut() {Iterating over responsesI've got a query akin to this in a python application using `gremlin-python`: ``` t = traversal().wAWS Neptune updating gremlin driver to 3.6.2 introduced many bugs to working queriesAfter updating Amazon Neptune engine version from 1.2.0.2 to 1.2.1.0 and the Gremlin.Net (C# nuget) vertex-label-with-given-name-does-not-existERROR with Janusgraph 0.5.3vertex-label-with-given-name-does-not-exist ERROR with Janusgraph 0.5.3 while adding labels to vertiDocumentation states there should be a mid-traversal .E() step?Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-travDisabling strategies via string in remote driverIs there a way to disable a strategy in a providers implementation without a reference to the class?LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking👋🏻 Hi all! In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278Is there a way to store the tinkerpop graph in DynamoDB?AWS provides Neptue graph database but problem with it is that it is not distributed and can't be hoConnection to Azure cosmos db using GoHi All, Asking this as a newbie to Graphs databases in general. I have been trying to connect to an I met a man with seven wives, each of which had seven sacks.I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 aIntegration tests for AWS Neptune DBdo we have any Testcontainers for AWS Neptune for writing integration tests in java applicationsG.V() IDE can't visualize path().by(valueMap()) queryHi @G.V() - Gremlin IDE (Arthur) sorry if this is a duplicate question. I am playing around with G.VBeginner Gremlin QuestionsHello - I am trying to do an Advent of Code challenge as a graph problem to learn some Gremlin, and G.V Graph Playground: Gremlin client@G.V() - Gremlin IDE (Arthur) Quick question: Does G.V graph playground allow adding vertices & edge