Gremlin server plugin for running additional function on each vertex edge

Dear TinkerPop team, I'm currently trying to wrap my head around a specific problem I've been trying to solve for a few days. Basic overview: Vertices and edges can have a code field, e.g.: Vertex: a = True Edge: a == True While "travelling" over vertices, a custom script engine runs and saves the resulting variables with name and value into the current gremlin sack. When an edge is reached, the script engine evaluates the expression and tells gremlin to continue or break the current path. The script engine is written in Rust and works, the java bindings are not a problem either. What I can't figure out and would really appreciate your help with, is the gremlin part of this project. As far as I know, I have to create a gremlin server plugin like GremlinServerGremlinPlugin.java in the tinkerpop github repository, but if and how I can inject this custom functionality, is beyond me. Any hint would be highly appreciated. Thanks Volker
Solution:
you understand them properly, but you perhaps didn't connect their use to your case. you would define a TraversalStrategy that replaces steps that traverse vertex/edge data like out() or inE() with your own implementation for those steps. in that way you will have access to the vertex/edge Traverser as it passes through the step. i suppose you could also consider adding a special step that wraps those steps or follows them depending on your needs. i'm not sure which is best offhand.
Jump to solution
34 Replies
spmallette
spmallette2y ago
interesting. i think i need more context though to grasp what you are doing before i can offer any advice when you say "script engine" are you referring to an actual JSR-223 compliant ScriptEngine? or are you alluding to something else?
Volker
Volker2y ago
it's a script engine which atm handles a small subset of the python syntax the actual db is implemented with janus graph
spmallette
spmallette2y ago
but, you've explicitly implemented java ScriptEngine JSR-223 interfaces to build that? (like how we have GremlinGroovyScriptEngine?)
Volker
Volker2y ago
ah, no. sry if i communicated that incorrectly. the idea behind the project is not to extend the current query language ("developer" side), but to provide the "user" side the option to control the graph execution in python/c#/... like syntax
Volker
Volker2y ago
Volker
Volker2y ago
the a = True, a == True, a == False is just a text property stored on the vertex/edge which can be changed by the user and during (full) graph traversal, the script engine executes this code, updates the gremlin sack with the variables or tells the traversal instance to continue along the edge or break
spmallette
spmallette2y ago
ok, so you just refer to your custom processing there as a "script engine". that script engine evaluates a script stored as a property on the vertex or edge to control where Gremlin navigates? is that the rough idea?
Volker
Volker2y ago
yes
spmallette
spmallette2y ago
ok, so now i understand the second part of what you originally wrote a bit better.
The script engine is written in Rust and works, the java bindings are not a problem either.
could you explain where you hook in those java bindings to TinkerPop?
Volker
Volker2y ago
With that part, i meant that in my demo java<->rust project the bindings are working. i can't figure out where and how to hook into tinkerpop my idea looks like this:
Volker
Volker2y ago
Volker
Volker2y ago
so i should probably hook into the processing of each new vertex/edge?
spmallette
spmallette2y ago
I think you want to do this with a TraversalStrategy https://tinkerpop.apache.org/docs/current/reference/#traversalstrategy are you familiar with those at all?
Volker
Volker2y ago
I've stumbled across them while reading the documentation, but thought that they are just to mutate/verify the defined steps before execution thus don't have access to the specific vertex/edge data during execution am i mistaken?
Solution
spmallette
spmallette2y ago
you understand them properly, but you perhaps didn't connect their use to your case. you would define a TraversalStrategy that replaces steps that traverse vertex/edge data like out() or inE() with your own implementation for those steps. in that way you will have access to the vertex/edge Traverser as it passes through the step. i suppose you could also consider adding a special step that wraps those steps or follows them depending on your needs. i'm not sure which is best offhand.
spmallette
spmallette2y ago
I hope TraversalStrategy works for you. You might look at examples from JanusGraph or Neo4jGraph to get further inspiration. If you have further questions, consider posting in the #implementers channel since you're building extensions to TinkerPop (but here is fine too if you prefer this format)
Volker
Volker2y ago
Yes, thanks for the input. I'm currently trying to implement a solution with TraversalStrategy. I've also looked into implementing special steps, but thats seems to be rather complicated because I would have to replace quite a few classes on the client and server side. Althoug I'm still fighting with the actual Java implementation, especially the compilation (since my Java experience is limited), maybe you can tell me if what I have planned is feasible: AbstractGremlinPlugin that registers my own ScriptDecorationStrategy on the server, which replaces the "out" steps with a VertexStep wrapper, which just packs the VertexStep.flatMap() result into an Iterator wrapper, which then executes the script on iteration and just jumps over "invalid" edges public final class ScriptDecorationStrategy extends AbstractTraversalStrategy<TraversalStrategy.DecorationStrategy> implements TraversalStrategy.DecorationStrategy { @Override public void apply(Admin<?, ?> traversal) { traversal.getSteps().forEach(step -> { if (step instanceof VertexStep && ((VertexStep<?>) step).getDirection() == Direction.OUT && ((VertexStep<?>) step).returnsVertex()) { // returnsVertex is only needed because of the current db implementation, which I can't change :C // out -> change to scriptout step = new ScriptVertexStep<>(traversal, Vertex.class, Direction.IN, ((VertexStep<?>) step).getEdgeLabels()); } }); } } public class ScriptVertexStep<E extends Element> extends VertexStep<E> { public ScriptVertexStep(final Traversal.Admin traversal, final Class<E> returnClass, final Direction direction, final String... edgeLabels) { super(traversal, returnClass, direction, edgeLabels); } @Override protected Iterator<E> flatMap(final Traverser.Admin<Vertex> traverser) { return new ScriptFilterIterator<E>(super.flatMap(traverser)); } } public class ScriptFilterIterator<E extends Element> implements Iterator<E> { private Iterator<E> orig; private E next; public ScriptFilterIterator(Iterator<E> orig) { this.orig = orig; } @Override public boolean hasNext() { if (next != null) return true; return tryComputeNext(); } @Override public E next() { if (!hasNext()) { throw new NoSuchElementException(); } final E ret = next; tryComputeNext(); return ret; } private boolean tryComputeNext() { try { next = orig.next(); // Script execution /* * If Edge && false { * return tryComputeNext(); * } */ return true; } catch (NoSuchElementException ex) { next = null; return false; } } } If this would work, I don't have to add a special step and the developer still has the option to use out() normally, as long as there isn't a specific "script" parameter defined on the vertex, even if the strategy is enabled
spmallette
spmallette2y ago
you seem to be on the right track. i think a decoration strategy makes sense. i wonder if you should actually extend VertexStep though. i'm not sure of the implications offhand. in any case i'd say just follow the path you're taking for now and get things all working before making that choice as an aside, i'm curious. is this a personal project you are working on? will it be something made publicly available?
Volker
Volker2y ago
Ok, thanks for the confirmation. I'll continue to keep you updated on the progress. Sadly, this is not a private project. I'm developing this as a working student, but I try to convince my boss that it would be nice to release it as open source to improve the companies outreach and maybe get some free maintainers And thank you again for your help. I know how time consuming managing a community, answering everybody and developing the actual product can be
spmallette
spmallette2y ago
well, i will confirm again that this is a feature that folks ask about fairly often. in the past, i've experimented with it using Groovy scripts stored on vertices/edges and it worked reasonably well. the problem with Groovy are the security issues of running arbitrary scripts.
Volker
Volker2y ago
Yes, this was the number one risk for my boss aswell. My solution is a custom parser, compiler and (sandboxed) executor which works on a limited, developer configured subset of the preferred scripting language, thus giving the developer 100% control over what is possible to do
Lonnie VanZandt
Lonnie VanZandt17mo ago
What about this idea? Rather than have your graph engine be your function execution environment, store a call to a serverless function in a lambda step which the query only "executes" via making an in-flight network request? Then you have the liberty to tailor your function execution environment separately from whatever is hosting and running Gremlin TinkerPop. Whatever language or class libraries you need you provide in the serverless function environment. The cost, of course, is the network latency for that call to the lambda that calls the serverless function. If optimal query performance is your goal, this would be horrible. But if graph traversal triggered execution of user-supplied code is your goal, it might work conveniently.
spmallette
spmallette17mo ago
note that there is a call() step for this sort of functionality: https://tinkerpop.apache.org/docs/current/reference/#call-step
Volker
Volker17mo ago
Thanks for your input, sadly the solution has to work for a huge number of script executions in a short time This looks promising, I'll look into it once the first version is running I'm currently stuck installing the custom Gremlin server plugin into a Docker installation of Janusgraph. Adding it under scriptEngines:gremlin-groovy:plugins in janusgraph-server.yaml gets the plugin loaded, but the custom strategy is not executed for queries Init logs of the plugin and strategy: INFO org.apache.tinkerpop.gremlin.server.jsr223.ScriptingGremlinServerPlugin - Plugin loaded INFO org.apache.tinkerpop.gremlin.server.jsr223.ScriptDecorationStrategy - Strategy loaded but this is the output for a simple query: Traversal Explanation =============================================================================================== Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex), VertexStep(OUT,vertex)] RemoteStrategy [D] [RemoteStep(DriverServerConnection-localhost/127.0.0.1:8182 [graph=g])]
spmallette
spmallette17mo ago
what query are you sending to the server to get that explain?
Volker
Volker17mo ago
g.V(4224).out().out().explain()
spmallette
spmallette17mo ago
where are you automatically installing that strategy? is that in the config for Janus Server or something?
Volker
Volker17mo ago
in the GremlinPlugin.getCustomizers function I build the ImportCustomizer and before returning it, register the strategy: TraversalStrategies.GlobalCache.registerStrategies(Graph.class, TraversalStrategies.GlobalCache.getStrategies(Graph.class).addStrategies(ScriptDecorationStrategy.instance())); is this the wrong way to do this?
spmallette
spmallette17mo ago
does it work if you explicitly use it in your query? like g.withStrategies(ScriptDecorationStrategy.instance()).V(....?
Volker
Volker17mo ago
that seems to work
spmallette
spmallette17mo ago
well, at least its finding the strategy not sure offhand why it doesn't work if you do it the way you did it personally i would have configured it in the server startup script in the construction of "g". but its a bit curious it won't work through the GlobalCache actually ...come to think of it, that explain() you have is a remote. i think there is a weird thing with remote explains over bytecode. send a script to the server and see if you get a better explain output
Volker
Volker17mo ago
ok, enabled remote console gremlin> :remote console ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode without strategy explicitly enabled: gremlin> g.V(4224).out().explain() ==>Traversal Explanation =================================================================================================================== Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] ConnectiveStrategy [D] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] IdentityRemovalStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] MatchPredicateStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] FilterRankingStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] InlineFilterStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] IncidentToAdjacentStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] RepeatUnrollStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] ... the strategy is not applied, but if I enable it again explicitly: gremlin> g.withStrategies(ScriptDecorationStrategy.instance()).V(4224).out().explain() ==>Traversal Explanation =================================================================================================================== Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] ScriptDecorationStrategy [D] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)] it's applied so it seems like the strategy is not automatically applied for the travelsource on the server, right?
spmallette
spmallette17mo ago
strange are you truncating the output at all? why does the addition of ScriptDecorationStrategy remove all the other strategies?
Volker
Volker17mo ago
ah sorry, it does not. i just truncated the rest for clarity it executes the same strategies as without the ScriptDecorationStrategy so after some debugging, I found the problem: JanusGraph registers it's own StandardJanusGraph Graph, which clones the standard Graph strategies, but is loaded before my plugin. To be safe, I just get the private GRAPH_CACHE from GlobalCache, iterate all entries and add my strategy to each of them. This seems to work