AT
Apache TinkerPopcriminosis
Iterating over responses
I've got a query akin to this in a python application using
gremlin-python
:
some_property
is indexed by a mixed index served by ElasticSearch behind JanusGraph, with at least for the moment about 1 million entries. I'm still building up my dataset so foo
will actually return about 100k of the million, but future additions will change that.
If I do the query as written above it times out, presumably it's trying to send back all 100k at once? If I do a limit of like 100, it seems like I get all 100 at once (of course changing t.next()
to instead being a for loop to observe all 100).
In the Tinkerpop docs there's mention of a server side setting resultIterationBatchSize
with a default of 64. I'd expect it to just send back the first part of the result set as a batch of 64, and I only print 1 of them, discarding the rest.
The Gremlin-Python
section explicitly calls out a client side batch setting:
But I'd expect that to just be something if you're wanting to override the server side's default of 64?
Ultimately what I'm wanting to do to is have some large result set requested, but only incrementally hold it in batches on the client side without having to hold the entire result set in memory on the client side.Solution:
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response
spmallette•100d ago
sorry - i dont' think anyone noticed this question for some reason. unfortunately, the drivers don't work they way you're hoping. the server will continue to stream results according to batch size. it doesn't hold and wait for the client to ask for the next batch. i don't like to recommend this, but with JanusGraph, I guess you could use a session. you'd send a script to the server like:
you'd process those results. then on the next request you could do:
and keep doing that until you exhaust
t
. you'd want to do some sort of checking to ensure that your script doesn't end in a NoSuchElementException
but you get the idea I hope. i don't like to recommend it because that approach only works if the server is using Groovy to process Gremlin scripts and not all graphs do that so your code loses portability. furthermore, I think we will be leaning even further away from Groovy in coming versions and this approach likely won't be available at some point in the future.criminosis•100d ago
Out of curiosity what's the batch intended to be used for then if not for batch size to the client? The graph provider from its backing data store?
Solution
spmallette•100d ago
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response
criminosis•99d ago
Got it. Thanks @spmallette
Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.
1.3KMembers
View on DiscordWant results from more Discord servers?
More PostsAWS Neptune updating gremlin driver to 3.6.2 introduced many bugs to working queriesAfter updating Amazon Neptune engine version from 1.2.0.2 to 1.2.1.0 and the Gremlin.Net (C# nuget) vertex-label-with-given-name-does-not-existERROR with Janusgraph 0.5.3vertex-label-with-given-name-does-not-exist
ERROR with Janusgraph 0.5.3 while adding labels to vertiDocumentation states there should be a mid-traversal .E() step?Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-travDisabling strategies via string in remote driverIs there a way to disable a strategy in a providers implementation without a reference to the class?LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking👋🏻 Hi all!
In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278Is there a way to store the tinkerpop graph in DynamoDB?AWS provides Neptue graph database but problem with it is that it is not distributed and can't be hoConnection to Azure cosmos db using GoHi All, Asking this as a newbie to Graphs databases in general.
I have been trying to connect to an I met a man with seven wives, each of which had seven sacks.I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 aIntegration tests for AWS Neptune DBdo we have any Testcontainers for AWS Neptune for writing integration tests in java applicationsG.V() IDE can't visualize path().by(valueMap()) queryHi @G.V() - Gremlin IDE (Arthur) sorry if this is a duplicate question. I am playing around with G.VBeginner Gremlin QuestionsHello - I am trying to do an Advent of Code challenge as a graph problem to learn some Gremlin, and G.V Graph Playground: Gremlin client@G.V() - Gremlin IDE (Arthur) Quick question: Does G.V graph playground allow adding vertices & edgeSplitting a query with range()I have a Gremlin Query that starts simple (one Label), and then branches out to many different paths