Iterating over responses

I've got a query akin to this in a python application using gremlin-python:
t = traversal().with_remote(DriverRemoteConnection(url, "g")) \
.V().has("some_vertex", "some_property", "foo").values()
print(t.next())
t = traversal().with_remote(DriverRemoteConnection(url, "g")) \
.V().has("some_vertex", "some_property", "foo").values()
print(t.next())
some_property is indexed by a mixed index served by ElasticSearch behind JanusGraph, with at least for the moment about 1 million entries. I'm still building up my dataset so foo will actually return about 100k of the million, but future additions will change that. If I do the query as written above it times out, presumably it's trying to send back all 100k at once? If I do a limit of like 100, it seems like I get all 100 at once (of course changing t.next() to instead being a for loop to observe all 100). In the Tinkerpop docs there's mention of a server side setting resultIterationBatchSize with a default of 64. I'd expect it to just send back the first part of the result set as a batch of 64, and I only print 1 of them, discarding the rest. The Gremlin-Python section explicitly calls out a client side batch setting:
The following options are allowed on a per-request basis in this fashion: batchSize, requestId, userAgent and evaluationTimeout (formerly scriptEvaluationTimeout which is also supported but now deprecated).
The following options are allowed on a per-request basis in this fashion: batchSize, requestId, userAgent and evaluationTimeout (formerly scriptEvaluationTimeout which is also supported but now deprecated).
But I'd expect that to just be something if you're wanting to override the server side's default of 64? Ultimately what I'm wanting to do to is have some large result set requested, but only incrementally hold it in batches on the client side without having to hold the entire result set in memory on the client side.
Solution:
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response
spmallette
spmallette100d ago
sorry - i dont' think anyone noticed this question for some reason. unfortunately, the drivers don't work they way you're hoping. the server will continue to stream results according to batch size. it doesn't hold and wait for the client to ask for the next batch. i don't like to recommend this, but with JanusGraph, I guess you could use a session. you'd send a script to the server like:
t = g.V();t.next(64)
t = g.V();t.next(64)
you'd process those results. then on the next request you could do:
t.next(64)
t.next(64)
and keep doing that until you exhaust t. you'd want to do some sort of checking to ensure that your script doesn't end in a NoSuchElementException but you get the idea I hope. i don't like to recommend it because that approach only works if the server is using Groovy to process Gremlin scripts and not all graphs do that so your code loses portability. furthermore, I think we will be leaning even further away from Groovy in coming versions and this approach likely won't be available at some point in the future.
criminosis
criminosis100d ago
Out of curiosity what's the batch intended to be used for then if not for batch size to the client? The graph provider from its backing data store?
Solution
spmallette
spmallette100d ago
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response
criminosis
criminosis99d ago
Got it. Thanks @spmallette
Want results from more Discord servers?
Add your server
More Posts
AWS Neptune updating gremlin driver to 3.6.2 introduced many bugs to working queriesAfter updating Amazon Neptune engine version from 1.2.0.2 to 1.2.1.0 and the Gremlin.Net (C# nuget) vertex-label-with-given-name-does-not-existERROR with Janusgraph 0.5.3vertex-label-with-given-name-does-not-exist ERROR with Janusgraph 0.5.3 while adding labels to vertiDocumentation states there should be a mid-traversal .E() step?Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-travDisabling strategies via string in remote driverIs there a way to disable a strategy in a providers implementation without a reference to the class?LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking👋🏻 Hi all! In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278Is there a way to store the tinkerpop graph in DynamoDB?AWS provides Neptue graph database but problem with it is that it is not distributed and can't be hoConnection to Azure cosmos db using GoHi All, Asking this as a newbie to Graphs databases in general. I have been trying to connect to an I met a man with seven wives, each of which had seven sacks.I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 aIntegration tests for AWS Neptune DBdo we have any Testcontainers for AWS Neptune for writing integration tests in java applicationsG.V() IDE can't visualize path().by(valueMap()) queryHi @G.V() - Gremlin IDE (Arthur) sorry if this is a duplicate question. I am playing around with G.VBeginner Gremlin QuestionsHello - I am trying to do an Advent of Code challenge as a graph problem to learn some Gremlin, and G.V Graph Playground: Gremlin client@G.V() - Gremlin IDE (Arthur) Quick question: Does G.V graph playground allow adding vertices & edgeSplitting a query with range()I have a Gremlin Query that starts simple (one Label), and then branches out to many different paths