Help with a gremlin query..

Hi folks, I have a graph where: Customer vertices connect to Order vertices via hasOrder edges. Each Customer has a customerId Each Order has: • orderId • orderDate Rules for output: •I will pass multiple customer IDs as input. •For each customer, fetch their orders. •A customer may have multiple entries for the same orderId. In that case: Keep only the one with the latest orderDate. The selection of orders for one customer must not interfere with another customer’s selection — even if they share the same orderId. I cannot use group() or any other grouping/aggregation step — only order(), limit(), dedup(), etc. Example scenario: customerId | orderId | orderDate C1 | O1 | 2024-01-01 C1 | O1 | 2024-01-02 C1 | O2 | 2024-01-03 C2 | O1 | 2024-01-01 C2 | O1 | 2024-01-02 C2 | O3 | 2024-01-04 C3 | O4 | 2024-01-02 Expected output: {c=C1, o={orderId=[O1], orderDate=[2024-01-02]}} {c=C1, o={orderId=[O2], orderDate=[2024-01-03]}} {c=C2, o={orderId=[O1], orderDate=[2024-01-02]}} {c=C2, o={orderId=[O3], orderDate=[2024-01-04]}} {c=C3, o={orderId=[O4], orderDate=[2024-01-02]}} Need help: How can I write a Gremlin query that achieves this with only order(), limit(), and dedup(), without using group() or similar, while ensuring that each customer is processed independently? QUERY TO SAMPLE DATA IN CHATS..
9 Replies
Andrea
Andrea2w ago
Your requirement of "•A customer may have multiple entries for the same orderId. In that case: Keep only the one with the latest orderDate." contradicts with your expected output, can you please clarify? For example there are multiple entries for C1 and O1 with different dates
eternallaw
eternallawOP2w ago
my bad Andrea, thanks a lot for pointing out. Corrected the o/p in the post..
spmallette
spmallette2w ago
is sharing the same orderId the same as sharing the same order vertex? it's often easiest/fastest if you simply add a Gremlin query of (addV/addE) that creates the sample data set for us.
eternallaw
eternallawOP2w ago
Sample data query: g.addV('Customer').property('customerId','C1').as('c1'). addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o11'). addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o12'). addV('Order').property('orderId','O2').property('orderDate','2024-01-03').as('o13'). addV('Customer').property('customerId','C2').as('c2'). addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o21'). addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o22'). addV('Order').property('orderId','O3').property('orderDate','2024-01-04').as('o23'). addV('Customer').property('customerId','C3').as('c3'). addV('Order').property('orderId','O4').property('orderDate','2024-01-02').as('o31'). addE('hasOrder').from('c1').to('o11'). addE('hasOrder').from('c1').to('o12'). addE('hasOrder').from('c1').to('o13'). addE('hasOrder').from('c2').to('o21'). addE('hasOrder').from('c2').to('o22'). addE('hasOrder').from('c2').to('o23'). addE('hasOrder').from('c3').to('o31').iterate() Sure.. Added in the chat. OrderId even if same represents different vertex. So we picking up the one with the latest orderDate. Just to reiterate within a customer we may have same orderId multiple time, so we pick the latest. But across separate customers orderId may be same, so we dont want to mix that up..
Andrea
Andrea2w ago
Does this work for you?
g.V().has('customerId', within(['C1', 'C2', 'C3']))
.as('customer')
.local(
out('hasOrder')
.order()
.by('orderId')
.by(values('orderDate').asDate(), desc)
.fold()
.unfold()
.dedup()
.by('orderId')
)
.project('c', 'o')
.by(in('hasOrder').values('customerId'))
.by(valueMap('orderId', 'orderDate'))
g.V().has('customerId', within(['C1', 'C2', 'C3']))
.as('customer')
.local(
out('hasOrder')
.order()
.by('orderId')
.by(values('orderDate').asDate(), desc)
.fold()
.unfold()
.dedup()
.by('orderId')
)
.project('c', 'o')
.by(in('hasOrder').values('customerId'))
.by(valueMap('orderId', 'orderDate'))
This outputs for me:
==>[c:C2,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C2,o:[orderId:[O3],orderDate:[2024-01-04]]]
==>[c:C1,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C1,o:[orderId:[O2],orderDate:[2024-01-03]]]
==>[c:C3,o:[orderId:[O4],orderDate:[2024-01-02]]]
==>[c:C2,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C2,o:[orderId:[O3],orderDate:[2024-01-04]]]
==>[c:C1,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C1,o:[orderId:[O2],orderDate:[2024-01-03]]]
==>[c:C3,o:[orderId:[O4],orderDate:[2024-01-02]]]
eternallaw
eternallawOP2w ago
Thanks.. working as expected.. Wondering if we can workout anything without local either, just the basics order and dedup/limit
Andrea
Andrea2w ago
Curious why you would want to avoid local? Can use barrier instead of fold().unfold():
g.V().has('customerId', within(['C1', 'C2', 'C3']))
.as('customer')
.local(
out('hasOrder')
.order()
.by('orderId')
.by(values('orderDate').asDate(), desc)
.barrier()
.dedup()
.by('orderId')
)
.project('c', 'o')
.by(in('hasOrder').values('customerId'))
.by(valueMap('orderId', 'orderDate'))
g.V().has('customerId', within(['C1', 'C2', 'C3']))
.as('customer')
.local(
out('hasOrder')
.order()
.by('orderId')
.by(values('orderDate').asDate(), desc)
.barrier()
.dedup()
.by('orderId')
)
.project('c', 'o')
.by(in('hasOrder').values('customerId'))
.by(valueMap('orderId', 'orderDate'))
spmallette
spmallette2w ago
Here's another version, that produces a flattened Map:
gremlin> g.V().hasLabel('Customer').as('c').
......1> flatMap(out().
......2> order().by('orderId').by(values('orderDate').asDate(), desc).
......3> barrier().
......4> dedup().by('orderId')).as('orderId','orderDate').
......5> select('c', 'orderId', 'orderDate').
......6> by('customerId').
......7> by('orderId').
......8> by('orderDate')
==>[c:C1,orderId:O1,orderDate:2024-01-02]
==>[c:C1,orderId:O2,orderDate:2024-01-03]
==>[c:C3,orderId:O4,orderDate:2024-01-02]
==>[c:C2,orderId:O1,orderDate:2024-01-02]
==>[c:C2,orderId:O3,orderDate:2024-01-04]
gremlin> g.V().hasLabel('Customer').as('c').
......1> flatMap(out().
......2> order().by('orderId').by(values('orderDate').asDate(), desc).
......3> barrier().
......4> dedup().by('orderId')).as('orderId','orderDate').
......5> select('c', 'orderId', 'orderDate').
......6> by('customerId').
......7> by('orderId').
......8> by('orderDate')
==>[c:C1,orderId:O1,orderDate:2024-01-02]
==>[c:C1,orderId:O2,orderDate:2024-01-03]
==>[c:C3,orderId:O4,orderDate:2024-01-02]
==>[c:C2,orderId:O1,orderDate:2024-01-02]
==>[c:C2,orderId:O3,orderDate:2024-01-04]
I don't think local step is necessarily needed. flatMap could suffice. though i too am curious why you have all the step restrictions? CosmosDB?
eternallaw
eternallawOP2w ago
just working on a config driven thing, where gremlin queries are parsed by a query parser/adaptor, which doesnt support/list any much variety of functions. Thats the limitation, long story short..

Did you find this page helpful?