Help with a gremlin query..
Hi folks,
I have a graph where:
Customer vertices connect to Order vertices via hasOrder edges.
Each Customer has a customerId
Each Order has:
• orderId
• orderDate
Rules for output:
•I will pass multiple customer IDs as input.
•For each customer, fetch their orders.
•A customer may have multiple entries for the same orderId. In that case:
Keep only the one with the latest orderDate.
The selection of orders for one customer must not interfere with another customer’s selection — even if they share the same orderId.
I cannot use group() or any other grouping/aggregation step — only order(), limit(), dedup(), etc.
Example scenario:
customerId | orderId | orderDate
C1 | O1 | 2024-01-01
C1 | O1 | 2024-01-02
C1 | O2 | 2024-01-03
C2 | O1 | 2024-01-01
C2 | O1 | 2024-01-02
C2 | O3 | 2024-01-04
C3 | O4 | 2024-01-02
Expected output:
{c=C1, o={orderId=[O1], orderDate=[2024-01-02]}}
{c=C1, o={orderId=[O2], orderDate=[2024-01-03]}}
{c=C2, o={orderId=[O1], orderDate=[2024-01-02]}}
{c=C2, o={orderId=[O3], orderDate=[2024-01-04]}}
{c=C3, o={orderId=[O4], orderDate=[2024-01-02]}}
Need help: How can I write a Gremlin query that achieves this with only order(), limit(), and dedup(), without using group() or similar, while ensuring that each customer is processed independently?
QUERY TO SAMPLE DATA IN CHATS..
9 Replies
Your requirement of "•A customer may have multiple entries for the same orderId. In that case: Keep only the one with the latest orderDate." contradicts with your expected output, can you please clarify? For example there are multiple entries for C1 and O1 with different dates
my bad Andrea, thanks a lot for pointing out. Corrected the o/p in the post..
is sharing the same orderId the same as sharing the same order vertex?
it's often easiest/fastest if you simply add a Gremlin query of (addV/addE) that creates the sample data set for us.
Sample data query:
g.addV('Customer').property('customerId','C1').as('c1').
addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o11').
addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o12').
addV('Order').property('orderId','O2').property('orderDate','2024-01-03').as('o13').
addV('Customer').property('customerId','C2').as('c2').
addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o21').
addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o22').
addV('Order').property('orderId','O3').property('orderDate','2024-01-04').as('o23').
addV('Customer').property('customerId','C3').as('c3').
addV('Order').property('orderId','O4').property('orderDate','2024-01-02').as('o31').
addE('hasOrder').from('c1').to('o11').
addE('hasOrder').from('c1').to('o12').
addE('hasOrder').from('c1').to('o13').
addE('hasOrder').from('c2').to('o21').
addE('hasOrder').from('c2').to('o22').
addE('hasOrder').from('c2').to('o23').
addE('hasOrder').from('c3').to('o31').iterate()
Sure.. Added in the chat.
OrderId even if same represents different vertex. So we picking up the one with the latest orderDate. Just to reiterate within a customer we may have same orderId multiple time, so we pick the latest. But across separate customers orderId may be same, so we dont want to mix that up..
Does this work for you?
This outputs for me:
Thanks.. working as expected..
Wondering if we can workout anything without local either, just the basics order and dedup/limit
Curious why you would want to avoid local?
Can use barrier instead of fold().unfold():
Here's another version, that produces a flattened
Map
:
I don't think local
step is necessarily needed. flatMap
could suffice.
though i too am curious why you have all the step restrictions? CosmosDB?just working on a config driven thing, where gremlin queries are parsed by a query parser/adaptor, which doesnt support/list any much variety of functions. Thats the limitation, long story short..