Apache TinkerPop•2mo ago

Help with a gremlin query..

Hi folks, I have a graph where: Customer vertices connect to Order vertices via hasOrder edges. Each Customer has a customerId Each Order has: • orderId • orderDate Rules for output: •I will pass multiple customer IDs as input. •For each customer, fetch their orders. •A customer may have multiple entries for the same orderId. In that case: Keep only the one with the latest orderDate. The selection of orders for one customer must not interfere with another customer’s selection — even if they share the same orderId. I cannot use group() or any other grouping/aggregation step — only order(), limit(), dedup(), etc. Example scenario: customerId | orderId | orderDate C1 | O1 | 2024-01-01 C1 | O1 | 2024-01-02 C1 | O2 | 2024-01-03 C2 | O1 | 2024-01-01 C2 | O1 | 2024-01-02 C2 | O3 | 2024-01-04 C3 | O4 | 2024-01-02 Expected output: {c=C1, o={orderId=[O1], orderDate=[2024-01-02]}} {c=C1, o={orderId=[O2], orderDate=[2024-01-03]}} {c=C2, o={orderId=[O1], orderDate=[2024-01-02]}} {c=C2, o={orderId=[O3], orderDate=[2024-01-04]}} {c=C3, o={orderId=[O4], orderDate=[2024-01-02]}} Need help: How can I write a Gremlin query that achieves this with only order(), limit(), and dedup(), without using group() or similar, while ensuring that each customer is processed independently? QUERY TO SAMPLE DATA IN CHATS..

9 Replies

Andrea•2mo ago

Your requirement of "•A customer may have multiple entries for the same orderId. In that case: Keep only the one with the latest orderDate." contradicts with your expected output, can you please clarify? For example there are multiple entries for C1 and O1 with different dates

eternallawOP•2mo ago

my bad Andrea, thanks a lot for pointing out. Corrected the o/p in the post..

spmallette•2mo ago

is sharing the same orderId the same as sharing the same order vertex? it's often easiest/fastest if you simply add a Gremlin query of (addV/addE) that creates the sample data set for us.

eternallawOP•2mo ago

Sample data query: g.addV('Customer').property('customerId','C1').as('c1'). addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o11'). addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o12'). addV('Order').property('orderId','O2').property('orderDate','2024-01-03').as('o13'). addV('Customer').property('customerId','C2').as('c2'). addV('Order').property('orderId','O1').property('orderDate','2024-01-01').as('o21'). addV('Order').property('orderId','O1').property('orderDate','2024-01-02').as('o22'). addV('Order').property('orderId','O3').property('orderDate','2024-01-04').as('o23'). addV('Customer').property('customerId','C3').as('c3'). addV('Order').property('orderId','O4').property('orderDate','2024-01-02').as('o31'). addE('hasOrder').from('c1').to('o11'). addE('hasOrder').from('c1').to('o12'). addE('hasOrder').from('c1').to('o13'). addE('hasOrder').from('c2').to('o21'). addE('hasOrder').from('c2').to('o22'). addE('hasOrder').from('c2').to('o23'). addE('hasOrder').from('c3').to('o31').iterate() Sure.. Added in the chat. OrderId even if same represents different vertex. So we picking up the one with the latest orderDate. Just to reiterate within a customer we may have same orderId multiple time, so we pick the latest. But across separate customers orderId may be same, so we dont want to mix that up..

Andrea•2mo ago

Does this work for you?

g.V().has('customerId', within(['C1', 'C2', 'C3']))
  .as('customer')
  .local(
    out('hasOrder')
    .order()
      .by('orderId')
      .by(values('orderDate').asDate(), desc)
    .fold()
    .unfold()
    .dedup()
      .by('orderId')
  )
  .project('c', 'o')
    .by(in('hasOrder').values('customerId'))
    .by(valueMap('orderId', 'orderDate'))

g.V().has('customerId', within(['C1', 'C2', 'C3']))
  .as('customer')
  .local(
    out('hasOrder')
    .order()
      .by('orderId')
      .by(values('orderDate').asDate(), desc)
    .fold()
    .unfold()
    .dedup()
      .by('orderId')
  )
  .project('c', 'o')
    .by(in('hasOrder').values('customerId'))
    .by(valueMap('orderId', 'orderDate'))

This outputs for me:

==>[c:C2,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C2,o:[orderId:[O3],orderDate:[2024-01-04]]]
==>[c:C1,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C1,o:[orderId:[O2],orderDate:[2024-01-03]]]
==>[c:C3,o:[orderId:[O4],orderDate:[2024-01-02]]]

==>[c:C2,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C2,o:[orderId:[O3],orderDate:[2024-01-04]]]
==>[c:C1,o:[orderId:[O1],orderDate:[2024-01-02]]]
==>[c:C1,o:[orderId:[O2],orderDate:[2024-01-03]]]
==>[c:C3,o:[orderId:[O4],orderDate:[2024-01-02]]]

eternallawOP•2mo ago

Thanks.. working as expected.. Wondering if we can workout anything without local either, just the basics order and dedup/limit

Andrea•2mo ago

Curious why you would want to avoid local? Can use barrier instead of fold().unfold():

g.V().has('customerId', within(['C1', 'C2', 'C3']))
  .as('customer')
  .local(
    out('hasOrder')
    .order()
      .by('orderId')
      .by(values('orderDate').asDate(), desc)
    .barrier()
    .dedup()
      .by('orderId')
  )
  .project('c', 'o')
    .by(in('hasOrder').values('customerId'))
    .by(valueMap('orderId', 'orderDate'))

g.V().has('customerId', within(['C1', 'C2', 'C3']))
  .as('customer')
  .local(
    out('hasOrder')
    .order()
      .by('orderId')
      .by(values('orderDate').asDate(), desc)
    .barrier()
    .dedup()
      .by('orderId')
  )
  .project('c', 'o')
    .by(in('hasOrder').values('customerId'))
    .by(valueMap('orderId', 'orderDate'))

spmallette•2mo ago

Here's another version, that produces a flattened Map:

gremlin> g.V().hasLabel('Customer').as('c').
......1>   flatMap(out().
......2>           order().by('orderId').by(values('orderDate').asDate(), desc).
......3>           barrier().
......4>           dedup().by('orderId')).as('orderId','orderDate').
......5>   select('c', 'orderId', 'orderDate').
......6>     by('customerId').
......7>     by('orderId').
......8>     by('orderDate')
==>[c:C1,orderId:O1,orderDate:2024-01-02]
==>[c:C1,orderId:O2,orderDate:2024-01-03]
==>[c:C3,orderId:O4,orderDate:2024-01-02]
==>[c:C2,orderId:O1,orderDate:2024-01-02]
==>[c:C2,orderId:O3,orderDate:2024-01-04]

gremlin> g.V().hasLabel('Customer').as('c').
......1>   flatMap(out().
......2>           order().by('orderId').by(values('orderDate').asDate(), desc).
......3>           barrier().
......4>           dedup().by('orderId')).as('orderId','orderDate').
......5>   select('c', 'orderId', 'orderDate').
......6>     by('customerId').
......7>     by('orderId').
......8>     by('orderDate')
==>[c:C1,orderId:O1,orderDate:2024-01-02]
==>[c:C1,orderId:O2,orderDate:2024-01-03]
==>[c:C3,orderId:O4,orderDate:2024-01-02]
==>[c:C2,orderId:O1,orderDate:2024-01-02]
==>[c:C2,orderId:O3,orderDate:2024-01-04]

I don't think local step is necessarily needed. flatMap could suffice. though i too am curious why you have all the step restrictions? CosmosDB?

eternallawOP•2mo ago

just working on a config driven thing, where gremlin queries are parsed by a query parser/adaptor, which doesnt support/list any much variety of functions. Thats the limitation, long story short..

Gaming

Programming

Help with a gremlin query..

Did you find this page helpful?