5 replies

MergeV "get or create" performance asymmetry

So I'm working on adding the mergeV step among others to the Rust gremlin driver. As part of that I took a pause and did a performance comparison to the "traditional" way of doing it.

So the Rust driver is submitting bytecode that's effectively doing:

"Traditional/Reference":

g.V("expected_id").fold().coalesce(unfold(), addV("my_label").properties(T.id, "expected_id")).properties("other_property", "foo").V("some_other_expected_id")...and so forth

g.V("expected_id").fold().coalesce(unfold(), addV("my_label").properties(T.id, "expected_id")).properties("other_property", "foo").V("some_other_expected_id")...and so forth

^ But given a batch of 10k vertices to write it'd do this for a chunk of 10 vertices in a single mutation traversal, but doing 10 connections in parallel to split up the batch until it finished getting all 10k written. It's well known that very long traversals don't perform well and my own trials found that doing this at > 50 vertices in a single traversal would cause timeouts for my use case, so I've been generally doing 10 and calling it good.

But this puts a ceiling on the amount of work a single network call can make (10 vertices worth) so hence why I started trying out

mergeV()

mergeV()

to stack more info into a single call without making the traversal prohibitively long.

And then the "mergeV()" way:

g.inject(
  [["lookup": [(T.id):"expected_id"], "properties":["other_property": "foo"]], ["lookup":[(T.id):"some_other_expected_id"], "properties":[other_vertex_properties_here]], ...and so forth]).
  unfold().as("payload").
  mergeV(select('lookup')).
  property(
    "other_property",
    select('payload').select('properties').select("other_property")).

g.inject(
  [["lookup": [(T.id):"expected_id"], "properties":["other_property": "foo"]], ["lookup":[(T.id):"some_other_expected_id"], "properties":[other_vertex_properties_here]], ...and so forth]).
  unfold().as("payload").
  mergeV(select('lookup')).
  property(
    "other_property",
    select('payload').select('properties').select("other_property")).

I would run the mergeV call with chunks of 200 vertices in each call.

JanusGraph•2y ago•

5 replies

criminosis

MergeV "get or create" performance asymmetry

g.V("expected_id").fold().coalesce(unfold(), addV("my_label").properties(T.id, "expected_id")).properties("other_property", "foo").V("some_other_expected_id")...and so forth

g.V("expected_id").fold().coalesce(unfold(), addV("my_label").properties(T.id, "expected_id")).properties("other_property", "foo").V("some_other_expected_id")...and so forth

mergeV()

mergeV()

to stack more info into a single call without making the traversal prohibitively long.

And then the "mergeV()" way:

g.inject(
  [["lookup": [(T.id):"expected_id"], "properties":["other_property": "foo"]], ["lookup":[(T.id):"some_other_expected_id"], "properties":[other_vertex_properties_here]], ...and so forth]).
  unfold().as("payload").
  mergeV(select('lookup')).
  property(
    "other_property",
    select('payload').select('properties').select("other_property")).

g.inject(
  [["lookup": [(T.id):"expected_id"], "properties":["other_property": "foo"]], ["lookup":[(T.id):"some_other_expected_id"], "properties":[other_vertex_properties_here]], ...and so forth]).
  unfold().as("payload").
  mergeV(select('lookup')).
  property(
    "other_property",
    select('payload').select('properties').select("other_property")).

I would run the mergeV call with chunks of 200 vertices in each call.

MergeV "get or create" performance asymmetry

Similar Threads

MergeV "get or create" performance asymmetry

Similar Threads

Similar Threads

Similar Threads