I'm trying to reason about the performance of list() operations using the prefix option to make decisions for how to model a many to many relationship in KV. I could implement it several ways and performance test, but I'm trying to avoid that premature optimization work. A Person can be part of several Orgs and each Org can have many Persons (upwards of 10,000). Assume two Persons with ids “p1" and "p2" and two Orgs with ids “o1" and "o2". Also, assume, eventual consistency is fine for my use case.
Option 1 just keys:
"person/p1/org/o1"
“person/p1/org/o2”
"person/p2/org/o1"
“org/o1/person/p1”
“org/o1/person/p2”
“org/o2/person/p1”
Then, when I wanted to find all of Orgs that go with p1, I could use prefix “person/p1”. That would get me back two rows. I could parse those keys and extract the ids for those two orgs. Similarly, if I wanted to find all of the Persons that go with Org o2, I could use prefix “org/o2”, parse that to find the one person that goes with it. The operations for adding or removing a relationship only ever effects two keys so those changes will be cheap but what about the list() operations when there are 10,000 Persons in one Org? If the keys are stored in an index that is kept in memory, I’m guessing that will also be fine.
Option 2 big and ever growing values:
I would have just one key/value pair for each Person and one for each Org. The value would contain a serialized array of the values of the other side of the relationship. Query would be fast, but changes would require reading, deserializing, filtering or adding-to, re-serializing, and finally saving a potentially large array.
Thoughts?