AT

Sequential IDs in Neptune?

AAndys181411/7/2023
@neptune I'm attempting to implement sequential IDs for the vertices in our AWS Neptune graph. So far, we have added a new property called vertexNumber, which will store the numeric sequential ID for each vertex. Then, before saving a vertex to the database, I run a simple query to retrieve the current highest vertex number, increment it, and store the new vertexNumber to that vertex. Pseudo-code examples found below.
// Calculate the current highest vertexNumber
id = g.V().hasLabel('my_vertex_label').has("vertexNumber").values("vertexNumber").max()

// Increment result by 1 which will be for the next vertex we save.
vertexNumber = id + 1

// Add the new vertex
g.addV('my_vertex_label').property(Cardinality.single, "vertexNumber", vertexNumber).property(...etc)
// Calculate the current highest vertexNumber
id = g.V().hasLabel('my_vertex_label').has("vertexNumber").values("vertexNumber").max()

// Increment result by 1 which will be for the next vertex we save.
vertexNumber = id + 1

// Add the new vertex
g.addV('my_vertex_label').property(Cardinality.single, "vertexNumber", vertexNumber).property(...etc)
My question is: How will Neptune handle this at scale? For context, we have a distributed architechture in which tens or hundreds (in super rare cases, maybe over a thousand?) new vertexes can be created per SECOND, meaning our db cluster probably sees a lot of concurrent transactions. We are looking for information on how Neptune will handle the initial read query with, for example, 10 or more concurrent transactions. Will all 10+ transactions return the same vertexNumber, or Will Neptune be smart enough to isolate these queries? Thanks!
X3x111/7/2023
Hi, I suggest you have a look at this page : https://docs.aws.amazon.com/neptune/latest/userguide/transactions-neptune.html#transactions-neptune-read-only More broadly at this section about transactions : https://docs.aws.amazon.com/neptune/latest/userguide/transactions.html Neptune operates with transactions, so if you don't modify a part of your graph, you will always return the same value and will most likely use its cache to speed up the result. Just as a side note, if you have concurrent threads running the example you show, it's very likely you will face race conditions. To avoid it you either have to make sure only 1 thread updates 1 part of the graph, or build a single query to achieve what you want (read + write in the same query).
Ttriggan11/8/2023
Both the max() and min() steps in Gremlin are not currently leveraging Neptune's built-in indexes. So whenever you run these, it would effectively require a full scan of all IDs. As your data scales, these queries (even if pulled from bufferpool cache) would continue to get slower and slower with more data that they would need to fetch. What is the purpose of the sequential valued properties? Also note, that each vertex and edge will have it's own ID. In Neptune, vertex an edge IDs must be unique, so they can be valuable in the sense that attempting to create another vertex or edge with the same ID would through an exception. (This is one of the few built-in constraints in Neptune). If you don't supply an ID, Neptune uses a UUID in it's place. However, it is a best practice to use a unique and deterministic value for vertex and edge IDs when possible. That can make simple lookup queries (g.V(<id>)) easy to express and also performant.
AAndys181411/8/2023
Thanks for the information. I've read all the Neptune docs and didn't feel like it provided a lot of clarity as to whether or not my use case would work. And yes, we would absolutely have concurrent threads running the provided example. Like I said we have a distributed architechture in which tens, or hundreds of vertexes can be ingested per second. Thanks for the comment. And yes, the performance of max() was absolutely a concern for us, which is why we're putting that query behind a Redis cache. In an ideal situation, the DB won't need to be hit very often as the Redis cache should have the most up-to-date sequential ID. We're not replacing the UUID convention that Neptune uses at all. The UUID will still be the primary identifier for vertices and edges. We are simply adding an additional property called vertexNumber. The purpose of sequential valued properties is for user experience and human readability. The best example I can give you on what we're trying to achieve is how ServiceNow does it: https://i.imgur.com/zYOFnE5.png . However, they still use a UUID on the backend, probably for security and scaling purposes: https://i.imgur.com/tXNBNiT.png. This is exactly what we're going for. UUIDs are a nice easy solution but we need a way for our users to identify something without us needing to display a 32-character string of random non-human readable junk on the frontend.
Ttriggan11/8/2023
Makes sense. You may also consider taking advantage of the ID values instead of using UUIDs. That would ensure that your IDs are unique across the entire graph.
DCDragos Ciupureanu11/8/2023
Another approach would be to generate "readable" ids such as the ones from nanoid https://github.com/ai/nanoid & https://zelark.github.io/nano-id-cc/ . The last link is a calculator of collision probability and for 1k ids/s with only numbers and capital case letters, and the id length of 16 characters, you have ~13 years until you get 1% chance of collision. I guess what I'm trying to say is that if the UX is not greatly impacted by not having a sequential number perhaps you could do without the headache of having to manage the id yourself.
AAndys181411/8/2023
I have considered this, because I think it'd be great to have the built-in uniqueness constraint, but I'm a little worried that vertexes with conflicting IDs will cause us to miss certain data being ingested. We would definitely need to handle whatever exception Neptune throws when attempting to add vertexes with non-unique IDs
DCDragos Ciupureanu11/8/2023
Correct, in which case you retry the call with a different ID.
AAndys181411/8/2023
This is a good note. 16 characters might be a little more than I would hope for, but maybe I can find a balance between the characters count, vs. a practically low chance at a collision. Of course I'd still like to do uniqueness validation before saving to the database, but luckily we're still going to be using UUID for the actual primary ID, so it's not the absolute end of the world if we get a very rare collision
DCDragos Ciupureanu11/8/2023
If you really want uniqueness you can use this ID as the vertex/edge ID and that's taken care by Neptune for you.
DBDave Bechberger11/8/2023
@Andys1814 How much do you care about the ids being truly sequential or is having some gaps acceptable as long as they are human readable? I ask as this was a common request when I was working with Cassandra. A common practice was to allot a range of ids to each client on connection versus getting a new one each time. When a client exhausts it's assigned range it then reaches out to get a new range.
This helps to minimize the single point of failure and additional overhead of having to go to a single coordinator to get an id value for each request. It does however means that inserts will not be in sequential order and that you may have gaps in the number. This may or may not be an issue depending on your use case.
AAndys181411/10/2023
This is a great idea. We're not really worried if there's some gaps. The range idea definitely mitigates the negative affects of when collisions do hit and retries are necessary.

Looking for more? Join the community!

Want results from more Discord servers?
Add your server
Recommended Posts
Gremlin console vs REST APII'm trying to get a path and the properties of the vertices and the edges for that path by running aCryptic Neptune Gremlin Error Rate Creeping - What Would You Recommend?This relates more to do with Neptune usage, nevertheless, it is also related to the Gremlin Query erkubehoundIf anyone is familiar with KubeHound DSL. Can someone explain why Query 1 is different from Query 2.Help with visualizing in the graph-notebookI am trying to visualize a graph in the graph-notebook but no matter what I do I cannot get it to beGremlin browser code editorHi, I'm looking for a code editor like monaco https://microsoft.github.io/monaco-editor/ to embed inConnecting to local gremlin server with websocket addressHello everyone. I'm looking for help with a client app written in Java that uses Tinkerpop Gremlin tClarification on Kerberos configuration for Gremlin DriverI'm a little bit unclear on the role of the JAAS configuration file for the Gremlin client in the coGremlin Driver and frequently changing serversIn a containerised environment, hosts are frequently replaced and their IP address can change severaGlobal SearchIs there a way where i can scan all the vertex or edge properties that match a given keyword in gremGraphSON mapperHi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk d.drop() behavior confussionI have a basic java app and I'm learning hot to send gremlin queries to a JanusGraph from that java Can I name the result of an anonymous traversal without moving the traverser?I can currently do the following: ``` Graph graph = TinkerFactory.createModern(); GraphTraversalSCan GraphBinary be used to save a graph to file?Can GraphBinary be used to save graph in a file. Any example is welcome.How to get cardinality of property?I have a multi property and I want to find out its cardinality. How can I do that? valueMap/elementMinverted regex searchHey, In my vertices I store escaped regexp statements as labels (e.g: 'wh.' which in theory should Debug message spam from tinkerpop server 3.7Right now, when connecting to my local tinkerpop server, I am getting incredible amounts of debug loShould by() Modulator Work For More Types?This works. `gremlin> g.V().out().out().path().by("name") ==>[marko,josh,ripple] ==>[marko,josh,lop]InProcess GraphDB with Gremlin Support? (C# or NodeJS)Hello, is there any in process GraphDB out there in the world? Best would be c# or NodeJS and not JaEasiest Way to Get List Cardinality Properties As a List?What is the easiest way to retrieve the vertex properties that have list cardinality back as a list filter lambda in remote consolehi all, i’m trying to do filter on remote console to neptune server but keep getting MalformedQueryE