DCDragos Ciupureanu10/19/2023
Hi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk data importer (unless there's a bulk-insert functionality straight from Gremlin - I couldn't find this). Looking at the GraphSON schema/docs I see there are some IDs on the edges that I am not sure how/if I need to generate. I'm doing this mapping in Python (but can be done in other languages if there's better support). Any recommendations/tips for this?
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature:
it appears we've missed this question on bulk loading on @neptune - anyone have any tips for this?
DCDragos Ciupureanu10/24/2023
Thanks for the suggestion. Yes, in the end I used the bulk loader as it's easy to export from Pandas to the Gremlin CSV format. 👍
Do we need to split up into edgesosns and vetexons right?
@Dragos Ciupureanu - Neptune also has Pandas support through the AWS SDK for Pandas:
DCDragos Ciupureanu10/25/2023
Nice, didn't know about this. Thanks @triggan Whilst on the same neptune topic, do you happen to know if I can get graph embeddings out of the graph? I see they use RotatE for link predictions but I just want the embeddings. From what I looked I couldn't find anything in their examples. Similarly, does a gremlin response hook up into something that can return embeddings for a subgraph?
Neptune doesn't provide embeddings directly. Best I can say, for now, is "what this space" as there is a lot happening around this at the moment. At present, you would use Gremlin to fetch the subgraphs and then feed this into a separate library or model to generate the embeddings. Two popular ones used for this in the graph arena tend to be GraphSAGE and GraphStorm (
DCDragos Ciupureanu10/25/2023
That's very useful, thanks @triggan

