GraphSON mapper

Hi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk data importer https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 (unless there's a bulk-insert functionality straight from Gremlin - I couldn't find this). Looking at the GraphSON schema/docs I see there are some IDs on the edges that I am not sure how/if I need to generate. https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 I'm doing this mapping in Python (but can be done in other languages if there's better support). Any recommendations/tips for this?
Solution:
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
Jump to solution
8 Replies
spmallette
spmalletteā€¢8mo ago
it appears we've missed this question on bulk loading on @neptune - anyone have any tips for this?
Solution
triggan
trigganā€¢8mo ago
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
Dragos Ciupureanu
Dragos Ciupureanuā€¢8mo ago
Thanks for the suggestion. Yes, in the end I used the bulk loader as it's easy to export from Pandas to the Gremlin CSV format. šŸ‘
ManabuBeach
ManabuBeachā€¢8mo ago
Do we need to split up into edgesosns and vetexons right?
triggan
trigganā€¢8mo ago
@Dragos Ciupureanu - Neptune also has Pandas support through the AWS SDK for Pandas: https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb
GitHub
aws-sdk-pandas/tutorials/033 - Amazon Neptune.ipynb at main Ā· aws/a...
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (...
Dragos Ciupureanu
Dragos Ciupureanuā€¢8mo ago
Nice, didn't know about this. Thanks @triggan Whilst on the same neptune topic, do you happen to know if I can get graph embeddings out of the graph? I see they use RotatE for link predictions but I just want the embeddings. From what I looked I couldn't find anything in their examples. Similarly, does a gremlin response hook up into something that can return embeddings for a subgraph?
triggan
trigganā€¢8mo ago
Neptune doesn't provide embeddings directly. Best I can say, for now, is "what this space" as there is a lot happening around this at the moment. At present, you would use Gremlin to fetch the subgraphs and then feed this into a separate library or model to generate the embeddings. Two popular ones used for this in the graph arena tend to be GraphSAGE and GraphStorm (https://graphstorm.readthedocs.io/en/latest/tutorials/quick-start.html#generating-embedding).
Dragos Ciupureanu
Dragos Ciupureanuā€¢8mo ago
That's very useful, thanks @triggan