GraphSON mapper
Hi,
I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk data importer https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 (unless there's a bulk-insert functionality straight from Gremlin - I couldn't find this). Looking at the GraphSON schema/docs I see there are some IDs on the edges that I am not sure how/if I need to generate.
https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0
I'm doing this mapping in Python (but can be done in other languages if there's better support). Any recommendations/tips for this?
Solution:Jump to solution
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
8 Replies
it appears we've missed this question on bulk loading on @neptune - anyone have any tips for this?
Solution
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
Thanks for the suggestion. Yes, in the end I used the bulk loader as it's easy to export from Pandas to the Gremlin CSV format. 👍
Do we need to split up into edgesosns and vetexons right?
@Dragos Ciupureanu - Neptune also has Pandas support through the AWS SDK for Pandas: https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb
GitHub
aws-sdk-pandas/tutorials/033 - Amazon Neptune.ipynb at main · aws/a...
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (...
Nice, didn't know about this. Thanks @triggan
Whilst on the same neptune topic, do you happen to know if I can get graph embeddings out of the graph? I see they use
RotatE
for link predictions but I just want the embeddings. From what I looked I couldn't find anything in their examples. Similarly, does a gremlin response hook up into something that can return embeddings for a subgraph?Neptune doesn't provide embeddings directly. Best I can say, for now, is "what this space" as there is a lot happening around this at the moment. At present, you would use Gremlin to fetch the subgraphs and then feed this into a separate library or model to generate the embeddings. Two popular ones used for this in the graph arena tend to be GraphSAGE and GraphStorm (https://graphstorm.readthedocs.io/en/latest/tutorials/quick-start.html#generating-embedding).
That's very useful, thanks @triggan