Apache TinkerPop•2y ago

GraphSON mapper

Hi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk data importer https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 (unless there's a bulk-insert functionality straight from Gremlin - I couldn't find this). Looking at the GraphSON schema/docs I see there are some IDs on the edges that I am not sure how/if I need to generate. https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 I'm doing this mapping in Python (but can be done in other languages if there's better support). Any recommendations/tips for this?

Solution:

Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html

Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune

Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.

Jump to solution

8 Replies

spmallette•2y ago

it appears we've missed this question on bulk loading on @neptune - anyone have any tips for this?

Solution

triggan•2y ago

Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune

Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.

Dragos CiupureanuOP•2y ago

Thanks for the suggestion. Yes, in the end I used the bulk loader as it's easy to export from Pandas to the Gremlin CSV format. 👍

ManabuBeach•2y ago

Do we need to split up into edgesosns and vetexons right?

triggan•2y ago

@Dragos Ciupureanu - Neptune also has Pandas support through the AWS SDK for Pandas: https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb

GitHub

aws-sdk-pandas/tutorials/033 - Amazon Neptune.ipynb at main · aws/a...

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (...

Dragos CiupureanuOP•2y ago

Nice, didn't know about this. Thanks @triggan Whilst on the same neptune topic, do you happen to know if I can get graph embeddings out of the graph? I see they use RotatE for link predictions but I just want the embeddings. From what I looked I couldn't find anything in their examples. Similarly, does a gremlin response hook up into something that can return embeddings for a subgraph?

triggan•2y ago

Neptune doesn't provide embeddings directly. Best I can say, for now, is "what this space" as there is a lot happening around this at the moment. At present, you would use Gremlin to fetch the subgraphs and then feed this into a separate library or model to generate the embeddings. Two popular ones used for this in the graph arena tend to be GraphSAGE and GraphStorm (https://graphstorm.readthedocs.io/en/latest/tutorials/quick-start.html#generating-embedding).

Dragos CiupureanuOP•2y ago

That's very useful, thanks @triggan

Gaming

Programming

GraphSON mapper

Did you find this page helpful?