Neptune Local Development Access Methods

Question for the Neptune folks - I'm trying to cobble together ergonomic local access to our development cluster. Here's what I've got that works: - SSM session running via an EC2 bastion host, which makes the targeted cluster (using AWS-StartPortForwardingSessionToRemoteHost) available at localhost:8182 - A mapping in /etc/hosts resolving <my_cluster_url> to localhost - In combination, these enable me to use the same SigV4Auth signing approach (with headers) that works from a within-VPC context to access the cluster from a Python script/Jupyter notebook - https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-python.html I have been trying to come up with an approach for handling the /etc/hosts edit that doesn't require sudo access or changes to system files. In principle, it should be possible to insert custom DNS resolution into the Python script, but I'm having no joy. Has anyone else previously hit and overcome this snag in this fashion/another way?
Solution:
Ok, after a bit more tinkering, here's an implementation that doesn't require any outside-script configuration or tools, and doesn't require disabling SSL. It requires an additional import AiohttpTransport which is used to pass the server hostname to DriverRemoteConnection via transport_factory, solving the hostname mismatch failure issue that arises without it by passing the 'correct' host to the TLS handshake. ```python...
Jump to solution
5 Replies
triggan
triggan•2w ago
Have you thought of just doing all of this within a container? That way you could modify /etc/hosts in the container and not effect your workstation's config.
Captator
CaptatorOP•2w ago
I hadn't, and that's an obvious-in-retrospect solution that leaves me somewhat kicking myself. A third approach (that I had all the pieces for in my tinkering, just never all in the right place at the same time... thanks to https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils for completing the circuit) is as follows:
import boto3
from botocore.auth import SigV4Auth, _host_from_url
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

# # If Jupyter notebook
# import nest_asyncio
# nest_asyncio.apply()


def get_signed_headers(url: str, region: str, creds: Credentials) -> dict:
request = AWSRequest(method="GET", url=url)
request.headers.add_header("Host", _host_from_url(str(request.url)))
SigV4Auth(creds, "neptune-db", region).add_auth(request)
return dict(request.headers)


true_database_url = "wss://mycluster.eu-west-2.neptune.amazonaws.com:8182/gremlin"
local_database_url = "wss://localhost:8182/gremlin"

creds = boto3.Session(profile_name="my_profile").get_credentials()
headers = get_signed_headers(true_database_url, "eu-west-2", creds)

conn = None

try:
conn = DriverRemoteConnection(local_database_url, "g", headers=headers, ssl=False)

g = traversal().withRemote(conn)

# Fetch 10 vertices
vertices = g.V().limit(10).toList()

# Print the vertices
for v in vertices:
print(v)
finally:
if conn:
conn.close()
import boto3
from botocore.auth import SigV4Auth, _host_from_url
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

# # If Jupyter notebook
# import nest_asyncio
# nest_asyncio.apply()


def get_signed_headers(url: str, region: str, creds: Credentials) -> dict:
request = AWSRequest(method="GET", url=url)
request.headers.add_header("Host", _host_from_url(str(request.url)))
SigV4Auth(creds, "neptune-db", region).add_auth(request)
return dict(request.headers)


true_database_url = "wss://mycluster.eu-west-2.neptune.amazonaws.com:8182/gremlin"
local_database_url = "wss://localhost:8182/gremlin"

creds = boto3.Session(profile_name="my_profile").get_credentials()
headers = get_signed_headers(true_database_url, "eu-west-2", creds)

conn = None

try:
conn = DriverRemoteConnection(local_database_url, "g", headers=headers, ssl=False)

g = traversal().withRemote(conn)

# Fetch 10 vertices
vertices = g.V().limit(10).toList()

# Print the vertices
for v in vertices:
print(v)
finally:
if conn:
conn.close()
My understanding is that in this context ssl=False is not a security issue unless localhost itself is somehow compromised, because the insecure portion of the traffic's journey is exclusively from the script making the request to the entrypoint to the SSM session tunnel at localhost:8182
triggan
triggan•2w ago
SSL protects against man-in-the-middle attacks but also encrypts data sent between client and server to ensure any medium in which that traffic traverses is incapable of decrypting that data without the private key used in the creation of the server-side SSL cert.
Captator
CaptatorOP•2w ago
Absolutely - ordinarily I wouldn't consider disabling it. To be explicit, I am only considering this approach because I can assert that the journey from the development machine to the bastion host, to the Neptune cluster itself, is occurring exclusively through an AWS SSM Session, which is e2e encrypted using TLS. I have now figured out a fourth approach using dnspython inside the script/notebook and coreDNS via CLI, (which can be configured by files in the project directory, so is amenable to source control), which doesn't require setting ssl=False I will edit this to share a minimal reproduction tomorrow 🙂
Solution
Captator
Captator•7d ago
Ok, after a bit more tinkering, here's an implementation that doesn't require any outside-script configuration or tools, and doesn't require disabling SSL. It requires an additional import AiohttpTransport which is used to pass the server hostname to DriverRemoteConnection via transport_factory, solving the hostname mismatch failure issue that arises without it by passing the 'correct' host to the TLS handshake.
import boto3
from botocore.auth import SigV4Auth, _host_from_url
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
from gremlin_python.driver.aiohttp.transport import AiohttpTransport
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

# # If Jupyter notebook
# import nest_asyncio
# nest_asyncio.apply()


def get_signed_headers(url: str, region: str, creds: Credentials) -> dict:
request = AWSRequest(method="GET", url=url)
request.headers.add_header("Host", _host_from_url(str(request.url)))
SigV4Auth(creds, "neptune-db", region).add_auth(request)
return dict(request.headers)


neptune_hostname = "mycluster.eu-west-2.neptune.amazonaws.com"
true_database_url = f"wss://{neptune_hostname}:8182/gremlin"
local_database_url = "wss://localhost:8182/gremlin"

creds = boto3.Session(profile_name="my_profile").get_credentials()
headers = get_signed_headers(true_database_url, "eu-west-2", creds)

conn = None

try:
conn = DriverRemoteConnection(
local_database_url,
"g",
headers=headers,
transport_factory=lambda: AiohttpTransport(server_hostname=neptune_hostname),
)

g = traversal().withRemote(conn)

# Return 1 if connection successful
print(g.inject(1).toList())
finally:
if conn:
conn.close()
import boto3
from botocore.auth import SigV4Auth, _host_from_url
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
from gremlin_python.driver.aiohttp.transport import AiohttpTransport
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

# # If Jupyter notebook
# import nest_asyncio
# nest_asyncio.apply()


def get_signed_headers(url: str, region: str, creds: Credentials) -> dict:
request = AWSRequest(method="GET", url=url)
request.headers.add_header("Host", _host_from_url(str(request.url)))
SigV4Auth(creds, "neptune-db", region).add_auth(request)
return dict(request.headers)


neptune_hostname = "mycluster.eu-west-2.neptune.amazonaws.com"
true_database_url = f"wss://{neptune_hostname}:8182/gremlin"
local_database_url = "wss://localhost:8182/gremlin"

creds = boto3.Session(profile_name="my_profile").get_credentials()
headers = get_signed_headers(true_database_url, "eu-west-2", creds)

conn = None

try:
conn = DriverRemoteConnection(
local_database_url,
"g",
headers=headers,
transport_factory=lambda: AiohttpTransport(server_hostname=neptune_hostname),
)

g = traversal().withRemote(conn)

# Return 1 if connection successful
print(g.inject(1).toList())
finally:
if conn:
conn.close()

Did you find this page helpful?