Optimizing connection between Python API (FastAPI) and Neptune

Hi guys. I've been working with gremlin python in my company for the past 4 years, using Neptune as the database. We are running a FastAPI server, where Neptune has been the main database since the beginning.
We always have been struggling to get a good performance on the API, but recently it has become a more latent pain, with endpoints taking more than 10s to respond.

We took some actions trying to improve this perfomance, such as updating the cluster to the latest engine version, and the same for FastAPI and gremlin-python dependencies.

Right now we're running with 3 instances (2 read replicas) db.t4g.medium. We also tested with a single db.r6g.large, but we didn't see a significant improvement.

In the process of trying to understand more what's causing the slowness, we've created a proof of concept API, where the source code can be found on this repo: https://github.com/aca-so/neptune-poc/.

We also created a new connector to Neptune, different of what we use in our main application, ‘cause on our main application we have a mechanism of keep alive to avoid Neptune closing the connections. For this PoC we used a different approach, recycling the connections every 5 minutes, based on the instances available on cluster.
So the first question is:
  1. ”What's the best way to handle these connections? We thought in three approaches: keep alive (we know it doesn't fits good with connection pool), using until closed and then renew, or renewing every X minutes. Is there another way? What's the best one?
Solution
There's a lot to unpack here....

  1. We state in our docs that t4g.medium instances are really not great for production workloads. We support them for initial development so users can keep cost down, but the amount of resources available, and the fact that they are burstable instances, really constrains their usability. Once you've used up CPU credits, you're going to get throttled.
  2. Neptune's concurrency model is based on instance size and the number of vCPUs per instance. For each vCPU, there are two query execution threads. So on a t4g.medium or an r6g.large instance, there are 2 vCPUs. That means that instance can only be computing 4 concurrent requests at a time. If you need more concurrency, then you should look to scale to a larger instance with more vCPUs. If you're workload varies over time, you may want to investigate using Neptune Serverless, which can automatically scale vertically to meet the needs of the application. There's a good presentation from last year's re:Invent that discusses when Serverless works best and when not to use it: https://youtu.be/xAdWa0Ahiok?si=OeSe-_L3ErcYH-XU
Similar to this, connection pool size would likely mirror the number of available execution threads on the instance(s). You can send more requests to a Neptune instance over what it can currently process, but those additional requests will be queued (up to ~8,000 requests can end up in the request queue, which you can monitor via the MainQueuePendingQueueReqeusts CloudWatch metric).
YouTubeAWS Events
What if you could bring the connected data insights of your Amazon Neptune application to all your users—graph practitioners and non-technical users alike—while evolving your application to satisfy increasingly demanding availability, performance, and scaling requirements? In this session, review architectures that can help you operate and evolv...
Was this page helpful?