latency

Hello, I am building an app that performs object detection using machine learning. The machine learning inference server is hosted in AWS sagemaker (US-east), and I am using railway to host a node express server as a sort of gateway between the client (US-east) and the sagemaker server. The client needs to send a single image with it's request. I have noticed that directly invoking the sagemaker server it takes about 300 milliseconds to get a response back. I know from local testing, the inference time is about 150 milliseconds, so it's taking 150 milliseconds presumably to send that image data and get a response back from sagemaker. When invoking the express server hosted on railway (US-west), it takes about 900 milliseconds - 1 second to get a response back. I am slightly surprised by that, but I imagine that it's mostly passing the image data between requests that's causing most of this, i.e client --> express --> sagemaker instead of just client --> sagemaker. It could also be that express server is US-west and sagemaker is US-east. There is also the fact that I need to do some authentication stuff on the express server before passing along the request to sagemaker, but I have tried to run those in parallel rather than sequential. I would like to reduce the latency as much as possible, also please forgive me, I am mostly a front-end dev that is diving into territory I know very little about, so any thoughts/ideas/suggestions are appreciated.
12 Replies
Percy
Percy8mo ago
Project ID: N/A
Brody
Brody8mo ago
why not run the express app in us-east as well to cut down on the rtt to your aws service?
CeresMiller
CeresMiller8mo ago
express app is running us-west (oregon)
Brody
Brody8mo ago
my bad
CeresMiller
CeresMiller8mo ago
run sagemaker in us-west you mean? Yeah that could be an option
Brody
Brody8mo ago
I fixed my question
CeresMiller
CeresMiller8mo ago
Railway requires an upgrade of my plan afaik to get access to US-east
Brody
Brody8mo ago
yes they do
CeresMiller
CeresMiller8mo ago
are you suprised that it's triple the latency, does us-west and us-east differ that much
Brody
Brody8mo ago
I'm sure not all that latency is coming from the travel time but you could also run your aws service in us-west if that's an option though if this project will have a userbase or clients, then at some point you will need to upgrade to pro anyway
CeresMiller
CeresMiller8mo ago
do you have some thoughts on what it could be or how to approach this, should I just atomically break it down and see what is causing the latency
Brody
Brody8mo ago
I think you should just run the two things in the same region, eliminate that variable completely first