R
Railway•3mo ago
Prodigga

How to scale up a NodeJS Socket IO server without Session Affinity

Hey there We have a Socket IO server and we'd like the option to scale horizontally, but if I understand correctly this isn't possible without session affinity (handshake process will fail to establish communication?). are we 'locked into' a single service in this case? Project ID N/A
Solution:
its true railway's proxy does not support session affinity (at the time of writing this) but there would be nothing stopping you from deploying your own proxy that does support session affinity, the proxy you deploy would need to support session affinity, and dynamic upstreams so that it can internally proxy to each of your replicas. aka caddy
Jump to solution
35 Replies
Percy
Percy•3mo ago
Project ID: N/A
Prodigga
Prodigga•3mo ago
n/a
Solution
Brody
Brody•3mo ago
its true railway's proxy does not support session affinity (at the time of writing this) but there would be nothing stopping you from deploying your own proxy that does support session affinity, the proxy you deploy would need to support session affinity, and dynamic upstreams so that it can internally proxy to each of your replicas. aka caddy
Prodigga
Prodigga•3mo ago
How do I proxy requests to a specific replica? i already have a gateway that sits in front of everything. I guess it would mean that all socketio traffic would have to be piped through the gateway too?
Brody
Brody•3mo ago
it wouldn't be a specific replica, the proxy would need do support sticky sessions though as long as the gateway supports dynamic proxy upstream that resolve from a AAAA lookup and sticky sessions, then you can do it. what is your current gateway? nginx?
Prodigga
Prodigga•3mo ago
homebrew gateway, nodejs server, i am running some custom routing logic 🙃
Brody
Brody•3mo ago
oh I see, then you'd have one hell of a time writing code to proxy sticky sessions to the replicas
Prodigga
Prodigga•3mo ago
http-proxy-middleware to proxy the requests to the appropriate game server (dev/stg/prd) mmm yeah shame session affinity is not supported out of the box like heroku, that made this (surely common?) use case supported out of the box!
Brody
Brody•3mo ago
for context, this is what a DNS lookup on the internal domain of a service resolves to when that service has multiple replicas https://utilities.up.railway.app/dns-lookup?value=hello-world.railway.internal&type=ip doing a AAAA lookup is where you would get a list of your upstreams from
Prodigga
Prodigga•3mo ago
right, and i know nothing about what i am about to say, but conceptually i would do my own loadbalancing at the api gateway level, picking from one of those replicas, and associate all incoming requests from that client with that specific replica would that be as 'simple' as proxying requests to ie http://fd12:74d7:7e85::33:1190:a62c/ versus http://hello-world.railway.internal/ ? (maybe i should move this discussion elsewhere hah)
Brody
Brody•3mo ago
exactly, but it all has to be dynamically done, since all those ips change on every deployment you are missing the port, but yes exactly, hosts aren't technically needed for routing on the private network like how they are needed on the public network
Prodigga
Prodigga•3mo ago
yeah right right. does that mean a new DNS lookup for every request being proxied to ensure the replica is still valid? I don't think i will chase this solution down, as it seems pretty involved. just curious. it is pretty scary that we might not be able to handle a large unflux of users though
Brody
Brody•3mo ago
you could cache for a several seconds, with some extra retry logic, that would save a lot lookups if you have a lot of traffic some context on railways current proxy, they use envoy right now, and eventually it will be thrown out the window for a home grown http proxy, after that I can't see adding support for sticky sessions to be too challenging relative to writing an http proxy that can handle railways scale I assume your gateway does already proxy websocket traffic? otherwise you could use a readymade solution that supports everything I've talked about.. caddy
Prodigga
Prodigga•3mo ago
no, on heroku the clients would be given the url of the appropraite server to connect directly to well, not just on heroku - that's my current setup too
Brody
Brody•3mo ago
keep in mind, railway is still growing and improving compared to an already well established service like heroku, not everything can be 1:1 feature wise, so sometimes some manual workarounds are going to be needed
Prodigga
Prodigga•3mo ago
clients send all their REST Api requests via api gateway. requests get routed to the correct place. there is an endpoint to query for details regarding 'realtime coms' (ie socketio) which returns the socketio server url. clients bypass the api gateway and connect directly yeah for sure been mostly good so far and we are on 1vcpu at peak time at the moment so i guess plenty of room to go but i imagine a lot of concurrent connections may choke out a single machine
Brody
Brody•3mo ago
as long as your code can scale vertically without issues, you have about 31 vCPUs of headroom haha
Prodigga
Prodigga•3mo ago
yeah, i had a bit of a scare last night though, during peak time it didn't scale past 1vcpu but our average response time was 1-2 seconds even querying the health-check endpoint (all it does is return {status: "OK"} ) was taking 1-2 seconds :ablobgrimace:
Brody
Brody•3mo ago
maybe it can't scale vertically then?
Prodigga
Prodigga•3mo ago
right now it seems to have resolved yeah maybe not, though I am not sure what the bottle neck is. as if it was processing power, I imagine railway would have scaled it up past 1.1 vcpu i am going to keep an eye on it over night and see how it goes tonight
Brody
Brody•3mo ago
your code would need to be able to scale past 1.1 vcpu, not railway
Prodigga
Prodigga•3mo ago
How so, I don't have any handling for that. I just assumed 1> vcpu = more horsepower kind of like higher tier dynos on heroku
Brody
Brody•3mo ago
nope, at any given point your app has access to the full 32 vcpu, it's up to your code to be able to properly utilise that though so much different than dynos
Prodigga
Prodigga•3mo ago
right, i see
Brody
Brody•3mo ago
I don't know your project architecture but couldn't you run multiple separate services for the websockets? each service would have only one replica so no sticky sessions needed, your gateway would just be responsible for having a list of the websocket services and their domains and hand them out when applicable, and unless I am not understanding what you've explained to me thus far, it kinda sounds like it can do this already?
Prodigga
Prodigga•3mo ago
yeah that would work, perhaps the best solution until sticky sessions are implemented
Brody
Brody•3mo ago
I also did ask the person who is writing the new http proxy if sticky sessions where on their mind, I'll update you when I have news on that
Prodigga
Prodigga•3mo ago
and once sticky sessions are implemented i can colapse it all down to one service and scale up via replica nice, thank you
Brody
Brody•3mo ago
maybe even configurable proxying algorithms, currently it's only round robin
Prodigga
Prodigga•3mo ago
🔥 do you know of any more resources i can look at for utilising the other cores? is this just nodejs clustering?
Brody
Brody•3mo ago
I'm honestly not sure, I'm not even a node dev haha
Prodigga
Prodigga•3mo ago
thats ok! thanks for all the help
Brody
Brody•3mo ago
no problem!
Prodigga
Prodigga•3mo ago
i dont know how to mark as closed lol
Brody
Brody•3mo ago
only mods/admins can use that right now