Have an automatic failover for sharded applications
What are you trying to do?
When my main server is off and the 100 shards on it cannot be used by the users, I'd like to have an automatic failover on another server with like 20 shards on it. I added argument so it can work like that : bun start shard=x shardcount=100 with x from 0 to 100
What's the problem?
Problem is I don't know how to do that, and I'd like to have some feedback by fellow developers. As I need the websocket to be active for voice connection, I'm limited to how I can do it
What have you tried?
I've searched about docker swarm and kubernetes but I'm not sure it should be the best
Code & Details
Well, I'm using Discord.js for the client to connect, so there's no specific code here
4 Replies
I have an healthcheck system using an api, that I can modify if I want. But the machines can intercommunicate.
The bot does not need 100 shards by default, I'm only having more shards to lighten the process, 20-30 are enough for 68k servers
This is really getting into system architectures more than anything discord.js specific
SHarding is already a form of load balancing, but what it doesn't really do is any sort of health monitoring beyond respawning a shard if it dies
If you wanted to run additional shards in a sort of hot-swap, you'd need to implement you own event routing to make sure idle secondary shards aren't processing events
What you originally suggested of all 100 shards being down suggests a bigger issue though, not something that can easily be addressed by spinning up 20 others on some other infrastructure
We're talking very custom websocket and shard handling there
Beyond what the library is designed for
Yep definitely, I understand that, I'm mainly asking to seek ideas on how I could implement such thing. Mentionning the idle state of events might be what I'd be searching for. And thanks for that