How to configure a one-to-one mapping of client connection to worker/GPU instance
I am building an application where a client connects to a worker and the worker streams some content to the client over websocket. I want to configure this setup to force a one-to-one mapping of client to worker. In other words, I would like precise control over how individual client requests are allocated to workers. I tried setting the request count to 1 to force the endpoint to spin up one worker per client connection, but that didn't work because while the endpoint does spin up one worker per endpoint, it still routes multiple client connections through the same worker at least some of the time because it is handling load-balancing with some logic that doesn't seem to be accessible as far as I've found.
3 Replies
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
If websocket works that can be an option
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View