Achieving concurrent requests per worker
Hi, I’m new to RunPod and am trying to deploy my own fine-tuned version of an XTTS model. I’ve successfully deployed a container image that runs and returns results as expected. However, I’m unable to process requests concurrently.
I’ve read the RunPod documentation on enabling concurrent requests, but it doesn’t seem to work for me. When I trigger the /run endpoint multiple times, the requests queue up instead of running simultaneously. My expectation is that a single worker should handle multiple requests concurrently, up to the maximum capacity I configure.
I implement a dynamic concurrency function similar to the one in the documentation here https://docs.runpod.io/serverless/workers/concurrent-handler.
Could you please help me understand how to correctly set up concurrency in my deployment? Thank you!
Runpod Documentation
Welcome to RunPod - Runpod Documentation
Explore our guides and examples to deploy your AI/ML application on RunPod.
4 Replies
Unknown User•5mo ago
Message Not Public
Sign In & Join Server To View
My concurrency modifier is a dynamic function that changes in value ( min 1 and max 7) based on GPU usage, very similar logic to the link I sent above.
Unknown User•5mo ago
Message Not Public
Sign In & Join Server To View
This worked, I guess something is wrong with the way I have my logic set up, thank you!