R
Runpod5mo ago
MW

Achieving concurrent requests per worker

Hi, I’m new to RunPod and am trying to deploy my own fine-tuned version of an XTTS model. I’ve successfully deployed a container image that runs and returns results as expected. However, I’m unable to process requests concurrently. I’ve read the RunPod documentation on enabling concurrent requests, but it doesn’t seem to work for me. When I trigger the /run endpoint multiple times, the requests queue up instead of running simultaneously. My expectation is that a single worker should handle multiple requests concurrently, up to the maximum capacity I configure. I implement a dynamic concurrency function similar to the one in the documentation here https://docs.runpod.io/serverless/workers/concurrent-handler. Could you please help me understand how to correctly set up concurrency in my deployment? Thank you!
Runpod Documentation
Welcome to RunPod - Runpod Documentation
Explore our guides and examples to deploy your AI/ML application on RunPod.
4 Replies
Unknown User
Unknown User5mo ago
Message Not Public
Sign In & Join Server To View
MW
MWOP5mo ago
My concurrency modifier is a dynamic function that changes in value ( min 1 and max 7) based on GPU usage, very similar logic to the link I sent above.
Unknown User
Unknown User5mo ago
Message Not Public
Sign In & Join Server To View
MW
MWOP5mo ago
This worked, I guess something is wrong with the way I have my logic set up, thank you!

Did you find this page helpful?