Runpod•5mo ago

Achieving concurrent requests per worker

Hi, I’m new to RunPod and am trying to deploy my own fine-tuned version of an XTTS model. I’ve successfully deployed a container image that runs and returns results as expected. However, I’m unable to process requests concurrently. I’ve read the RunPod documentation on enabling concurrent requests, but it doesn’t seem to work for me. When I trigger the /run endpoint multiple times, the requests queue up instead of running simultaneously. My expectation is that a single worker should handle multiple requests concurrently, up to the maximum capacity I configure. I implement a dynamic concurrency function similar to the one in the documentation here https://docs.runpod.io/serverless/workers/concurrent-handler. Could you please help me understand how to correctly set up concurrency in my deployment? Thank you!

Runpod Documentation

Welcome to RunPod - Runpod Documentation

Explore our guides and examples to deploy your AI/ML application on RunPod.

4 Replies

Unknown User•5mo ago

Message Not Public

MWOP•5mo ago

My concurrency modifier is a dynamic function that changes in value ( min 1 and max 7) based on GPU usage, very similar logic to the link I sent above.

Unknown User•5mo ago

Message Not Public

MWOP•5mo ago

This worked, I guess something is wrong with the way I have my logic set up, thank you!

Gaming

Programming

Achieving concurrent requests per worker

Did you find this page helpful?