Hi, I’m new to RunPod and am trying to deploy my own fine-tuned version of an XTTS model. I’ve successfully deployed a container image that runs and returns results as expected. However, I’m unable to process requests concurrently.
I’ve read the RunPod documentation on enabling concurrent requests, but it doesn’t seem to work for me. When I trigger the /run endpoint multiple times, the requests queue up instead of running simultaneously. My expectation is that a single worker should handle multiple requests concurrently, up to the maximum capacity I configure.