hanging after 500 concurrent requests
Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.
3 Replies
@Alpay Ariyak any idea?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Yeah, could you please expand on how it hangs?
Also, Max Job Concurrency is 300 by default, you can change it with
MAX_CONCURRENCY env var