R
Runpod2y ago
Maher

hanging after 500 concurrent requests

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.
3 Replies
digigoblin
digigoblin2y ago
@Alpay Ariyak any idea?
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
Alpay Ariyak
Alpay Ariyak2y ago
Yeah, could you please expand on how it hangs? Also, Max Job Concurrency is 300 by default, you can change it with MAX_CONCURRENCY env var

Did you find this page helpful?