Worker Errors Out When Sending Simultaneous Requests

I was benchmarking a serverless endpoint by sending 10 simultaneous requests to the endpoint that has two active workers and one of the workers keeps errors out with the attached stack trace. After this error happens I get 9 requests that become stuck In Progress and if I terminate the errored out worker and spin up a new one I get the same stack trace unless I manually clear out the In Progress requests. This endpoint is using a Llama2 70B model with image runpod/worker-vllm:0.2.3
Solution:
Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.
Jump to solution
3 Replies
hexadecibal
hexadecibal4mo ago
Here is the error stack
Solution
hexadecibal
hexadecibal4mo ago
Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.