RunPod•15mo ago

Worker Errors Out When Sending Simultaneous Requests

I was benchmarking a serverless endpoint by sending 10 simultaneous requests to the endpoint that has two active workers and one of the workers keeps errors out with the attached stack trace. After this error happens I get 9 requests that become stuck In Progress and if I terminate the errored out worker and spin up a new one I get the same stack trace unless I manually clear out the In Progress requests. This endpoint is using a Llama2 70B model with image runpod/worker-vllm:0.2.3

Solution:

Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.

Jump to solution

3 Replies

hexadecibalOP•15mo ago

Here is the error stack

hexadecibalOP•15mo ago

stack.txt

Solution

hexadecibal•15mo ago

Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.

Gaming

Programming

Worker Errors Out When Sending Simultaneous Requests

Did you find this page helpful?