Huge P98 execution time in EU-RO region endpoint

We are seeing a huge P98 execution time in one of our EU-RO region endpoints for the past few days. It used to be below 60s in general, but now it soared above 40 minutes. We also see no correlation between the input text length & inference time, so just wanted to check if there is any hardware or driver releated issues in this region. Endpoint id: 1wfnup871iklus I suspect this also drastically increased our number of running workers.
No description
1 Reply
jg
jg3mo ago
Adding on to this issue, we've noticed that there might be messages in the queue that have not been properly handled. Based on the logs from one of our endpoints, we see KeyError: 'input' even when there are no requests being sent to this specific endpoint (1wfnup871iklus)
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
KeyError: 'input'
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
if job["input"] is None:
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
File "/usr/local/envs/venv/lib/python3.9/site-packages/runpod/serverless/work_loop.py", line 43, in start_worker
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
return future.result()
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
KeyError: 'input'
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
if job["input"] is None:
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
File "/usr/local/envs/venv/lib/python3.9/site-packages/runpod/serverless/work_loop.py", line 43, in start_worker
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
return future.result()
We suspect that in advent of this error, the worker refreshes and results in causing additional delays in processing the requests. But this is speculation and any help would help us a lot in addressing this issue. It would really help if we can get any updates on this.. the increase in execution time is causing us to spawn up more than 3 time the normal amount of workers that we needed to handle our normal traffic.