Intermittent Slow Performance Issue with GPU Workers
I am currently encountering an intermittent issue with some GPU workers exhibiting significantly slower performance. I have tried to measure the time taken for a specific task on a designated type of GPU worker (4090 24GB). Typically, when I send the exact identical payload input to the endpoint, the execution time is around 1 minute. However, I have observed that occasionally, a worker becomes exceptionally slow. Even with the same payload input, Docker image, tag, and GPU type, the execution time extends to a few hours. Notably, during these occurrences, the GPU utilization remains constantly at 0%.
Upon reviewing the output log, it is evident that the inference speed is unusually slow when the affected worker is in operation. Have any of you experienced a similar problem, and if so, how did you resolve it?
Your insights and assistance in addressing this issue would be greatly appreciated. Thank you.
Upon reviewing the output log, it is evident that the inference speed is unusually slow when the affected worker is in operation. Have any of you experienced a similar problem, and if so, how did you resolve it?
Your insights and assistance in addressing this issue would be greatly appreciated. Thank you.
