There's inconsistency in performance ( POD )
Hello. I rent and operate 20 RTX4090 GPUs all day long.
However, there are significant differences in inference speeds.
Each line in the table in the attached image represents 2 RTX 4090 GPUs.
One processes 150 images in 3 minutes. However, the rest only process 50-80 images. On my own RTX4090 2-way server that I purchased directly, the throughput is 180 images processed in 3 minutes. I haven't been able to figure out why these speed differences are occurring.
The inference task is generating one image.
However, there are significant differences in inference speeds.
Each line in the table in the attached image represents 2 RTX 4090 GPUs.
One processes 150 images in 3 minutes. However, the rest only process 50-80 images. On my own RTX4090 2-way server that I purchased directly, the throughput is 180 images processed in 3 minutes. I haven't been able to figure out why these speed differences are occurring.
The inference task is generating one image.
