"I'm training on 4090D with 500 images. The speed is 4.4s/it (using Rank_3_18950MB_9_05_Second_IT). When I'm training on A800 (80 GB VRAM), the speed is 3.3s/it (using Rank_1_29500MB_8_85_Second_IT). I think the speed difference is abnormal. What could be causing this?"