Training runs 2-5x slower on pods than on home system.
Home system: 4090, 7950x, 64GB RAM, W.2 SSD.
I comparisons:
1x 4090: 2.5-3x slower on ALL ops.
L40: 5x slower
h200: 1.5x slower
The 'slowness' refers to the time for each operation. In the attached examples, I show that a nn.Linear module takes around 2x slower on the Runpod 4090, vs mine.
Why may this be?
For extra context, my dataset is mnist, and it is loaded onto the gpu


0 Replies