Slow Disk Performance For H200 nodes
"Container volumes provide fast read and write speeds since they are locally attached to workers." Based on the RunPod documentation, I would expect there to be local SSDs with the pods. However, I've noticed getting slow disk speeds when I'm using certain H200 nodes. I'd expect 5GB/s+ with local NVMe SSDs, but I'm getting < 1 GB/s with some pods. Here's the command I use to benchmark:
fio --name=write-test --directory=. --numjobs=8 --size=10G --rw=write --bs=128K --iodepth=1 --time_based --runtime=10s --group_reporting --direct=1
My company is saving very large model checkpoints, so we need performant disk speeds.2 Replies
@atanprofluent
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #20509
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View