Slow Disk Performance For H200 nodes

"Container volumes provide fast read and write speeds since they are locally attached to workers." Based on the RunPod documentation, I would expect there to be local SSDs with the pods. However, I've noticed getting slow disk speeds when I'm using certain H200 nodes. I'd expect 5GB/s+ with local NVMe SSDs, but I'm getting < 1 GB/s with some pods. Here's the command I use to benchmark: fio --name=write-test --directory=. --numjobs=8 --size=10G --rw=write --bs=128K --iodepth=1 --time_based --runtime=10s --group_reporting --direct=1 My company is saving very large model checkpoints, so we need performant disk speeds.
2 Replies
Poddy
Poddy4mo ago
@atanprofluent
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #20509
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?