Multi Node training with torchrun/slurm

Has anyone here ever tried multinode on runpod? I am thinking of setting this up but if people have encountered prohibitive network speeds I do not see a reason to.
6 Replies
flash-singh
flash-singh17mo ago
you won't be able to do this without our multi-node feature since you don't get access to internal ips
Unknown User
Unknown User17mo ago
Message Not Public
Sign In & Join Server To View
flash-singh
flash-singh17mo ago
its a new service we are currently alpha testing, will let you deploy multi node clusters for training or other use cases with 100+ Gbps private networking
Unknown User
Unknown User17mo ago
Message Not Public
Sign In & Join Server To View
flash-singh
flash-singh17mo ago
a100s / h100s, likely will open beta next month
Unknown User
Unknown User17mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?