SGLang DeepSeek-V3-0324

I have been trying to run Deepseek-V3-0324 using instant clusters with 2 x (8 x H100s) and have so far been unsuccessful. I am trying to get the model to run multi-node + multi-gpu.

I have downloaded the model from Huggingface onto a persistent and attach the persistent volume to my instant cluster before launching. After launching, I then run the Pytorch demo script as presented in https://docs.runpod.io/instant-clusters/pytorch to make sure that the network is working (it does).

I then follow the instructions to get Deepseek-V3-0324 running according to: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3

Instead of following the absolute default instructions and doing:

# node 1
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 0 --trust-remote-code

# node 2
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 1 --trust-remote-code


In its place, I run the following command on each node:
python3 -m sglang.launch_server --model-path DeepSeek-V3-0324 --tp 16 --dist-init-addr ${MASTER_ADDR}:${MASTER_PORT} --nnodes ${NUM_NODES} --node-rank ${NODE_RANK} --trust-remote-code


The issue is that this hangs. I check nvidia-smi to see the model loading and it only ever loads each GPU up to almost 1GB before it goes up no further.

Any help would be greatly appreciated.
IMAGE_2025-04-15_152102.jpg
Learn how to deploy an Instant Cluster and run a multi-node process using PyTorch.
GitHub
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
Was this page helpful?