RunpodR
Runpod2y ago
15 replies
blabbercrab

Trying to load a huge model into serverless

https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b

Anyone have any idea how to do this in vLLM?
I've deployed using two 80GB gpus and have had no luck
Was this page helpful?