Runpod•13mo ago

Using runpod serverless for HF 72b Qwen model --> seeking help

Hey all, I'm new to this and tried loading a HF Qwen 2.5 72b variant on Runpod serverless, and I'm having issues.

Requesting help from runpod veterans please!

Here's what i did:

Clicked into runpod serverless

pasted the HF link for modell https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2

Chose A100 (80gb) and 2GPUs (choosing 1 GPU gave me an error message)

Added MAX_MODEL_LENGTH setting of 20k tokens (previously had an error message as I didn't set this initially, which was busted by the 128k default model context)

Clicked deploy

Clicked run ("hello world prompt")

It then started loading . Took about half and hour, to download, went through all the checkpoints and eventually just had a bunch of error messages, and the pod just kept running. Ate up $10 of credits.

LOG output was somethhing like attached.

It just kept running and eating credits, and wouldnt respond to any requests (would always just be in queue) so i shut it down.

I tried googling / youtube for tutorials, but haven't found much.

Anyone can point me in the right direction to get this going please?

Thanks!

message.txt9.36KB

EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 · Hugging Face

Charixfox•12/12/24, 11:24 PM

Have you successfully gotten a smaller model - 3B or 7B - to work first so you understand the process?

Downloading the model takes a large amount of time. If it's downloaded to a worker storage, it needs to download every time it starts a new worker or otherwise reinitializes. With small models, this is not a big deal. With a 72B model, that is 144GB of download for the full FP16.

bpOP•12/13/24, 2:15 AM

ok lemme start off with a les ambitiious project haha thanks

Using runpod serverless for HF 72b Qwen model --> seeking help

Similar Threads

Similar Threads

Similar Threads