R
Runpod16mo ago
octopus

Guide to deploy Llama 405B on Serverless?

Hi, can any experts on Serverless advice on how to deploy Llama 405B on Serverless?
33 Replies
Suba
Suba16mo ago
@octopus - you need to attach a network volume to the end point. The volume should have at least 1 TB space to hold the 405 B model (unless you are using quantized models). Then increase the number of workers to match the model gpu requirement (like 10 48 GB GPUs) I tried several 405 B models in HF but get error related to rope_scaling. Looks like we need to modify it to null and try. To do this I need to download all files and upload again.
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
@nerdylive not sure about this, do we have a document or page that lists vllm's support for a model?
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
looks like it supports LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc.
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
I am using runpod/worker-vllm:stable-cuda12.1.0 since I am using serverless I am unable to run any command
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
No I get error related to rope_scaling llama 3.1 's config.json has lots of params under rope_scaling
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
but the current vllm accepts only two params
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
2024-07-24T04:42:22.063990694Z engine.py :110 2024-07-24 04:42:22,063 Error initializing vLLM engine: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
ok got it, 405 is not in there
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
ok.. is it done automatically or should we raise a ticket etc
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
great, thank you very much for your time 🙂
tim
tim16mo ago
You could try to use https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference, but with a GPU. The ollama worker was updated and now it supports also Llama 3.1. We only tested this with 8B, but I don’t see why this shouldn’t also work with 405B 🙏
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
tim
tim16mo ago
I will also test this later today with 70 and 405.
Suba
Suba16mo ago
@nerdylive would like to know if you got any news on the vllm update for 405
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Suba
Suba16mo ago
@NERDDISCO pls let me know if ollama worker worked with 405
yhlong00000
yhlong0000016mo ago
RunPod Blog
Run Llama 3.1 405B with Ollama: A Step-by-Step Guide
Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K)
tim
tim16mo ago
Thats super cool! How can we also do this serverless? We can’t add multiple GPUs to a worker, so is there any other way?
yhlong00000
yhlong0000016mo ago
Yeah currently you can’t , 405b needs too much memory. 😂😂😂
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View
Madiator2011
Madiator201116mo ago
I suspect about 200+
Unknown User
Unknown User16mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?