Runpod•12mo ago

Serverless deepseek-ai/DeepSeek-R1 setup?

How can I configure a serverless end point for deepseek-ai/DeepSeek-R1?

LLattus How can I configure a serverless end point for deepseek-ai/DeepSeek-R1?

Jason•1/22/25, 2:05 PM

does vllm supports that model?

Jason•1/22/25, 2:06 PM

if not, you can make a model that can run inference for that model

LattusOP•1/22/25, 2:21 PM

https://huggingface.co/deepseek-ai/DeepSeek-R1
It seems so.

deepseek-ai/DeepSeek-R1 · Hugging Face

LattusOP•1/22/25, 2:23 PM

Basic config, 2 GPU count

LattusOP•1/22/25, 2:35 PM

Once it is running, I try the default hello world request and it just gets stuck IN_QUEUE for 8 minutes..

LLattus Once it is running, I try the default hello world request and it just gets stuck...

Jason•1/22/25, 3:00 PM

Can you check logs maybe its still downloading

Jason•1/22/25, 3:00 PM

or OOM

LLattus Basic config, 2 GPU count

Jason•1/22/25, 3:01 PM

wait.. how big is the model?

Jason•1/22/25, 3:02 PM

seems like r1 is a really huge model isnt it?

LattusOP•1/22/25, 3:07 PM

yes, but I tried even just following along with the youtube tutorial here and got the same IN_QUEUE problem...: https://youtu.be/0XXKK82LwWk?si=ZDCu_YV39Eb5Fn8A

YouTubeRunPod

Set Up A Serverless LLM Endpoint Using vLLM In Six Minutes on RunPod

Guide to setting up a serverless endpoint on RunPod in six minutes on RunPod.

LLattus yes, but I tried even just following along with the youtube tutorial here and go...

Jason•1/22/25, 3:18 PM

Any logs?

Jason•1/22/25, 3:18 PM

in your workers or endpoint?

LattusOP•1/22/25, 3:25 PM

Oh, wait!! I just ran the 1.5B model and got this response:

LattusOP•1/22/25, 3:26 PM

When I tried running the larger model, I got errors about not enough memory

LattusOP•1/22/25, 3:26 PM

""Uncaught exception | <class 'torch.OutOfMemoryError'>; CUDA out of memory. Tried to allocate 3.50 GiB. GPU 0 has a total capacity of 44.45 GiB of which 1.42 GiB is free"

Jason•1/22/25, 3:26 PM

seems like you got oom ya..

LattusOP•1/22/25, 3:27 PM

So how do I configure ?

Jason•1/22/25, 3:27 PM

r1 is such a huge model seems like you need 1tb+ vram
don't know how to calculate, but est maybe something in range of 700gb+ vram

LattusOP•1/22/25, 3:27 PM

wow

LattusOP•1/22/25, 3:27 PM

so it's not really an option to deploy?..

Jason•1/22/25, 3:27 PM

not sure, depends for your use hahah

LattusOP•1/22/25, 3:27 PM

I mean, Deepseek offers their own API keys

LattusOP•1/22/25, 3:28 PM

I thought it could be more cost effective to just run a serverless endpoint here but..

LLattus I thought it could be more cost effective to just run a serverless endpoint here...

Jason•1/22/25, 3:29 PM

only if you got enough volume, especially for bigger models imo

LattusOP•1/22/25, 3:30 PM

hmm.. I see

LattusOP•1/22/25, 3:30 PM

Thanks for your help

Jason•1/22/25, 3:32 PM

your welcome bro

lsdvaibhavvvv•1/28/25, 7:18 AM

Hey @nerdylive i still can deploy the 7B deepseek R1 model right instead of huge model. ?

lsdvaibhavvvv•1/28/25, 7:19 AM

Screenshot_2025-01-28_at_12.49.18_PM.png

lsdvaibhavvvv•1/28/25, 7:19 AM

I am facing this issue

lsdvaibhavvvv•1/28/25, 7:19 AM

I am not that good in resolving issues.

<MarDev/>•1/28/25, 9:39 PM

Did you find a solution ?

lsdvaibhavvvv•1/29/25, 4:08 AM

Not yet...

Jason•1/30/25, 2:29 AM

use trust remote code = true

Jason•1/30/25, 2:29 AM

lsdvaibhavvvv•1/30/25, 10:18 AM

where should i put this

lsdvaibhavvvv•1/30/25, 10:18 AM

in envrinment

Jason•1/30/25, 10:37 AM

env variable

Jason•1/30/25, 10:40 AM

like this

Llsdvaibhavvvv Click to see attachment

riverfog7•2/22/25, 6:35 AM

Is the model you are trying to run a GGUF quant? You'll need a custom script for GGUF quants or if there is multiple models in a single repo

Jehex•3/5/25, 8:51 AM

I dont understand , this morning I to do a brief test with https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

a 24gb Vram gpu, but now I got a error cuda memory , do you know guy's how I can fix this issue ?

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B · Hugging Face

yhlong00000•3/5/25, 12:52 PM

try 48GB gpu, see if that helps.

Jehex•3/6/25, 2:37 AM

Hello there, I increased the max token settings but still getting only the beginning of the thinking, how can I fix that

Yyhlong00000 try 48GB gpu, see if that helps.

Jehex•3/6/25, 2:37 AM

yep fixed thanks

JJehex Hello there, I increased the max token settings but still getting only the begin...

Jason•3/6/25, 3:26 AM

set max tokens to mroe than 16

Jason•3/6/25, 3:44 AM

in your request, or use a openai client sdk

JJason set max tokens to mroe than 16

Jehex•3/6/25, 4:53 AM

Thanks ! Will let you know if it’s work

JJason set max tokens to mroe than 16

Jehex•3/6/25, 6:39 AM

Yep increase to 3000 but still getting a short " thinking " answer

JJehex Yep increase to 3000 but still getting a short " thinking " answer 😦

Jason•3/6/25, 6:43 AM

How did you configure it

JJason How did you configure it

Jehex•3/6/25, 6:45 AM

basically used this model casperhansen/deepseek-r1-distill-qwen-32b-awq with vllm and runpod serverless, except lower the model max lenght to 11000 I didnt modify any others settings

Serverless deepseek-ai/DeepSeek-R1 setup?

Similar Threads

Similar Threads

Similar Threads