Llama-3.1-Nemotron-70B-Instruct in Serverless
Hello there,
I've been trying to deploy Nvidia's Llama-3.1-Nemotron-70B-Instruct in serverless using vLLM template but I could not get it work no matter what.
I'm trying to deploy it using an endpoint using 2 x H100 GPUs, but in my most attempts I don't even see weights being downloaded. Requests start and after few minutes worker terminates.
In this scenario I get error:
Even weirder is that I deploy exact same configuration again but sometimes it downloads the weights and then does not work with different errors each time. It's not consistent.
In fact, I tried few other popular 70B models but couldn't get any of them work.
Has anybody tried and managed to run 70B models in serverless so far?
I've been trying to deploy Nvidia's Llama-3.1-Nemotron-70B-Instruct in serverless using vLLM template but I could not get it work no matter what.
I'm trying to deploy it using an endpoint using 2 x H100 GPUs, but in my most attempts I don't even see weights being downloaded. Requests start and after few minutes worker terminates.
In this scenario I get error:
Unrecognized model in nvidia/Llama-3.1-Nemotron-70B-Instruct. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, (and list goes on)Even weirder is that I deploy exact same configuration again but sometimes it downloads the weights and then does not work with different errors each time. It's not consistent.
In fact, I tried few other popular 70B models but couldn't get any of them work.
Has anybody tried and managed to run 70B models in serverless so far?