Trying to work with: llama3-70b-8192 and I get out of memory

Hi
I am trying to work with the model:

llama3-70b-8192

llama3-70b-8192

but I cant deploy my serverless endpoint because out of memory.
I have attached image config screenshot. please reccoment on other settings to make it work

[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU

[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU

Thanks

Runpod•16mo ago•

2 replies

avif

Trying to work with: llama3-70b-8192 and I get out of memory

Hi
I am trying to work with the model:

llama3-70b-8192

llama3-70b-8192

but I cant deploy my serverless endpoint because out of memory.
I have attached image config screenshot. please reccoment on other settings to make it work

[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU

[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU

Thanks

Continue the conversation

Join the Discord to ask follow-up questions and connect with the community

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

21,906 Members

Join

Trying to work with: llama3-70b-8192 and I get out of memory

Trying to work with: llama3-70b-8192 and I get out of memory

Continue the conversation

Runpod

Continue the conversation

Runpod

Similar Threads