Just wanted to get your recommendations on GPU choice for running a 13B language model with a quantization in AWQ or GPTQ? Workload would be around 200-300 requests / hour. I tried a 48 GB A6000 with pretty good results but I was wondering if you think 24 GB GPU could be up to the task?
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!