Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
I'm trying to find what serverless GPU configs are required to run Mixtral 8x7B-Instruct either quantized (https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ) or the main from Mistral. It would be good to have this info in the ReadMe in vLLM Worker Repo.
I run into OutOfMemory issues when trying it on 48GB GPU.