48GB (also tried 80GB)runpod/worker-vllm:0.3.0-cuda11.8.0MODEL_NAME=TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ (Also tried: casperhansen/mixtral-instruct-awq and TheBloke/firefly-mixtral-8x7b-GPTQ and mistralai/Mixtral-8x7B-Instruct-v0.1TRUST_REMOTE_CODE=1QUANTIZATION=awq || gptq for gptq modelsJoin the Discord to ask follow-up questions and connect with the community
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
21,906 Members
Join