I used Runpod a year ago and was able load a Llama3-8B finetune into vLLM and quantized using BNB on the fly to 4bit. I've been trying that with a Qwen3-14B finetune recently and I can't seem to get it to work. I also merged my finetune to 4-bit bnb safetensors and it also refuses to load.
Is there some new configuration I need to use to get this to work now?
No replies yet
Join the Discord to continue the conversation
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!