how to run a quantized model on server less? I'd like to run the 4/8 bit version of this model: - Runpod