I'm having trouble consistently running models larger than 70b parameters in webui. They only work maybe one in ten times. When I do get them to work, even if I keep the pod, put it to sleep, and spin it up again later, I get error messages. Here's an example of error messages I'm getting from trying to load a model that I have successfully loaded before using the exact same configuration:
Traceback (most recent call last):
File "/workspace/text-generation-webui/modules/ui_model_menu.py", line 214, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader) File "/workspace/text-generation-webui/modules/models.py", line 90, in load_model
output = load_func_maploader File "/workspace/text-generation-webui/modules/models.py", line 399, in ExLlama_HF_loader
return ExllamaHF.from_pretrained(model_name) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 174, in from_pretrained
return ExllamaHF(config) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 31, in init
self.ex_model = ExLlama(self.ex_config) File "/usr/local/lib/python3.10/dist-packages/exllama/model.py", line 852, in init