Setting up a serverless endpoint for a custom model
Hi, complete beginner here. I've been trying to set up a custom model endpoint, but keep getting issues with having my requests answered (using the web application). I've gotten delay times of 40+ minutes so clearly something is wrong but I am having a hard time troubleshooting.
Here are some details:
[info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n
[info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n
[info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n
[info]
[info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n
[info] model.save_pretrained(NETWORK_MODEL_PATH)\n
[info] File "//handler.py", line 77, in <module>\n
[info]Traceback (most recent call last):\n
[info]Saving model to network storage...\n
(line 77: model.save_pretrained(NETWORK_MODEL_PATH))
but I've tried to increase size of workers and the size of network storage but that didn't do anything.
Here are some details:
- trained a model (base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and saved to a huggingface repo
- trying to publish endpoint via GitHub integration (already have folder containing Dockerfile, handler.py, requirements.txt)
- hugginface repo that handler function uses contains: (4) model.safetensors files, model.gguf file, model.safetensors.index.json
- errors in logs show:
[info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n
[info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n
[info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n
[info]
[info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n
[info] model.save_pretrained(NETWORK_MODEL_PATH)\n
[info] File "//handler.py", line 77, in <module>\n
[info]Traceback (most recent call last):\n
[info]Saving model to network storage...\n
(line 77: model.save_pretrained(NETWORK_MODEL_PATH))
but I've tried to increase size of workers and the size of network storage but that didn't do anything.