Setting up a serverless endpoint for a custom model
Hi, complete beginner here. I've been trying to set up a custom model endpoint, but keep getting issues with having my requests answered (using the web application). I've gotten delay times of 40+ minutes so clearly something is wrong but I am having a hard time troubleshooting.
Here are some details: - trained a model (base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and saved to a huggingface repo - trying to publish endpoint via GitHub integration (already have folder containing Dockerfile, handler.py, requirements.txt) - hugginface repo that handler function uses contains: (4) model.safetensors files, model.gguf file, model.safetensors.index.json - errors in logs show:
[error]worker exited with exit code 1 [info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n [info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n [info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n [info] [info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n [info] model.save_pretrained(NETWORK_MODEL_PATH)\n [info] File "//handler.py", line 77, in <module>\n [info]Traceback (most recent call last):\n [info]Saving model to network storage...\n
(line 77: model.save_pretrained(NETWORK_MODEL_PATH)) but I've tried to increase size of workers and the size of network storage but that didn't do anything.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!