Setting up a serverless endpoint for a custom model
Hi, complete beginner here. I've been trying to set up a custom model endpoint, but keep getting issues with having my requests answered (using the web application). I've gotten delay times of 40+ minutes so clearly something is wrong but I am having a hard time troubleshooting.
Here are some details:
- trained a model (base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and saved to a huggingface repo
- trying to publish endpoint via GitHub integration (already have folder containing Dockerfile, handler.py, requirements.txt)
- hugginface repo that handler function uses contains: (4) model.safetensors files, model.gguf file, model.safetensors.index.json
- errors in logs show:
[error]worker exited with exit code 1
[info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n
[info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n
[info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n
[info]
[info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n
[info] model.save_pretrained(NETWORK_MODEL_PATH)\n
[info] File "//handler.py", line 77, in <module>\n
[info]Traceback (most recent call last):\n
[info]Saving model to network storage...\n
(line 77: model.save_pretrained(NETWORK_MODEL_PATH))
but I've tried to increase size of workers and the size of network storage but that didn't do anything.
23 Replies
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
yea I just made it a global varible
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
tried that im alr at 50gb
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
sure,
def clean_and_setup_network_storage():
"""Prepare network volume"""
if os.path.exists(NETWORK_MODEL_PATH):
shutil.rmtree(NETWORK_MODEL_PATH)
if os.path.exists(HF_CACHE_DIR):
shutil.rmtree(HF_CACHE_DIR)
os.makedirs(NETWORK_MODEL_PATH, exist_ok=True)
os.makedirs(HF_CACHE_DIR, exist_ok=True)
// Load model from network storage if exists, else download and save
if check_model_files_exist():
print("Loading model from network storage...")
tokenizer = AutoTokenizer.from_pretrained(NETWORK_MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
NETWORK_MODEL_PATH,
trust_remote_code=True
)
else:
print("Model not found on network storage. Downloading...")
clean_and_setup_network_storage()
tokenizer = AutoTokenizer.from_pretrained(
MODEL_ID,
cache_dir=HF_CACHE_DIR,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
cache_dir=HF_CACHE_DIR,
trust_remote_code=True
)
print("Saving model to network storage...")
tokenizer.save_pretrained(NETWORK_MODEL_PATH)
model.save_pretrained(NETWORK_MODEL_PATH)
print("Saved successfully.")
//Create inference pipeline
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer)
def handler(job):
job_input = job.get("input", {})
prompt = job_input.get("prompt", "")
max_new_tokens = job_input.get("max_new_tokens", 128)
temperature = job_input.get("temperature", 0.7)
if not prompt:
return {"error": "No prompt provided."}
try:
output = pipe(prompt, max_new_tokens=max_new_tokens, temperature=temperature)
return {"output": output[0]["generated_text"]}
except Exception as e:
return {"error": f"Generation failed: {str(e)}"}
runpod.serverless.start({"handler": handler})
I download the model from HuggingFace and save it to network storage for future use
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View

Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
HF_CACHE_DIR = "/runpod-volume/hf_cache"
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
sorry how do I check? its just showing the size, data center, month;y cost and id
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
ok i'm trying to do that but i'm just not able to start the connection
the one worker thats running isn't connecting and all the other workers are "throttled"
but I see this

also, does this error only relate to the size of the network storage or can there be other causes?
[info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n
[info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
Yea I connected it
and even tried to delete it, update storage from 30 to 50gb and reconnect it so I don't know why its not showing anything
@Kush
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #21203
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
In general btw, is there anywhere that people tend to mess up when creating an endpoint? Or common fixes to the issue I have
Unknown User•4w ago
Message Not Public
Sign In & Join Server To View
hm ok