R
Runpod4w ago
Kush

Setting up a serverless endpoint for a custom model

Hi, complete beginner here. I've been trying to set up a custom model endpoint, but keep getting issues with having my requests answered (using the web application). I've gotten delay times of 40+ minutes so clearly something is wrong but I am having a hard time troubleshooting. Here are some details: - trained a model (base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and saved to a huggingface repo - trying to publish endpoint via GitHub integration (already have folder containing Dockerfile, handler.py, requirements.txt) - hugginface repo that handler function uses contains: (4) model.safetensors files, model.gguf file, model.safetensors.index.json - errors in logs show: [error]worker exited with exit code 1 [info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n [info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n [info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n [info] [info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n [info] model.save_pretrained(NETWORK_MODEL_PATH)\n [info] File "//handler.py", line 77, in <module>\n [info]Traceback (most recent call last):\n [info]Saving model to network storage...\n (line 77: model.save_pretrained(NETWORK_MODEL_PATH)) but I've tried to increase size of workers and the size of network storage but that didn't do anything.
23 Replies
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
yea I just made it a global varible
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
tried that im alr at 50gb
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
sure, def clean_and_setup_network_storage(): """Prepare network volume""" if os.path.exists(NETWORK_MODEL_PATH): shutil.rmtree(NETWORK_MODEL_PATH) if os.path.exists(HF_CACHE_DIR): shutil.rmtree(HF_CACHE_DIR) os.makedirs(NETWORK_MODEL_PATH, exist_ok=True) os.makedirs(HF_CACHE_DIR, exist_ok=True) // Load model from network storage if exists, else download and save if check_model_files_exist(): print("Loading model from network storage...") tokenizer = AutoTokenizer.from_pretrained(NETWORK_MODEL_PATH, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( NETWORK_MODEL_PATH, trust_remote_code=True ) else: print("Model not found on network storage. Downloading...") clean_and_setup_network_storage() tokenizer = AutoTokenizer.from_pretrained( MODEL_ID, cache_dir=HF_CACHE_DIR, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, cache_dir=HF_CACHE_DIR, trust_remote_code=True ) print("Saving model to network storage...") tokenizer.save_pretrained(NETWORK_MODEL_PATH) model.save_pretrained(NETWORK_MODEL_PATH) print("Saved successfully.") //Create inference pipeline pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer) def handler(job): job_input = job.get("input", {}) prompt = job_input.get("prompt", "") max_new_tokens = job_input.get("max_new_tokens", 128) temperature = job_input.get("temperature", 0.7) if not prompt: return {"error": "No prompt provided."} try: output = pipe(prompt, max_new_tokens=max_new_tokens, temperature=temperature) return {"output": output[0]["generated_text"]} except Exception as e: return {"error": f"Generation failed: {str(e)}"} runpod.serverless.start({"handler": handler}) I download the model from HuggingFace and save it to network storage for future use
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
No description
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
HF_CACHE_DIR = "/runpod-volume/hf_cache"
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
sorry how do I check? its just showing the size, data center, month;y cost and id
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
ok i'm trying to do that but i'm just not able to start the connection the one worker thats running isn't connecting and all the other workers are "throttled" but I see this
Kush
KushOP4w ago
No description
Kush
KushOP4w ago
also, does this error only relate to the size of the network storage or can there be other causes? [info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n [info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
Yea I connected it and even tried to delete it, update storage from 30 to 50gb and reconnect it so I don't know why its not showing anything
Poddy
Poddy4w ago
@Kush
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #21203
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
In general btw, is there anywhere that people tend to mess up when creating an endpoint? Or common fixes to the issue I have
Unknown User
Unknown User4w ago
Message Not Public
Sign In & Join Server To View
Kush
KushOP4w ago
hm ok

Did you find this page helpful?