Runpod•3mo ago

Setting up a serverless endpoint for a custom model

Hi, complete beginner here. I've been trying to set up a custom model endpoint, but keep getting issues with having my requests answered (using the web application). I've gotten delay times of 40+ minutes so clearly something is wrong but I am having a hard time troubleshooting. Here are some details: - trained a model (base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B) and saved to a huggingface repo - trying to publish endpoint via GitHub integration (already have folder containing Dockerfile, handler.py, requirements.txt) - hugginface repo that handler function uses contains: (4) model.safetensors files, model.gguf file, model.safetensors.index.json - errors in logs show: [error]worker exited with exit code 1 [info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n [info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n [info] File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file\n [info] [info] File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4107, in save_pretrained\n [info] model.save_pretrained(NETWORK_MODEL_PATH)\n [info] File "//handler.py", line 77, in <module>\n [info]Traceback (most recent call last):\n [info]Saving model to network storage...\n (line 77: model.save_pretrained(NETWORK_MODEL_PATH)) but I've tried to increase size of workers and the size of network storage but that didn't do anything.

23 Replies

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

yea I just made it a global varible

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

tried that im alr at 50gb

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

sure, def clean_and_setup_network_storage(): """Prepare network volume""" if os.path.exists(NETWORK_MODEL_PATH): shutil.rmtree(NETWORK_MODEL_PATH) if os.path.exists(HF_CACHE_DIR): shutil.rmtree(HF_CACHE_DIR) os.makedirs(NETWORK_MODEL_PATH, exist_ok=True) os.makedirs(HF_CACHE_DIR, exist_ok=True) // Load model from network storage if exists, else download and save if check_model_files_exist(): print("Loading model from network storage...") tokenizer = AutoTokenizer.from_pretrained(NETWORK_MODEL_PATH, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( NETWORK_MODEL_PATH, trust_remote_code=True ) else: print("Model not found on network storage. Downloading...") clean_and_setup_network_storage() tokenizer = AutoTokenizer.from_pretrained( MODEL_ID, cache_dir=HF_CACHE_DIR, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, cache_dir=HF_CACHE_DIR, trust_remote_code=True ) print("Saving model to network storage...") tokenizer.save_pretrained(NETWORK_MODEL_PATH) model.save_pretrained(NETWORK_MODEL_PATH) print("Saved successfully.") //Create inference pipeline pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer) def handler(job): job_input = job.get("input", {}) prompt = job_input.get("prompt", "") max_new_tokens = job_input.get("max_new_tokens", 128) temperature = job_input.get("temperature", 0.7) if not prompt: return {"error": "No prompt provided."} try: output = pipe(prompt, max_new_tokens=max_new_tokens, temperature=temperature) return {"output": output[0]["generated_text"]} except Exception as e: return {"error": f"Generation failed: {str(e)}"} runpod.serverless.start({"handler": handler}) I download the model from HuggingFace and save it to network storage for future use

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

HF_CACHE_DIR = "/runpod-volume/hf_cache"

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

sorry how do I check? its just showing the size, data center, month;y cost and id

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

ok i'm trying to do that but i'm just not able to start the connection the one worker thats running isn't connecting and all the other workers are "throttled" but I see this

KushOP•3mo ago

also, does this error only relate to the size of the network storage or can there be other causes? [info]safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 122, kind: QuotaExceeded, message: "Disk quota exceeded" })\n [info] serialize_file(_flatten(tensors), filename, metadata=metadata)\n

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

Yea I connected it and even tried to delete it, update storage from 30 to 50gb and reconnect it so I don't know why its not showing anything

Poddy•3mo ago

@Kush

Escalated To Zendesk

The thread has been escalated to Zendesk!

Ticket ID: #21203

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

In general btw, is there anywhere that people tend to mess up when creating an endpoint? Or common fixes to the issue I have

Unknown User•3mo ago

Message Not Public

KushOP•3mo ago

hm ok

Gaming

Programming

Setting up a serverless endpoint for a custom model

Did you find this page helpful?