Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Prompt formatting is weird for my model

Tried making a serverless endpoint with my model from huggingface but its stuck with this prompt template. The correct one should be the alpaca prompt template which is like "instruction", "input", "output". How do I change it to this? It's only giving me errors when I change it in this window and try to send a request that way.
Solution:
That field doesn't change based on the container or model you're using on the endpoint. It's just a placeholder. To test it via the RunPod UI like you're trying, which allows only non-openAI paths /run or runsync, the payload should be: ```json { "input": { "messages": [...
No description

Output is 100%, but still processing

We use runpod to do a generation that completes without error. But the progress reaches 100% but the status still forever IN_PROGRESS. Can someone please help. Our production is suffering This only occurs sometimes, and other times it correctly succeeds. When check logs it even shows ‘Finished’ and no errors whatsoever...

runpod.serverless has no attribute progress_update (runpod_python 1.7.12)

I've been given a task to look into an existing serverless handler written in python. I updated the dependencies, and now I'm getting runpod.serverless has no attribute "progress_update". I looked into it, and it seems that it's defined in rp_progress, which is not being exposed as a module? is this intentional? is there a different feature that exposes the same functionality that is undocumented, or am I doing something wrong? It's been the first time in forever that I did any python aside...

How much is payment? How to pay?

In serverless, I wanna know this. How much is payment? How to pay? Monthly? hourly? or secondly?...

Default stable diffusion preset doesn't work on rtx 5090 serverless

Hi! I've created SDXL from preset on runpod serverless, made one simple request using requests windows, and closed the window, and returned to runpod like two day later to see 5 dollars taken from my account, 5 gpus in workers windows, and logs showing cards trying to start few times for no reason like 15-20 hours after the only request that I have sent
No description

In Faster Whisper Serverless, how to get transcribe result?

In python code, I sent request : ``` # S3 bucket and key for the audio file s3 = boto3.client('s3') # RunPog API configuration...

Can't deploy Qwen/Qwen2.5-14B-Instruct-1M on serverless

Steps to reproduce: 1. Use Serverless vLLM quick deploy for Qwen/Qwen2.5-14B-Instruct-1M (image attached) 2. Proceed with default config. 3. Try and send a request....
No description

Unhealthy machines

We recently noticed that occasionally we get machines with bad performance - worker startup time is very long, and then runtime performance is really bad. We've seen it with and without Fastboot. We are going to do 2 things to address it: 1. Crash worker before giving control back to the Runpod library if we detect bad performance. 2. Remove bad workers with the control plane. Is it expected for the tenant (us) to handle machine health issues? What would be the recommendation from the Runpod team?...

Is /workspace == /runpod-volume ?

I am a bit confused. I use https://github.com/runpod-workers/runpod-worker-comfy to spin up a serverless comfy and I have my models in the /workspace/models as defined by https://github.com/runpod-workers/runpod-worker-comfy/blob/main/src/extra_model_paths.yaml ```yaml runpod_worker_comfy:...

websocket endpoints on serverless

Are persistent websocket endpoints supported on serverless?

Qwen2.5 0.5b worked out of box and Qwen3 0.6b failed

with error
ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
...

Not possible to set temperature / top_p using Serverless vLLM via quick deploy?

By default, vLLM loads sampling parameters (e.g. temperature / top_p) from a model's generation_config.json if present. (see here: https://github.com/vllm-project/vllm/issues/15241). To override this you, have to pass --generation-config when starting the vLLM server. Because RunPod's worker-vllm (https://github.com/runpod-workers/worker-vllm) doesn't expose an environment variable to pipe through a --generation-config value, does this mean it's not possible to change the temperature or top_p for any model deployed by Serverless vLLM quick deploy that has a generation_config.json file e.g. all the Meta Llama models? And the solutions is a customer Docker image / Worker?...

Stuck on initializing

I'm trying to set up a serverless instance to act as a comfyui backend and am getting tripped up pretty early in the process. This is my first time working with runpod, and I'm not sure what's going wrong. I've dug through documentation and passed these files by a couple LLM models to find errors, but I can't get it worked out. If someone could take a look at my dockerfile and handler, I'd appreciate it!...

No available workers

All of my workers have been throttled. Is this expected? That only 1 left sometimes back to throttled and then back to initialization stage and downloading the image, back and force....

Is it okay to use more than 10+ workers using 5090 or we will experience inconsistencies?

we using 4090 (EU region) but having 32 vram would really help us, also performance increase

RAM and CPU

How much RAM and CPU are available to the workers? Is this equivalent to what the Pods with the same GPUs get, or something different?...

Are Docker images cached?

My Docker image registry is hosted on a VPS. If it were to go down, would the serverless workers no longer be able to start up or are the images cached inside Runpod?

Why is this taking so long and why didn't RunPod time out the request?

Serverless endpoint: vLLM Model: meta-llama/Llama-3.1-8B-Instruct GPU: 48GB A40 ...