Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

6/20/2025

Prompt formatting is weird for my model

Tried making a serverless endpoint with my model from huggingface but its stuck with this prompt template. The correct one should be the alpaca prompt template which is like "instruction", "input", "output". How do I change it to this? It's only giving me errors when I change it in this window and try to send a request that way.

Solution:

That field doesn't change based on the container or model you're using on the endpoint. It's just a placeholder. To test it via the RunPod UI like you're trying, which allows only non-openAI paths /run or runsync, the payload should be: ```json { "input": { "messages": [...

RukshanJS

6/20/2025

Output is 100%, but still processing

We use runpod to do a generation that completes without error. But the progress reaches 100% but the status still forever IN_PROGRESS. Can someone please help. Our production is suffering This only occurs sometimes, and other times it correctly succeeds. When check logs it even shows ‘Finished’ and no errors whatsoever...

nicodoggie

6/20/2025

runpod.serverless has no attribute progress_update (runpod_python 1.7.12)

I've been given a task to look into an existing serverless handler written in python. I updated the dependencies, and now I'm getting runpod.serverless has no attribute "progress_update". I looked into it, and it seems that it's defined in rp_progress, which is not being exposed as a module? is this intentional? is there a different feature that exposes the same functionality that is undocumented, or am I doing something wrong? It's been the first time in forever that I did any python aside...

!Crypto

6/20/2025

How much is payment? How to pay?

In serverless, I wanna know this. How much is payment? How to pay? Monthly? hourly? or secondly?...

mawerty

6/19/2025

Default stable diffusion preset doesn't work on rtx 5090 serverless

Hi! I've created SDXL from preset on runpod serverless, made one simple request using requests windows, and closed the window, and returned to runpod like two day later to see 5 dollars taken from my account, 5 gpus in workers windows, and logs showing cards trying to start few times for no reason like 15-20 hours after the only request that I have sent

nqcuong

6/19/2025

In Faster Whisper Serverless, how to get transcribe result?

In python code, I sent request : ``` # S3 bucket and key for the audio file s3 = boto3.client('s3') # RunPog API configuration...

Tryptophan

6/18/2025

When serverless uses a worker, is that worker shared between other serverless endpoints?

Or is it dedicated to the worker until terminated?

Ellroy

6/18/2025

Can't deploy Qwen/Qwen2.5-14B-Instruct-1M on serverless

Steps to reproduce: 1. Use Serverless vLLM quick deploy for Qwen/Qwen2.5-14B-Instruct-1M (image attached) 2. Proceed with default config. 3. Try and send a request....

hotsnr

6/17/2025

Unhealthy machines

We recently noticed that occasionally we get machines with bad performance - worker startup time is very long, and then runtime performance is really bad. We've seen it with and without Fastboot. We are going to do 2 things to address it: 1. Crash worker before giving control back to the Runpod library if we detect bad performance. 2. Remove bad workers with the control plane. Is it expected for the tenant (us) to handle machine health issues? What would be the recommendation from the Runpod team?...

6/17/2025

Serverless pod using comfyui worker failing to build (using comfy worker template)

code: https://github.com/ashmitg/runpod-t2v-inference-copy.git

build-logs-0440d038-...

blird

6/16/2025

Is /workspace == /runpod-volume ?

I am a bit confused. I use https://github.com/runpod-workers/runpod-worker-comfy to spin up a serverless comfy and I have my models in the /workspace/models as defined by https://github.com/runpod-workers/runpod-worker-comfy/blob/main/src/extra_model_paths.yaml ```yaml runpod_worker_comfy:...

Shreeshail

6/16/2025

websocket endpoints on serverless

Are persistent websocket endpoints supported on serverless?

artem

6/16/2025

Qwen2.5 0.5b worked out of box and Qwen3 0.6b failed

with error

ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

...

Ellroy

6/15/2025

Not possible to set temperature / top_p using Serverless vLLM via quick deploy?

By default, vLLM loads sampling parameters (e.g. temperature / top_p) from a model's generation_config.json if present. (see here: https://github.com/vllm-project/vllm/issues/15241). To override this you, have to pass --generation-config when starting the vLLM server. Because RunPod's worker-vllm (https://github.com/runpod-workers/worker-vllm) doesn't expose an environment variable to pipe through a --generation-config value, does this mean it's not possible to change the temperature or top_p for any model deployed by Serverless vLLM quick deploy that has a generation_config.json file e.g. all the Meta Llama models? And the solutions is a customer Docker image / Worker?...

cornslop

6/13/2025

Stuck on initializing

I'm trying to set up a serverless instance to act as a comfyui backend and am getting tripped up pretty early in the process. This is my first time working with runpod, and I'm not sure what's going wrong. I've dug through documentation and passed these files by a couple LLM models to find errors, but I can't get it worked out. If someone could take a look at my dockerfile and handler, I'd appreciate it!...

handler.py

zongheng1619

6/13/2025

No available workers

All of my workers have been throttled. Is this expected? That only 1 left sometimes back to throttled and then back to initialization stage and downloading the image, back and force....

Yobin

6/13/2025

Is it okay to use more than 10+ workers using 5090 or we will experience inconsistencies?

we using 4090 (EU region) but having 32 vram would really help us, also performance increase

Goran

6/12/2025

RAM and CPU

How much RAM and CPU are available to the workers? Is this equivalent to what the Pods with the same GPUs get, or something different?...

vytskalt

6/12/2025

Are Docker images cached?

My Docker image registry is hosted on a VPS. If it were to go down, would the serverless workers no longer be able to start up or are the images cached inside Runpod?

Ellroy

6/12/2025

Why is this taking so long and why didn't RunPod time out the request?

Serverless endpoint: vLLM Model: meta-llama/Llama-3.1-8B-Instruct GPU: 48GB A40 ...

logs.txt

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!