Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Some tasks are consistently in the IN_PROGRESS state

I'm trying out runpod and I've simply created a whisper task, but some tasks are consistently in the IN_PROGRESS state, and I don't know how to fix it.

chat template not supported

Am using runpod serverless initially with Deepseek R1 from huggingface, then switched to Llama4-scout for testing but I got this error. No issues on the huggingface side, the model is exactly the same as it was. Output: { "delayTime": 48454, "error": "{'object': 'error', 'message': 'Chat template does not exist for this model, you must provide a single string input instead of a list of messages', 'type': 'BadRequestError', 'param': None, 'code': 400}", "executionTime": 254, "id": "2bee8688-4f7e-4d8d-b198-3ad3054f34d8-u2", "status": "FAILED", "workerId": "9sxpstx8jawche" } ...

Completion-style instead of Instruct-style responses

I'm using the default hello world code on serverless: "{ "input": { "prompt": "Hello World" } }" but getting a completion-style response instead of an instruct-style response, despite using Llama instruct. To clarify, the response I'm getting is: "! Welcome to my blog about London: the Great City!...". How do I change the prompt format to get instruct-style responses? Where can I find the syntax?...

Serverless workers frequent switch to initializing / throttles

Been playing with Serverless workers for a couple weeks now and love it. But since yesterday I noticed a huge uptick of workers not sticking to their idle state. Every couple jobs they either go back to Initializing or are throttled. This never was the case for the weeks I was using serverless. Very rarely they would lose their state but currently its happening inbetween every couple jobs....
No description

I have over 100 serverless running with different endpoint, and we face this timeout issue.

Most of time our controller/handler throws this error on various workers, Idk if its choking, its all of the sudden on all servers/workers and then resumes after some time.
10:51:35 voiceclone gunicorn[1267959]: Bad Request: /api/audio_to_audio/
Jul 22 10:51:35 voiceclone gunicorn[1267959]: Unexpected error: HTTPSConnectionPool(host='api.runpod.ai', port=443): Max retries exceeded with url: /v2/vn2o4vgw0aes0k/runsync (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x74cc5bdf5ba0>, 'Connection to api.runpod.ai timed out. (connect timeout=3600)'))
Jul 22 10:51:35 voiceclone gunicorn[1267959]: - - [22/Jul/2025:05:51:35 +0000] "POST /api/audio_to_audio/ HTTP/1.0" 400 0 "-" "Proxyscotch/1.1"
10:51:35 voiceclone gunicorn[1267959]: Bad Request: /api/audio_to_audio/
Jul 22 10:51:35 voiceclone gunicorn[1267959]: Unexpected error: HTTPSConnectionPool(host='api.runpod.ai', port=443): Max retries exceeded with url: /v2/vn2o4vgw0aes0k/runsync (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x74cc5bdf5ba0>, 'Connection to api.runpod.ai timed out. (connect timeout=3600)'))
Jul 22 10:51:35 voiceclone gunicorn[1267959]: - - [22/Jul/2025:05:51:35 +0000] "POST /api/audio_to_audio/ HTTP/1.0" 400 0 "-" "Proxyscotch/1.1"
...

P-State issue

Driver Version 550.127.05 CUDA Version 12.4 Power Limit...

Intializing for more than 20 + hours

I am trying to create a comfyui endpoint through severless and have set up my docker image and all models in the network but when i tried to deploy the endpoint its stuck in intializing phase for more than 20+ hours now and in logs it shows worker exited with exit code 1 any solution ?

How to write to network by API?

Hi! Is it possible to make some API request that can launch wget to network? I want to upload file to network ( /workspace ) - is it possible to do without creating Pod? Or is there any Pod template that can help with it?...
Solution:
Network volumes are S3-compatible. You can definitely make changes to your network volume without launching a pod Check this out: https://docs.runpod.io/serverless/storage/s3-api#boto3...

Geo-Redundant network storage

Hey I have a Flux and Wan 2.1 generation serverless setup working. Right now every base model, text encoder and VAE gets baked into the Docker image. Rebuilds and deploys take a lot of time but thats to be expected with 100GB+ images. I'm right now busy introducing new functionality for our userbase to train their own lora and use it right away after training. Currently, finished trained Loras go to Azure Blob storage and are loaded through a ComfyUI node that loads Loras from remote URL. The issue is that download times range from 10 s to 60 s depending on the worker’s region. It works but getting billed to download loras for every run isn't ideal I have a very specific configuration I optimized my builds for which is CUDA 12.8 + H100's. Most locations where my workers are don't even have the possibility to create a network volume there. Which is why I'm kinda stuck now. The current implementation of Runpod network volumes by narrowing to one region limits where workers can spin up.. even though the read speed are way better....

How to prevent removal of container

I have deployed a TTS service on a serverless pod, 24GB Pro Instance. But sometimes it removes the container and starts recreating it again. How to prevent it?

disk quota exceeded: unknown

I've been routinely pushing updates to my serverless, and everything has been working fine until today. Now, I’m encountering the following error during the image push:
error writing layer blob: rpc error: code = Unknown desc = close /runpod-volume/registry.runpod.net/[redacted-project-path]/ingest/[hash...]/ref: disk quota exceeded: unknown
error writing layer blob: rpc error: code = Unknown desc = close /runpod-volume/registry.runpod.net/[redacted-project-path]/ingest/[hash...]/ref: disk quota exceeded: unknown
...

Increase workers for serverless

Just wanted to check/ask what is the rules/how do you increase the max workers you can have across all endpoints for serverless?

Platform usage for jailbreak research

Hi there, I want to build a platform which will allow researchers to conduct jailbreak tests against open-source models in a controlled environment. This results be used to create datasets that can be shared with AI labs to improve their coverage of jailbreak mitigations. Is this something that would be allowed under your terms of service?

Format of runpod.create_endpoint gpu_ids

It's unclear what the format is, the website is using this format: { "data": { "saveEndpoint": { "gpuIds": "['NVIDIA GeForce RTX 4090'],ADA_24", "id": "ncipvwdj6cuz8y",...

Runpod not detecting dockerfile

Runpod was working fine and after a small commit to GitHub repo, it suddenly stopped working in build process and can't detect the Dockerfile anymore. I checked everything. Not in gitignore. No dockerignore. No typo or spell typo. What can be the problem ? There's not even a button to clear the cache.

error on running moonshotai/Kimi-K2-Instruct

i get these error this is my first time using runpod, so not sure how do i resolve this i have run the thing using hugging face repo, everything with default with trust code enabled, i dont know much about other settings ...

Need Help: Stable Diffusion 3.5 large Deployment Error on RunPod

Trying to deploy stabilityai/stable-diffusion-3.5-large on serverless RunPod using ComfyUI but getting this error: “Missing 'workflow' parameter” Anyone knows if this model works on RunPod? Which option is best — ComfyUI, vLLM, or SG-Lang? ...
No description

ComfyUI API

Good morning! I have been following the instructions to be able to have a comfyui workflow as an API. But I'm stuck here, having the template created "with flux.dev" and having a serverless. ...
No description

Possible bug with environment variables

I've created a staging and production serverless endpoint. They both have the same environment variable names and I used the clone feature to clone the first one (staging) to the second one (production). When I change the value of an environment variable on one, it updates on the other. This seems like a bug on runpods end?

Load balancing to death?

I've been sitting and watching your serverless system. And it just doesn't make sense. I have two workers assigned. Yet you decide I need "Extra" instances spun up. My workers are sitting there idle... oh there's a request let's send that to the "extra" queue. not the pod sitting idle.... Oh that last picture that was with an "Extra" pod we can't use that again we need to scrap that and use another cold booting pod... Oh your workers are still idle shit .. well we better put them to sleep!...