Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Achieving concurrent requests per worker

Hi, I’m new to RunPod and am trying to deploy my own fine-tuned version of an XTTS model. I’ve successfully deployed a container image that runs and returns results as expected. However, I’m unable to process requests concurrently. I’ve read the RunPod documentation on enabling concurrent requests, but it doesn’t seem to work for me. When I trigger the /run endpoint multiple times, the requests queue up instead of running simultaneously. My expectation is that a single worker should handle multiple requests concurrently, up to the maximum capacity I configure. I implement a dynamic concurrency function similar to the one in the documentation here https://docs.runpod.io/serverless/workers/concurrent-handler. ...

How to change the github repository in my serverless.

Basically the title, i already changed it via my account settings under connection but the github repo in my serverless doesnt change. Please help. Thank you

serverless does not cache at all

So becuase the serverless vllm worker did not have a thing I needed I changed it a bit and uplaoded my own docker image of it. But now after each request it has to load the model compeltly again and it takes 90 seconds each time. Like I do a request the worker load 90s does the requst then goes offline again after the 5s timeout i set and then I send another request and it has to do the 90s loading again. ...

Updated serverless workers are all unhealthy

I used to have a fully functional deployment of a serverless worker with comfyui which gets models from my network volume. I have since deployed an update which has new comfyui nodes and models, but my deployment only makes unhealthy workers and api requests are not being processed. I am not sure if the issue is my comfyui snapshot file that teh worker is building from, or the docker file build commands or my network volume. I need help. The worker that used to work is from this docker image fai...
No description

how to change batch count in serverless comfyUI?

Hey folks, I am using ComfyUI serverless, where I am loading prompts[more than 10 prompts at a time] from a local txt file and trying to generate a latent image batch size of 4 in one go. That is, one prompt should generate 4 images, but it only works if I change ComfyUI batch count [attached in image]; otherwise, it only loads one prompt, and it gets idle. Since the serverless ComfyUI has no such feature to set batch count, is there any way to make it happen, or do I have to request the endpoin...
No description

limit_mm_per_prompt removed again?

I think limit_mm_per_prompt was removed again cuz I cant find it in the curent branch. This is driving me crazy like I cant run this model wihtout that env var. Im doing my own fork rn and building the docker image but it takes for ever fml please please add limit_mm_per_prompt limit_mm_per_prompt limit_mm_per_promptlimit_mm_per_promptlimit_mm_per_promptlimit_mm_per_prompt...

Serverless VLLM batching

Hey so every hour I have like 10k prompts I want to send to my serverless instance. Im using vllm and my question is does the batching which vllm does out of the box work for the serverless instance cuz I send all prompts as single request not in one request. I could not find anything about this in the docs and in this chat. Would be really helpful thanks.

Deploy a standard http server?

I have a standard http server packaged in a docker image. Can I run it inside a RunPod serverless environment?
Solution:
Soon :) We're working on something for this, I don't have a tentative date on my sheet though.

Where is the 250ms cold start metric that you advertised derived from?

I am using the faster-whisper template with flashboot and I get ~20s of delay time + 500ms/1s of execution time. How can I acheive a 250ms cold start time?

Load balancing + scaling

Hello, Does anyone know how requests are balanced across workers? This is important to understand in the context of autoscaling — especially if I’m using scaling based on queue delay and idle timeout. ...

H100 Replicate VS RunPod

Hi, When I use a Flux model on Replicate, generating 4 images takes about 30 seconds and costs $0.001525 on an H100. On the other hand, with RunPod, generating the same 4 images takes 60 seconds and costs a bit more. ...

serverless endpoints are BUGGED since yesterday

Serverless endpoints ( especially for vllm ) are completely bugged since yesterday no logs showing at all create a new serverless endpoint yesterday with vllm quick deploy been 15 hours that it's initializing with 0 log also one my serverless endpoint deploy from github also meet weird errors with 0 log...

Is it possible to change the endpoint ID after deployment?

I have deployed a serverless application on RunPod. I wanted to check if it is possible to change the endpoint ID. I know RunPod has an edit option to change the endpoint name, but I didn’t see an option to change the endpoint ID. Please let me know if it's possible or not.

Help Needed: Chatterbox TTS Server on Runpod Serverless - Jobs Stuck, Handler Not Reached

Hi everyone, we're struggling to get the Chatterbox TTS Server ( https://github.com/resemble-ai/chatterbox ) running correctly on Runpod Serverless. Any insights would be massively appreciated! Current Situation: - We're deploying a Dockerized version of the Chatterbox TTS server. - The goal is to use it as a Runpod Serverless endpoint....

How do I run Qwen3 235B Q5_K_M Using vLLM

Hi, I was wondering if there was a simple way to run Qwen3 235B Q5_K_M using vLLM on RunPod. I have two main issue: 1) the Qwen3 235B GGUF repo contains multiple quantizations (e.g., Q6_K, Q5_K_M, Q5_0), and I don't know how to select one...

github.com/Zheng-Chong/CatVTON

I want to install this repository in serverless mode and we need this requirements torch==2.4.0 torchvision==0.19.0 accelerate==0.31.0 git+https://github.com/huggingface/diffusers.git...

need help with serverless flux lora training using ai-toolkit

I built a Docker image using this repo https://github.com/newideas99/flux-training-docker and successfully trained Lora using Runpod serverless endpoints. However, when I run the trained Lora, I get this error: "Exception: Error while deserializing header: HeaderTooLarge." I am no expert, but the Lora safetensor file might be corrupted, and the reason behind the corruption is the Docker base image "navinhariharan/fluxd-model." Any help is appreciated.
Best,
Jesse...

Need help with Serverless Dreamshaper XL Worker Unstable/Throttle Issue.

Hey! I’m building a commercial DreamShaper XL image generator and need a RELIABLE serverless container on RunPod (must be able to handle high volume, with no unhealthy workers or crashing for at least several days). Can anyone recommend a specific public container/image name (with version/tag) for DreamShaper XL serverless that’s actually working for you? If you’ve found a stable setup, please share your image, config, and (optionally) GPU/region. Happy to pay it forward with feedback/results! 🙏...

API Docuementation of Preset Models like Faster Whisper

Hi, is there anywhere a documentation about the API of specific Preset Models like Faster Whisper? Like for testing Faster Whisper it took an hour to search for an example like the following. { "input": {...