Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

How to set max concurrency per worker for a load balancing endpoint?

I'm trying to configure the maximum concurrency for each worker on my serverless load balancing endpoint, but I can't seem to find the setting in the new UI.

Not getting all webhooks from requests

Some requests with webhooks fail and I'm not sure why. Can't see anything in the logs for this. For instance, this one finished perfectly in the worker but did not send the webhook response: https://api.runpod.ai/v2/whpwouwejfjrmq/status/655e8518-11ec-48f1-a2c1-25ca0e6c4ef4-u1 Request (details changed for privacy)...

What are the best practices when working with network volumes and large models

Hi Runpod! we've been using serverless pods for quite a while now. most of our customer serving ran in the background, on demand, which means we could tolerate the long warmup times. However, to meet demands as per our customers we have made several key improvements in our generation times. That being said, our main bottleneck today is the infrastructure itself. We use quite a bit of models to perform the work for our customers, and have tried 3 different paths: 1. Working with images from a private registry that contained the models - was untolerable, the images kept re-downloading layers that were not altered, making it unfeasible to sustain through development unless we seperate only the models. and even then, whenever we need to add a new lora etc -- causes a lot of issues....

Some questions about Serverless workers and custom workflows

Hi all, i'm very newbie, please help me with this questions. 1)How long will it take for a serverless worker to start with models around ~60 GB, and is it better to store the models on a network volume or bake them into the Docker container? 2) What is the simplest and fastest way to create my own serverless worker if I already have a ComfyUI workflow with custom nodes? ...
Solution:

Update Transformers Library

Hi, I am trying to run Qwen/Qwen3-Embedding-8B via serverless endpoints. 1. I select quick deploy, Infinity Vector Embeddings. 2. Set Qwen/Qwen3-Embedding-8B as the model. 3. Batch size 32, data type auto....

New Serverless UI Issue

Your new UI does not have the worker setting ("Max Worker", "Active Workers" etc.) field. As a result, I am unable to resolve the max worker issue as shown in the screenshot. Please roll back to the previous UI and don't push stuff like this in production.
No description

serverless runpod/qwen-image-20b stays in initiating

Hi, I am trying to deploy serverless end point for this image runpod/qwen-image-20b... it just stays in initiating. I created a template with 20GB disk using NVIDIA GPU. Can anyone please help. Regards, Sameer...

Serverless Load-balancing

Good morning, I've recently came across https://docs.runpod.io/serverless/load-balancing/overview and following the instrucions. Yet, when I attempted to make a external HTTP request using n8n it simply did not work. I've attached my works logs below. Please let me know if I've done something wrong. Or. It's a possible issue with the documentation. Note) I used the following Container Image: runpod/vllm-loadbalancer:dev...

CPU and GPU Serverless

Is there a way I can choose CPU and GPU at the same time for my serverless? or at least know the CPU available for the GPU selected? It usually gets a CPUs on it's own however I do need some CPU power in some tasks alongside GPU. Any help?

Illegal instruction (core dumped)

hi, I'm using a Docker image for serverless ComfyUI on RunPod. I mounted my existing volume to the container, and everything sets up correctly. However, when I try to run my app using: ./venv/bin/python main.py ...

exceed your workers quota

for some reason i can't create an endpoint
No description

vLLM - How to avoid downloading weights every time?

I have a Serverless Endpoint with vLLM. Using this Docker Image : runpod/worker-v1-vllm:v2.7.0stable-cuda12.1.0 My ENV var: ```MODEL_NAME=Qwen/Qwen2.5-VL-3B-Instruct-AWQ...

Too big delay time. How can I reduce it?

It is important for me that my request is executed in 1 second or less. But the request is executed in about 1.5 seconds. I have optimized the docker image a lot, its weight is now 300 MB. And if the cold start time is ok for me (200 ms), then the delay time is not ( How can it be optimized?...

How to deal with initialization errors?

I went to sleep and woke up to logs of multiple users trying out image generation only for 100% of requests to fail. After a brief investigation I found a machine with this in the logs: ``` Traceback (most recent call last):...

Serverless Job distribution

I have a process that loads 1500 jobs. I have 4 gpus with 4 running workers. Why are a majority of the jobs being assinged to only 1 of the 4 GPUs?...
No description

How do I set quantization to fp8 in the serverless settings?

If I select 'bitsandbytes' will that automatically change quant level to fp8?

Store models in VRAM

Is there a possible way to store the models in VRAM as it keeps loading and ofloading each time I run a comfyui flow. Any suggestions?

Setting up runpod serverless from scratch

Hello everyone, i am trying to deploy a comfyui severless endpoint to execute workflow that uses flux1-dev model to generate some model. I'm new to it and i have tried to create one serverless endpoint using Comfyui template from hub-listing. I already have a downloaded all the model in a network file along with all the custom nodes that i'll be using in the comfyui workflow that i want to execute. I did saw someone suggesting to make some changes in the handler.py script but i have no idea how to set that up. Can anyone guide me how can i set the serverless endpoint, because i have took the help of the documentation but those didn't help me and instead those document confused me....

RunPod Serverless Endpoint Issue - Jobs Complete But No Output Returned

Problem: My serverless endpoint jobs are completing successfully but returning empty results. Endpoint ID: su6ufhaephnw03 (Stable Diffusion XL) Symptoms:...