Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

ComfyUI API

Good morning! I have been following the instructions to be able to have a comfyui workflow as an API. But I'm stuck here, having the template created "with flux.dev" and having a serverless. ...
No description

Possible bug with environment variables

I've created a staging and production serverless endpoint. They both have the same environment variable names and I used the clone feature to clone the first one (staging) to the second one (production). When I change the value of an environment variable on one, it updates on the other. This seems like a bug on runpods end?

Load balancing to death?

I've been sitting and watching your serverless system. And it just doesn't make sense. I have two workers assigned. Yet you decide I need "Extra" instances spun up. My workers are sitting there idle... oh there's a request let's send that to the "extra" queue. not the pod sitting idle.... Oh that last picture that was with an "Extra" pod we can't use that again we need to scrap that and use another cold booting pod... Oh your workers are still idle shit .. well we better put them to sleep!...

Serverless not running with NIM - NVidia custom models

I'm trying to run a Service Instance in Runpod. When I launch the instance, it seems to start running. However, when I send a "Hello World" test request using the instance's own interface, on the Runpod screen, for example, the request is placed in a queue and doesn't finish processing. If I send a second request, it continues to enter the same queue. I'm having trouble getting the serverless instance to run. I followed all the steps in the Runpod documentation to do this. Thanks!...

Serverless API - It's not returning visually what it says in the response data

I have some issues with my serverless runpod. Everything is send correctly via the API and I also see it correctly come back, such as the height and width parameters but when that has portrait sizing for example it still comes back as a square image. What could be the root cause? It also seems that the image quality is really low but with this set-up I tested it locally, everything is totally fine....

Serverless Docker AWS ECR failed connections still charges requests until timeout

In one of my endpoints that uses an ECR docker image (pulls it on new workers) and using the contaner registry auth from settings, couldn't load the image when runpod restarted my workers (because the token was probably expired). But instead of not receiving requests, it does and run them until the timeout, resulting in a lot of money spend (by over 1000% the amount I am paying daily on a single day!) How can I make sure this won't happen again?...

Endpoint ID changing after Pulumi deployment

Pulumi output says one thing. Runpod then says another thing. The enpoint URL from Pulumi output that I then add to LiteLLM is obsolete....
No description

ComfyUI Worker: FLUX.1 dev & Network Volume Setup Questions

Few questions: https://github.com/runpod-workers/worker-comfyui runpod/worker-comfyui:<version>-flux1-dev: Includes checkpoint, text encoders, and VAE for FLUX.1 dev <---- This model is using the fp8 version right now and not the full version, is that right? ...

CRITICAL: Runpod charging more unfairly

We have been charged everyday since the 13th of june, the logs clearly show that somedays the worker is not even called. Plus we have timeouts of 10 minutes, runpod is charging as if we called it everyday, one call for 10 minutes would be 0.1668 USD, while runpod is charging everyday more than 10x times the ammount, on the other hand we have the logs showing for example:
2025-07-10 20:37:18.701 -> START
2025-07-10 20:37:24.854 -> END
2025-07-10 20:37:18.701 -> START
2025-07-10 20:37:24.854 -> END
...
No description

Unhealthy workers keep sabotaging production

As you can see, somehow 2/3 active workers + all flexible workers became unhealthy. I don't know the reason for this or if I have any power to fix it. However, without my involvement Runpod doesn't kill those workers and doesn't replace them automatically with healthy workers making my prod unstable. To resolve this incident I needed to manually kill unhealthy workers. I need some support on how to prevent or handle this situation.
No description

How Do You Speed Up ComfyUI Serverless?

Hi community! I'm starting this thread to gather our collective knowledge on optimizing ComfyUI on RunPod Serverless. My goal is for us to share best practices and solve a tricky performance issue I'm facing. Step 1: The Initial Problem (NORMAL_VRAM mode)...

Akira: Ghosts in KS2 .. spooky...

Hi, So little bit of an issue. Using endpoints so far has been relatively OK. As far as GPU allocation goes it seems fair. But the worker load is another thing all together. Mostly my generations happen at about 1.3it/s ... but then it will jump same worker, similar task ... 35s/it! I'm honestly a little impressed that you've managed to keep your Datacenter/s from burning down. ...

Unable to edit template due to invalid container image name even it is valid

In Edit Template dialog, you cannot save anything because the container image will complain on a valid full docker image name Example of a errored image name (but it is valid!!)
us-central1-docker.pkg.dev/ai-platform/speech/speech-to-text:3c7f537
us-central1-docker.pkg.dev/ai-platform/speech/speech-to-text:3c7f537
...

RunPod Worker Infinite Loop

The RunPod serverless render worker gets stuck in an infinite loop after successfully completing video rendering tasks. The worker script completes successfully (renders video, uploads to GCS) but RunPod immediately restarts it, causing resource waste and potential cost issues. Logs of worker is attached in the file. my render-worker.ts is also attached there. and docker file is also attached. ...

Ollama Worker Keeps Redownloading Model

Hi, I’m trying to set up my first serverless Ollama (0.9.6) endpoint with an attached network volume (85 GB) to store the downloaded model (74 GB). Unfortunately, once the model has been fully downloaded, the worker starts downloading it again, and continues to do so indefinitely. Is there something I might be overlooking?...
No description

max_num_seqs

I'm deploying a Serverless endpoint using deepseek-ai/DeepSeek-Coder-V2-Base, and encountering repeated engine failures due to the environment variable max_num_seqs being interpreted as a string ('1') instead of an integer. This triggers:
TypeError: '<' not supported between instances of 'int' and 'str'
...

Does anybody experiencing outages tonight?

For the past 30 minutes, i've been facing increasing startup time and higher failure rate for most of my EU nodes

Choosing GPU based on task data

Is there currently a way I can choose what Worker configuration is chosen for a request? I have a serverless instance that supports both Video and Image generation. However since Images need much less VRAM is there any way I can select what GPU to use per request?...

How do people serve large models on Runpod Serverless?

Hi all—looking for real-world advice on shipping model with large weight size (60 GB+) as a Runpod Serverless endpoint. I seem stuck between two awkward choices: 1) Embed the weights in the Docker image Pros: Once the image lands on a worker, cold-start only covers copying weights into GPU RAM. Cons: A ~70 GB image is painful to build—most CI runners don’t have that much local disk, it usually takes hours to build such big image and Runpod support says very large images have slower initial rollout....

Hunyuan or LTX on Serverless Endpoints?

Hi there! I'm working on building a web app where the user uploads an image for the first frame of a video to a react front end, and types a text prompt of what they want to happen in the video, and then it'll send an api request to an h100 serverless endpoint, it'll spin up a gpu with ltx to start with for the mvp and eventually I'd like to mess with hunyuan as well (probably in batches or charge more in app credits for immediate processing), and it'll save that video to a storage volume on runpod and then it'll serve the generated video back to the front end when it's ready. I've already deposited some money and created a storage volume because I'm assuming I'll want to load the model weights off a persistent storage volume, but I'm not sure how to get started. I see there are lots of templates for pods but I don't need comfyui or anything, more likely something like fastapi but idk. Can anyone help me out with a list of what I need to learn in order to build this? Perhaps some links to recommended resources and guides that you've found helpful? I have the react front end on firebase is all so far. Thanks!...