RunpodR
Runpod6mo ago
pave7946

How Do You Speed Up ComfyUI Serverless?

Hi community!

I'm starting this thread to gather our collective knowledge on optimizing ComfyUI on RunPod Serverless. My goal is for us to share best practices and solve a tricky performance issue I'm facing.

Step 1: The Initial Problem (NORMAL_VRAM mode)

I started by checking my logs on both an A100 (80GB) and an L4 (24GB) worker. I noticed both were defaulting to NORMAL_VRAM mode, which seems suboptimal.

--- L4 ---
Total VRAM 22478 MB, total RAM 515498 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA L4 : cudaMallocAsync
ComfyUI version: 0.3.43

--- A100 ---
Total VRAM 81038 MB, total RAM 2051931 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA A100 80GB PCIe : cudaMallocAsync
ComfyUI version: 0.3.43

Step 2: The Try and Fix and the issues

My first action was to add the --highvram flag to my launch command. This worked, and the logs now correctly show Set vram state to: HIGH_VRAM.

However, this is where I'm stuck and need your help. Despite being in HIGH_VRAM mode, the performance is still poor, and new issues have appeared:

The CPU usage is constantly stuck at 100%.

On the smaller 24GB L4, my workflow now fails with an OOM (Out Of Memory) error from the UNETLoader.

The GPU (A100/L4) is at 30/40% during the entire job.

This makes me suspect that other launch arguments I'm using (like --gpu-only) might be conflicting and preventing the workload from being properly offloaded to the GPU.

Let's Turn This Into a Knowledge-Sharing Thread!
To help solve this and create a resource for everyone, would you be willing to share the launch settings you use to run ComfyUI effectively?

I'm especially interested in:

Your full launch command from your start.sh or worker file.

The type of GPU you're running on.

Any key flags you've found essential for good performance (e.g., --preview-method auto, --disable-xformers, etc.).

Any other "secret sauce" for reducing cold start times or speeding up inference.

Thanks for sharing your expertise!
Was this page helpful?