Slow Model Loading - Solutions

Problem Description

Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks.

(I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?

SSasan **Problem Description** Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on Ru...

Jason•5/18/25, 9:21 AM

how did you profile those btw

Jason•5/18/25, 9:28 AM

compared to api providers, like what? ( unrelated to this problem)

But what explains they can achieve that durations is that because they use faster gpu's

Jason•5/18/25, 9:29 AM

if it has lower cost then just try to use it, because to optimize this you might need to change the code, the model, or use faster gpu(s)

riverfog7•5/19/25, 4:08 AM

you need to compare speed when the model is loaded

riverfog7•5/19/25, 4:09 AM

other API providers likely have models loaded 24/7

Rriverfog7 you need to compare speed when the model is loaded

SasanOP•5/19/25, 5:13 AM

I know, but loading model by using Diffusers takes little bit time

Jason•5/20/25, 3:24 AM

You mean how to implement it without diffusers or any tricks that magically speed up the diffusers loading?

Jason•5/20/25, 3:25 AM

Try tensorrt

Jason•5/20/25, 3:25 AM

Quantized model if you don't mind (might speedup inference)

Jason•5/20/25, 3:30 AM

Half precision if possible, xformers

Jason•5/20/25, 3:30 AM

Distilled models (harder), less steps for generating img

JJason Distilled models (harder), less steps for generating img

SasanOP•5/21/25, 11:50 AM

Hi,

I saw something called Pruna (I don't know if you have heard of it). It looks promising; I may use it, but I am not sure.

Jason•5/22/25, 11:57 AM

Oh sure feel free to update here about how it goes

Jason•5/22/25, 11:57 AM

I haven't heard of it

Slow Model Loading - Solutions

Similar Threads

Similar Threads

Similar Threads