Slow Model Loading - Solutions
Problem Description
Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks.
(I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?
3 Replies
how did you profile those btw
compared to api providers, like what? ( unrelated to this problem)
But what explains they can achieve that durations is that because they use faster gpu's
if it has lower cost then just try to use it, because to optimize this you might need to change the code, the model, or use faster gpu(s)
you need to compare speed when the model is loaded
other API providers likely have models loaded 24/7
I know, but loading model by using Diffusers takes little bit time