R
RunPod3d ago
Sasan

Slow Model Loading - Solutions

Problem Description Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks. (I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?
3 Replies
Jason
Jason2d ago
how did you profile those btw compared to api providers, like what? ( unrelated to this problem) But what explains they can achieve that durations is that because they use faster gpu's if it has lower cost then just try to use it, because to optimize this you might need to change the code, the model, or use faster gpu(s)
riverfog7
riverfog723h ago
you need to compare speed when the model is loaded other API providers likely have models loaded 24/7
Sasan
SasanOP22h ago
I know, but loading model by using Diffusers takes little bit time

Did you find this page helpful?