Slow Model Loading - Solutions
Problem Description
Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks.
(I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?
6 Replies
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
you need to compare speed when the model is loaded
other API providers likely have models loaded 24/7
I know, but loading model by using Diffusers takes little bit time
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
Hi,
I saw something called Pruna (I don't know if you have heard of it). It looks promising; I may use it, but I am not sure.
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View