R
Runpod6mo ago
Sasan

Slow Model Loading - Solutions

Problem Description Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks. (I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?
6 Replies
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
riverfog7
riverfog76mo ago
you need to compare speed when the model is loaded other API providers likely have models loaded 24/7
Sasan
SasanOP6mo ago
I know, but loading model by using Diffusers takes little bit time
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Sasan
SasanOP6mo ago
Hi, I saw something called Pruna (I don't know if you have heard of it). It looks promising; I may use it, but I am not sure.
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?