Runpod•6mo ago

Slow Model Loading - Solutions

Problem Description Loading HiDream-I1-Fast and Llama-3.1-8B-Instruct on RunPod (A100) takes ~34s (total ~46s) with only 13% CPU usage. Profiler shows aten::copy_ (71% CPU, 21.69s) and cudaMemcpyAsync (24%, 7.42s) as bottlenecks. (I checked other API providers — they offer this model via API at a much lower cost than what I'm currently testing, and they deliver outputs in under 15 seconds idk how!), but I couldn't optimize loading time despite trying multiprocessing with spawn (caused semaphore leaks). Need to reduce loading time as much as possible. Any solutions or insights?

6 Replies

Unknown User•6mo ago

Message Not Public

riverfog7•6mo ago

you need to compare speed when the model is loaded other API providers likely have models loaded 24/7

SasanOP•6mo ago

I know, but loading model by using Diffusers takes little bit time

Unknown User•6mo ago

Message Not Public

SasanOP•6mo ago

Hi, I saw something called Pruna (I don't know if you have heard of it). It looks promising; I may use it, but I am not sure.

Unknown User•6mo ago

Message Not Public

Gaming

Programming

Slow Model Loading - Solutions

Did you find this page helpful?