RunpodR
Runpod3y ago
21 replies
blistick

Slow model loading

Hi all. I have a serverless endpoint designed to run Stable Diffusion inference. It's taking about 12 seconds to load the model (Realistic Vision) into the pipeline (using "StableDiffusionPipeline.from_pretrained") from a RunPod network drive. Is this normal? Is the load time mostly a function of (possibly slow) communications speed between the serverless instance and the network volume?

The problem is that I'm loading other models as well, so even if I keep the endpoint active there is still a big delay before inference for a job can even begin, and then of course there's the time for inference itself. The total time is too long to provide a good customer experience.

I love the idea of easy scaling using the serverless approach, and cost control, but if I can't improve the speed I may have to use a different approach.

Any input on other people's experience and ways to improve model loading time would be greatly appreciated!
Was this page helpful?