Slow model loading
Hi all. I have a serverless endpoint designed to run Stable Diffusion inference. It's taking about 12 seconds to load the model (Realistic Vision) into the pipeline (using "StableDiffusionPipeline.from_pretrained") from a RunPod network drive. Is this normal? Is the load time mostly a function of (possibly slow) communications speed between the serverless instance and the network volume?
The problem is that I'm loading other models as well, so even if I keep the endpoint active there is still a big delay before inference for a job can even begin, and then of course there's the time for inference itself. The total time is too long to provide a good customer experience.
I love the idea of easy scaling using the serverless approach, and cost control, but if I can't improve the speed I may have to use a different approach.
Any input on other people's experience and ways to improve model loading time would be greatly appreciated!
The problem is that I'm loading other models as well, so even if I keep the endpoint active there is still a big delay before inference for a job can even begin, and then of course there's the time for inference itself. The total time is too long to provide a good customer experience.
I love the idea of easy scaling using the serverless approach, and cost control, but if I can't improve the speed I may have to use a different approach.
Any input on other people's experience and ways to improve model loading time would be greatly appreciated!
