RunpodR
Runpod2y ago
10 replies
Martin

How to load model into memory before the first run of a pod?

In the template worker, in the handler file it is written:
# If your handler runs inference on a model, load the model here.
# You will want models to be loaded into memory before starting serverless.


I am loading my model here.

But when a new pod started in my endpoint, it's first run will systematically take more than 10s because it is loading the model.
This results in some requests taking more than 10x longer that the expected latency.

Is there a way to load the model as soon as the new pod is "active"?

Thanks.
Was this page helpful?