Serverless Inference
Hi, I have been using runpod to train my model, and am very interested in using serverless computing to deploy it. I have successfully created a docker image that loads the model and contains an inference endpoint function. However, the model is rather large, and I am curious if there is a way to hold the model in ram to avoid loading it every time the container is stopped and restarted? If not, could anyone recommend another resource for model deployment? Is a traditional server a better option here?
