Hi, I have been using runpod to train my model, and am very interested in using serverless computing to deploy it. I have successfully created a docker image that loads the model and contains an inference endpoint function. However, the model is rather large, and I am curious if there is a way to hold the model in ram to avoid loading it every time the container is stopped and restarted? If not, could anyone recommend another resource for model deployment? Is a traditional server a better option here?
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!