LocalAI Deployment
Hello RunPod Team, I'm considering your platform for deploying an AI model and have some questions.
My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai.
Could you guide me on how to set up this functionality on your platform?
Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use.
Thank you!
My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai.
Could you guide me on how to set up this functionality on your platform?
Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use.
Thank you!
Documentation for LocalAI
Solution
What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py.
U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime
U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime