R
RunPod•5mo ago
eldoo7100

LocalAI Deployment

Hello RunPod Team, I'm considering your platform for deploying an AI model and have some questions. My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai. Could you guide me on how to set up this functionality on your platform? Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use. Thank you!
LocalAI :: LocalAI documentation
Documentation for LocalAI
Solution:
What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py. U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime...
Jump to solution
5 Replies
Solution
justin
justin•5mo ago
What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py. U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime
justin
justin•5mo ago
https://github.com/justinwlin/runpodWhisperx/blob/master/Dockerfile This one isnt using a runpod as a base but can get the idea
GitHub
runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
justin
justin•5mo ago
this is me doing another one dividing it into two. one is for a gpu persistent service runpod has so i can debug with the baked in model. the other is for me using it for serverless gpu pod: Use the updated base CUDA image FROM runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04 WORKDIR /app Best practices for minimizing layer size and avoiding cache issues RUN apt-get update && \ apt-get install -y --no-install-recommends ffmpeg && \ rm -rf /var/lib/apt/lists/* && \ pip install --no-cache-dir torch==2.1.2 torchvision torchaudio xformers audiocraft firebase-rest-api==1.11.0 noisereduce==3.0.0 runpod COPY preloadModel.py /app/preloadModel.py COPY handler.py /app/handler.py COPY firebase_credentials.json /app/firebase_credentials.json COPY suprepo /app/suprepo RUN python /app/preloadModel.py Then this is the serverless one: Use the updated base CUDA image FROM justinwlin/audiocraft_runpod_gpu:1.0 WORKDIR /app COPY handler.py /app/handler.py Set Stop signal and CMD STOPSIGNAL SIGINT CMD ["python", "-u", "handler.py"] If u want to, u can build and test ur docker image LOCALLY before ever purchasing runpod credit to make sure ur template works as expected runpod has a test locally section in docs
eldoo7100
eldoo7100•5mo ago
Oh I think I get it, I need to build a docker image which will run the API, it should be builded with a model I choose, and then the handler will simply make calls to the API. I was thinking to use localai for that, because it has built-in support with enforcing grammer(json format), maybe you can advise me, should I use localai or a different tool you know? Thanks 🙂
justin
justin•5mo ago
Rlly depends what u wanna do if u have a specific model usually they have instructions how to run it @eldoo7100 So my recommendation is if u want: 1) deposit 10 bucks on runpod if u want to risk using it (or test locally if u can) 2) Use a gpu pod, and start up a pytorch template, or use ur own locally again 3) record the steps u need to get ur code running. And then build ur stuff from that that is how i came up with this audiocraft one by using a runpod base image on their website, going on the web terminal / jupyter lab and playing around with it (make sure to terminate pod when done, or else ull be charged for running pods) again all this can be done locally as long as ur computer / model / code supports it. i cannot say tho bc idk what ur doing / i prob dont have specific knowledge as i just use runpod for my own personal projects