R
RunPod3mo ago
bigslam

How to run OLLAMA on Runpod Serverless?

As the title suggests, I’m trying to find out a way to deploy the OLLAMA on Runpod as a Serverless Application. Thank you
Solution:
Ollama has a way to override where u the models get downloaded. so u essentially create a network volume on serverless under /runpod-volume is where they get mounted for serverless And when ur ollama server starts through a background script on start, u do whatever u want. overall its a bit of a pain...
Jump to solution
24 Replies
Solution
justin
justin3mo ago
Ollama has a way to override where u the models get downloaded. so u essentially create a network volume on serverless under /runpod-volume is where they get mounted for serverless And when ur ollama server starts through a background script on start, u do whatever u want. overall its a bit of a pain
justin
justin3mo ago
I recommend use runpod vllm, if ur looking for a runpod supported method / alpay can help as he is a staff working specifically on it
justin
justin3mo ago
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
justin
justin3mo ago
Option 1, if u got any specific models in mind and @Alpay Ariyak can give help Or use a community thing like what i built, which has everything built into the docker container, avoiding network volumes cause network volumes has some downsides like being locked into a region + i already have docker images ready to go
justin
justin3mo ago
GitHub
GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
justin
justin3mo ago
I even have client side code examples for mine these are not “ollama” but it would achieve i assume ur purpose of running ur own llm maybe like mistral 7b
bigslam
bigslam3mo ago
I want to run quantized LLMs @justin eg GGUF
JJonahJ
JJonahJ3mo ago
Vllm supports AWQ quantized, but yeah it would be nice to have other options for text inference. Like I keep seeing this ‘grammars’ thing mentioned about the place, but afaik Vllm doesn’t support that either…
justin
justin3mo ago
I mean as I said, if you want to run it, just attach a network volume, and override where the Ollama stores the models into the network drive. The problem with ollama is that it needs to start a background server and check if the models are there > if not it downloads a new one. So the main thing is just overriding the default check path logic, so when ur worker starts up, it checks the network volume if it exists already. for some reason, I could never get it to work by manually copying in the models locally into my docker image, idk how their hash checking works, and I want it built into my docker image, so I just moved to using OpenLLM
bigslam
bigslam3mo ago
Easier to run ollama on a gpu pod, but I’m trying to save time and want a serverless implementation
Armyk
Armyk2w ago
Any news on this? Did you manage to run Ollama in serverless? I need to run a GGUF model.
giannisan.
giannisan.2w ago
I am wondering the same, having trouble with the serverles config for ollama
nerdylive
nerdylive2w ago
Why not try it on vllm? You can make the template yourself check some implementation on worker handler code on github
digigoblin
digigoblin2w ago
Obviously because vllm does NOT support GGUF.
nerdylive
nerdylive2w ago
oh right
PatrickR
PatrickR2w ago
We have a tutorial on this. It’s for CPU but you can run it on GPU too https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference
Run an Ollama Server on a RunPod CPU | RunPod Documentation
Learn to set up and run an Ollama server on RunPod CPU for inference with this step-by-step tutorial.
nerdylive
nerdylive2w ago
Wow
digigoblin
digigoblin2w ago
How come some stuff is blog posts and some docs?
nerdylive
nerdylive2w ago
Hahah its a tutorial right in my opinion, stuffs like that for specific use cases should be in tutorials
digigoblin
digigoblin2w ago
Well my point is some tutorials are blog posts, others are docs. Would be nice to have some level of consistency to know where to find things.
nerdylive
nerdylive2w ago
what do you mean by level of consistency
digigoblin
digigoblin2w ago
Put everything that is a tutorial in the same place, not all over the place. I don't want to search docs, blog posts etc to find something. I want to go to 1 place.
nerdylive
nerdylive2w ago
ohh ic
PatrickR
PatrickR2w ago
@digigoblin It’s a good point! Stuff on tutorials are supported, updates will occur, customer support can answer questions. Blog posts are kind of like a snapshot in time, don’t always get updated, and have less quality control. We have a ticket to go back and turn old blog posts into tutorials.
Want results from more Discord servers?
Add your server
More Posts
Data loss on podi rend pod with gpu type A5000, but suddenly my gpu type changed to rtx 3090 and all my data(150 gb)Serverless: module 'gradio.deprecation' has no attribute 'GradioDeprecationWarningHello! I'm getting this error when i use RunPod Fast Stable Diffusion with serverless. Can you pleasImg2txt code works locally but not after deployingI am using a model for Image 2 text , i have made its handler file and tested it locally , for testiUpload files to Network volume? Two days spent on this and can't make it happenHOW do I get my local safetensor LLM files on my PC to the network volume? Is the CLI the only way? Docker image using headless OpenGL (EGL, surfaceless plaform) OK locally, fails to CPU in RunpodHi all, I'm wondering if anyone can educate me on what would be causing this difference in behaviouMoving to production on Runpod: Need to check information on serverless costsHi team. I'm working with my company to move our product to release, with a soft launch in April. WShell asks for a password when I try to ssh to a secure cloud pod (with correct public key set)I have a correctly formatted public key set, I have ssh enabled. Still asks for a password when I ssrunpodctl create pod for CPU onlyHello, i try to create pod from cli width **runpodctl create pod --gpuCount 0** but i have this errodocker not foundHello, I get an error from the container attempting to launch: /opt/nvidia/nvidia_entrypoint.sh: linHow to mount network volume to the pod?Hey all, I created a network volume and have a pod. How do I connect the network volume to the pod?