R
RunPod3mo ago
acamp

Ollama on RunPod

Hey all, I am attempting to set up Ollama on a Nvidia GeForce RTX 4090 pod. The commands for that are pretty straightforward (link to article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20). All I do is run the following two commands on the pod's web terminal after it starts up, and I'm good to go: 1) (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) & 2) ollama run [model_name] However, what I would like to do is have these commands run automatically upon starting the pod. My initial thought was to enter the above two commands into the 'Container Start Command' field on the pod deployment page (as seen in image attached). I'm not sure how to write these start-up commands and would be grateful for any assistance.
Set up Ollama on your GPU Pod | RunPod Documentation
Set up Ollama server and run LLMs with RunPod GPUs
No description
26 Replies
Madiator2011
Madiator20113mo ago
why not use ollama docker image instead?
acamp
acamp3mo ago
I was just looking into that. Did you have any resources that maybe helpful? I was referring to this link: https://hub.docker.com/r/ollama/ollama#!, but I was wondering if there was an approach more suited to RunPod.
Madiator2011
Madiator20113mo ago
you want something like this https://runpod.io/console/deploy?template=q5rqanpolz&ref=vfker49t note this is api thingy not like chat via terminal
acamp
acamp3mo ago
Thanks for the link. I went ahead and spun up a pod with the ollama/ollama container image. After the pod starts, would you know how to make inferences with a model (e.g. gemma).
Madiator2011
Madiator20113mo ago
you could pass gemma in container command
Madiator2011
Madiator20113mo ago
like this
No description
acamp
acamp3mo ago
I went ahead and tried "run gemma" (image attached), but I get an error message in the container logs that says: Error: could not connect to ollama app, is it running?
No description
Madiator2011
Madiator20113mo ago
Delete run
acamp
acamp3mo ago
If I just have "gemma", the error messages is: Error: unknown command "gemma" for "ollama"
Madiator2011
Madiator20113mo ago
try gemma:7b
acamp
acamp3mo ago
It seems to be returning the same error.
Madiator2011
Madiator20113mo ago
try memby set image to ollama/ollama:latest
acamp
acamp3mo ago
Tried this, and the error seems to be the same. It looks like I just have to run two commands - "serve" and "run gemma", after which I should be able to make inferences with gemma, but I'm not sure how to implement that. Thank you for all the support so far, but are there any other fixes I coudl implement to get this to work?
Madiator2011
Madiator20113mo ago
what if you put serve run gemma
acamp
acamp3mo ago
It returns the following error: Error: accepts 0 arg(s), received 2 It looks like the Container Start Command can only take one command.
Madiator2011
Madiator20113mo ago
the docker container runs command server first
acamp
acamp3mo ago
Yes, but in this case I think it's trying to run the command "serve" along with "run" and "gemma" as the arguments.
Madiator2011
Madiator20113mo ago
yes
acamp
acamp2mo ago
@justin [Not Staff] Hey Justin, I noticed that you were able to provide some valuable advise to other users regarding Ollama on runpod, so I was hoping to reach out to you regarding this thread, that I have yet to debug.
justin
justin2mo ago
No, you hit one of the problems I have with Ollama. You need a background server to then run the ollama run command; i've tried to automate this in the pass adding a simple start.sh script, so on, but I couldn't get it working with the pod. But I could get something basic working with serverless, but it still "redownloads" the model every time. Idk something about their hashing algorithm. I ended up using openllm instead: https://github.com/bentoml/OpenLLM And then in my dockerfile, I just run this preload.py script, which basically does everything I need it to do. https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/preload.py Maybe you can play around with my repo on Pod mode, to see if you can get it working with Gemma
GitHub
GitHub - bentoml/OpenLLM: Run any open-source LLMs, such as Llama 2...
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud. - bentoml/OpenLLM
GitHub
Runpod-OpenLLM-Pod-and-Serverless/preload.py at main · justinwlin/R...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
justin
justin2mo ago
but ive included instructions for llama / mistral7b
Madiator2011
Madiator20112mo ago
@justin [Not Staff] @acamp #Open WebUI (Formerly Ollama WebUI) something you might like it has ollama running in background 🙂
justin
justin2mo ago
Oo is there a repo to the docker file id love to see how it works
Madiator2011
Madiator20112mo ago
GitHub
GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Forme...
User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui
Madiator2011
Madiator20112mo ago
btw you can use service instead of runpodctl in pods create service file and start with service name start
acamp
acamp2mo ago
@Papa Madiator and @justin [Not Staff] Thank you both for the assistance and resources! Would you happen to know if it's possible to setup llama3 on open-webui and make inferences to it using an API? I was not able to find specific instructions on how to set up an LLM on open-webui