Llama 3.1 via Ollama
You can now use the tutorial on running Ollama on serverless environments (https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference) in combination with Llama 3.1.
We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs.
We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs.

Learn to set up and run an Ollama server on RunPod CPU for inference with this step-by-step tutorial.



