How to Install Ollama on Mac, Windows, and Linux
Step-by-step guide to installing Ollama on macOS, Windows, or Linux and running your first AI model locally in under five minutes — no GPU required.
Ollama is the fastest way to run AI models on your own computer. This guide covers installation on all major platforms.
Prerequisites
Before you start, make sure you have:
- 8 GB RAM minimum (16 GB recommended)
- 10 GB free disk space
- A modern operating system (macOS 12+, Windows 10+, or Linux)
Installation
macOS
The easiest way to install Ollama on Mac:
- Download Ollama from ollama.com/download
- Open the downloaded
.zipfile - Drag Ollama to your Applications folder
- Launch Ollama
Or via Homebrew:
brew install ollamaLinux
Install with one command:
curl -fsSL https://ollama.com/install.sh | shFor specific distributions, check the official docs.
Windows
- Download Ollama from ollama.com/download
- Run the installer
- Follow the setup wizard
Running Your First Model
Once Ollama is installed, open your terminal and run:
ollama run llama3.2This will download the Llama 3.2 8B model (~4.7 GB) and start a chat session. The first run takes a few minutes to download the model.
Popular Models to Try
Here are some great models to start with, ordered by size:
Small Models (4-8 GB RAM)
# Great for basic tasks
ollama run llama3.2:3b
# Excellent for coding
ollama run qwen2.5-coder:7b
# Best reasoning at this size
ollama run deepseek-r1:8bMedium Models (16 GB RAM)
# Best all-rounder
ollama run llama3.3:70b
# Great for multilingual
ollama run qwen2.5:14bUsing the API Server
Ollama automatically starts an OpenAI-compatible API server at http://localhost:11434. You can use it with any OpenAI client:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Hello, how are you?" }
]
}'Common Issues
"Out of Memory" Error
If you get an OOM error:
- Try a smaller model (e.g.,
llama3.2:3binstead ofllama3.2) - Close other applications to free up RAM
- Check our models for 8GB RAM guide
Slow Inference
If responses are slow:
- Make sure GPU acceleration is enabled
- On Mac: Activity Monitor → check if "GPU" is being used
- On Linux: ensure NVIDIA drivers are up to date
Managing Models
# List downloaded models
ollama list
# Delete a model
ollama rm llama3.2
# Update a model
ollama pull llama3.2What's Next?
- Install LM Studio for a GUI alternative
- Compare Ollama vs LM Studio
- Deploy Ollama on Runpod for cloud GPU access
More Posts
Apple Silicon LLM Optimization — Get the Most from M1, M2, M3, and M4
TutorialOptimize local AI performance on Apple Silicon. Covers Metal GPU acceleration, unified memory advantages, and the best models for each Mac chip generation.

How to Run Llama Locally — Step-by-Step Guide for 2026
TutorialRun Meta's Llama models on your own computer. Covers Llama 3.2 and 3.1, model size selection by RAM, and step-by-step setup with Ollama and LM Studio.

Cheapest Way to Run LLM — Local, Cloud, and Hybrid Options Compared
GuideA cost-focused guide to running large language models. Compare local hardware costs, cloud GPU pricing, and find the cheapest approach for your situation.
