Ollama Tutorial for Beginners — From Zero to Chatting with AI
A hands-on beginner tutorial for Ollama. Learn to install, run models, use system prompts, switch between models, and tap into the API for your own projects.
Ollama is the fastest way to start running AI models on your own computer. This tutorial goes beyond basic installation — you'll learn how to have better conversations, use system prompts, switch models, and use the API.
Prerequisites
- 8 GB RAM minimum (16 GB recommended)
- macOS, Windows, or Linux
- Basic terminal familiarity
Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows — download the installer from https://ollama.comVerify the installation:
ollama --versionRun Your First Model
ollama run llama3.2Ollama downloads the model (about 2 GB) and starts an interactive chat session. Type your message and press Enter. The AI responds directly in your terminal.
To exit the chat, type /bye or press Ctrl+D.
Conversation Tips
Be Specific
Bad:
Help me with codeGood:
Write a Python function that reads a CSV file and returns the
rows where the "price" column is greater than 100. Include error
handling for missing files.Ask for Formats
List the top 5 benefits of running AI locally.
Format as a numbered list with one sentence each.Iterate
Now make it more concise.Rewrite that for a non-technical audience.Ollama remembers your conversation context within the same session.
System Prompts
System prompts set the AI's behavior for the entire conversation. They're powerful for customizing output.
Create a file called Modelfile:
cat > Modelfile << 'EOF'
FROM llama3.2
SYSTEM """
You are a concise technical writer. Always respond in bullet points.
Keep answers under 100 words. Use simple language.
"""
EOF
ollama create tech-writer -f Modelfile
ollama run tech-writerNow every response from this model follows your system prompt rules.
Switching Between Models
You don't need to stop Ollama to switch models. In a chat session:
>>> /byemodel qwen2.5:7bOr start a new session with a different model:
ollama run qwen2.5:7bPopular models to try:
| Model | Command | Size | Best For |
|---|---|---|---|
| Llama 3.2 | ollama run llama3.2 | 2 GB | General tasks, fast |
| Llama 3.1 8B | ollama run llama3.1 | 4.9 GB | General chat, coding |
| Qwen 2.5 7B | ollama run qwen2.5:7b | 4.7 GB | Coding, multilingual |
| Mistral 7B | ollama run mistral:7b | 4.4 GB | Conversation |
| DeepSeek R1 | ollama run deepseek-r1:8b | 4.9 GB | Reasoning, math |
See the full list: ollama list (installed models) or browse at ollama.com/library.
Managing Models
# List installed models
ollama list
# Pull a model without running it
ollama pull llama3.1
# Delete a model to free space
ollama rm llama3.2
# Get info about a model
ollama show llama3.1Using the API
Ollama runs an API server automatically. This lets you build applications that use local AI.
Start the server:
ollama serveMake a request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in one paragraph",
"stream": false
}'Chat-style API:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is local AI?"}
],
"stream": false
}'The API is OpenAI-compatible, so most OpenAI client libraries work with Ollama by changing the base URL.
Adding a Web Interface
Prefer a graphical interface? Open WebUI adds a ChatGPT-like web interface on top of Ollama:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:mainThen open http://localhost:3000 in your browser.
Performance Tips
- Close other apps to free RAM for the model
- Use smaller models (like Llama 3.2) for quick tasks
- Apple M-series Macs get the best performance with Metal acceleration
- NVIDIA GPUs are auto-detected and used for acceleration
- First response is slower — the model loads into memory, then subsequent responses are fast
Summary
You now know how to:
- Install and run Ollama
- Have effective conversations with AI models
- Create custom models with system prompts
- Switch between models for different tasks
- Use the API in your own applications
Next Steps
- Best Models for 8GB RAM — detailed model recommendations
- Ollama vs LM Studio — compare with a GUI alternative
- Deploy Ollama on Runpod — run bigger models on cloud GPU
More Posts
Ollama vs Open WebUI — Engine or Interface, Which Do You Need?
ComparisonOllama runs models; Open WebUI gives them a browser interface. They work together, not against each other. Here is how to decide which one — or both — you need.

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes
TutorialDeploy Open WebUI with Ollama on Runpod for a private, ChatGPT-like experience on cloud GPU. Access your AI assistant from any device with a web browser.

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)
GuideYes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.
