Local AI Hub
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Blog
Ollama Tutorial for Beginners — From Zero to Chatting with AI
2026/04/10
Beginner15 min

Ollama Tutorial for Beginners — From Zero to Chatting with AI

A hands-on beginner tutorial for Ollama. Learn to install, run models, use system prompts, switch between models, and tap into the API for your own projects.

Ollama is the fastest way to start running AI models on your own computer. This tutorial goes beyond basic installation — you'll learn how to have better conversations, use system prompts, switch models, and use the API.

Prerequisites

  • 8 GB RAM minimum (16 GB recommended)
  • macOS, Windows, or Linux
  • Basic terminal familiarity

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the installer from https://ollama.com

Verify the installation:

ollama --version

Run Your First Model

ollama run llama3.2

Ollama downloads the model (about 2 GB) and starts an interactive chat session. Type your message and press Enter. The AI responds directly in your terminal.

To exit the chat, type /bye or press Ctrl+D.

Conversation Tips

Be Specific

Bad:

Help me with code

Good:

Write a Python function that reads a CSV file and returns the
rows where the "price" column is greater than 100. Include error
handling for missing files.

Ask for Formats

List the top 5 benefits of running AI locally.
Format as a numbered list with one sentence each.

Iterate

Now make it more concise.
Rewrite that for a non-technical audience.

Ollama remembers your conversation context within the same session.

System Prompts

System prompts set the AI's behavior for the entire conversation. They're powerful for customizing output.

Create a file called Modelfile:

cat > Modelfile << 'EOF'
FROM llama3.2

SYSTEM """
You are a concise technical writer. Always respond in bullet points.
Keep answers under 100 words. Use simple language.
"""
EOF

ollama create tech-writer -f Modelfile
ollama run tech-writer

Now every response from this model follows your system prompt rules.

Switching Between Models

You don't need to stop Ollama to switch models. In a chat session:

>>> /byemodel qwen2.5:7b

Or start a new session with a different model:

ollama run qwen2.5:7b

Popular models to try:

ModelCommandSizeBest For
Llama 3.2ollama run llama3.22 GBGeneral tasks, fast
Llama 3.1 8Bollama run llama3.14.9 GBGeneral chat, coding
Qwen 2.5 7Bollama run qwen2.5:7b4.7 GBCoding, multilingual
Mistral 7Bollama run mistral:7b4.4 GBConversation
DeepSeek R1ollama run deepseek-r1:8b4.9 GBReasoning, math

See the full list: ollama list (installed models) or browse at ollama.com/library.

Managing Models

# List installed models
ollama list

# Pull a model without running it
ollama pull llama3.1

# Delete a model to free space
ollama rm llama3.2

# Get info about a model
ollama show llama3.1

Using the API

Ollama runs an API server automatically. This lets you build applications that use local AI.

Start the server:

ollama serve

Make a request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in one paragraph",
  "stream": false
}'

Chat-style API:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is local AI?"}
  ],
  "stream": false
}'

The API is OpenAI-compatible, so most OpenAI client libraries work with Ollama by changing the base URL.

Adding a Web Interface

Prefer a graphical interface? Open WebUI adds a ChatGPT-like web interface on top of Ollama:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser.

Performance Tips

  • Close other apps to free RAM for the model
  • Use smaller models (like Llama 3.2) for quick tasks
  • Apple M-series Macs get the best performance with Metal acceleration
  • NVIDIA GPUs are auto-detected and used for acceleration
  • First response is slower — the model loads into memory, then subsequent responses are fast

Summary

You now know how to:

  • Install and run Ollama
  • Have effective conversations with AI models
  • Create custom models with system prompts
  • Switch between models for different tasks
  • Use the API in your own applications

Next Steps

  • Best Models for 8GB RAM — detailed model recommendations
  • Ollama vs LM Studio — compare with a GUI alternative
  • Deploy Ollama on Runpod — run bigger models on cloud GPU
Ready for bigger models? Try cloud GPU on Runpod.
Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.
Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Author

avatar for Local AI Hub
Local AI Hub

Categories

  • Getting Started
  • Tutorials
PrerequisitesInstall OllamaRun Your First ModelConversation TipsBe SpecificAsk for FormatsIterateSystem PromptsSwitching Between ModelsManaging ModelsUsing the APIAdding a Web InterfacePerformance TipsSummaryNext Steps

More Posts

Ollama vs Open WebUI — Engine or Interface, Which Do You Need?
ComparisonsTutorials

Ollama vs Open WebUI — Engine or Interface, Which Do You Need?

Comparison

Ollama runs models; Open WebUI gives them a browser interface. They work together, not against each other. Here is how to decide which one — or both — you need.

avatar for Local AI Hub
Local AI Hub
2026/04/10
Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes
Cloud DeployTutorials

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Tutorial

Deploy Open WebUI with Ollama on Runpod for a private, ChatGPT-like experience on cloud GPU. Access your AI assistant from any device with a web browser.

avatar for Local AI Hub
Local AI Hub
2026/04/16
Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)
Lists & GuidesModels & Hardware

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Guide

Yes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

avatar for Local AI Hub
Local AI Hub
2026/04/14
Local AI Hub

Run AI locally — fast, cheap, and private

Resources
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Device Check
  • Blog
Company
  • About
  • Contact
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 Local AI Hub. All Rights Reserved.