Ollama Tutorial for Beginners — From Zero to Chatting with AI

2026/04/10

Beginner15 min

Ollama Tutorial for Beginners — From Zero to Chatting with AI

A hands-on beginner tutorial for Ollama. Learn to install, run models, use system prompts, switch between models, and tap into the API for your own projects.

Ollama is the fastest way to start running AI models on your own computer. This tutorial goes beyond basic installation — you'll learn how to have better conversations, use system prompts, switch models, and use the API.

Prerequisites

8 GB RAM minimum (16 GB recommended)
macOS, Windows, or Linux
Basic terminal familiarity

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the installer from https://ollama.com

Verify the installation:

ollama --version

Run Your First Model

ollama run llama3.2

Ollama downloads the model (about 2 GB) and starts an interactive chat session. Type your message and press Enter. The AI responds directly in your terminal.

To exit the chat, type /bye or press Ctrl+D.

Conversation Tips

Be Specific

Bad:

Help me with code

Good:

Write a Python function that reads a CSV file and returns the
rows where the "price" column is greater than 100. Include error
handling for missing files.

Ask for Formats

List the top 5 benefits of running AI locally.
Format as a numbered list with one sentence each.

Iterate

Now make it more concise.

Rewrite that for a non-technical audience.

Ollama remembers your conversation context within the same session.

System Prompts

System prompts set the AI's behavior for the entire conversation. They're powerful for customizing output.

Create a file called Modelfile:

cat > Modelfile << 'EOF'
FROM llama3.2

SYSTEM """
You are a concise technical writer. Always respond in bullet points.
Keep answers under 100 words. Use simple language.
"""
EOF

ollama create tech-writer -f Modelfile
ollama run tech-writer

Now every response from this model follows your system prompt rules.

Switching Between Models

You don't need to stop Ollama to switch models. In a chat session:

>>> /byemodel qwen2.5:7b

Or start a new session with a different model:

ollama run qwen2.5:7b

Popular models to try:

Model	Command	Size	Best For
Llama 3.2	`ollama run llama3.2`	2 GB	General tasks, fast
Llama 3.1 8B	`ollama run llama3.1`	4.9 GB	General chat, coding
Qwen 2.5 7B	`ollama run qwen2.5:7b`	4.7 GB	Coding, multilingual
Mistral 7B	`ollama run mistral:7b`	4.4 GB	Conversation
DeepSeek R1	`ollama run deepseek-r1:8b`	4.9 GB	Reasoning, math

See the full list: ollama list (installed models) or browse at ollama.com/library.

Managing Models

# List installed models
ollama list

# Pull a model without running it
ollama pull llama3.1

# Delete a model to free space
ollama rm llama3.2

# Get info about a model
ollama show llama3.1

Using the API

Ollama runs an API server automatically. This lets you build applications that use local AI.

Start the server:

ollama serve

Make a request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in one paragraph",
  "stream": false
}'

Chat-style API:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is local AI?"}
  ],
  "stream": false
}'

The API is OpenAI-compatible, so most OpenAI client libraries work with Ollama by changing the base URL.

Adding a Web Interface

Prefer a graphical interface? Open WebUI adds a ChatGPT-like web interface on top of Ollama:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser.

Performance Tips

Close other apps to free RAM for the model
Use smaller models (like Llama 3.2) for quick tasks
Apple M-series Macs get the best performance with Metal acceleration
NVIDIA GPUs are auto-detected and used for acceleration
First response is slower — the model loads into memory, then subsequent responses are fast

Summary

You now know how to:

Install and run Ollama
Have effective conversations with AI models
Create custom models with system prompts
Switch between models for different tasks
Use the API in your own applications

Next Steps

Best Models for 8GB RAM — detailed model recommendations
Ollama vs LM Studio — compare with a GUI alternative
Deploy Ollama on Runpod — run bigger models on cloud GPU

Ready for bigger models? Try cloud GPU on Runpod.

Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.

Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Author

Local AI Hub

Ollama Tutorial for Beginners — From Zero to Chatting with AI

A hands-on beginner tutorial for Ollama. Learn to install, run models, use system prompts, switch between models, and tap into the API for your own projects.

Prerequisites

8 GB RAM minimum (16 GB recommended)
macOS, Windows, or Linux
Basic terminal familiarity

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the installer from https://ollama.com

Verify the installation:

ollama --version

Run Your First Model

ollama run llama3.2

Ollama downloads the model (about 2 GB) and starts an interactive chat session. Type your message and press Enter. The AI responds directly in your terminal.

To exit the chat, type /bye or press Ctrl+D.

Conversation Tips

Be Specific

Bad:

Help me with code

Good:

Write a Python function that reads a CSV file and returns the
rows where the "price" column is greater than 100. Include error
handling for missing files.

Ask for Formats

List the top 5 benefits of running AI locally.
Format as a numbered list with one sentence each.

Iterate

Now make it more concise.

Rewrite that for a non-technical audience.

Ollama remembers your conversation context within the same session.

System Prompts

System prompts set the AI's behavior for the entire conversation. They're powerful for customizing output.

Create a file called Modelfile:

cat > Modelfile << 'EOF'
FROM llama3.2

SYSTEM """
You are a concise technical writer. Always respond in bullet points.
Keep answers under 100 words. Use simple language.
"""
EOF

ollama create tech-writer -f Modelfile
ollama run tech-writer

Now every response from this model follows your system prompt rules.

Switching Between Models

You don't need to stop Ollama to switch models. In a chat session:

>>> /byemodel qwen2.5:7b

Or start a new session with a different model:

ollama run qwen2.5:7b

Popular models to try:

Model	Command	Size	Best For
Llama 3.2	`ollama run llama3.2`	2 GB	General tasks, fast
Llama 3.1 8B	`ollama run llama3.1`	4.9 GB	General chat, coding
Qwen 2.5 7B	`ollama run qwen2.5:7b`	4.7 GB	Coding, multilingual
Mistral 7B	`ollama run mistral:7b`	4.4 GB	Conversation
DeepSeek R1	`ollama run deepseek-r1:8b`	4.9 GB	Reasoning, math

See the full list: ollama list (installed models) or browse at ollama.com/library.

Managing Models

# List installed models
ollama list

# Pull a model without running it
ollama pull llama3.1

# Delete a model to free space
ollama rm llama3.2

# Get info about a model
ollama show llama3.1

Using the API

Ollama runs an API server automatically. This lets you build applications that use local AI.

Start the server:

ollama serve

Make a request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in one paragraph",
  "stream": false
}'

Chat-style API:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is local AI?"}
  ],
  "stream": false
}'

The API is OpenAI-compatible, so most OpenAI client libraries work with Ollama by changing the base URL.

Adding a Web Interface

Prefer a graphical interface? Open WebUI adds a ChatGPT-like web interface on top of Ollama:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser.

Performance Tips

Close other apps to free RAM for the model
Use smaller models (like Llama 3.2) for quick tasks
Apple M-series Macs get the best performance with Metal acceleration
NVIDIA GPUs are auto-detected and used for acceleration
First response is slower — the model loads into memory, then subsequent responses are fast

Summary

You now know how to:

Install and run Ollama
Have effective conversations with AI models
Create custom models with system prompts
Switch between models for different tasks
Use the API in your own applications

Next Steps

Best Models for 8GB RAM — detailed model recommendations
Ollama vs LM Studio — compare with a GUI alternative
Deploy Ollama on Runpod — run bigger models on cloud GPU

Ready for bigger models? Try cloud GPU on Runpod.

Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.

Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Author

Local AI Hub

Ollama Tutorial for Beginners — From Zero to Chatting with AI

Author

Categories

More Posts

Ollama vs Open WebUI — Engine or Interface, Which Do You Need?

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Ollama Tutorial for Beginners — From Zero to Chatting with AI

Author

Categories

More Posts

Ollama vs Open WebUI — Engine or Interface, Which Do You Need?

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)