Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

2026/04/16

Intermediate30 min

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Deploy Open WebUI with Ollama on Runpod for a private, ChatGPT-like experience on cloud GPU. Access your AI assistant from any device with a web browser.

Want a ChatGPT-like experience running on your own cloud GPU? Open WebUI on Runpod gives you a beautiful browser interface with full model control, document chat, and multi-user support — all private and self-hosted.

What You'll Build

Open WebUI accessible from any browser
Powered by Ollama on a cloud GPU
Persistent storage for models and conversations
Multi-user accounts (optional)

Step 1: Create a Network Volume

Go to Storage → Network Volumes in Runpod
Click Add Network Volume
Size: 50 GB
Data Center: Remember which one (must match your GPU)

Step 2: Deploy a GPU Instance with Ollama

Go to GPU Cloud → Deploy
Choose a GPU (RTX 4090 recommended for best value)
Select the same data center as your volume
Use the Ollama community template
Attach your network volume at /workspace
Deploy and wait for it to start

Step 3: Connect and Prepare Ollama

Connect via HTTP Proxy terminal:

# Set persistent model storage
export OLLAMA_MODELS=/workspace/ollama/models
mkdir -p /workspace/ollama/models

# Stop default service and restart with correct config
sudo systemctl stop ollama 2>/dev/null || true
OLLAMA_MODELS=/workspace/ollama/models OLLAMA_HOST=0.0.0.0:11434 ollama serve > /workspace/ollama.log 2>&1 &

# Download your preferred models
sleep 5
ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull deepseek-r1:8b

Step 4: Deploy Open WebUI

Run Open WebUI in Docker on the same instance:

docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v /workspace/open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

This connects Open WebUI to your local Ollama instance and stores data persistently.

Step 5: Access Open WebUI

Go to your instance settings
Expose port 3000
Open the proxy URL in your browser: https://your-pod-id.proxy.runpod.net-3000.proxy.runpod.net

Alternatively, connect via the Runpod HTTP Proxy on port 3000.

Step 6: Set Up Your Account

Open WebUI shows a registration page on first visit
Create your admin account (this is stored locally, not in any cloud)
You're now in a ChatGPT-like interface powered by your own cloud GPU

Usage Tips

Starting a Chat

Select a model from the dropdown (the models you pulled in Step 3 appear here)
Type your message and press Enter
The response comes from your cloud GPU — private and fast

Document Chat (RAG)

Click the + button or drag files into the chat
Upload PDFs, text files, or paste web URLs
Ask questions about the documents
Open WebUI searches the documents and provides cited answers

Multi-User Setup

As admin, go to Settings → Users
Enable registration or create accounts manually
Each user gets their own conversation history
Models and documents can be shared or kept private

Cost Management

Recommended Setup for Cost Efficiency

RTX 4090 at $0.44/hr
Auto-Stop set to 1 hour of inactivity
Spot instance for even lower cost (with interruption risk)

Monthly Cost Estimates

Usage	GPU	Monthly Cost
2 hrs/day weekdays	RTX 4090	~$18
4 hrs/day weekdays	RTX 4090	~$35
8 hrs/day weekdays	RTX 4090	~$70

Auto-Start Script

Create a script to restart both services after instance restart:

cat > /workspace/start-all.sh << 'EOF'
#!/bin/bash
export OLLAMA_MODELS=/workspace/ollama/models
export OLLAMA_HOST=0.0.0.0:11434

# Start Ollama
pkill ollama 2>/dev/null || true
sleep 2
ollama serve > /workspace/ollama.log 2>&1 &
sleep 5

# Start Open WebUI
docker start open-webui 2>/dev/null || docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v /workspace/open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

echo "All services started!"
ollama list
EOF

chmod +x /workspace/start-all.sh

Troubleshooting

Open WebUI can't connect to Ollama: Verify Ollama is running with ollama list. Check that OLLAMA_HOST=0.0.0.0:11434 is set.

Port 3000 not accessible: Make sure the port is exposed in Runpod instance settings.

Models not showing: Verify OLLAMA_MODELS points to /workspace/ollama/models and models were pulled successfully.

Slow responses: Check if the model fits in your GPU's VRAM. An RTX 4090 (24GB) handles models up to 14B comfortably.

Summary

You now have a private, ChatGPT-like experience running on cloud GPU. Open WebUI handles the interface while Ollama runs the models. Data persists between sessions, and you can access it from any browser.

Next Steps

Deploy Ollama on Runpod — deeper Ollama configuration
Ollama vs Open WebUI — understand how they work together
Best GPU Cloud for LLM — compare cloud providers

Deploy Open WebUI on Runpod for a private cloud AI experience.

Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.

Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Lists & GuidesModels & Hardware

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Guide

Yes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

Local AI Hub

2026/04/14

Tutorials

Private AI Setup Guide — Run AI Completely Offline in 2026

Tutorial

A step-by-step guide to setting up a fully private, offline AI system. No data leaves your machine — covers model selection, tools, and privacy best practices.

Local AI Hub

2026/04/20

Lists & GuidesModels & Hardware

Best AI Models for 8GB RAM — What Can You Run Locally?

Guide

A complete guide to the best LLMs you can run on a computer with 8GB of RAM. Includes benchmarks, practical recommendations, and setup commands for each model.

Local AI Hub

2026/04/10

2026/04/16

Intermediate30 min

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Deploy Open WebUI with Ollama on Runpod for a private, ChatGPT-like experience on cloud GPU. Access your AI assistant from any device with a web browser.

What You'll Build

Open WebUI accessible from any browser
Powered by Ollama on a cloud GPU
Persistent storage for models and conversations
Multi-user accounts (optional)

Step 1: Create a Network Volume

Go to Storage → Network Volumes in Runpod
Click Add Network Volume
Size: 50 GB
Data Center: Remember which one (must match your GPU)

Step 2: Deploy a GPU Instance with Ollama

Go to GPU Cloud → Deploy
Choose a GPU (RTX 4090 recommended for best value)
Select the same data center as your volume
Use the Ollama community template
Attach your network volume at /workspace
Deploy and wait for it to start

Step 3: Connect and Prepare Ollama

Connect via HTTP Proxy terminal:

# Set persistent model storage
export OLLAMA_MODELS=/workspace/ollama/models
mkdir -p /workspace/ollama/models

# Stop default service and restart with correct config
sudo systemctl stop ollama 2>/dev/null || true
OLLAMA_MODELS=/workspace/ollama/models OLLAMA_HOST=0.0.0.0:11434 ollama serve > /workspace/ollama.log 2>&1 &

# Download your preferred models
sleep 5
ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull deepseek-r1:8b

Step 4: Deploy Open WebUI

Run Open WebUI in Docker on the same instance:

docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v /workspace/open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

This connects Open WebUI to your local Ollama instance and stores data persistently.

Step 5: Access Open WebUI

Go to your instance settings
Expose port 3000
Open the proxy URL in your browser: https://your-pod-id.proxy.runpod.net-3000.proxy.runpod.net

Alternatively, connect via the Runpod HTTP Proxy on port 3000.

Step 6: Set Up Your Account

Open WebUI shows a registration page on first visit
Create your admin account (this is stored locally, not in any cloud)
You're now in a ChatGPT-like interface powered by your own cloud GPU

Usage Tips

Starting a Chat

Select a model from the dropdown (the models you pulled in Step 3 appear here)
Type your message and press Enter
The response comes from your cloud GPU — private and fast

Document Chat (RAG)

Click the + button or drag files into the chat
Upload PDFs, text files, or paste web URLs
Ask questions about the documents
Open WebUI searches the documents and provides cited answers

Multi-User Setup

As admin, go to Settings → Users
Enable registration or create accounts manually
Each user gets their own conversation history
Models and documents can be shared or kept private

Cost Management

Recommended Setup for Cost Efficiency

RTX 4090 at $0.44/hr
Auto-Stop set to 1 hour of inactivity
Spot instance for even lower cost (with interruption risk)

Monthly Cost Estimates

Usage	GPU	Monthly Cost
2 hrs/day weekdays	RTX 4090	~$18
4 hrs/day weekdays	RTX 4090	~$35
8 hrs/day weekdays	RTX 4090	~$70

Auto-Start Script

Create a script to restart both services after instance restart:

cat > /workspace/start-all.sh << 'EOF'
#!/bin/bash
export OLLAMA_MODELS=/workspace/ollama/models
export OLLAMA_HOST=0.0.0.0:11434

# Start Ollama
pkill ollama 2>/dev/null || true
sleep 2
ollama serve > /workspace/ollama.log 2>&1 &
sleep 5

# Start Open WebUI
docker start open-webui 2>/dev/null || docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v /workspace/open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

echo "All services started!"
ollama list
EOF

chmod +x /workspace/start-all.sh

Troubleshooting

Open WebUI can't connect to Ollama: Verify Ollama is running with ollama list. Check that OLLAMA_HOST=0.0.0.0:11434 is set.

Port 3000 not accessible: Make sure the port is exposed in Runpod instance settings.

Models not showing: Verify OLLAMA_MODELS points to /workspace/ollama/models and models were pulled successfully.

Slow responses: Check if the model fits in your GPU's VRAM. An RTX 4090 (24GB) handles models up to 14B comfortably.

Summary

Next Steps

Deploy Ollama on Runpod — deeper Ollama configuration
Ollama vs Open WebUI — understand how they work together
Best GPU Cloud for LLM — compare cloud providers

Deploy Open WebUI on Runpod for a private cloud AI experience.

Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.

Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Lists & GuidesModels & Hardware

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Guide

Yes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

Local AI Hub

2026/04/14

Tutorials

Private AI Setup Guide — Run AI Completely Offline in 2026

Tutorial

A step-by-step guide to setting up a fully private, offline AI system. No data leaves your machine — covers model selection, tools, and privacy best practices.

Local AI Hub

2026/04/20

Lists & GuidesModels & Hardware

Best AI Models for 8GB RAM — What Can You Run Locally?

Guide

A complete guide to the best LLMs you can run on a computer with 8GB of RAM. Includes benchmarks, practical recommendations, and setup commands for each model.

Local AI Hub

2026/04/10

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Author

Categories

More Posts

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Private AI Setup Guide — Run AI Completely Offline in 2026

Best AI Models for 8GB RAM — What Can You Run Locally?

Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes

Author

Categories

More Posts

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Private AI Setup Guide — Run AI Completely Offline in 2026

Best AI Models for 8GB RAM — What Can You Run Locally?