Run Open WebUI on Runpod — Cloud ChatGPT in 10 Minutes
Deploy Open WebUI with Ollama on Runpod for a private, ChatGPT-like experience on cloud GPU. Access your AI assistant from any device with a web browser.
Want a ChatGPT-like experience running on your own cloud GPU? Open WebUI on Runpod gives you a beautiful browser interface with full model control, document chat, and multi-user support — all private and self-hosted.
What You'll Build
- Open WebUI accessible from any browser
- Powered by Ollama on a cloud GPU
- Persistent storage for models and conversations
- Multi-user accounts (optional)
Step 1: Create a Network Volume
- Go to Storage → Network Volumes in Runpod
- Click Add Network Volume
- Size: 50 GB
- Data Center: Remember which one (must match your GPU)
Step 2: Deploy a GPU Instance with Ollama
- Go to GPU Cloud → Deploy
- Choose a GPU (RTX 4090 recommended for best value)
- Select the same data center as your volume
- Use the Ollama community template
- Attach your network volume at
/workspace - Deploy and wait for it to start
Step 3: Connect and Prepare Ollama
Connect via HTTP Proxy terminal:
# Set persistent model storage
export OLLAMA_MODELS=/workspace/ollama/models
mkdir -p /workspace/ollama/models
# Stop default service and restart with correct config
sudo systemctl stop ollama 2>/dev/null || true
OLLAMA_MODELS=/workspace/ollama/models OLLAMA_HOST=0.0.0.0:11434 ollama serve > /workspace/ollama.log 2>&1 &
# Download your preferred models
sleep 5
ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull deepseek-r1:8bStep 4: Deploy Open WebUI
Run Open WebUI in Docker on the same instance:
docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v /workspace/open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:mainThis connects Open WebUI to your local Ollama instance and stores data persistently.
Step 5: Access Open WebUI
- Go to your instance settings
- Expose port 3000
- Open the proxy URL in your browser:
https://your-pod-id.proxy.runpod.net-3000.proxy.runpod.net
Alternatively, connect via the Runpod HTTP Proxy on port 3000.
Step 6: Set Up Your Account
- Open WebUI shows a registration page on first visit
- Create your admin account (this is stored locally, not in any cloud)
- You're now in a ChatGPT-like interface powered by your own cloud GPU
Usage Tips
Starting a Chat
- Select a model from the dropdown (the models you pulled in Step 3 appear here)
- Type your message and press Enter
- The response comes from your cloud GPU — private and fast
Document Chat (RAG)
- Click the + button or drag files into the chat
- Upload PDFs, text files, or paste web URLs
- Ask questions about the documents
- Open WebUI searches the documents and provides cited answers
Multi-User Setup
- As admin, go to Settings → Users
- Enable registration or create accounts manually
- Each user gets their own conversation history
- Models and documents can be shared or kept private
Cost Management
Recommended Setup for Cost Efficiency
- RTX 4090 at $0.44/hr
- Auto-Stop set to 1 hour of inactivity
- Spot instance for even lower cost (with interruption risk)
Monthly Cost Estimates
| Usage | GPU | Monthly Cost |
|---|---|---|
| 2 hrs/day weekdays | RTX 4090 | ~$18 |
| 4 hrs/day weekdays | RTX 4090 | ~$35 |
| 8 hrs/day weekdays | RTX 4090 | ~$70 |
Auto-Start Script
Create a script to restart both services after instance restart:
cat > /workspace/start-all.sh << 'EOF'
#!/bin/bash
export OLLAMA_MODELS=/workspace/ollama/models
export OLLAMA_HOST=0.0.0.0:11434
# Start Ollama
pkill ollama 2>/dev/null || true
sleep 2
ollama serve > /workspace/ollama.log 2>&1 &
sleep 5
# Start Open WebUI
docker start open-webui 2>/dev/null || docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v /workspace/open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
echo "All services started!"
ollama list
EOF
chmod +x /workspace/start-all.shTroubleshooting
Open WebUI can't connect to Ollama: Verify Ollama is running with ollama list. Check that OLLAMA_HOST=0.0.0.0:11434 is set.
Port 3000 not accessible: Make sure the port is exposed in Runpod instance settings.
Models not showing: Verify OLLAMA_MODELS points to /workspace/ollama/models and models were pulled successfully.
Slow responses: Check if the model fits in your GPU's VRAM. An RTX 4090 (24GB) handles models up to 14B comfortably.
Summary
You now have a private, ChatGPT-like experience running on cloud GPU. Open WebUI handles the interface while Ollama runs the models. Data persists between sessions, and you can access it from any browser.
Next Steps
- Deploy Ollama on Runpod — deeper Ollama configuration
- Ollama vs Open WebUI — understand how they work together
- Best GPU Cloud for LLM — compare cloud providers
Author

Categories
More Posts
Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)
GuideYes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

Private AI Setup Guide — Run AI Completely Offline in 2026
TutorialA step-by-step guide to setting up a fully private, offline AI system. No data leaves your machine — covers model selection, tools, and privacy best practices.

Best AI Models for 8GB RAM — What Can You Run Locally?
GuideA complete guide to the best LLMs you can run on a computer with 8GB of RAM. Includes benchmarks, practical recommendations, and setup commands for each model.
