Local AI Hub
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Blog
How to Deploy Ollama on Runpod — Run Any Model on Cloud GPU
2026/04/10
Intermediate25 min

How to Deploy Ollama on Runpod — Run Any Model on Cloud GPU

Step-by-step guide to deploying Ollama on Runpod with persistent storage, API access, and cost optimization. Run models up to 70B parameters on cloud GPU.

Running Ollama on Runpod gives you access to powerful GPUs without buying expensive hardware. This guide walks you through deploying Ollama with persistent storage so your models and data survive restarts.

Prerequisites

  • A Runpod account
  • Basic terminal familiarity
  • A credit card for billing (you only pay while the instance is running)

Step 1: Choose Your GPU

Select a GPU based on the models you want to run:

Model SizeMin VRAMRecommended GPUEst. Cost/hr
7-8B params8 GBRTX 4090~$0.44
14B params16 GBRTX 4090~$0.44
32B params32 GBA100 40GB~$0.80
70B params64 GBA100 80GB~$1.50

For most users, an RTX 4090 offers the best value. It can handle all models up to 14B parameters comfortably.

Step 2: Deploy with the Ollama Template

  1. Go to GPU Cloud in your Runpod dashboard
  2. Click Deploy
  3. Select your GPU (e.g., RTX 4090)
  4. In Customize Deployment, search for "Ollama" in the template list
  5. Select the official Ollama template
  6. Click Deploy

Wait 1-2 minutes for the instance to start. You'll see the status change to "Running."

Step 3: Connect and Verify

Click Connect on your instance, then choose Connect to HTTP Proxy to open a web terminal.

Verify Ollama is running:

ollama --version
ollama list

Step 4: Pull and Run Models

# Pull a model
ollama pull llama3.1:8b

# Run it
ollama run llama3.1:8b

# Try a larger model (if your GPU has enough VRAM)
ollama pull qwen2.5:14b
ollama run qwen2.5:14b

Step 5: Set Up Persistent Storage

By default, everything is lost when you stop the instance. To keep your models:

  1. Go to Storage in your Runpod dashboard
  2. Click Add Network Volume
  3. Choose a size (50 GB is enough for several models)
  4. Select a data center (pick the same region as your GPU)
  5. Attach the volume to your instance at /workspace

Then configure Ollama to use the volume:

# Stop Ollama
sudo systemctl stop ollama

# Set the model storage path
export OLLAMA_MODELS=/workspace/ollama/models

# Restart Ollama
ollama serve &

Now your downloaded models persist across restarts.

Step 6: Expose the API

To connect external tools (like Open WebUI or your own apps) to your cloud Ollama:

  1. In your instance settings, expose port 11434
  2. Use the Runpod proxy URL as your API endpoint

Your API URL will look like:

https://your-pod-id.proxy.runpod.net

Test it:

curl https://your-pod-id.proxy.runpod.net/api/tags

You can now use this URL as an OpenAI-compatible API endpoint in any application.

Step 7: Connect from Open WebUI

If you have Open WebUI running locally or on another instance:

  1. Go to Open WebUI Settings
  2. Set the Ollama API URL to your Runpod proxy URL
  3. Your cloud models will appear in the model selector

Cost Optimization

Enable Auto-Stop:

  1. Go to your instance settings
  2. Set Auto-Stop to 1 hour of inactivity
  3. This prevents accidental overcharges

Use Spot Instances:

  • Spot instances are up to 70% cheaper
  • They can be interrupted when demand is high
  • Fine for experimentation, not for production use

Estimated monthly costs (10 hours/week usage):

GPUMonthly Cost
RTX 4090~$17
A100 40GB~$32
A100 80GB~$60

Troubleshooting

Ollama not responding: Check if the service is running with ollama list. Restart with ollama serve.

Out of VRAM: Use a smaller quantization or a model with fewer parameters. Try ollama run llama3.1:8b-q3 for lower VRAM usage.

Slow first response: The first run after pulling a model loads it into VRAM. Subsequent responses will be fast.

Port not accessible: Make sure port 11434 is exposed in your instance settings.

Summary

Deploying Ollama on Runpod takes about 5 minutes. The Ollama template handles the setup. Add persistent storage to keep your models, expose the API port for external access, and set auto-stop to control costs.

Next Steps

  • Runpod Beginner Guide — if you're completely new to Runpod
  • Run Open WebUI on Runpod — add a browser interface
  • Best GPU Cloud for LLM — compare Runpod with alternatives
Deploy Ollama on Runpod in minutes — start with a free GPU instance.
Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.
Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Author

avatar for Local AI Hub
Local AI Hub

Categories

  • Cloud Deploy
  • Tutorials
PrerequisitesStep 1: Choose Your GPUStep 2: Deploy with the Ollama TemplateStep 3: Connect and VerifyStep 4: Pull and Run ModelsStep 5: Set Up Persistent StorageStep 6: Expose the APIStep 7: Connect from Open WebUICost OptimizationTroubleshootingSummaryNext Steps

More Posts

How to Install LM Studio — The Easiest Way to Run Local AI
Tutorials

How to Install LM Studio — The Easiest Way to Run Local AI

Tutorial

Download, install, and start chatting with AI models in under 5 minutes using LM Studio. No terminal needed — everything runs through a beautiful desktop app.

avatar for Local AI Hub
Local AI Hub
2026/04/10
Best AI Models for 32GB RAM — Run Professional-Grade LLMs Locally
Lists & GuidesModels & Hardware

Best AI Models for 32GB RAM — Run Professional-Grade LLMs Locally

Guide

32GB RAM unlocks professional-grade models like Qwen 2.5 32B and Mixtral 8x7B. Here is exactly what to run and how to get the best performance from each.

avatar for Local AI Hub
Local AI Hub
2026/04/18
Private AI Setup Guide — Run AI Completely Offline in 2026
Tutorials

Private AI Setup Guide — Run AI Completely Offline in 2026

Tutorial

A step-by-step guide to setting up a fully private, offline AI system. No data leaves your machine — covers model selection, tools, and privacy best practices.

avatar for Local AI Hub
Local AI Hub
2026/04/20
Local AI Hub

Run AI locally — fast, cheap, and private

Resources
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Device Check
  • Blog
Company
  • About
  • Contact
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 Local AI Hub. All Rights Reserved.