Local AI Hub
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Blog
Enterprise Local AI Deployment — Air-Gapped, On-Premise, and Compliant
2026/04/22
Advanced18 min read

Enterprise Local AI Deployment — Air-Gapped, On-Premise, and Compliant

Deploy local AI for enterprise use. Covers air-gapped setups, on-premise GPU servers, compliance, and multi-user configurations powered by Open WebUI.

Enterprises are adopting local AI for three reasons: data sovereignty, regulatory compliance, and cost control. When you process customer data, proprietary code, or regulated content through cloud APIs, you introduce risk. Local AI eliminates that risk by keeping everything on infrastructure you control.

This guide covers enterprise-grade local AI deployment — from air-gapped environments to multi-user Open WebUI setups with role-based access control.

Enterprise Use Cases for Local AI

Use CaseWhy LocalCompliance Driver
Legal document analysisAttorney-client privilegeAttorney-client privilege rules
Healthcare record processingProtected health informationHIPAA
Financial data analysisSensitive financial recordsSOX, GDPR
Code assistant for proprietary codeTrade secrets, source codeTrade secret law, IP protection
Internal knowledge baseConfidential business dataNDA, internal policies
Customer support automationCustomer PII in queriesGDPR, CCPA

Architecture Overview

An enterprise local AI deployment has four layers:

[Users] → [Reverse Proxy / Auth] → [Open WebUI] → [Ollama] → [GPU Server]
  • GPU Server — runs the inference workload (Ollama + models)
  • Open WebUI — provides the web interface and user management
  • Reverse Proxy — handles TLS, authentication, and access logging
  • Users — access the system through a browser

Air-Gapped Deployment

Air-gapped deployments have zero internet connectivity. This is required for classified environments, certain government workloads, and organizations with strict data isolation policies.

Step 1: Prepare on a Connected Machine

Download everything you need on a machine with internet access:

# Download Ollama
curl -fsSL https://ollama.com/install.sh -o ollama-install.sh

# Download Docker images
docker pull ghcr.io/open-webui/open-webui:main
docker save ghcr.io/open-webui/open-webui:main -o open-webui.tar

# Download Ollama images (requires a Docker image of Ollama)
docker pull ollama/ollama:latest
docker save ollama/ollama:latest -o ollama.tar

# Pull models
ollama pull llama3.1:70b
ollama pull qwen2.5-coder:32b
ollama pull nomic-embed-text

# Export models for transfer
# Models are stored in ~/.ollama by default
tar -czf ollama-models.tar.gz ~/.ollama/models

Step 2: Transfer to Air-Gapped Network

Use approved transfer media (USB, encrypted drive, secure file transfer):

# Copy these files to transfer media:
# - ollama-install.sh
# - open-webui.tar
# - ollama.tar
# - ollama-models.tar.gz

Step 3: Install on Air-Gapped Machine

# Install Ollama
chmod +x ollama-install.sh
./ollama-install.sh

# Load Docker images
docker load -i ollama.tar
docker load -i open-webui.tar

# Restore models
tar -xzf ollama-models.tar.gz -C ~/

# Verify
ollama list

Step 4: Deploy the Stack

# Create docker-compose.yml
cat > docker-compose.yml << 'EOF'
version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "443:8080"
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
      - ENABLE_SIGNUP=false
      - WEBUI_SECRET_KEY=<generate-a-strong-secret>
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:
EOF

docker compose up -d

The system is now running with no internet dependency. Users access https://localhost:443 on the local network.

On-Premise GPU Server Hardware

Choosing the right hardware determines what models you can run and how many users you can support.

GPU Recommendations

ConfigurationGPUVRAMMax Model SizeConcurrent UsersEst. Cost
Entry1x RTX 409024 GB32B Q45-10$2,000-3,000
Mid-range2x RTX 409048 GB70B Q410-20$5,000-7,000
Enterprise2x A100 80GB160 GB70B+ Q8 / multiple models20-50$25,000-35,000
High-end4x A100 80GB320 GBMultiple large models50-100$60,000-80,000

System Requirements

ComponentMinimumRecommended
CPU16-core modern x8632+ core (EPYC or Xeon)
RAM64 GB DDR5128-256 GB DDR5 ECC
Storage1 TB NVMe SSD2-4 TB NVMe SSD (models are large)
Network1 GbE10 GbE (for multi-server setups)
Power850W1600W+ (per GPU server)

Cooling and Environment

  • Dedicated server room with temperature control (GPU servers generate significant heat)
  • UPS battery backup to prevent data corruption during power events
  • Cable management for maintenance access
  • Noise isolation if the server is near workspaces (GPU fans are loud under load)

Open WebUI Multi-User Configuration

Open WebUI supports multi-user setups with authentication and basic role-based access control.

Enable Authentication

# In docker-compose.yml environment section:
environment:
  - WEBUI_AUTH=true
  - ENABLE_SIGNUP=false           # Disable public signup
  - WEBUI_SECRET_KEY=<your-secret> # Strong random secret
  - DATA_EXPORT_ENABLED=true       # Allow data export for compliance

User Roles

Open WebUI provides three roles:

RoleCapabilities
AdminFull control: manage users, configure models, set system settings, view all chats
UserUse models, create chats, upload documents, manage own data
PendingRegistered but awaiting admin approval

Admin Setup Workflow

  1. First user is automatically admin — this is your IT administrator account
  2. Disable public signup after creating the admin account (ENABLE_SIGNUP=false)
  3. Create user accounts manually through the admin panel
  4. Or enable LDAP/SSO for enterprise environments (see below)

LDAP Integration

For Active Directory or LDAP environments, configure Open WebUI to authenticate against your existing directory:

environment:
  - ENABLE_LDAP=true
  - LDAP_SERVER_URL=ldap://your-ad-server:389
  - LDAP_BIND_DN=CN=service-account,OU=ServiceAccounts,DC=company,DC=com
  - LDAP_BIND_PASSWORD=<bind-password>
  - LDAP_SEARCH_BASE=OU=Employees,DC=company,DC=com
  - LDAP_SEARCH_FILTER=(sAMAccountName={username})
  - LDAP_USE_SSL=true

Users authenticate with their existing corporate credentials. No separate passwords to manage.

RBAC and Access Control

Model-Level Access

Restrict which models different user groups can access:

  1. Go to Admin Settings → Models
  2. For each model, set the Access Control list
  3. Assign models to user groups (e.g., "Legal team gets Llama 3.1 70B; Engineering gets Qwen Coder 32B")

This prevents unauthorized users from accessing expensive models and controls GPU costs.

Document Workspace Isolation

Open WebUI stores documents per user by default. For team workspaces:

  1. Create shared workspaces through the admin panel
  2. Assign users to workspaces based on their department
  3. Each workspace maintains its own vector database and document index

This ensures the legal team's documents are separate from engineering's knowledge base.

Compliance Considerations

GDPR Compliance

RequirementImplementation
Data minimizationOnly upload documents needed for the task
Right to erasureOpen WebUI allows admin to delete user data and chat history
Data portabilityEnable DATA_EXPORT_ENABLED=true for user data exports
Processing recordsEnable access logging (see Monitoring section)
Lawful basisInternal business operations typically fall under legitimate interest

HIPAA Considerations

For organizations handling protected health information:

  1. Encryption at rest — store Open WebUI data volumes on encrypted storage
  2. Encryption in transit — use TLS (configure in your reverse proxy)
  3. Access logging — log all queries and responses for audit trails
  4. BAA (Business Associate Agreement) — since you're running the software yourself, you are the processor. No BAA is needed with external AI providers because there are none.
  5. Access controls — enforce role-based access so only authorized personnel query health data
  6. Audit trails — retain logs for the required period (typically 6 years)

This is one of the strongest arguments for local AI in healthcare: no data flows to third-party AI providers, eliminating an entire category of HIPAA risk.

SOC 2 Alignment

ControlHow Local AI Helps
CC6.1 (Logical access)All access through authenticated Open WebUI
CC6.2 (Access removal)Admin can deactivate users immediately
CC6.3 (Encryption)TLS in transit, encrypted volumes at rest
CC7.1 (Detection)Access logs capture all queries
CC7.2 (Monitoring)Prometheus + Grafana dashboards (see below)

Monitoring and Logging

Access Logging

Configure Open WebUI to log all user interactions:

environment:
  - ENABLE_AUDIT_LOGGING=true
  - LOG_LEVEL=INFO

Logs include:

  • User authentication events
  • Model access per user
  • Query timestamps
  • Document uploads and deletions

Forward logs to your SIEM (Splunk, Elastic, or similar) for centralized monitoring.

Performance Monitoring with Prometheus

# Add to docker-compose.yml
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana
    ports:
      - "3001:3000"
    volumes:
      - grafana:/var/lib/grafana

Track these metrics:

MetricWhat It Tells YouAlert Threshold
GPU utilizationAre GPUs being used efficiently?< 20% (over-provisioned) or > 95% (bottleneck)
GPU memory usageAre models fitting in VRAM?> 90% (risk of OOM)
Request latencyHow fast are responses?> 30s for chat, > 5s for autocomplete
Request queue depthAre users waiting for GPU time?> 10 queued requests
Error rateAre requests failing?> 1% error rate

Ollama Health Check

# Check Ollama status
curl http://localhost:11434/api/tags

# Monitor running models
curl http://localhost:11434/api/ps

# Simple health check script for cron
#!/bin/bash
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
  echo "Ollama is down!" | mail -s "Local AI Alert" admin@company.com
fi

Scaling Strategies

Vertical Scaling (More GPU)

The simplest approach: upgrade your GPU server.

# With 4x A100 80GB, you can run:
ollama run llama3.1:70b        # Uses ~40GB VRAM
# Plus simultaneously:
ollama run qwen2.5-coder:32b   # Uses ~20GB VRAM
# Leaving ~260GB for additional models or batch processing

Horizontal Scaling (Multiple Servers)

For 50+ concurrent users, distribute the load:

                    [Nginx / HAProxy]
                   /        |        \
          [Ollama-1]   [Ollama-2]   [Ollama-3]
          (70B model)  (Coder 32B)  (General 8B)

Configure Open WebUI to point to multiple Ollama instances:

  1. Deploy 2-3 Ollama servers, each running different models
  2. Configure Open WebUI with multiple Ollama endpoints
  3. Users select the appropriate model for their task
  4. Load balancing happens at the model-selection level

Cloud Burst for Peak Loads

For organizations that want on-premise baseline with cloud burst capability:

  1. Run your primary infrastructure on-premise
  2. Configure a secondary Ollama instance on Runpod
  3. During peak usage, route overflow requests to the cloud instance
  4. Cloud instances are destroyed after use — no persistent data in the cloud

This hybrid approach keeps sensitive data on-premise by default while providing elasticity.

Security Checklist

Before going live, verify:

  • TLS configured on the reverse proxy (Nginx/Caddy)
  • Public signup disabled in Open WebUI
  • Strong admin password set
  • LDAP/SSO integrated if available
  • Data volumes encrypted at rest
  • Access logging enabled
  • Firewall rules restrict access to authorized IP ranges
  • Ollama API not exposed to the network (only accessible to Open WebUI)
  • Regular backup schedule for Open WebUI data and configurations
  • Incident response plan for AI-generated content issues

Backup and Recovery

# Backup Open WebUI data
docker cp open-webui:/app/backend/data ./backup-$(date +%Y%m%d)

# Backup Ollama models (if you want to avoid re-downloading)
tar -czf ollama-backup-$(date +%Y%m%d).tar.gz ~/.ollama/models

# Restore from backup
docker cp ./backup-20260422 open-webui:/app/backend/data
docker restart open-webui

Schedule daily backups with cron. Store backups on encrypted, off-site storage for disaster recovery.

Related Guides

  • How to Deploy Ollama on Runpod
  • Open WebUI vs AnythingLLM
  • Private AI Setup Guide
  • Best GPU Cloud for LLM
  • Local AI vs Cloud AI Cost Comparison
Need enterprise-grade GPU infrastructure? Try Runpod's bare metal servers.
Get started with Runpod for cloud GPU computing. No hardware upgrades needed — run any AI model on powerful remote GPUs.
Get Started with Runpod

Partner link. We may earn a commission at no extra cost to you.

All Posts

Author

avatar for Local AI Hub
Local AI Hub

Categories

  • Cloud Deploy
  • Tutorials
Enterprise Use Cases for Local AIArchitecture OverviewAir-Gapped DeploymentStep 1: Prepare on a Connected MachineStep 2: Transfer to Air-Gapped NetworkStep 3: Install on Air-Gapped MachineStep 4: Deploy the StackOn-Premise GPU Server HardwareGPU RecommendationsSystem RequirementsCooling and EnvironmentOpen WebUI Multi-User ConfigurationEnable AuthenticationUser RolesAdmin Setup WorkflowLDAP IntegrationRBAC and Access ControlModel-Level AccessDocument Workspace IsolationCompliance ConsiderationsGDPR ComplianceHIPAA ConsiderationsSOC 2 AlignmentMonitoring and LoggingAccess LoggingPerformance Monitoring with PrometheusOllama Health CheckScaling StrategiesVertical Scaling (More GPU)Horizontal Scaling (Multiple Servers)Cloud Burst for Peak LoadsSecurity ChecklistBackup and RecoveryRelated Guides

More Posts

Best AI Models for Coding, Chat, and RAG — Task-Specific Guide
Lists & GuidesModels & Hardware

Best AI Models for Coding, Chat, and RAG — Task-Specific Guide

Guide

Different AI tasks need different models. Find the best model for coding, conversational chat, and document-based RAG based on your hardware and needs.

avatar for Local AI Hub
Local AI Hub
2026/04/18
Best AI Models for 8GB RAM — What Can You Run Locally?
Lists & GuidesModels & Hardware

Best AI Models for 8GB RAM — What Can You Run Locally?

Guide

A complete guide to the best LLMs you can run on a computer with 8GB of RAM. Includes benchmarks, practical recommendations, and setup commands for each model.

avatar for Local AI Hub
Local AI Hub
2026/04/10
Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)
Lists & GuidesModels & Hardware

Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)

Guide

Yes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

avatar for Local AI Hub
Local AI Hub
2026/04/14
Local AI Hub

Run AI locally — fast, cheap, and private

Resources
  • Compare Tools
  • Tutorials
  • Cloud Deploy
  • Device Check
  • Blog
Company
  • About
  • Contact
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 Local AI Hub. All Rights Reserved.