Local AI in VS Code — Continue.dev, Cline, and Twinny Setup Guide
Set up AI-powered coding in VS Code with local models. Complete guide to Continue.dev, Cline, and Twinny extensions running on Ollama — no API keys needed.
AI coding assistants don't have to send your code to remote servers. With local models running on Ollama and the right VS Code extension, you get autocomplete, chat, and code generation — all running on your own machine. No API keys, no usage limits, no code leaving your computer.
This guide covers the three best VS Code extensions for local AI coding: Continue.dev, Cline, and Twinny.
Prerequisites
Before setting up any extension, you need Ollama running with a coding-focused model.
Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from ollama.com for WindowsPull Coding Models
# Best all-around coding model (8GB RAM)
ollama pull deepseek-coder-v2:16b
# Great for autocomplete and chat (8GB RAM)
ollama pull qwen2.5-coder:7b
# Lightweight option for 8GB machines
ollama pull codellama:7b
# Powerful but needs 16GB+ RAM
ollama pull qwen2.5-coder:14bVerify everything is running:
ollama listYou should see your downloaded models listed. Ollama must stay running in the background while you use VS Code.
Extension 1: Continue.dev
Continue.dev is the most popular open-source AI coding assistant for VS Code. It supports tab autocomplete, inline edits, and a sidebar chat — all powered by local models.
Installation
- Open VS Code
- Go to Extensions (Cmd+Shift+X / Ctrl+Shift+X)
- Search for Continue and click Install
Configure for Local Models
After installing, Continue creates a config file. Open it:
- macOS:
~/.continue/config.json - Windows:
%USERPROFILE%\.continue\config.json
Replace the contents with this Ollama configuration:
{
"models": [
{
"title": "Qwen 2.5 Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
},
{
"title": "DeepSeek Coder V2",
"provider": "ollama",
"model": "deepseek-coder-v2:16b"
}
],
"tabAutocompleteModel": {
"title": "Qwen Coder Autocomplete",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
},
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text"
},
"allowAnonymousTelemetry": false
}This disables telemetry and connects Continue to your local Ollama instance.
Features
- Tab autocomplete — starts suggesting code as you type, powered by the model you set in
tabAutocompleteModel - Cmd+I inline edit — highlight code, press Cmd+I, describe the change in plain English
- Sidebar chat — click the Continue icon in the sidebar to ask questions about your codebase
- Codebase indexing — Continue indexes your project so it can answer questions about your full codebase
Recommended Setup
Use Qwen 2.5 Coder 7B for autocomplete (fast responses) and DeepSeek Coder V2 16B for chat and complex edits (better reasoning). If you have 16GB RAM, use Qwen 2.5 Coder 14B for everything.
Extension 2: Cline
Cline (formerly Claude Dev) is an autonomous coding agent. Unlike Continue's suggestion-based approach, Cline can create files, run terminal commands, and make multi-step edits on its own.
Installation
- Open VS Code Extensions
- Search for Cline and install it
Configure for Ollama
- Open the Cline sidebar in VS Code
- Click the settings icon
- Set the API Provider to Ollama
- Base URL:
http://localhost:11434(default Ollama address) - Select your model from the dropdown (e.g.,
qwen2.5-coder:7b)
No API key needed — Cline talks directly to Ollama on your machine.
How Cline Works
Cline operates as an agent with access to your workspace:
- You describe a task in plain English
- Cline reads your existing files to understand the codebase
- It plans the changes needed
- It creates or edits files, showing you each step
- You approve or reject each change
Example tasks Cline handles well:
- "Create a REST API endpoint for user authentication in Express.js"
- "Add unit tests for the payment processing module"
- "Refactor the database queries to use prepared statements"
- "Fix the TypeScript errors in src/utils/helpers.ts"
Tips for Best Results
- Be specific about what you want — the more context, the better
- Start with smaller tasks — Cline works best with focused requests
- Review each change before approving — local models are good but not perfect
- Use Qwen 2.5 Coder 14B if you have the RAM — larger models follow instructions better
Extension 3: Twinny
Twinny is a lightweight, privacy-focused AI coding extension built specifically for local models. It focuses on fast autocomplete and inline chat.
Installation
- Open VS Code Extensions
- Search for Twinny and install
Configure for Ollama
- Open VS Code Settings (Cmd+, / Ctrl+,)
- Search for "Twinny"
- Set the following:
- Chat model:
ollama:qwen2.5-coder:7b - FIM model (autocomplete):
ollama:qwen2.5-coder:7b - Ollama host:
http://localhost:11434
- Chat model:
Or use the Twinny settings UI in the sidebar after installation.
Features
- Fast autocomplete — optimized for fill-in-the-middle (FIM) completions
- Inline chat — highlight code and ask questions or request changes
- Sidebar chat — separate chat panel for general coding questions
- Low resource usage — designed to be lightweight alongside your development work
Twinny is the best option if you want fast, unobtrusive autocomplete without the overhead of a full agent.
Comparison Table
| Feature | Continue.dev | Cline | Twinny |
|---|---|---|---|
| Tab autocomplete | Yes | No | Yes |
| Inline edit | Yes (Cmd+I) | No | Yes |
| Sidebar chat | Yes | Yes | Yes |
| Agent mode | No | Yes | No |
| File creation | No | Yes | No |
| Terminal access | No | Yes | No |
| Codebase indexing | Yes | Yes | No |
| Setup complexity | Medium | Low | Low |
| Best for | All-around coding | Autonomous tasks | Fast autocomplete |
Which Should You Use?
Install all three. They serve different purposes and work together:
- Twinny for fast autocomplete as you type
- Continue.dev for inline edits and codebase-aware chat
- Cline for multi-step autonomous tasks like scaffolding new features
If you only want one extension, Continue.dev gives you the most complete experience with autocomplete, chat, and codebase indexing.
Recommended Models for Coding
| Model | RAM | Best For | Speed |
|---|---|---|---|
| Qwen 2.5 Coder 7B | 8 GB | Autocomplete, quick chat | Fast |
| CodeLlama 7B | 8 GB | General code generation | Fast |
| DeepSeek Coder V2 16B | 16 GB | Complex reasoning, refactoring | Medium |
| Qwen 2.5 Coder 14B | 16 GB | Best overall coding model | Medium |
Sweet spot: Qwen 2.5 Coder 7B for autocomplete + DeepSeek Coder V2 16B for chat gives you the best balance of speed and quality on a 16GB machine.
Troubleshooting
Autocomplete is slow: Use a smaller model (7B instead of 14B) for autocomplete. The tab completion model needs to respond in under 500ms to feel smooth.
Ollama not detected: Make sure Ollama is running (ollama serve in a terminal). Check that port 11434 is accessible.
Poor code suggestions: Try a coding-specific model. General models like Llama 3.1 are fine for chat but coding models like Qwen 2.5 Coder produce much better code completions.
Out of memory: Close other applications. Each model uses RAM proportional to its size — a 7B model needs roughly 5-6GB, a 14B model needs about 10-12GB.
Related Guides
Author

Categories
More Posts
How to Run Llama Locally — Step-by-Step Guide for 2026
TutorialRun Meta's Llama models on your own computer. Covers Llama 3.2 and 3.1, model size selection by RAM, and step-by-step setup with Ollama and LM Studio.

Run Ollama on Runpod — Persistent Cloud GPU Setup Guide
TutorialSet up Ollama as a persistent cloud AI service on Runpod. Keep your models between sessions, expose the API endpoint, and connect from any device you own.

Private AI Setup Guide — Run AI Completely Offline in 2026
TutorialA step-by-step guide to setting up a fully private, offline AI system. No data leaves your machine — covers model selection, tools, and privacy best practices.
