How to Run Qwen Locally — Alibaba's Powerful Multilingual Model
Run Qwen 2.5 models on your own computer — one of the best open models for coding, multilingual tasks, and general use. Works on devices with 8GB RAM or more.
Qwen 2.5 is Alibaba's family of large language models. They're particularly strong at coding, multilingual tasks (especially Chinese), and general reasoning — making them some of the best open models available for local use.
Why Qwen?
- Exceptional at coding — often outperforms Llama on programming tasks
- Multilingual — strong Chinese and English support, plus 20+ other languages
- Multiple sizes — from 0.5B to 72B parameters
- Open weights — free to download and run locally
Available Qwen 2.5 Models
| Model | Size (Q4) | Min RAM | Best For |
|---|---|---|---|
| Qwen 2.5 0.5B | ~0.4 GB | 4 GB | Ultra-light tasks |
| Qwen 2.5 1.5B | ~1.0 GB | 4 GB | Basic tasks |
| Qwen 2.5 3B | ~2.0 GB | 4 GB | Good balance, low RAM |
| Qwen 2.5 7B | ~4.7 GB | 8 GB | Coding, multilingual |
| Qwen 2.5 14B | ~9.0 GB | 16 GB | High quality, all tasks |
| Qwen 2.5 32B | ~20 GB | 32 GB | Professional use |
| Qwen 2.5 72B | ~42 GB | 64 GB | Maximum quality |
Recommendation: Qwen 2.5 7B is the best starting point for 8GB RAM devices.
Quick Start with Ollama
# Install Ollama if you haven't
curl -fsSL https://ollama.com/install.sh | sh
# Run Qwen 2.5 7B
ollama run qwen2.5:7b
# Or try other sizes
ollama run qwen2.5:3b # For 4GB RAM
ollama run qwen2.5:14b # For 16GB RAMQuick Start with LM Studio
- Download LM Studio
- Search for "qwen 2.5" in the model browser
- Download the 7B Q4_K_M version
- Start chatting
What Qwen Excels At
Coding
Qwen 2.5 is one of the best open models for programming:
>>> Write a Python FastAPI endpoint that accepts a file upload,
validates it's a PDF under 10MB, and stores it in S3.
```python
from fastapi import FastAPI, UploadFile, HTTPException
import boto3
from botocore.exceptions import ClientError
app = FastAPI()
s3 = boto3.client("s3")
MAX_SIZE = 10 * 1024 * 1024 # 10MB
@app.post("/upload")
async def upload_pdf(file: UploadFile):
if not file.filename.endswith(".pdf"):
raise HTTPException(400, "Only PDF files are accepted")
content = await file.read()
if len(content) > MAX_SIZE:
raise HTTPException(400, "File must be under 10MB")
try:
s3.put_object(
Bucket="my-uploads",
Key=file.filename,
Body=content,
ContentType="application/pdf",
)
except ClientError:
raise HTTPException(500, "Upload failed")
return {"filename": file.filename, "size": len(content)}
### Multilingual
Qwen handles multiple languages well:Translate to Chinese: "Running AI locally protects your privacy."
在本地运行 AI 可以保护您的隐私。
### General ReasoningExplain the difference between SQL JOIN types
INNER JOIN: Returns only rows that match in both tables. LEFT JOIN: Returns all rows from the left table, plus matching rows from the right. RIGHT JOIN: Returns all rows from the right table, plus matching rows from the left. FULL OUTER JOIN: Returns all rows from both tables, matching where possible. CROSS JOIN: Returns every combination of rows from both tables.
## Performance Benchmarks (Q4_K_M)
On an M2 MacBook Pro (16GB):
| Model | Tokens/sec | First token | RAM usage |
|-------|-----------|-------------|-----------|
| Qwen 2.5 3B | ~45 | ~0.3s | ~3 GB |
| Qwen 2.5 7B | ~28 | ~0.5s | ~6 GB |
| Qwen 2.5 14B | ~14 | ~1.0s | ~11 GB |
On an RTX 4090:
| Model | Tokens/sec | First token | VRAM usage |
|-------|-----------|-------------|-----------|
| Qwen 2.5 7B | ~90 | ~0.2s | ~5 GB |
| Qwen 2.5 14B | ~55 | ~0.3s | ~10 GB |
| Qwen 2.5 32B | ~25 | ~0.6s | ~20 GB |
## When to Use Qwen vs Other Models
| Need | Best Model |
|------|-----------|
| Coding tasks | Qwen 2.5 7B+ or DeepSeek R1 |
| Chinese language | Qwen 2.5 (best choice) |
| General chat | Llama 3.1 or Qwen 2.5 |
| Reasoning/math | DeepSeek R1 |
| Fast responses | Llama 3.2 3B |
| Low RAM (4GB) | Qwen 2.5 3B |
## Summary
Qwen 2.5 is an excellent choice for local AI, especially if you work with code or multiple languages. The 7B model fits on 8GB RAM devices and delivers strong performance for daily tasks.
## Next Steps
- [Best Models for 8GB RAM](/blog/models-for-8gb-ram) — compare Qwen with other models
- [How to Run Llama Locally](/blog/how-to-run-llama-locally) — compare with Meta's Llama
- [How to Run DeepSeek Locally](/blog/how-to-run-deepseek-locally) — another top modelMore Posts
Can 16GB RAM Run LLMs? (And Can Your Mac Run Them?)
GuideYes, 16GB RAM is excellent for local AI. This guide covers what models run on 16GB, why Apple Silicon Macs are ideal, and how to get the best performance.

How to Run DeepSeek Locally — The Best Open Reasoning Model
TutorialRun DeepSeek R1 on your own computer. Known for chain-of-thought reasoning, math, and coding — it is one of the most capable open-source models available today.

Run LLM on DigitalOcean — GPU Droplet Setup Guide
TutorialStep-by-step guide to running large language models on DigitalOcean GPU Droplets. Set up Ollama, deploy your first model, and keep cloud costs under control.
