🆚
🤖 For AI users NEW
AI Model Comparison 2026
Compare GPT-5, Claude 4.7, Gemini 2.5, Llama 4, DeepSeek — context window, input/output pricing, modality (text/vision/audio), thinking mode. Updated 2026-05.
💰 Cheapest
GPT-4o mini
$0.15 / $ 0.6 per 1M · OpenAI
📦 Largest context
Gemini 2.5 Pro
2,000K tokens · Google
Prices updated 2026-05. Verify on provider pages before committing budget.
| Model | Context | Output | Input $/1M | Output $/1M | Modality |
|---|---|---|---|---|---|
| Claude Opus 4.7 Anthropic⚡ thinking | 1M | 64K | $15 | $75 | 📝 👁 |
| Claude Sonnet 4.6 Anthropic⚡ thinking | 1M | 64K | $3 | $15 | 📝 👁 |
| Claude Haiku 4.5 Anthropic | 200K | 8K | $0.8 | $4 | 📝 👁 |
| GPT-5 OpenAI⚡ thinking | 400K | 16K | $5 | $20 | 📝 👁 🎙 |
| GPT-4o OpenAI | 128K | 16K | $2.5 | $10 | 📝 👁 🎙 |
| GPT-4o mini OpenAI | 128K | 16K | $0.15 | $0.6 | 📝 👁 |
| o3 OpenAI⚡ thinking | 200K | 100K | $10 | $40 | 📝 👁 |
| Gemini 2.5 Pro Google⚡ thinking | 2M | 64K | $1.25 | $10 | 📝 👁 🎙 🎥 |
| Gemini 2.5 Flash Google | 1M | 64K | $0.3 | $2.5 | 📝 👁 🎙 🎥 |
| Llama 3.3 70B Meta | 128K | 8K | $0.6 | $0.8 | 📝 |
| Llama 4 Maverick Meta | 256K | 8K | $0.27 | $0.85 | 📝 👁 |
| DeepSeek V3 DeepSeek | 128K | 8K | $0.27 | $1.1 | 📝 |
| DeepSeek R1 DeepSeek⚡ thinking | 128K | 32K | $0.55 | $2.19 | 📝 |
| Grok 3 xAI | 256K | 8K | $3 | $15 | 📝 👁 |
| Mistral Large 2 Mistral | 128K | 8K | $2 | $6 | 📝 |
| Qwen 2.5 72B Alibaba | 128K | 8K | $0.4 | $1.2 | 📝 👁 |
📝 = text · 👁 = vision · 🎙 = audio · 🎥 = video
How to pick a model
- High-throughput simple tasks (classification, extraction, FAQ bots): pick the cheapest — Haiku 4.5, GPT-4o mini, Gemini Flash, DeepSeek V3.
- Complex coding & reasoning: Claude Opus 4.7 or o3 (thinking mode).
- Long documents (books, full codebases): Gemini 2.5 Pro (2M context) or Claude (1M).
- Native multimodal incl. audio + video: Gemini 2.5 — only one handling all four modalities.
- Self-host / on-prem: Llama 4, DeepSeek (open weights).
- EU/GDPR compliance: Mistral (EU-hosted).
Output vs input cost
Output is usually 4-5× the input price. For a 1k-input / 1k-output call, ~80% of the bill is the response. Tip: ask for terse replies (respond in <100 words).
Pricing notes
- Volume discounts (OpenAI Tier 4-5, Anthropic Enterprise).
- Prompt caching: 50-90% cheaper on repeated prefixes.
- Batch API: 50% off, up to 24h delay.
- Open-weights via DeepInfra / Together / Fireworks can be even cheaper.
Related
Related tools
See all tools →HOT
🎯
Token Counter
Accurate token count for ChatGPT, Claude, Gemini, Llama. Live input cost.
💰
API Cost Calculator
Estimate monthly/yearly LLM API spend. Compare which model is cheapest.
✨
Prompt Builder
Compose well-structured prompts. 6 templates for common tasks.
NEW📜
Markdown Preview
Render markdown live — paste ChatGPT/Claude output. GFM, tables, code blocks.