🆚

So sánh Model AI 2026

So sánh GPT-5, Claude 4.7, Gemini 2.5, Llama 4, DeepSeek — context window, giá input/output, modality (text/vision/audio), thinking mode. Bảng cập nhật 2026-05.

Tất cả công cụ

💰 Rẻ nhất

GPT-4o mini

$0.15 / $ 0.6 per 1M · OpenAI

📦 Context lớn nhất

Gemini 2.5 Pro

2,000K tokens · Google

Provider: Sắp xếp: Cập nhật giá: 2026-05. Kiểm tra trang chính thức trước khi cam kết ngân sách.

Model	Context	Output	Input $/1M	Output $/1M	Modality	Ghi chú
Claude Opus 4.7 Anthropic⚡ thinking	1M	64K	$15	$75	📝 👁	Top cho coding & reasoning phức tạp
Claude Sonnet 4.6 Anthropic⚡ thinking	1M	64K	$3	$15	📝 👁	Cân bằng giá/chất lượng tốt nhất
Claude Haiku 4.5 Anthropic	200K	8K	$0.8	$4	📝 👁	Nhanh, rẻ, dùng cho throughput cao
GPT-5 OpenAI⚡ thinking	400K	16K	$5	$20	📝 👁 🎙	Đa năng, có chế độ thinking
GPT-4o OpenAI	128K	16K	$2.5	$10	📝 👁 🎙	Multimodal native, response nhanh
GPT-4o mini OpenAI	128K	16K	$0.15	$0.6	📝 👁	Rẻ nhất nhóm OpenAI
o3 OpenAI⚡ thinking	200K	100K	$10	$40	📝 👁	Reasoning model, math/code mạnh
Gemini 2.5 Pro Google⚡ thinking	2M	64K	$1.25	$10	📝 👁 🎙 🎥	Context 2M lớn nhất hiện tại
Gemini 2.5 Flash Google	1M	64K	$0.3	$2.5	📝 👁 🎙 🎥	Multimodal đầy đủ + giá rẻ
Llama 3.3 70B Meta	128K	8K	$0.6	$0.8	📝	Open weights, self-host được
Llama 4 Maverick Meta	256K	8K	$0.27	$0.85	📝 👁	MoE, mới ra 2025
DeepSeek V3 DeepSeek	128K	8K	$0.27	$1.1	📝	Coding tốt, giá rất rẻ
DeepSeek R1 DeepSeek⚡ thinking	128K	32K	$0.55	$2.19	📝	Reasoning open weights
Grok 3 xAI	256K	8K	$3	$15	📝 👁	Realtime web access
Mistral Large 2 Mistral	128K	8K	$2	$6	📝	EU-hosted, GDPR friendly
Qwen 2.5 72B Alibaba	128K	8K	$0.4	$1.2	📝 👁	Đa ngôn ngữ tốt, mạnh tiếng Trung

📝 = text · 👁 = vision · 🎙 = audio · 🎥 = video

Cách chọn model

Bài toán đơn giản, throughput cao (phân loại, extract, chatbot FAQ): chọn model rẻ nhất — Haiku 4.5, GPT-4o mini, Gemini Flash, DeepSeek V3.
Coding & reasoning phức tạp: Claude Opus 4.7 hoặc o3 (có thinking mode).
Document dài (sách, codebase): Gemini 2.5 Pro (2M context) hoặc Claude (1M context).
Multimodal native (audio + video): Gemini 2.5 — duy nhất xử lý cả 4 modality.
Self-host / on-prem: Llama 4, DeepSeek (open weights, có thể chạy local).
Tuân thủ EU/GDPR: Mistral (host EU).

Output đắt hơn input bao nhiêu lần?

Tỷ lệ output/input thường 4-5×. Nghĩa là nếu prompt 1000 token + response 1000 token, chi phí response chiếm ~80%. Mẹo tiết kiệm: yêu cầu model trả lời ngắn gọn (respond in <100 words).