AI Model Cheat Sheet
Every major model's price, context window, and sweet spot — on one page you can bookmark. Last verified July 2026 · printable (Ctrl+P) · updated monthly alongside our price tracker.
| Model | Provider | Input $/1M | Output $/1M | Context | Max output | Best for |
|---|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | 64K | Bulk classification, tagging, cheapest possible calls | |
| GPT-5.4 Nano | OpenAI | $0.20 | $1.25 | 400K* | 128K* | High-volume simple tasks in the OpenAI stack |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | 64K | Budget chat & summarization with big context | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | 64K | Fast production workloads needing Claude quality |
| GPT-5.6 Luna | OpenAI | $1.00 | $6.00 | 400K* | 128K* | Value tier of OpenAI's newest family |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M | 65K | Speed/intelligence balance, agentic browsing | |
| Gemini 3.1 Pro | $2.00 | $12.00 | 500K | 64K | Strong reasoning at mid-tier price | |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 400K* | 128K* | Mainstream OpenAI production default |
| GPT-5.6 Terra | OpenAI | $2.50 | $15.00 | 400K* | 128K* | Newest mainstream flagship |
| Claude Sonnet 5 | Anthropic | $3.00† | $15.00† | 1M | 128K | Coding & agents near-frontier quality at flagship price |
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 | 1M | 128K | Long-horizon agents, hard coding, knowledge work |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 400K* | 128K* | OpenAI's frontier reasoning |
| GPT-5.6 Sol | OpenAI | $5.00 | $30.00 | 400K* | 128K* | Top of the GPT-5.6 family |
| Claude Fable 5 | Anthropic | $10.00 | $50.00 | 1M | 128K | The hardest reasoning and longest autonomous runs |
Standard-tier list prices, July 2026. †Claude Sonnet 5 intro rate $2/$10 through Aug 31, 2026. *GPT-5.x figures per the GPT-5 family's published specs — confirm on OpenAI's model page for the exact variant. All listed models support vision input, prompt caching (~90% off repeated input), and batch (~50% off).
Quick picks — skip the analysis
Reading the spec sheet
- Context window = the model's working memory per request (your prompt + documents + history + its answer). 200K ≈ a 300-page book; 1M ≈ five of them.
- Max output = the longest single response. You pay output rates for it, so long outputs dominate cost in drafting/codegen workloads.
- Filling the whole context costs real money: 1M input tokens on a $3/1M model = $3 per request — caching is what makes big-context workflows affordable.
- Prices move ~monthly. This page and the price tracker are verified together — the date at the top is your freshness guarantee.
Frequently asked questions
Which single model should I default to?
For most production work in mid-2026: a flagship-tier model (Claude Sonnet 5, GPT-5.6 Terra, or Gemini 3.1 Pro) as the default, a budget model for easy traffic, and a frontier model only where quality visibly pays for itself.
Are benchmark scores on this page?
Deliberately not — leaderboard positions shuffle monthly and rarely predict your task. Price, context, and output limits are the stable facts; test your top 2–3 candidates on your own data.
How do I estimate my cost with these numbers?
Tokens ÷ 1,000,000 × price. Or skip the arithmetic: the AI cost calculator and token calculator do it live.
Can I print or save this?
Yes — Ctrl+P gives a clean printable version (navigation is stripped automatically). Bookmark the page for the monthly-updated version.