AI & Developer Tools

AI Model Cheat Sheet

Every major model's price, context window, and sweet spot — on one page you can bookmark. Last verified July 2026 · printable (Ctrl+P) · updated monthly alongside our price tracker.

Model	Provider	Input $/1M	Output $/1M	Context	Max output	Best for
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M	64K	Bulk classification, tagging, cheapest possible calls
GPT-5.4 Nano	OpenAI	$0.20	$1.25	400K*	128K*	High-volume simple tasks in the OpenAI stack
Gemini 2.5 Flash	Google	$0.30	$2.50	1M	64K	Budget chat & summarization with big context
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K	64K	Fast production workloads needing Claude quality
GPT-5.6 Luna	OpenAI	$1.00	$6.00	400K*	128K*	Value tier of OpenAI's newest family
Gemini 3.5 Flash	Google	$1.50	$9.00	1M	65K	Speed/intelligence balance, agentic browsing
Gemini 3.1 Pro	Google	$2.00	$12.00	500K	64K	Strong reasoning at mid-tier price
GPT-5.4	OpenAI	$2.50	$15.00	400K*	128K*	Mainstream OpenAI production default
GPT-5.6 Terra	OpenAI	$2.50	$15.00	400K*	128K*	Newest mainstream flagship
Claude Sonnet 5	Anthropic	$3.00†	$15.00†	1M	128K	Coding & agents near-frontier quality at flagship price
Claude Opus 4.8	Anthropic	$5.00	$25.00	1M	128K	Long-horizon agents, hard coding, knowledge work
GPT-5.5	OpenAI	$5.00	$30.00	400K*	128K*	OpenAI's frontier reasoning
GPT-5.6 Sol	OpenAI	$5.00	$30.00	400K*	128K*	Top of the GPT-5.6 family
Claude Fable 5	Anthropic	$10.00	$50.00	1M	128K	The hardest reasoning and longest autonomous runs

Standard-tier list prices, July 2026. †Claude Sonnet 5 intro rate $2/$10 through Aug 31, 2026. *GPT-5.x figures per the GPT-5 family's published specs — confirm on OpenAI's model page for the exact variant. All listed models support vision input, prompt caching (~90% off repeated input), and batch (~50% off).

Quick picks — skip the analysis

Cheapest that's still good: Gemini 2.5 Flash-Lite ($0.10/$0.40) for classification and extraction; GPT-5.4 Nano if you're already on OpenAI.

Best value flagship: Claude Sonnet 5 — near-frontier coding/agents at $3/$15 (and $2/$10 until Aug 31, 2026).

Biggest context window: Claude's 1M-token models (Fable 5, Opus 4.8, Sonnet 5) and Gemini's 1M Flash tier — roughly 1,500 pages of text in one request.

Longest single output: 128K-output models (Claude 1M-tier, GPT-5.x) — a whole report or codebase-sized diff in one call.

Hardest problems, cost no object: Claude Fable 5 ($10/$50) or GPT-5.6 Sol ($5/$30).

Chatbot at scale: route 80% of traffic to Haiku 4.5 / Gemini Flash, escalate the rest — model it in the chatbot cost simulator.

Reading the spec sheet

Context window = the model's working memory per request (your prompt + documents + history + its answer). 200K ≈ a 300-page book; 1M ≈ five of them.
Max output = the longest single response. You pay output rates for it, so long outputs dominate cost in drafting/codegen workloads.
Filling the whole context costs real money: 1M input tokens on a $3/1M model = $3 per request — caching is what makes big-context workflows affordable.
Prices move ~monthly. This page and the price tracker are verified together — the date at the top is your freshness guarantee.

Frequently asked questions

Which single model should I default to?

For most production work in mid-2026: a flagship-tier model (Claude Sonnet 5, GPT-5.6 Terra, or Gemini 3.1 Pro) as the default, a budget model for easy traffic, and a frontier model only where quality visibly pays for itself.

Are benchmark scores on this page?

Deliberately not — leaderboard positions shuffle monthly and rarely predict your task. Price, context, and output limits are the stable facts; test your top 2–3 candidates on your own data.

How do I estimate my cost with these numbers?

Tokens ÷ 1,000,000 × price. Or skip the arithmetic: the AI cost calculator and token calculator do it live.

Can I print or save this?

Yes — Ctrl+P gives a clean printable version (navigation is stripped automatically). Bookmark the page for the monthly-updated version.