Output tokens cost 3-8x more than input. Here's how to optimize your AI spend.
AI model pricing is confusing by design. Every provider uses different units, different tiers, and different surcharges. Input tokens, output tokens, cache reads, cache writes, reasoning tokens, image tokens — the bill can surprise you. We break down exactly what you're paying for and how to minimize costs without sacrificing quality.
Every AI API charges per token — roughly 0.75 words per token in English. You're charged separately for input tokens (your prompt) and output tokens (the model's response). The critical insight: output tokens are ALWAYS more expensive than input tokens, typically by a factor of 3-8x.
This means the biggest lever for cost optimization is controlling output length. A model that generates 500 tokens of focused output costs dramatically less than one that generates 2,000 tokens of verbose output, even if the input prompt is identical.
Some providers also charge for 'reasoning tokens' — intermediate thinking steps that reasoning models generate but don't show you. These can be substantial: a complex math problem might generate 5,000 reasoning tokens before producing a 100-token answer.
At the bottom: Gemma 3n at $0.02/$0.07 per million tokens. This Google model costs pennies for light workloads. Intelligence score: 6.4, so it's only suitable for simple classification and extraction tasks.
Mid-range: GPT-5.4 Nano at $0.20/$1.25 is the cheapest model from OpenAI with meaningful intelligence (44.4). DeepSeek V3.2 at $0.28/$0.42 is even cheaper for output and competitive on benchmarks.
Frontier: Gemini 3.1 Pro at $2/$12 and GPT-5.4 at $2.50/$15 are the cheapest top-tier models. Claude Opus 4.6 at $5/$25 is the most expensive frontier model.
The 12.5x gap between Gemini 3.1 Pro's input price and Claude Opus 4.6 isn't justified by a 4-point intelligence difference (57.2 vs 53.0) for most tasks.
If your prompts share a common system prompt or context prefix, prompt caching can save 60-90%. Instead of processing the same 10,000-token system prompt every request, the provider caches it after the first call.
Anthropic's cache read costs $0.50/1M (vs $5 standard input for Opus 4.6) — a 90% discount. OpenAI's cache read is similarly discounted. Google's is the cheapest at $0.20/1M for Gemini.
For applications like customer support bots or code assistants that use the same system prompt for every request, caching is the single biggest cost optimization available.
A useful metric: how much does each point of intelligence cost?
GLM-5: $1/1M input for 49.8 intelligence = $0.020 per intelligence point. DeepSeek V3.2: $0.28/1M for 41.7 = $0.007 per point. GPT-5.4: $2.50/1M for 57.2 = $0.044 per point. Claude Opus 4.6: $5/1M for 53.0 = $0.094 per point.
DeepSeek offers the best intelligence-per-dollar at the frontier level. GLM-5 is the best in the near-frontier tier. Claude Opus 4.6 is the least cost-efficient by this metric — you're paying a premium for its agentic capabilities, not raw benchmark performance.
The most effective strategies, in order of impact:
1. Use the right model for the task. Don't send simple classification tasks to GPT-5.4 when GPT-5.4 Nano handles them fine. A tiered routing system that sends easy tasks to cheap models and hard tasks to expensive ones can cut costs 60-80%.
2. Control output length. Set max_tokens appropriately. Use system prompts that instruct concise responses. Every unnecessary paragraph costs real money at scale.
3. Enable prompt caching. If you use a system prompt longer than 1,000 tokens, caching pays for itself immediately.
4. Batch requests. Most providers offer batch APIs at 50% discount for non-real-time workloads.
5. Monitor and alert. Set up spending alerts. A bug that loops API calls can burn through budget fast.
All pricing data from Artificial Analysis, updated daily. Cost analysis assumes standard API tier pricing without volume discounts. Token estimates based on English text at approximately 0.75 words per token.
For frontier quality, Gemini 3.1 Pro ($2/$12) offers the best price-to-performance ratio. For budget work, DeepSeek V3.2 ($0.28/$0.42) is unbeatable. For cost optimization, prompt caching and model routing have the highest impact. The days of paying $15/1M for input tokens are over — if you're still on Claude Opus 4.5's old pricing, switch.
Published April 1, 2026. Data updated daily from independent benchmarks and API providers.