TokenMix Research Lab · 2026-04-29

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
Last Updated: 2026-04-30 Author: TokenMix Research Lab Data checked: 2026-04-30
Claude API pricing in 2026 splits cleanly across three tiers: Opus 4.7 at $5/$25 per million input/output tokens, Sonnet 4.6 at $3/$15, and Haiku 4.5 at $1/$5. The newer 4.x models cost the same or less than the deprecated Opus 4.1 and Sonnet 3.7 they replaced.
According to Anthropic's official pricing page, Opus 4.7, 4.6, and 4.5 share identical per-token rates. Sonnet 4.6 has held at $3/$15 since Sonnet 4. Haiku 4.5 doubled the input price of Haiku 3 ($0.25 → $1) but tripled output capability, and it now matches Sonnet 3 on benchmarks like SWE-bench Verified. Cache reads stay at 0.1x base input, batch processing keeps a flat 50% discount, and Opus 4.7's new tokenizer can use up to 35% more tokens for the same English text — a real cost factor that the headline rate card hides.
Table of Contents
- Quick Answer
- Confirmed Facts vs Common Misreads
- Full Claude Pricing Table
- How Did Claude Pricing Change From 2025 to 2026?
- Opus, Sonnet, Haiku: Which Tier Should You Pick?
- Cost Examples Across Real Workloads
- Claude vs GPT-5.5 vs Gemini 3.1 Pro
- How to Cut Claude API Costs by 60-90%
- Hidden Cost Factors Most Posts Miss
- When Should You Use Bedrock or Vertex Instead?
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Answer
| Question | Direct Answer |
|---|---|
| What does Claude Opus 4.7 cost per million tokens? | $5 input / $25 output (1M context window) |
| What does Claude Sonnet 4.6 cost? | $3 input / $15 output |
| What does Claude Haiku 4.5 cost? | $1 input / $5 output |
| Cheapest Claude model? | Haiku 3 at $0.25 / $1.25, but capabilities lag Haiku 4.5 |
| How much does prompt caching save? | 90% on cache reads (0.1x base input) |
| Batch API discount? | Flat 50% on input and output |
Confirmed Facts vs Common Misreads
| Claim | Status | Source |
|---|---|---|
| Opus 4.7 = $5/$25 per MTok | Confirmed | platform.claude.com pricing table |
| Opus 4.6 and Opus 4.7 share the same rate card | Confirmed | platform.claude.com pricing table |
| Opus 4.7 may use up to 35% more tokens than 4.6 for same text | Confirmed (caveat) | Anthropic docs note on new tokenizer |
| Sonnet 4.6 supports full 1M context at standard pricing | Confirmed | Long context pricing section |
| Cache hits cost 0.1x base input | Confirmed | Prompt caching multiplier table |
| 5-minute cache write costs 1.25x; 1-hour costs 2x | Confirmed | Anthropic prompt caching docs |
| Claude is always more expensive than OpenAI | False | GPT-5.5 standard is $5/$30; Opus 4.7 is $5/$25 (output cheaper) |
| Anthropic raised Claude prices in 2026 | False | Opus 4.7 launched at the same rate as Opus 4.6/4.5 |
| US-only data residency on Opus 4.7 adds 1.1x | Confirmed | Data residency pricing section |
Full Claude Pricing Table
Based on Anthropic's official pricing page (data checked 2026-04-30, all prices USD per 1M tokens):
| Model | Input | 5m Cache Write | 1h Cache Write | Cache Read | Output | Batch Input | Batch Output | Context |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.7 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 1M |
| Claude Opus 4.6 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 1M |
| Claude Opus 4.5 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 200K |
| Claude Opus 4.1 (deprecated) | $15 | $18.75 | $30 | $1.50 | $75 | $7.50 | $37.50 | 200K |
| Claude Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 1M |
| Claude Sonnet 4.5 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 200K |
| Claude Sonnet 4 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 200K |
| Claude Sonnet 3.7 (deprecated) | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 200K |
| Claude Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 | $0.50 | $2.50 | 200K |
| Claude Haiku 3.5 | $0.80 | $1 | $1.60 | $0.08 | $4 | $0.40 | $2 | 200K |
| Claude Haiku 3 | $0.25 | $0.30 | $0.50 | $0.03 | $1.25 | $0.125 | $0.625 | 200K |
Three things to take from this table. First, the 4.x generation collapsed Opus pricing from $15/$75 down to $5/$25 — that's a 67% input cut, 67% output cut versus Opus 4.1. Second, Sonnet has held steady at $3/$15 across four releases. Third, the Haiku tier has doubled in price between Haiku 3.5 and Haiku 4.5, but the capability jump is far larger than 25% — Anthropic is pulling Haiku upward into Sonnet 3 territory on coding benchmarks.
How Did Claude Pricing Change From 2025 to 2026?
Three structural shifts hit between mid-2025 and April 2026:
| Change | When | Impact |
|---|---|---|
| Opus 4.1 → Opus 4.5/4.6/4.7 rate cut | Late 2025 | Input/output rate dropped 3x ($15/$75 → $5/$25) |
| 1M context window opened at standard pricing | Sonnet 4.6, Opus 4.6, Opus 4.7 | No 2x premium beyond 200k like some competitors |
| Cache TTL options expanded | 2025-09 | 1-hour cache (2x write) added alongside 5-min (1.25x write) |
The deprecated tier matters for budgeting. According to Anthropic's deprecation policy, Sonnet 3.7 and Opus 3 are scheduled for retirement, which means workloads still pinned to those versions face forced migrations — and the migration target (Sonnet 4.x or Opus 4.x) costs the same or less, which is unusual for an SaaS API in this space.
Opus, Sonnet, Haiku: Which Tier Should You Pick?
Tier choice should follow the workload, not the brand. Here is the decision matrix we use internally at TokenMix.ai:
| Workload type | Recommended tier | Why |
|---|---|---|
| Production code generation, agentic coding | Sonnet 4.6 | Hits ~75% on SWE-bench Verified at 1/5 the Opus 4.7 price |
| Hard reasoning, multi-step research, novel domain | Opus 4.7 | Highest sustained accuracy but 5x the Sonnet output rate |
| Customer support classification, intent detection | Haiku 4.5 | $1/$5 with strong instruction-following |
| Bulk summarization, doc cleanup | Haiku 4.5 + Batch | Drops to $0.50/$2.50 with Batch API |
| Long-context ingestion (>200k) | Sonnet 4.6 | Only 4.x model with 1M at standard rates besides Opus 4.6/4.7 |
| Stateful agents with cacheable system prompts | Sonnet 4.6 + cache | Cache reads at $0.30/MTok make agent loops cheap |
A common mistake is reaching for Opus 4.7 by default. According to DataCamp's Opus 4.7 vs GPT-5.4 benchmark, Opus 4.7 leads Sonnet 4.6 by 2-4 percentage points on most reasoning benchmarks, but the price ratio is 5x. For 90% of production workloads, Sonnet 4.6 is the right call and Opus is reserved for the hard 10% where accuracy gains pay back the spend.
Cost Examples Across Real Workloads
Customer support chatbot — 10,000 tickets/month
Average ticket = 3,700 tokens (per Anthropic's customer support guide). Assume 80/20 input/output split.
| Model | Monthly cost | Per ticket |
|---|---|---|
| Haiku 4.5 | $37.00 | $0.0037 |
| Haiku 4.5 + 5-min cache (60% hit rate) | $19.20 | $0.0019 |
| Haiku 4.5 + Batch | $18.50 | $0.0019 |
| Sonnet 4.6 | $111.00 | $0.0111 |
| Opus 4.7 | $185.00 | $0.0185 |
Inferred: most support classification fits Haiku 4.5; the 90% input savings from cache compound when system prompts are reused across tickets.
Coding agent — 100,000 calls/month with reused tool schemas
Assume 8k input tokens (mostly tool schemas, repo context) and 1k output tokens per call. Tool schemas are a perfect prompt-caching candidate.
| Strategy | Sonnet 4.6 monthly | Opus 4.7 monthly |
|---|---|---|
| No cache | $3,900 | $6,500 |
| 5-min cache (75% hit rate) | $1,275 | $2,125 |
| 5-min cache + Batch (where applicable) | ~$650 | ~$1,065 |
ProjectDiscovery's published case study shows that moving dynamic content out of the cacheable prefix raised their hit rate from 7% to 74% in a single deployment. This is the single largest lever available.
Document summarization — 1M documents/month
Average document = 5k input tokens, output = 500 tokens. Latency-insensitive, perfect for Batch API.
| Model | Standard cost | Batch cost (50% off) |
|---|---|---|
| Haiku 4.5 | $7,500 | $3,750 |
| Sonnet 4.6 | $22,500 | $11,250 |
| Opus 4.7 | $37,500 | $18,750 |
For non-time-sensitive document workloads, Haiku 4.5 + Batch puts cost-per-document at ~$0.0038. That's the floor before you start considering self-hosted inference.
Code review pipeline — 5,000 PRs/month with 80k average context
Long-context PR review is where the 1M context window earns its keep.
| Model | Per-PR cost | Monthly cost |
|---|---|---|
| Sonnet 4.6 (no cache) | $0.27 | $1,350 |
| Sonnet 4.6 + 1-hour cache (repo context cached) | $0.094 | $470 |
| Opus 4.7 + 1-hour cache | $0.156 | $780 |
The 1-hour cache (2x write multiplier) breaks even after just two reads — a bar most code review pipelines clear easily.
Claude vs GPT-5.5 vs Gemini 3.1 Pro
Same workload (10M output tokens/month, agent-heavy):
| Provider | Model | Input $/MTok | Output $/MTok | Monthly cost (10M output) |
|---|---|---|---|---|
| Anthropic | Opus 4.7 | $5 | $25 | $250 |
| OpenAI | GPT-5.5 standard | $5 | $30 | $300 |
| OpenAI | GPT-5.5 pro tier | $30 | $180 | $1,800 |
| Anthropic | Sonnet 4.6 | $3 | $15 | $150 |
| Gemini 3.1 Pro | varies | varies | see Gemini API pricing |
Two caveats. First, llm-stats.com benchmark reports GPT-5.5 uses 72% fewer output tokens than Opus 4.7 on equivalent tasks — meaning the headline gap closes once you measure tokens-to-completion rather than tokens-per-token. Second, Finout's Opus 4.7 pricing analysis flags that Opus 4.7's new tokenizer can swell the same input text by up to 35%, which Anthropic confirmed in their own documentation.
In other words: rate-card comparisons mislead. Real cost comparisons need to measure cost-per-completed-task, not cost-per-token.
How to Cut Claude API Costs by 60-90%
Five levers, ranked by effort vs return:
| Lever | Effort | Typical savings | Notes |
|---|---|---|---|
| Prompt caching with stable prefix | Low | 60-80% on input | See Claude API cache pricing guide |
| Tier downshift (Opus → Sonnet, Sonnet → Haiku) | Low-Medium | 50-80% | Validate quality on golden eval set |
| Batch API for async workloads | Low | 50% flat | Doesn't stack with Fast mode |
| Output token budgeting (max_tokens, stop sequences) | Medium | 20-40% on output | Highest leverage when output is most of cost |
| Multi-model routing (Haiku triage → Sonnet escalate) | High | 40-60% | Use TokenMix.ai routing or LiteLLM |
The prompt caching lever is the most underused. Per Vellum's prompt caching docs, cached tokens run roughly 50% cheaper than non-cached even before you stack on Anthropic's 0.1x cache hit rate. Helicone reports similar savings in their prompt caching observability changelog.
Hidden Cost Factors Most Posts Miss
Inferred from production usage patterns plus Anthropic documentation:
| Factor | Real cost impact |
|---|---|
| Opus 4.7 tokenizer overhead (up to 35%) | Effective rate per English token is closer to $6.75/$33.75 versus the $5/$25 headline |
| US-only data residency on Opus 4.7 | 1.1x multiplier on every token category; matters for compliance workloads |
| Web search tool usage | $10 per 1,000 searches plus standard token costs for retrieved content |
| Tool use system prompt overhead | 313-346 tokens added per request when tools parameter is set |
| Computer use beta tokens | Adds 466-499 tokens per request before screenshots |
| Fast mode on Opus 4.6 | 6x rate ($30/$150) — premium for latency-sensitive flows only |
| Bedrock / Vertex regional endpoints | 10% premium over global on Sonnet 4.5+ and Haiku 4.5+ |
Most cost surprises in production come from these factors, not the headline rate card.
When Should You Use Bedrock or Vertex Instead?
Anthropic exposes Claude through three first-party paths and three cloud platforms:
| Channel | Pricing | Latency | Compliance | Best for |
|---|---|---|---|---|
| Claude API direct | Standard rates | Lowest, global | SOC 2 | Default choice |
| AWS Bedrock | +10% on regional, same on global | Comparable | HIPAA, IL5 available | Workloads already on AWS |
| Google Vertex AI | +10% on regional, same on global | Comparable | EU data residency, ISO | Workloads already on GCP |
| Microsoft Foundry | Per Microsoft pricing | Comparable | Azure compliance umbrella | Enterprise Microsoft shops |
| TokenMix.ai unified gateway | Direct rates + 5% platform fee, OpenAI-compatible | Single endpoint | See official authorized access | Multi-model apps with Alipay/WeChat Pay |
The cleanest decision rule: pick the channel that matches your existing data plane. Migrating to Bedrock just to access Claude rarely pays back the integration tax.
Final Recommendation
For most production workloads in 2026, route Sonnet 4.6 by default with prompt caching enabled and batch processing for any non-real-time job. Reserve Opus 4.7 for the hard 10% where accuracy gains exceed the 5x cost premium. Use Haiku 4.5 as the cheap triage tier in front of Sonnet — the $1/$5 rate undercuts most reasonable alternatives once cache hits compound.
FAQ
Is Claude API cheaper than GPT-5.5 in 2026?
On output, yes — Opus 4.7 is $25 per MTok versus GPT-5.5 standard at $30. On input, both sit at $5 per MTok. The honest answer is workload-dependent: if your app spends most of its budget on output tokens, Claude wins by 17% on the rate card. If you measure cost-per-completed-task, GPT-5.5's 72% lower output token count per task narrows or reverses the gap.
Why is Claude Opus 4.7 priced the same as Opus 4.6?
Anthropic kept the rate card flat across the 4.5/4.6/4.7 generation. The catch is the new tokenizer in 4.7 may use up to 35% more tokens for the same input text, which raises effective cost per English word even though the rate-per-token is unchanged.
What is the cheapest Claude model in 2026?
Haiku 3 at $0.25 / $1.25 per MTok remains the absolute cheapest, but capabilities lag Haiku 4.5 by enough that most teams now route to Haiku 4.5 ($1/$5) for production work. With Batch API, Haiku 4.5 drops to $0.50/$2.50.
Does Anthropic offer volume discounts?
Yes, but only via direct sales contact for enterprise contracts. There is no published volume discount tier. The fastest cost reduction available to all users is the Batch API (flat 50% off) plus prompt caching (90% off cache reads).
How does prompt caching pricing work?
Cache writes cost 1.25x base input for 5-minute TTL or 2x for 1-hour TTL. Cache reads cost 0.1x base input. The 5-minute cache pays off after one cache read; the 1-hour cache pays off after two reads. See our Claude API cache pricing deep-dive for the break-even math.
Can Batch API and prompt caching be combined?
Yes. The discounts stack multiplicatively. According to Anthropic's pricing docs, Batch API "discounts can be combined" with prompt caching. Fast mode on Opus 4.6 is the one feature that does not combine with Batch.
What is the cheapest way to use Claude in production?
Stack three levers: route Haiku 4.5 for triage, enable prompt caching with a stable system prompt prefix, and submit non-time-sensitive jobs through Batch API. Combined effective rate can reach $0.30-$0.50 per MTok input on cached calls — roughly the cost of Haiku 3 with Sonnet-tier capabilities behind it via routing.
How do I avoid surprise costs on Claude API?
Set hard max_tokens budgets on every request, monitor usage.cache_read_input_tokens and usage.cache_creation_input_tokens separately, and run alerts on output-token spikes. Output is where 70-80% of bills live in agentic workloads.
Related Articles
- Claude API Cache Pricing 2026: 90% Input Savings Explained
- DeepSeek API Pricing
- Gemini API Pricing
- GPT-5 API Pricing
- Claude Haiku vs Sonnet: Cost vs Quality 2026
- Claude Opus 4.7 Review: Pricing and Benchmarks
- LLM API Gateway Guide
Sources
- Anthropic — Claude API pricing official documentation
- Anthropic — Prompt caching guide
- Anthropic — Batch processing documentation
- Anthropic — Model deprecations schedule
- DataCamp — Claude Opus 4.7 vs GPT-5.4 benchmark
- Finout — Claude Opus 4.7 real cost story
- llm-stats.com — GPT-5.5 vs Claude Opus 4.7 benchmark
- ProjectDiscovery — How we cut LLM cost with prompt caching
- Helicone — Anthropic prompt caching support
- Vellum — Prompt caching documentation
By TokenMix Research Lab · Updated 2026-04-30