Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Claude API pricing in 2026 splits cleanly across three tiers: Opus 4.7 at $5/$25 per million input/output tokens, Sonnet 4.6 at $3/
5, and Haiku 4.5 at
/$5. The newer 4.x models cost the same or less than the deprecated Opus 4.1 and Sonnet 3.7 they replaced.
According to Anthropic's official pricing page, Opus 4.7, 4.6, and 4.5 share identical per-token rates. Sonnet 4.6 has held at $3/
5 since Sonnet 4. Haiku 4.5 doubled the input price of Haiku 3 ($0.25 →
) but tripled output capability, and it now matches Sonnet 3 on benchmarks like SWE-bench Verified. Cache reads stay at 0.1x base input, batch processing keeps a flat 50% discount, and Opus 4.7's new tokenizer can use up to 35% more tokens for the same English text — a real cost factor that the headline rate card hides.
What does Claude Opus 4.7 cost per million tokens?
$5 input / $25 output (1M context window)
What does Claude Sonnet 4.6 cost?
$3 input /
5 output
What does Claude Haiku 4.5 cost?
input / $5 output
Cheapest Claude model?
Haiku 3 at $0.25 /
.25, but capabilities lag Haiku 4.5
How much does prompt caching save?
90% on cache reads (0.1x base input)
Batch API discount?
Flat 50% on input and output
Confirmed Facts vs Common Misreads
Claim
Status
Source
Opus 4.7 = $5/$25 per MTok
Confirmed
platform.claude.com pricing table
Opus 4.6 and Opus 4.7 share the same rate card
Confirmed
platform.claude.com pricing table
Opus 4.7 may use up to 35% more tokens than 4.6 for same text
Confirmed (caveat)
Anthropic docs note on new tokenizer
Sonnet 4.6 supports full 1M context at standard pricing
Confirmed
Long context pricing section
Cache hits cost 0.1x base input
Confirmed
Prompt caching multiplier table
5-minute cache write costs 1.25x; 1-hour costs 2x
Confirmed
Anthropic prompt caching docs
Claude is always more expensive than OpenAI
False
GPT-5.5 standard is $5/$30; Opus 4.7 is $5/$25 (output cheaper)
Anthropic raised Claude prices in 2026
False
Opus 4.7 launched at the same rate as Opus 4.6/4.5
US-only data residency on Opus 4.7 adds 1.1x
Confirmed
Data residency pricing section
Full Claude Pricing Table
Based on Anthropic's official pricing page (data checked 2026-04-30, all prices USD per 1M tokens):
Model
Input
5m Cache Write
1h Cache Write
Cache Read
Output
Batch Input
Batch Output
Context
Claude Opus 4.7
$5
$6.25
0
$0.50
$25
$2.50
2.50
1M
Claude Opus 4.6
$5
$6.25
0
$0.50
$25
$2.50
2.50
1M
Claude Opus 4.5
$5
$6.25
0
$0.50
$25
$2.50
2.50
200K
Claude Opus 4.1 (deprecated)
5
8.75
$30
.50
$75
$7.50
$37.50
200K
Claude Sonnet 4.6
$3
$3.75
$6
$0.30
5
.50
$7.50
1M
Claude Sonnet 4.5
$3
$3.75
$6
$0.30
5
.50
$7.50
200K
Claude Sonnet 4
$3
$3.75
$6
$0.30
5
.50
$7.50
200K
Claude Sonnet 3.7 (deprecated)
$3
$3.75
$6
$0.30
5
.50
$7.50
200K
Claude Haiku 4.5
.25
$2
$0.10
$5
$0.50
$2.50
200K
Claude Haiku 3.5
$0.80
.60
$0.08
$4
$0.40
$2
200K
Claude Haiku 3
$0.25
$0.30
$0.50
$0.03
.25
$0.125
$0.625
200K
Three things to take from this table. First, the 4.x generation collapsed Opus pricing from
5/$75 down to $5/$25 — that's a 67% input cut, 67% output cut versus Opus 4.1. Second, Sonnet has held steady at $3/
5 across four releases. Third, the Haiku tier has doubled in price between Haiku 3.5 and Haiku 4.5, but the capability jump is far larger than 25% — Anthropic is pulling Haiku upward into Sonnet 3 territory on coding benchmarks.
How Did Claude Pricing Change From 2025 to 2026?
Three structural shifts hit between mid-2025 and April 2026:
The deprecated tier matters for budgeting. According to Anthropic's deprecation policy, Sonnet 3.7 and Opus 3 are scheduled for retirement, which means workloads still pinned to those versions face forced migrations — and the migration target (Sonnet 4.x or Opus 4.x) costs the same or less, which is unusual for an SaaS API in this space.
Opus, Sonnet, Haiku: Which Tier Should You Pick?
Tier choice should follow the workload, not the brand. Here is the decision matrix we use internally at TokenMix.ai:
Workload type
Recommended tier
Why
Production code generation, agentic coding
Sonnet 4.6
Hits ~75% on SWE-bench Verified at 1/5 the Opus 4.7 price
Hard reasoning, multi-step research, novel domain
Opus 4.7
Highest sustained accuracy but 5x the Sonnet output rate
Customer support classification, intent detection
Haiku 4.5
/$5 with strong instruction-following
Bulk summarization, doc cleanup
Haiku 4.5 + Batch
Drops to $0.50/$2.50 with Batch API
Long-context ingestion (>200k)
Sonnet 4.6
Only 4.x model with 1M at standard rates besides Opus 4.6/4.7
Stateful agents with cacheable system prompts
Sonnet 4.6 + cache
Cache reads at $0.30/MTok make agent loops cheap
A common mistake is reaching for Opus 4.7 by default. According to DataCamp's Opus 4.7 vs GPT-5.4 benchmark, Opus 4.7 leads Sonnet 4.6 by 2-4 percentage points on most reasoning benchmarks, but the price ratio is 5x. For 90% of production workloads, Sonnet 4.6 is the right call and Opus is reserved for the hard 10% where accuracy gains pay back the spend.
Cost Examples Across Real Workloads
Customer support chatbot — 10,000 tickets/month
Average ticket = 3,700 tokens (per Anthropic's customer support guide). Assume 80/20 input/output split.
Model
Monthly cost
Per ticket
Haiku 4.5
$37.00
$0.0037
Haiku 4.5 + 5-min cache (60% hit rate)
9.20
$0.0019
Haiku 4.5 + Batch
8.50
$0.0019
Sonnet 4.6
11.00
$0.0111
Opus 4.7
85.00
$0.0185
Inferred: most support classification fits Haiku 4.5; the 90% input savings from cache compound when system prompts are reused across tickets.
Coding agent — 100,000 calls/month with reused tool schemas
Assume 8k input tokens (mostly tool schemas, repo context) and 1k output tokens per call. Tool schemas are a perfect prompt-caching candidate.
Strategy
Sonnet 4.6 monthly
Opus 4.7 monthly
No cache
$3,900
$6,500
5-min cache (75% hit rate)
,275
$2,125
5-min cache + Batch (where applicable)
~$650
~
,065
ProjectDiscovery's published case study shows that moving dynamic content out of the cacheable prefix raised their hit rate from 7% to 74% in a single deployment. This is the single largest lever available.
Document summarization — 1M documents/month
Average document = 5k input tokens, output = 500 tokens. Latency-insensitive, perfect for Batch API.
Model
Standard cost
Batch cost (50% off)
Haiku 4.5
$7,500
$3,750
Sonnet 4.6
$22,500
1,250
Opus 4.7
$37,500
8,750
For non-time-sensitive document workloads, Haiku 4.5 + Batch puts cost-per-document at ~$0.0038. That's the floor before you start considering self-hosted inference.
Code review pipeline — 5,000 PRs/month with 80k average context
Long-context PR review is where the 1M context window earns its keep.
Model
Per-PR cost
Monthly cost
Sonnet 4.6 (no cache)
$0.27
,350
Sonnet 4.6 + 1-hour cache (repo context cached)
$0.094
$470
Opus 4.7 + 1-hour cache
$0.156
$780
The 1-hour cache (2x write multiplier) breaks even after just two reads — a bar most code review pipelines clear easily.
Claude vs GPT-5.5 vs Gemini 3.1 Pro
Same workload (10M output tokens/month, agent-heavy):
Two caveats. First, llm-stats.com benchmark reports GPT-5.5 uses 72% fewer output tokens than Opus 4.7 on equivalent tasks — meaning the headline gap closes once you measure tokens-to-completion rather than tokens-per-token. Second, Finout's Opus 4.7 pricing analysis flags that Opus 4.7's new tokenizer can swell the same input text by up to 35%, which Anthropic confirmed in their own documentation.
In other words: rate-card comparisons mislead. Real cost comparisons need to measure cost-per-completed-task, not cost-per-token.
The prompt caching lever is the most underused. Per Vellum's prompt caching docs, cached tokens run roughly 50% cheaper than non-cached even before you stack on Anthropic's 0.1x cache hit rate. Helicone reports similar savings in their prompt caching observability changelog.
Hidden Cost Factors Most Posts Miss
Inferred from production usage patterns plus Anthropic documentation:
Factor
Real cost impact
Opus 4.7 tokenizer overhead (up to 35%)
Effective rate per English token is closer to $6.75/$33.75 versus the $5/$25 headline
US-only data residency on Opus 4.7
1.1x multiplier on every token category; matters for compliance workloads
Web search tool usage
0 per 1,000 searches plus standard token costs for retrieved content
Tool use system prompt overhead
313-346 tokens added per request when tools parameter is set
Computer use beta tokens
Adds 466-499 tokens per request before screenshots
Fast mode on Opus 4.6
6x rate ($30/
50) — premium for latency-sensitive flows only
Bedrock / Vertex regional endpoints
10% premium over global on Sonnet 4.5+ and Haiku 4.5+
Most cost surprises in production come from these factors, not the headline rate card.
When Should You Use Bedrock or Vertex Instead?
Anthropic exposes Claude through three first-party paths and three cloud platforms:
The cleanest decision rule: pick the channel that matches your existing data plane. Migrating to Bedrock just to access Claude rarely pays back the integration tax.
Final Recommendation
For most production workloads in 2026, route Sonnet 4.6 by default with prompt caching enabled and batch processing for any non-real-time job. Reserve Opus 4.7 for the hard 10% where accuracy gains exceed the 5x cost premium. Use Haiku 4.5 as the cheap triage tier in front of Sonnet — the
/$5 rate undercuts most reasonable alternatives once cache hits compound.
FAQ
Is Claude API cheaper than GPT-5.5 in 2026?
On output, yes — Opus 4.7 is $25 per MTok versus GPT-5.5 standard at $30. On input, both sit at $5 per MTok. The honest answer is workload-dependent: if your app spends most of its budget on output tokens, Claude wins by 17% on the rate card. If you measure cost-per-completed-task, GPT-5.5's 72% lower output token count per task narrows or reverses the gap.
Why is Claude Opus 4.7 priced the same as Opus 4.6?
Anthropic kept the rate card flat across the 4.5/4.6/4.7 generation. The catch is the new tokenizer in 4.7 may use up to 35% more tokens for the same input text, which raises effective cost per English word even though the rate-per-token is unchanged.
What is the cheapest Claude model in 2026?
Haiku 3 at $0.25 /
.25 per MTok remains the absolute cheapest, but capabilities lag Haiku 4.5 by enough that most teams now route to Haiku 4.5 (
/$5) for production work. With Batch API, Haiku 4.5 drops to $0.50/$2.50.
Does Anthropic offer volume discounts?
Yes, but only via direct sales contact for enterprise contracts. There is no published volume discount tier. The fastest cost reduction available to all users is the Batch API (flat 50% off) plus prompt caching (90% off cache reads).
How does prompt caching pricing work?
Cache writes cost 1.25x base input for 5-minute TTL or 2x for 1-hour TTL. Cache reads cost 0.1x base input. The 5-minute cache pays off after one cache read; the 1-hour cache pays off after two reads. See our Claude API cache pricing deep-dive for the break-even math.
Can Batch API and prompt caching be combined?
Yes. The discounts stack multiplicatively. According to Anthropic's pricing docs, Batch API "discounts can be combined" with prompt caching. Fast mode on Opus 4.6 is the one feature that does not combine with Batch.
What is the cheapest way to use Claude in production?
Stack three levers: route Haiku 4.5 for triage, enable prompt caching with a stable system prompt prefix, and submit non-time-sensitive jobs through Batch API. Combined effective rate can reach $0.30-$0.50 per MTok input on cached calls — roughly the cost of Haiku 3 with Sonnet-tier capabilities behind it via routing.
How do I avoid surprise costs on Claude API?
Set hard max_tokens budgets on every request, monitor usage.cache_read_input_tokens and usage.cache_creation_input_tokens separately, and run alerts on output-token spikes. Output is where 70-80% of bills live in agentic workloads.