TokenMix Research Lab · 2026-04-29

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared

Last Updated: 2026-04-30 Author: TokenMix Research Lab Data checked: 2026-04-30

Claude API pricing in 2026 splits cleanly across three tiers: Opus 4.7 at $5/$25 per million input/output tokens, Sonnet 4.6 at $3/ 5, and Haiku 4.5 at /$5. The newer 4.x models cost the same or less than the deprecated Opus 4.1 and Sonnet 3.7 they replaced.

According to Anthropic's official pricing page, Opus 4.7, 4.6, and 4.5 share identical per-token rates. Sonnet 4.6 has held at $3/ 5 since Sonnet 4. Haiku 4.5 doubled the input price of Haiku 3 ($0.25 → ) but tripled output capability, and it now matches Sonnet 3 on benchmarks like SWE-bench Verified. Cache reads stay at 0.1x base input, batch processing keeps a flat 50% discount, and Opus 4.7's new tokenizer can use up to 35% more tokens for the same English text — a real cost factor that the headline rate card hides.

Table of Contents

Quick Answer

Question Direct Answer
What does Claude Opus 4.7 cost per million tokens? $5 input / $25 output (1M context window)
What does Claude Sonnet 4.6 cost? $3 input / 5 output
What does Claude Haiku 4.5 cost? input / $5 output
Cheapest Claude model? Haiku 3 at $0.25 / .25, but capabilities lag Haiku 4.5
How much does prompt caching save? 90% on cache reads (0.1x base input)
Batch API discount? Flat 50% on input and output

Confirmed Facts vs Common Misreads

Claim Status Source
Opus 4.7 = $5/$25 per MTok Confirmed platform.claude.com pricing table
Opus 4.6 and Opus 4.7 share the same rate card Confirmed platform.claude.com pricing table
Opus 4.7 may use up to 35% more tokens than 4.6 for same text Confirmed (caveat) Anthropic docs note on new tokenizer
Sonnet 4.6 supports full 1M context at standard pricing Confirmed Long context pricing section
Cache hits cost 0.1x base input Confirmed Prompt caching multiplier table
5-minute cache write costs 1.25x; 1-hour costs 2x Confirmed Anthropic prompt caching docs
Claude is always more expensive than OpenAI False GPT-5.5 standard is $5/$30; Opus 4.7 is $5/$25 (output cheaper)
Anthropic raised Claude prices in 2026 False Opus 4.7 launched at the same rate as Opus 4.6/4.5
US-only data residency on Opus 4.7 adds 1.1x Confirmed Data residency pricing section

Full Claude Pricing Table

Based on Anthropic's official pricing page (data checked 2026-04-30, all prices USD per 1M tokens):

Model Input 5m Cache Write 1h Cache Write Cache Read Output Batch Input Batch Output Context
Claude Opus 4.7 $5 $6.25 0 $0.50 $25 $2.50 2.50 1M
Claude Opus 4.6 $5 $6.25 0 $0.50 $25 $2.50 2.50 1M
Claude Opus 4.5 $5 $6.25 0 $0.50 $25 $2.50 2.50 200K
Claude Opus 4.1 (deprecated) 5 8.75 $30 .50 $75 $7.50 $37.50 200K
Claude Sonnet 4.6 $3 $3.75 $6 $0.30 5 .50 $7.50 1M
Claude Sonnet 4.5 $3 $3.75 $6 $0.30 5 .50 $7.50 200K
Claude Sonnet 4 $3 $3.75 $6 $0.30 5 .50 $7.50 200K
Claude Sonnet 3.7 (deprecated) $3 $3.75 $6 $0.30 5 .50 $7.50 200K
Claude Haiku 4.5 .25 $2 $0.10 $5 $0.50 $2.50 200K
Claude Haiku 3.5 $0.80 .60 $0.08 $4 $0.40 $2 200K
Claude Haiku 3 $0.25 $0.30 $0.50 $0.03 .25 $0.125 $0.625 200K

Three things to take from this table. First, the 4.x generation collapsed Opus pricing from 5/$75 down to $5/$25 — that's a 67% input cut, 67% output cut versus Opus 4.1. Second, Sonnet has held steady at $3/ 5 across four releases. Third, the Haiku tier has doubled in price between Haiku 3.5 and Haiku 4.5, but the capability jump is far larger than 25% — Anthropic is pulling Haiku upward into Sonnet 3 territory on coding benchmarks.

How Did Claude Pricing Change From 2025 to 2026?

Three structural shifts hit between mid-2025 and April 2026:

Change When Impact
Opus 4.1 → Opus 4.5/4.6/4.7 rate cut Late 2025 Input/output rate dropped 3x ( 5/$75 → $5/$25)
1M context window opened at standard pricing Sonnet 4.6, Opus 4.6, Opus 4.7 No 2x premium beyond 200k like some competitors
Cache TTL options expanded 2025-09 1-hour cache (2x write) added alongside 5-min (1.25x write)

The deprecated tier matters for budgeting. According to Anthropic's deprecation policy, Sonnet 3.7 and Opus 3 are scheduled for retirement, which means workloads still pinned to those versions face forced migrations — and the migration target (Sonnet 4.x or Opus 4.x) costs the same or less, which is unusual for an SaaS API in this space.

Opus, Sonnet, Haiku: Which Tier Should You Pick?

Tier choice should follow the workload, not the brand. Here is the decision matrix we use internally at TokenMix.ai:

Workload type Recommended tier Why
Production code generation, agentic coding Sonnet 4.6 Hits ~75% on SWE-bench Verified at 1/5 the Opus 4.7 price
Hard reasoning, multi-step research, novel domain Opus 4.7 Highest sustained accuracy but 5x the Sonnet output rate
Customer support classification, intent detection Haiku 4.5 /$5 with strong instruction-following
Bulk summarization, doc cleanup Haiku 4.5 + Batch Drops to $0.50/$2.50 with Batch API
Long-context ingestion (>200k) Sonnet 4.6 Only 4.x model with 1M at standard rates besides Opus 4.6/4.7
Stateful agents with cacheable system prompts Sonnet 4.6 + cache Cache reads at $0.30/MTok make agent loops cheap

A common mistake is reaching for Opus 4.7 by default. According to DataCamp's Opus 4.7 vs GPT-5.4 benchmark, Opus 4.7 leads Sonnet 4.6 by 2-4 percentage points on most reasoning benchmarks, but the price ratio is 5x. For 90% of production workloads, Sonnet 4.6 is the right call and Opus is reserved for the hard 10% where accuracy gains pay back the spend.

Cost Examples Across Real Workloads

Customer support chatbot — 10,000 tickets/month

Average ticket = 3,700 tokens (per Anthropic's customer support guide). Assume 80/20 input/output split.

Model Monthly cost Per ticket
Haiku 4.5 $37.00 $0.0037
Haiku 4.5 + 5-min cache (60% hit rate) 9.20 $0.0019
Haiku 4.5 + Batch 8.50 $0.0019
Sonnet 4.6 11.00 $0.0111
Opus 4.7 85.00 $0.0185

Inferred: most support classification fits Haiku 4.5; the 90% input savings from cache compound when system prompts are reused across tickets.

Coding agent — 100,000 calls/month with reused tool schemas

Assume 8k input tokens (mostly tool schemas, repo context) and 1k output tokens per call. Tool schemas are a perfect prompt-caching candidate.

Strategy Sonnet 4.6 monthly Opus 4.7 monthly
No cache $3,900 $6,500
5-min cache (75% hit rate) ,275 $2,125
5-min cache + Batch (where applicable) ~$650 ~ ,065

ProjectDiscovery's published case study shows that moving dynamic content out of the cacheable prefix raised their hit rate from 7% to 74% in a single deployment. This is the single largest lever available.

Document summarization — 1M documents/month

Average document = 5k input tokens, output = 500 tokens. Latency-insensitive, perfect for Batch API.

Model Standard cost Batch cost (50% off)
Haiku 4.5 $7,500 $3,750
Sonnet 4.6 $22,500 1,250
Opus 4.7 $37,500 8,750

For non-time-sensitive document workloads, Haiku 4.5 + Batch puts cost-per-document at ~$0.0038. That's the floor before you start considering self-hosted inference.

Code review pipeline — 5,000 PRs/month with 80k average context

Long-context PR review is where the 1M context window earns its keep.

Model Per-PR cost Monthly cost
Sonnet 4.6 (no cache) $0.27 ,350
Sonnet 4.6 + 1-hour cache (repo context cached) $0.094 $470
Opus 4.7 + 1-hour cache $0.156 $780

The 1-hour cache (2x write multiplier) breaks even after just two reads — a bar most code review pipelines clear easily.

Claude vs GPT-5.5 vs Gemini 3.1 Pro

Same workload (10M output tokens/month, agent-heavy):

Provider Model Input $/MTok Output $/MTok Monthly cost (10M output)
Anthropic Opus 4.7 $5 $25 $250
OpenAI GPT-5.5 standard $5 $30 $300
OpenAI GPT-5.5 pro tier $30 80 ,800
Anthropic Sonnet 4.6 $3 5 50
Google Gemini 3.1 Pro varies varies see Gemini API pricing

Two caveats. First, llm-stats.com benchmark reports GPT-5.5 uses 72% fewer output tokens than Opus 4.7 on equivalent tasks — meaning the headline gap closes once you measure tokens-to-completion rather than tokens-per-token. Second, Finout's Opus 4.7 pricing analysis flags that Opus 4.7's new tokenizer can swell the same input text by up to 35%, which Anthropic confirmed in their own documentation.

In other words: rate-card comparisons mislead. Real cost comparisons need to measure cost-per-completed-task, not cost-per-token.

How to Cut Claude API Costs by 60-90%

Five levers, ranked by effort vs return:

Lever Effort Typical savings Notes
Prompt caching with stable prefix Low 60-80% on input See Claude API cache pricing guide
Tier downshift (Opus → Sonnet, Sonnet → Haiku) Low-Medium 50-80% Validate quality on golden eval set
Batch API for async workloads Low 50% flat Doesn't stack with Fast mode
Output token budgeting (max_tokens, stop sequences) Medium 20-40% on output Highest leverage when output is most of cost
Multi-model routing (Haiku triage → Sonnet escalate) High 40-60% Use TokenMix.ai routing or LiteLLM

The prompt caching lever is the most underused. Per Vellum's prompt caching docs, cached tokens run roughly 50% cheaper than non-cached even before you stack on Anthropic's 0.1x cache hit rate. Helicone reports similar savings in their prompt caching observability changelog.

Hidden Cost Factors Most Posts Miss

Inferred from production usage patterns plus Anthropic documentation:

Factor Real cost impact
Opus 4.7 tokenizer overhead (up to 35%) Effective rate per English token is closer to $6.75/$33.75 versus the $5/$25 headline
US-only data residency on Opus 4.7 1.1x multiplier on every token category; matters for compliance workloads
Web search tool usage 0 per 1,000 searches plus standard token costs for retrieved content
Tool use system prompt overhead 313-346 tokens added per request when tools parameter is set
Computer use beta tokens Adds 466-499 tokens per request before screenshots
Fast mode on Opus 4.6 6x rate ($30/ 50) — premium for latency-sensitive flows only
Bedrock / Vertex regional endpoints 10% premium over global on Sonnet 4.5+ and Haiku 4.5+

Most cost surprises in production come from these factors, not the headline rate card.

When Should You Use Bedrock or Vertex Instead?

Anthropic exposes Claude through three first-party paths and three cloud platforms:

Channel Pricing Latency Compliance Best for
Claude API direct Standard rates Lowest, global SOC 2 Default choice
AWS Bedrock +10% on regional, same on global Comparable HIPAA, IL5 available Workloads already on AWS
Google Vertex AI +10% on regional, same on global Comparable EU data residency, ISO Workloads already on GCP
Microsoft Foundry Per Microsoft pricing Comparable Azure compliance umbrella Enterprise Microsoft shops
TokenMix.ai unified gateway Direct rates + 5% platform fee, OpenAI-compatible Single endpoint See official authorized access Multi-model apps with Alipay/WeChat Pay

The cleanest decision rule: pick the channel that matches your existing data plane. Migrating to Bedrock just to access Claude rarely pays back the integration tax.

Final Recommendation

For most production workloads in 2026, route Sonnet 4.6 by default with prompt caching enabled and batch processing for any non-real-time job. Reserve Opus 4.7 for the hard 10% where accuracy gains exceed the 5x cost premium. Use Haiku 4.5 as the cheap triage tier in front of Sonnet — the /$5 rate undercuts most reasonable alternatives once cache hits compound.

FAQ

Is Claude API cheaper than GPT-5.5 in 2026?

On output, yes — Opus 4.7 is $25 per MTok versus GPT-5.5 standard at $30. On input, both sit at $5 per MTok. The honest answer is workload-dependent: if your app spends most of its budget on output tokens, Claude wins by 17% on the rate card. If you measure cost-per-completed-task, GPT-5.5's 72% lower output token count per task narrows or reverses the gap.

Why is Claude Opus 4.7 priced the same as Opus 4.6?

Anthropic kept the rate card flat across the 4.5/4.6/4.7 generation. The catch is the new tokenizer in 4.7 may use up to 35% more tokens for the same input text, which raises effective cost per English word even though the rate-per-token is unchanged.

What is the cheapest Claude model in 2026?

Haiku 3 at $0.25 / .25 per MTok remains the absolute cheapest, but capabilities lag Haiku 4.5 by enough that most teams now route to Haiku 4.5 ( /$5) for production work. With Batch API, Haiku 4.5 drops to $0.50/$2.50.

Does Anthropic offer volume discounts?

Yes, but only via direct sales contact for enterprise contracts. There is no published volume discount tier. The fastest cost reduction available to all users is the Batch API (flat 50% off) plus prompt caching (90% off cache reads).

How does prompt caching pricing work?

Cache writes cost 1.25x base input for 5-minute TTL or 2x for 1-hour TTL. Cache reads cost 0.1x base input. The 5-minute cache pays off after one cache read; the 1-hour cache pays off after two reads. See our Claude API cache pricing deep-dive for the break-even math.

Can Batch API and prompt caching be combined?

Yes. The discounts stack multiplicatively. According to Anthropic's pricing docs, Batch API "discounts can be combined" with prompt caching. Fast mode on Opus 4.6 is the one feature that does not combine with Batch.

What is the cheapest way to use Claude in production?

Stack three levers: route Haiku 4.5 for triage, enable prompt caching with a stable system prompt prefix, and submit non-time-sensitive jobs through Batch API. Combined effective rate can reach $0.30-$0.50 per MTok input on cached calls — roughly the cost of Haiku 3 with Sonnet-tier capabilities behind it via routing.

How do I avoid surprise costs on Claude API?

Set hard max_tokens budgets on every request, monitor usage.cache_read_input_tokens and usage.cache_creation_input_tokens separately, and run alerts on output-token spikes. Output is where 70-80% of bills live in agentic workloads.

Related Articles

Sources

By TokenMix Research Lab · Updated 2026-04-30