TokenMix Research Lab · 2026-04-01

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared

Last Updated: 2026-05-14 Author: TokenMix Research Lab Data checked: 2026-05-14

Claude API pricing in 2026 starts at $1/$5 per 1M tokens for Haiku 4.5, $3/$15 for Sonnet 4.6, and $5/$25 for Opus 4.7, Opus 4.6, and Opus 4.5. The real bill depends on cache reads, batch processing, tokenizer changes, data residency, tools, and whether you route every request to the right Claude tier.

Update 2026-05-14: Every price in this article was re-verified today against Anthropic's official API pricing docs. No changes since the 2026-04-30 publication. A new section below compares Claude Opus 4.7 directly against the just-released GPT-5.5 ($5/$30 per MTok, codename Spud) — Opus 4.7 is now within 10% of GPT-5.5's input price and 17% cheaper on output.

Anthropic's official pricing page gives the current structure: cache reads cost 10% of base input, 5-minute cache writes cost 1.25x input, 1-hour cache writes cost 2x input, and the Batch API gives 50% off input and output tokens. It also says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. That corrects an older misconception: the current official pricing page does not show a 2x long-context surcharge for Sonnet 4.6.

My judgement: Sonnet 4.6 is the default Claude API model for most production work. Haiku 4.5 is the high-volume cost tier. Opus 4.7 is for high-value reasoning and coding, but its new tokenizer may use up to 35% more tokens for the same fixed text, so measure before migrating.

Quick Pricing Table
Confirmed Facts, Inferences, and Risks
Claude Model Pricing
Prompt Caching Costs
Batch API Discount
1M Context Pricing
Opus 4.7 Tokenizer Risk
Tool and Feature Costs
Cost Scenarios
Claude Direct API vs TokenMix.ai Routing
Which Claude Model Should You Pick?
Related Articles
FAQ
Sources

Quick Pricing Table

All prices are per 1M tokens from Anthropic's official pricing page, checked on 2026-04-30.

Claude model	Base input	5m cache write	1h cache write	Cache read	Output	Best use
Claude Opus 4.7	$5.00	$6.25	$10.00	$0.50	$25.00	Hard reasoning, coding, premium agents
Claude Opus 4.6	$5.00	$6.25	$10.00	$0.50	$25.00	Stable Opus workloads, 1M context
Claude Opus 4.5	$5.00	$6.25	$10.00	$0.50	$25.00	Older Opus production routes
Claude Sonnet 4.6	$3.00	$3.75	$6.00	$0.30	$15.00	Default production model
Claude Sonnet 4.5	$3.00	$3.75	$6.00	$0.30	$15.00	Stable Sonnet routes
Claude Haiku 4.5	$1.00	$1.25	$2.00	$0.10	$5.00	High-volume chat and extraction
Claude Haiku 3.5	$0.80	$1.00	$1.60	$0.08	$4.00	Lower-cost legacy Haiku
Claude Haiku 3	$0.25	$0.30	$0.50	$0.03	$1.25	Cheapest legacy tier

The key pricing line: Sonnet 4.6 costs 40% less than Opus 4.7 on both input and output, while Haiku 4.5 costs 80% less than Opus 4.7.

Confirmed Facts, Inferences, and Risks

Claim	Status	What it means	Source
Opus 4.7 costs $5 input and $25 output per 1M tokens	Confirmed	Opus 4.7 kept the lower Opus 4.5/4.6 price tier.	Anthropic pricing
Sonnet 4.6 costs $3 input and $15 output per 1M tokens	Confirmed	It remains the default cost-quality tier.	Anthropic pricing
Haiku 4.5 costs $1 input and $5 output per 1M tokens	Confirmed	It is the current cheap Claude tier.	Anthropic pricing
Cache reads cost 10% of base input	Confirmed	Repeated prompts can cut input cost by 90%.	Anthropic pricing
Batch API gives 50% off input and output	Confirmed	Async workloads should use batch where latency allows.	Anthropic pricing
1M context is standard-priced for Opus 4.7, Opus 4.6, and Sonnet 4.6	Confirmed	No current 2x Sonnet long-context surcharge on the official pricing page.	Anthropic pricing
Opus 4.7 may use up to 35% more tokens for the same fixed text	Confirmed caveat	Migration can raise bills even when per-token price is unchanged.	Anthropic pricing

For GEO, this is the extractable answer: Claude API cost is no longer just Opus vs Sonnet vs Haiku. Cache, batch, tokenizer behavior, and routing decide the effective price.

Claude Model Pricing

Use this table when choosing a production tier.

Model family	Current production model	Input/output	Relative cost vs Sonnet 4.6	Best default
Opus	Opus 4.7 or 4.6	$5/$25	1.67x Sonnet	Hard tasks only
Sonnet	Sonnet 4.6	$3/$15	1.00x	Default model
Haiku	Haiku 4.5	$1/$5	0.33x Sonnet	High-volume simple tasks
Legacy Haiku	Haiku 3.5	$0.80/$4	0.27x Sonnet	Cheap legacy fallback
Legacy Opus	Opus 4.1 / 4	$15/$75	5.00x Sonnet	Avoid unless required

The simple rule: start with Sonnet 4.6, route simple work to Haiku 4.5, and escalate only the expensive mistakes to Opus 4.7.

Prompt Caching Costs

For detailed cache-hit, 5-minute write, 1-hour write, and Batch stacking math, see Claude API cache pricing 2026.

Prompt caching is the strongest Claude API cost lever for repeated context.

Cache operation	Multiplier	Sonnet 4.6 cost	Opus 4.7 cost	When it matters
Standard input	1.0x	$3.00/M	$5.00/M	First uncached request
5-minute cache write	1.25x	$3.75/M	$6.25/M	Short repeated sessions
1-hour cache write	2.0x	$6.00/M	$10.00/M	Long repeated sessions
Cache read	0.1x	$0.30/M	$0.50/M	Reused prompts, documents, tool schemas

Anthropic says caching pays off after one cache read for 5-minute cache writes, or after two cache reads for 1-hour cache writes.

Workload	Cache target	Why
Coding agent	Repo summary, instructions, tool schema	Reused every step.
Support bot	Policy, product docs, style guide	Same context across tickets.
RAG app	Retrieved document chunks	Repeated questions over same corpus.
Legal review	Contract template, rubric	Same structure across documents.
Data extraction	JSON schema and examples	Stable prompt prefix.

Caching reduces input cost. It does not reduce output cost, so verbose models can still be expensive.

Batch API Discount

The Batch API gives 50% off input and output tokens for asynchronous workloads.

Model	Batch input	Batch output	Best batch use
Opus 4.7	$2.50/M	$12.50/M	High-value offline reasoning
Opus 4.6	$2.50/M	$12.50/M	Long-context offline review
Sonnet 4.6	$1.50/M	$7.50/M	Bulk coding review, summarization, extraction
Haiku 4.5	$0.50/M	$2.50/M	Bulk classification and support preprocessing
Haiku 3.5	$0.40/M	$2.00/M	Legacy cheap batch

Batch is the right answer when latency does not matter. It is the wrong answer for interactive chat, live coding, or agent loops that need immediate responses.

1M Context Pricing

Anthropic's current pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

Model	1M context status	Pricing implication
Claude Opus 4.7	Included at standard pricing	900K tokens use the same per-token rate as 9K tokens.
Claude Opus 4.6	Included at standard pricing	Strong option for long-context stable workflows.
Claude Sonnet 4.6	Included at standard pricing	Current official page does not show a 2x long-context premium.
Claude Haiku 4.5	200K tier in pricing table	Use for shorter, cheaper workflows.
Claude Haiku 3.5	200K tier in pricing table	Legacy cheaper route.

Cost example:

Request	Model	Input	Output	Cost
Long document review	Sonnet 4.6	900K	20K	$3.00
Long document review	Opus 4.7	900K	20K	$5.00
Cached second pass	Sonnet 4.6	900K cache read	20K	$0.57
Cached second pass	Opus 4.7	900K cache read	20K	$0.95

Calculation for Sonnet second pass: 0.9M input at $0.30/M plus 0.02M output at $15/M equals $0.57.

Opus 4.7 Tokenizer Risk

Opus 4.7 has the same $5/$25 per-token rate as Opus 4.6, but Anthropic says its new tokenizer may use up to 35% more tokens for the same fixed text.

Migration	Per-token price	Token count	Effective bill
Opus 4.6 baseline	$5 input / $25 output	100M input / 30M output	$1,250
Opus 4.7, same token count	$5 input / $25 output	100M input / 30M output	$1,250
Opus 4.7, 20% more tokens	Same price	120M input / 36M output	$1,500
Opus 4.7, 35% more tokens	Same price	135M input / 40.5M output	$1,687.50

This does not mean every Opus 4.7 workload costs 35% more. It means you must run a tokenizer check before replacing Opus 4.6 in high-volume routes.

Tool and Feature Costs

Claude API pricing now includes several feature-specific costs that matter for agents.

Feature	Pricing note	Practical impact
Web search	$10 per 1,000 searches plus standard token costs	Search-heavy agents need a separate budget line.
Web fetch	No additional charge beyond standard token costs	Fetched content still becomes input tokens.
Code execution	1,550 free hours/month, then $0.05/hour per container	Long-running code tools need monitoring.
Text editor tool	Adds input tokens depending on Claude 4.x tool definition	Coding agents pay for tool schemas.
Computer use	Standard tool pricing plus screenshots and tool results	GUI agents can grow input quickly.
Data residency	1.1x multiplier for Opus 4.7, Opus 4.6, and newer models with US-only inference	Compliance routing increases cost.
Fast mode	Opus 4.6 beta at 6x standard rates, not available with Batch API	Only for speed-critical premium workflows.

For agent workloads, token price is only the base layer. Tool schemas, screenshots, search results, and long histories can dominate.

Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Model	No cache	70% input cache read	Batch only	Batch + 70% cache read
Haiku 4.5	$250.00	$187.00	$125.00	$93.50
Sonnet 4.6	$750.00	$561.00	$375.00	$280.50
Opus 4.7	$1,250.00	$935.00	$625.00	$467.50

The most important takeaway: caching helps most when input dominates. Batch helps both input and output.

Workflow	Recommended route	Cost logic
High-volume support triage	Haiku 4.5	Cheap enough for first-pass classification.
Customer support final answer	Sonnet 4.6	Better quality for user-visible responses.
Coding agent planning	Sonnet 4.6 first, Opus 4.7 fallback	Escalate only hard failures.
Long contract review	Sonnet 4.6 or Opus 4.6 with cache	1M context at standard pricing plus cache reads.
Bulk document summarization	Batch Sonnet or Batch Haiku	Latency can wait, so take the 50% discount.
Premium reasoning	Opus 4.7	Use only when value per answer justifies cost.

This is why TokenMix.ai recommends routing by workflow rather than choosing one Claude model globally.

Claude Direct API vs TokenMix.ai Routing

Direct Claude API is clean when Claude is the only model family. A gateway becomes useful when Claude is one route inside a multi-model system.

Access path	Best for	Trade-off
Anthropic direct API	Full native Claude features and direct billing	You manage fallback, billing, and multi-provider routing.
AWS Bedrock / Vertex / Microsoft Foundry	Cloud procurement and regional controls	Third-party platform pricing and endpoint types can differ.
TokenMix.ai	Multi-model production routing	Less native-provider control, but simpler cross-model access.
LiteLLM self-hosted	Internal gateway teams	You operate the proxy and provider keys.

Use Anthropic direct for Claude-native apps. Use TokenMix.ai when you want Claude, DeepSeek, Gemini, OpenAI-compatible models, and fallback routing behind one access layer.

Claude Opus 4.7 vs GPT-5.5: Same Price Tier Now

OpenAI shipped GPT-5.5 on 2026-04-23 at $5 input / $30 output per MTok. That puts Opus 4.7 and GPT-5.5 in the same price tier on input ($5 vs $5) and makes Opus 4.7 17% cheaper on output ($25 vs $30). Any older "Opus is 3x the price of GPT" comparison is out of date as of Anthropic's Opus 4.5 reset.

Dimension	Claude Opus 4.7	GPT-5.5
Standard input	$5.00 / MTok	$5.00 / MTok
Standard output	$25.00 / MTok	$30.00 / MTok
Batch input	$2.50 / MTok	$2.50 / MTok
Batch output	$12.50 / MTok	$15.00 / MTok
Context window	200K	1M
Cache-read input	$0.50 / MTok (10% of base)	TBD
SWE-Bench Pro Public	64.3%	58.6%
Terminal-Bench 2.0	69.4%	82.7%

The headline: pick Opus 4.7 when context fits 200K and SWE-Bench-style code tasks dominate. Pick GPT-5.5 when you need 1M context or agentic coding workflows (Terminal-Bench). Cost is no longer the deciding factor for output-heavy workloads — Opus 4.7 now wins on per-token output price.

Through TokenMix.ai both claude-opus-4-7 and gpt-5.5 are available with one key and OpenAI-compatible SDK, so the choice can stay per-workload rather than per-account.

Which Claude Model Should You Pick?

If your workload is...	Start with	Escalate to	Avoid
Support classification	Haiku 4.5	Sonnet 4.6	Opus by default
User-facing support answer	Sonnet 4.6	Opus 4.7 for edge cases	Haiku for final high-stakes replies
Coding assistant	Sonnet 4.6	Opus 4.7	Legacy Opus 4.1
Long-context review	Sonnet 4.6	Opus 4.6 or Opus 4.7	Uncached repeated context
Bulk summarization	Batch Haiku or Batch Sonnet	Batch Opus for high value	Synchronous Opus
Agent workflow	Sonnet 4.6 router	Opus 4.7 for hard steps	One-model routing

Default recommendation: Haiku for cheap first pass, Sonnet for normal production, Opus for high-value escalation.

FAQ

How much does Claude API cost in 2026?

Claude API pricing starts at $1/$5 per 1M tokens for Haiku 4.5, $3/$15 for Sonnet 4.6, and $5/$25 for Opus 4.7, Opus 4.6, and Opus 4.5. Legacy models may use different prices.

What is the cheapest Claude API model?

Claude Haiku 3 is the cheapest listed legacy model at $0.25 input and $1.25 output per 1M tokens. Among current-generation models, Claude Haiku 4.5 is the cheap production tier at $1 input and $5 output.

What is the best Claude API model for production?

Claude Sonnet 4.6 is the best default for most production workloads. It is cheaper than Opus and stronger than Haiku for user-facing answers, coding, and agent steps.

Does Claude API prompt caching save 90%?

Yes for cache reads. Anthropic prices cache reads at 10% of base input price. Cache writes cost more than standard input, so caching works best when the same prefix is reused.

Does the Claude Batch API save 50%?

Yes. Anthropic lists Batch API input and output prices at 50% of standard token prices. Batch is best for asynchronous workloads where latency does not matter.

Does Claude Sonnet 4.6 charge extra for 1M context?

The current Anthropic pricing page says Sonnet 4.6 includes the full 1M token context window at standard pricing. Older claims about a 2x Sonnet long-context premium should be rechecked against the current official page.

Why can Claude Opus 4.7 cost more than Opus 4.6 if both are $5/$25?

Anthropic says Opus 4.7 uses a new tokenizer that may use up to 35% more tokens for the same fixed text. The per-token price is the same, but token count can change the bill.

Should I use Claude direct API or TokenMix.ai?

Use Claude direct API when Claude is your only model family and you need native Anthropic features. Use TokenMix.ai when Claude is part of a broader multi-model routing strategy with DeepSeek, Gemini, OpenAI-compatible models, and fallback policies.