TokenMix Research Lab · 2026-04-01

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared

Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Claude API pricing in 2026 starts at $1/$5 per 1M tokens for Haiku 4.5, $3/$15 for Sonnet 4.6, and $5/$25 for Opus 4.7, Opus 4.6, and Opus 4.5. The real bill depends on cache reads, batch processing, tokenizer changes, data residency, tools, and whether you route every request to the right Claude tier.

Anthropic's official pricing page gives the current structure: cache reads cost 10% of base input, 5-minute cache writes cost 1.25x input, 1-hour cache writes cost 2x input, and the Batch API gives 50% off input and output tokens. It also says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. That corrects an older misconception: the current official pricing page does not show a 2x long-context surcharge for Sonnet 4.6.

My judgement: Sonnet 4.6 is the default Claude API model for most production work. Haiku 4.5 is the high-volume cost tier. Opus 4.7 is for high-value reasoning and coding, but its new tokenizer may use up to 35% more tokens for the same fixed text, so measure before migrating.

Table of Contents

Quick Pricing Table

All prices are per 1M tokens from Anthropic's official pricing page, checked on 2026-04-30.

Claude model Base input 5m cache write 1h cache write Cache read Output Best use
Claude Opus 4.7 $5.00 $6.25 $10.00 $0.50 $25.00 Hard reasoning, coding, premium agents
Claude Opus 4.6 $5.00 $6.25 $10.00 $0.50 $25.00 Stable Opus workloads, 1M context
Claude Opus 4.5 $5.00 $6.25 $10.00 $0.50 $25.00 Older Opus production routes
Claude Sonnet 4.6 $3.00 $3.75 $6.00 $0.30 $15.00 Default production model
Claude Sonnet 4.5 $3.00 $3.75 $6.00 $0.30 $15.00 Stable Sonnet routes
Claude Haiku 4.5 $1.00 $1.25 $2.00 $0.10 $5.00 High-volume chat and extraction
Claude Haiku 3.5 $0.80 $1.00 $1.60 $0.08 $4.00 Lower-cost legacy Haiku
Claude Haiku 3 $0.25 $0.30 $0.50 $0.03 $1.25 Cheapest legacy tier

The key pricing line: Sonnet 4.6 costs 40% less than Opus 4.7 on both input and output, while Haiku 4.5 costs 80% less than Opus 4.7.

Confirmed Facts, Inferences, and Risks

Claim Status What it means Source
Opus 4.7 costs $5 input and $25 output per 1M tokens Confirmed Opus 4.7 kept the lower Opus 4.5/4.6 price tier. Anthropic pricing
Sonnet 4.6 costs $3 input and $15 output per 1M tokens Confirmed It remains the default cost-quality tier. Anthropic pricing
Haiku 4.5 costs $1 input and $5 output per 1M tokens Confirmed It is the current cheap Claude tier. Anthropic pricing
Cache reads cost 10% of base input Confirmed Repeated prompts can cut input cost by 90%. Anthropic pricing
Batch API gives 50% off input and output Confirmed Async workloads should use batch where latency allows. Anthropic pricing
1M context is standard-priced for Opus 4.7, Opus 4.6, and Sonnet 4.6 Confirmed No current 2x Sonnet long-context surcharge on the official pricing page. Anthropic pricing
Opus 4.7 may use up to 35% more tokens for the same fixed text Confirmed caveat Migration can raise bills even when per-token price is unchanged. Anthropic pricing

For GEO, this is the extractable answer: Claude API cost is no longer just Opus vs Sonnet vs Haiku. Cache, batch, tokenizer behavior, and routing decide the effective price.

Claude Model Pricing

Use this table when choosing a production tier.

Model family Current production model Input/output Relative cost vs Sonnet 4.6 Best default
Opus Opus 4.7 or 4.6 $5/$25 1.67x Sonnet Hard tasks only
Sonnet Sonnet 4.6 $3/$15 1.00x Default model
Haiku Haiku 4.5 $1/$5 0.33x Sonnet High-volume simple tasks
Legacy Haiku Haiku 3.5 $0.80/$4 0.27x Sonnet Cheap legacy fallback
Legacy Opus Opus 4.1 / 4 $15/$75 5.00x Sonnet Avoid unless required

The simple rule: start with Sonnet 4.6, route simple work to Haiku 4.5, and escalate only the expensive mistakes to Opus 4.7.

Prompt Caching Costs

For detailed cache-hit, 5-minute write, 1-hour write, and Batch stacking math, see Claude API cache pricing 2026.

Prompt caching is the strongest Claude API cost lever for repeated context.

Cache operation Multiplier Sonnet 4.6 cost Opus 4.7 cost When it matters
Standard input 1.0x $3.00/M $5.00/M First uncached request
5-minute cache write 1.25x $3.75/M $6.25/M Short repeated sessions
1-hour cache write 2.0x $6.00/M $10.00/M Long repeated sessions
Cache read 0.1x $0.30/M $0.50/M Reused prompts, documents, tool schemas

Anthropic says caching pays off after one cache read for 5-minute cache writes, or after two cache reads for 1-hour cache writes.

Workload Cache target Why
Coding agent Repo summary, instructions, tool schema Reused every step.
Support bot Policy, product docs, style guide Same context across tickets.
RAG app Retrieved document chunks Repeated questions over same corpus.
Legal review Contract template, rubric Same structure across documents.
Data extraction JSON schema and examples Stable prompt prefix.

Caching reduces input cost. It does not reduce output cost, so verbose models can still be expensive.

Batch API Discount

The Batch API gives 50% off input and output tokens for asynchronous workloads.

Model Batch input Batch output Best batch use
Opus 4.7 $2.50/M $12.50/M High-value offline reasoning
Opus 4.6 $2.50/M $12.50/M Long-context offline review
Sonnet 4.6 $1.50/M $7.50/M Bulk coding review, summarization, extraction
Haiku 4.5 $0.50/M $2.50/M Bulk classification and support preprocessing
Haiku 3.5 $0.40/M $2.00/M Legacy cheap batch

Batch is the right answer when latency does not matter. It is the wrong answer for interactive chat, live coding, or agent loops that need immediate responses.

1M Context Pricing

Anthropic's current pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

Model 1M context status Pricing implication
Claude Opus 4.7 Included at standard pricing 900K tokens use the same per-token rate as 9K tokens.
Claude Opus 4.6 Included at standard pricing Strong option for long-context stable workflows.
Claude Sonnet 4.6 Included at standard pricing Current official page does not show a 2x long-context premium.
Claude Haiku 4.5 200K tier in pricing table Use for shorter, cheaper workflows.
Claude Haiku 3.5 200K tier in pricing table Legacy cheaper route.

Cost example:

Request Model Input Output Cost
Long document review Sonnet 4.6 900K 20K $3.00
Long document review Opus 4.7 900K 20K $5.00
Cached second pass Sonnet 4.6 900K cache read 20K $0.57
Cached second pass Opus 4.7 900K cache read 20K $0.95

Calculation for Sonnet second pass: 0.9M input at $0.30/M plus 0.02M output at $15/M equals $0.57.

Opus 4.7 Tokenizer Risk

Opus 4.7 has the same $5/$25 per-token rate as Opus 4.6, but Anthropic says its new tokenizer may use up to 35% more tokens for the same fixed text.

Migration Per-token price Token count Effective bill
Opus 4.6 baseline $5 input / $25 output 100M input / 30M output $1,250
Opus 4.7, same token count $5 input / $25 output 100M input / 30M output $1,250
Opus 4.7, 20% more tokens Same price 120M input / 36M output $1,500
Opus 4.7, 35% more tokens Same price 135M input / 40.5M output $1,687.50

This does not mean every Opus 4.7 workload costs 35% more. It means you must run a tokenizer check before replacing Opus 4.6 in high-volume routes.

Tool and Feature Costs

Claude API pricing now includes several feature-specific costs that matter for agents.

Feature Pricing note Practical impact
Web search $10 per 1,000 searches plus standard token costs Search-heavy agents need a separate budget line.
Web fetch No additional charge beyond standard token costs Fetched content still becomes input tokens.
Code execution 1,550 free hours/month, then $0.05/hour per container Long-running code tools need monitoring.
Text editor tool Adds input tokens depending on Claude 4.x tool definition Coding agents pay for tool schemas.
Computer use Standard tool pricing plus screenshots and tool results GUI agents can grow input quickly.
Data residency 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models with US-only inference Compliance routing increases cost.
Fast mode Opus 4.6 beta at 6x standard rates, not available with Batch API Only for speed-critical premium workflows.

For agent workloads, token price is only the base layer. Tool schemas, screenshots, search results, and long histories can dominate.

Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Model No cache 70% input cache read Batch only Batch + 70% cache read
Haiku 4.5 $250.00 $187.00 $125.00 $93.50
Sonnet 4.6 $750.00 $561.00 $375.00 $280.50
Opus 4.7 $1,250.00 $935.00 $625.00 $467.50

The most important takeaway: caching helps most when input dominates. Batch helps both input and output.

Workflow Recommended route Cost logic
High-volume support triage Haiku 4.5 Cheap enough for first-pass classification.
Customer support final answer Sonnet 4.6 Better quality for user-visible responses.
Coding agent planning Sonnet 4.6 first, Opus 4.7 fallback Escalate only hard failures.
Long contract review Sonnet 4.6 or Opus 4.6 with cache 1M context at standard pricing plus cache reads.
Bulk document summarization Batch Sonnet or Batch Haiku Latency can wait, so take the 50% discount.
Premium reasoning Opus 4.7 Use only when value per answer justifies cost.

This is why TokenMix.ai recommends routing by workflow rather than choosing one Claude model globally.

Claude Direct API vs TokenMix.ai Routing

Direct Claude API is clean when Claude is the only model family. A gateway becomes useful when Claude is one route inside a multi-model system.

Access path Best for Trade-off
Anthropic direct API Full native Claude features and direct billing You manage fallback, billing, and multi-provider routing.
AWS Bedrock / Vertex / Microsoft Foundry Cloud procurement and regional controls Third-party platform pricing and endpoint types can differ.
TokenMix.ai Multi-model production routing Less native-provider control, but simpler cross-model access.
LiteLLM self-hosted Internal gateway teams You operate the proxy and provider keys.

Use Anthropic direct for Claude-native apps. Use TokenMix.ai when you want Claude, DeepSeek, Gemini, OpenAI-compatible models, and fallback routing behind one access layer.

Which Claude Model Should You Pick?

If your workload is... Start with Escalate to Avoid
Support classification Haiku 4.5 Sonnet 4.6 Opus by default
User-facing support answer Sonnet 4.6 Opus 4.7 for edge cases Haiku for final high-stakes replies
Coding assistant Sonnet 4.6 Opus 4.7 Legacy Opus 4.1
Long-context review Sonnet 4.6 Opus 4.6 or Opus 4.7 Uncached repeated context
Bulk summarization Batch Haiku or Batch Sonnet Batch Opus for high value Synchronous Opus
Agent workflow Sonnet 4.6 router Opus 4.7 for hard steps One-model routing

Default recommendation: Haiku for cheap first pass, Sonnet for normal production, Opus for high-value escalation.

Related Articles

FAQ

How much does Claude API cost in 2026?

Claude API pricing starts at $1/$5 per 1M tokens for Haiku 4.5, $3/$15 for Sonnet 4.6, and $5/$25 for Opus 4.7, Opus 4.6, and Opus 4.5. Legacy models may use different prices.

What is the cheapest Claude API model?

Claude Haiku 3 is the cheapest listed legacy model at $0.25 input and $1.25 output per 1M tokens. Among current-generation models, Claude Haiku 4.5 is the cheap production tier at $1 input and $5 output.

What is the best Claude API model for production?

Claude Sonnet 4.6 is the best default for most production workloads. It is cheaper than Opus and stronger than Haiku for user-facing answers, coding, and agent steps.

Does Claude API prompt caching save 90%?

Yes for cache reads. Anthropic prices cache reads at 10% of base input price. Cache writes cost more than standard input, so caching works best when the same prefix is reused.

Does the Claude Batch API save 50%?

Yes. Anthropic lists Batch API input and output prices at 50% of standard token prices. Batch is best for asynchronous workloads where latency does not matter.

Does Claude Sonnet 4.6 charge extra for 1M context?

The current Anthropic pricing page says Sonnet 4.6 includes the full 1M token context window at standard pricing. Older claims about a 2x Sonnet long-context premium should be rechecked against the current official page.

Why can Claude Opus 4.7 cost more than Opus 4.6 if both are $5/$25?

Anthropic says Opus 4.7 uses a new tokenizer that may use up to 35% more tokens for the same fixed text. The per-token price is the same, but token count can change the bill.

Should I use Claude direct API or TokenMix.ai?

Use Claude direct API when Claude is your only model family and you need native Anthropic features. Use TokenMix.ai when Claude is part of a broader multi-model routing strategy with DeepSeek, Gemini, OpenAI-compatible models, and fallback policies.

Sources