TokenMix Research Lab · 2026-04-29

Claude API Pricing 2026: Opus 4.8, Sonnet 4.8, Haiku 4.5 Compared

Claude API Pricing 2026: Opus 4.8, Sonnet 4.8, Haiku 4.5 Compared

Last Updated: 2026-06-01 Author: TokenMix Research Lab Data checked: 2026-06-01

Claude API pricing in 2026 splits cleanly across three current tiers plus an upcoming Mythos-class tier: Opus 4.8 at $5/$25 per million input/output tokens (Fast Mode $10/$50), Sonnet 4.8 at $3/$15, Haiku 4.5 at $1/$5, and Mythos-class models confirmed by Anthropic for public release "in the coming weeks" at a projected premium tier above Opus.

According to Anthropic's official Opus 4.8 launch docs, Opus 4.8, 4.7, and 4.6 share identical standard per-token rates ($5/$25). Opus 4.8 adds a Fast Mode research preview at $10/$50 for 2.5x output speed. Sonnet 4.8 holds at $3/$15 across the 4.x line. Haiku 4.5 stays at $1/$5. Cache reads remain 0.1x base input (90% discount), batch processing keeps the flat 50% discount, and Opus 4.7's tokenizer change (carried to 4.8) can use up to 35% more tokens for the same English text — the headline rate card hides this.

Table of Contents

Quick Answer

Question Direct Answer
What does Claude Opus 4.8 cost per million tokens? $5 input / $25 output (1M context window, standard mode)
What does Opus 4.8 Fast Mode cost? $10 input / $50 output (2.5x output speed, research preview)
What does Claude Sonnet 4.8 cost? $3 input / $15 output (1M context window)
What does Claude Haiku 4.5 cost? $1 input / $5 output (200K context window)
What will Mythos-class cost? Projected ~$25 input / ~$125 output per MTok at premium tier (likely, not confirmed)
Cheapest Claude model? Haiku 3 at $0.25 / $1.25, but capabilities lag Haiku 4.5
How much does prompt caching save? 90% on cache reads (0.1x base input)
Batch API discount? Flat 50% on input and output

Confirmed Facts vs Common Misreads

Claim Status Source
Opus 4.8 = $5/$25 per MTok, identical to 4.7 Confirmed Anthropic Opus 4.8 launch docs
Opus 4.8 Fast Mode = $10/$50 per MTok Confirmed Anthropic Opus 4.8 launch docs
Sonnet 4.8 = $3/$15 per MTok Confirmed Anthropic pricing page
Sonnet 4.8 supports 1M context at standard pricing Confirmed Long context pricing section
Opus 4.8 cache write minimum dropped to 1,024 tokens Confirmed Opus 4.8 release notes
Cache hits cost 0.1x base input Confirmed Prompt caching multiplier table
5-minute cache write costs 1.25x; 1-hour costs 2x Confirmed Anthropic prompt caching docs
Mythos-class public release coming "in the coming weeks" (as of May 28, 2026) Confirmed Anthropic statement via BleepingComputer
Mythos-class will cost ~$25/$125 per MTok at launch Likely Project Glasswing partner reference pricing
Claude is always more expensive than OpenAI False GPT-5.5 standard is $3/$15; Opus 4.8 is $5/$25 — depends on workload mix
Anthropic raised Claude prices in 2026 False Opus 4.8 launched at the same $5/$25 as Opus 4.7
US-only data residency on Opus 4.x adds 1.1x Confirmed Data residency pricing section

Full Claude Pricing Table

Based on Anthropic's official pricing page (data checked 2026-06-01, all prices USD per 1M tokens):

Model Input 5m Cache Write 1h Cache Write Cache Read Output Batch Input Batch Output Context
Claude Opus 4.8 $5 $6.25 $10 $0.50 $25 $2.50 $12.50 1M
Claude Opus 4.8 Fast Mode $10 $12.50 $20 $1.00 $50 n/a n/a 1M
Claude Opus 4.7 $5 $6.25 $10 $0.50 $25 $2.50 $12.50 1M
Claude Opus 4.6 $5 $6.25 $10 $0.50 $25 $2.50 $12.50 1M
Claude Opus 4.5 $5 $6.25 $10 $0.50 $25 $2.50 $12.50 200K
Claude Opus 4.1 (deprecated) $15 $18.75 $30 $1.50 $75 $7.50 $37.50 200K
Claude Sonnet 4.8 $3 $3.75 $6 $0.30 $15 $1.50 $7.50 1M
Claude Sonnet 4.7 $3 $3.75 $6 $0.30 $15 $1.50 $7.50 1M
Claude Sonnet 4.6 $3 $3.75 $6 $0.30 $15 $1.50 $7.50 1M
Claude Sonnet 4.5 $3 $3.75 $6 $0.30 $15 $1.50 $7.50 200K
Claude Sonnet 3.7 (deprecated) $3 $3.75 $6 $0.30 $15 $1.50 $7.50 200K
Claude Haiku 4.5 $1 $1.25 $2 $0.10 $5 $0.50 $2.50 200K
Claude Haiku 3.5 $0.80 $1 $1.60 $0.08 $4 $0.40 $2 200K
Claude Haiku 3 $0.25 $0.30 $0.50 $0.03 $1.25 $0.125 $0.625 200K
Claude Mythos-class (projected) ~$25 TBD TBD TBD ~$125 TBD TBD TBD

Four things to take from this table. First, Opus 4.8 introduced Fast Mode at $10/$50 — 2x rate for 2.5x output speed, suited to user-facing interactive workloads. Second, the entire 4.x generation collapsed Opus pricing from $15/$75 down to $5/$25 — that's a 67% input cut, 67% output cut versus Opus 4.1. Third, Sonnet has held steady at $3/$15 across five releases (4 → 4.5 → 4.6 → 4.7 → 4.8). Fourth, the upcoming Mythos-class tier sits at a projected 5x Opus premium — a specialty tier for security audit and autonomous research workloads, not a general-purpose default.

Mythos-Class Pricing (Coming Weeks)

Anthropic confirmed on May 28, 2026 that Claude Mythos-class models will "roll out to all our customers in the coming weeks." Public Mythos pricing has not been announced, but signals point to a premium tier above Opus:

Signal Implied pricing
Project Glasswing partner reference rate $25 input / $125 output per MTok
5x Opus pricing pattern (matches Opus-over-Sonnet jump) $20-30 input, $100-150 output
Likely Fast Mode addition at launch 2-2.5x premium for speed
Capability differential (90x Firefox exploit gap vs Opus 4.6) Justifies security-focused premium

The realistic public release window is mid-June through end of July 2026. For a complete breakdown of what Mythos-class means at the capability layer, see Claude Mythos vs Opus 4.8 and the Project Glasswing data covering 23,019 software flaws discovered.

For most production workloads, Mythos is not the right default. Stay on Opus 4.8 unless your workload is security audit, vulnerability research, or autonomous long-horizon research where the 5x premium maps to measurable revenue or cost reduction.

How Did Claude Pricing Change From 2025 to 2026?

Four structural shifts hit between mid-2025 and June 2026:

Change When Impact
Opus 4.1 → Opus 4.5/4.6/4.7/4.8 rate cut Late 2025 - May 2026 Input/output rate dropped 3x ($15/$75 → $5/$25)
1M context window opened at standard pricing Sonnet 4.6, Opus 4.6, all later No 2x premium beyond 200k like some competitors
Cache TTL options expanded 2025-09 1-hour cache (2x write) added alongside 5-min (1.25x write)
Opus 4.8 Fast Mode introduced 2026-05-28 First Anthropic premium-speed tier on Opus, $10/$50

The deprecated tier matters for budgeting. According to Anthropic's deprecation policy, Sonnet 3.7 and Opus 3 are scheduled for retirement. The migration target (Sonnet 4.x or Opus 4.x) costs the same or less, which is unusual for an SaaS API in this space.

Opus, Sonnet, Haiku: Which Tier Should You Pick?

Tier choice should follow the workload, not the brand. Decision matrix:

Workload type Recommended tier Why
Production code generation, agentic coding Sonnet 4.8 Hits ~75% on SWE-bench Verified at 1/5 the Opus 4.8 price
Hard reasoning, multi-step research, novel domain Opus 4.8 Highest sustained accuracy, 88.6% SWE-bench Verified
Interactive copilots / user-facing chat Opus 4.8 Fast Mode 2.5x output speed; cost premium correctly priced for revenue-sensitive UX
Customer support classification, intent detection Haiku 4.5 $1/$5 with strong instruction-following
Bulk summarization, doc cleanup Haiku 4.5 + Batch Drops to $0.50/$2.50 with Batch API
Long-context ingestion (>200k) Sonnet 4.8 1M context at standard rates
Stateful agents with cacheable system prompts Sonnet 4.8 + cache Cache reads at $0.30/MTok make agent loops cheap
Security audit / vulnerability research Mythos-class (when public) 90x capability multiplier on offensive security workloads

A common mistake is reaching for Opus 4.8 by default. Opus 4.8 leads Sonnet 4.8 by 2-4 percentage points on most reasoning benchmarks, but the price ratio is 5x. For 90% of production workloads, Sonnet 4.8 is the right call and Opus is reserved for the hard 10% where accuracy gains pay back the spend. The Claude Sonnet vs Opus routing guide covers when to escalate.

Cost Examples Across Real Workloads

Customer support chatbot — 10,000 tickets/month

Average ticket = 3,700 tokens (per Anthropic's customer support guide). Assume 80/20 input/output split.

Model Monthly cost Per ticket
Haiku 4.5 $37.00 $0.0037
Haiku 4.5 + 5-min cache (60% hit rate) $19.20 $0.0019
Haiku 4.5 + Batch $18.50 $0.0019
Sonnet 4.8 $111.00 $0.0111
Opus 4.8 $185.00 $0.0185

Most support classification fits Haiku 4.5; the 90% input savings from cache compound when system prompts are reused across tickets.

Coding agent — 100,000 calls/month with reused tool schemas

Assume 8k input tokens (mostly tool schemas, repo context) and 1k output tokens per call. Tool schemas are a perfect prompt-caching candidate.

Strategy Sonnet 4.8 monthly Opus 4.8 monthly
No cache $3,900 $6,500
5-min cache (75% hit rate) $1,275 $2,125
5-min cache + Batch (where applicable) ~$650 ~$1,065
Opus 4.8 Fast Mode (no cache, user-facing) n/a $13,000

ProjectDiscovery's published case study shows that moving dynamic content out of the cacheable prefix raised their hit rate from 7% to 74% in a single deployment. This is the single largest lever available. The mid-conversation system message support added in Opus 4.8 preserves cache hits across longer agentic loops than 4.7 allowed.

Document summarization — 1M documents/month

Average document = 5k input tokens, output = 500 tokens. Latency-insensitive, perfect for Batch API.

Model Standard cost Batch cost (50% off)
Haiku 4.5 $7,500 $3,750
Sonnet 4.8 $22,500 $11,250
Opus 4.8 $37,500 $18,750

For non-time-sensitive document workloads, Haiku 4.5 + Batch puts cost-per-document at ~$0.0038. That's the floor before you start considering self-hosted inference.

Code review pipeline — 5,000 PRs/month with 80k average context

Long-context PR review is where the 1M context window earns its keep. Opus 4.8 lands a 4x reduction in code-flaw rate vs Opus 4.7, which matters more at this workload than the per-token cost.

Model Per-PR cost Monthly cost
Sonnet 4.8 (no cache) $0.27 $1,350
Sonnet 4.8 + 1-hour cache (repo context cached) $0.094 $470
Opus 4.8 + 1-hour cache $0.156 $780

The 1-hour cache (2x write multiplier) breaks even after just two reads — a bar most code review pipelines clear easily.

Claude vs GPT-5.5 vs Gemini 3.5 Pro

Same workload (10M output tokens/month, agent-heavy):

Provider Model Input $/MTok Output $/MTok Monthly cost (10M output)
Anthropic Opus 4.8 $5 $25 $250
Anthropic Opus 4.8 Fast Mode $10 $50 $500
OpenAI GPT-5.5 standard $3 $15 $150
OpenAI GPT-5.5 Pro tier $30 $180 $1,800
Anthropic Sonnet 4.8 $3 $15 $150
Google Gemini 3.5 Pro $2.50 $10 $100
Anthropic Mythos-class (projected) ~$25 ~$125 ~$1,250

Two caveats. First, llm-stats.com benchmark reports Opus 4.8 wins SWE-Bench Pro by 10.6 pts over GPT-5.5 and GDPval-AA by 121 Elo — meaning the headline cost gap is misleading when output quality matters. Second, Finout's Opus 4.7 pricing analysis flags that the Opus 4.7/4.8 tokenizer can swell the same input text by up to 35%, which Anthropic confirmed in their own documentation.

Rate-card comparisons mislead. Real cost comparisons need to measure cost-per-completed-task, not cost-per-token. See the Frontier Pro Tier comparison for cost-per-task math across all major frontier models.

How to Cut Claude API Costs by 60-90%

Five levers, ranked by effort vs return:

Lever Effort Typical savings Notes
Prompt caching with stable prefix Low 60-80% on input See Claude API cache pricing guide
Tier downshift (Opus → Sonnet, Sonnet → Haiku) Low-Medium 50-80% Validate quality on golden eval set
Batch API for async workloads Low 50% flat Doesn't stack with Fast mode
Output token budgeting (max_tokens, stop sequences) Medium 20-40% on output Highest leverage when output is most of cost
Multi-model routing (Haiku triage → Sonnet escalate → Opus 4.8 reasoning) High 40-60% Use TokenMix routing or LiteLLM

The prompt caching lever is the most underused. Per Vellum's prompt caching docs, cached tokens run roughly 50% cheaper than non-cached even before you stack on Anthropic's 0.1x cache hit rate. Helicone reports similar savings in their prompt caching observability changelog. Opus 4.8's lower cache minimum (1,024 tokens vs Opus 4.7's higher floor) makes more system prompts cacheable without code changes.

Hidden Cost Factors Most Posts Miss

Inferred from production usage patterns plus Anthropic documentation:

Factor Real cost impact
Opus 4.7/4.8 tokenizer overhead (up to 35%) Effective rate per English token is closer to $6.75/$33.75 versus the $5/$25 headline
Opus 4.8 effort default = high Default raised from medium to high — costs go up unless you set effort: "medium" explicitly
US-only data residency on Opus 4.x 1.1x multiplier on every token category; matters for compliance workloads
Web search tool usage $10 per 1,000 searches plus standard token costs for retrieved content
Tool use system prompt overhead 313-346 tokens added per request when tools parameter is set
Computer use beta tokens Adds 466-499 tokens per request before screenshots
Fast mode pricing premium Opus 4.8 Fast Mode is $10/$50 (2x rate) — premium for latency-sensitive flows only
Bedrock / Vertex regional endpoints 10% premium over global on Sonnet 4.5+ and Haiku 4.5+
Mythos-class gating Premium tier likely gated behind Cyber Verification Program for offensive capabilities

Most cost surprises in production come from these factors, not the headline rate card.

When Should You Use Bedrock or Vertex Instead?

Anthropic exposes Claude through three first-party paths and three cloud platforms:

Channel Pricing Latency Compliance Best for
Claude API direct Standard rates Lowest, global SOC 2 Default choice
AWS Bedrock +10% on regional, same on global Comparable HIPAA, IL5 available Workloads already on AWS
Google Vertex AI +10% on regional, same on global Comparable EU data residency, ISO Workloads already on GCP
Microsoft Foundry Per Microsoft pricing Comparable Azure compliance umbrella Enterprise Microsoft shops
TokenMix.ai unified gateway Direct rates pass-through, OpenAI-compatible Single endpoint See official authorized access Multi-model apps with Alipay/WeChat Pay

The cleanest decision rule: pick the channel that matches your existing data plane. Migrating to Bedrock just to access Claude rarely pays back the integration tax. For Mythos public release, AWS Bedrock US East is the only currently-deployed channel — Project Glasswing partners access there.

Final Recommendation

For most production workloads in June 2026, route Sonnet 4.8 by default with prompt caching enabled and batch processing for any non-real-time job. Reserve Opus 4.8 for the hard 10% where accuracy gains exceed the 5x cost premium — and use Opus 4.8 Fast Mode only for user-facing interactive flows where latency maps to revenue. Use Haiku 4.5 as the cheap triage tier in front of Sonnet. Stay alert for Mythos-class public release in mid-June through end of July 2026 — useful only if your workload is security audit, vulnerability research, or autonomous long-horizon research.

FAQ

Is Claude API cheaper than GPT-5.5 in 2026?

Mixed. Opus 4.8 is $5/$25 versus GPT-5.5 at $3/$15 — GPT-5.5 wins on rate card. But Opus 4.8 wins SWE-Bench Pro by 10.6 pts and GDPval-AA by 121 Elo, so cost-per-completed-task often favors Opus on coding workloads. Sonnet 4.8 at $3/$15 matches GPT-5.5 on price and is the closer apples-to-apples comparison.

Why is Claude Opus 4.8 priced the same as Opus 4.7?

Anthropic kept the rate card flat across the 4.5/4.6/4.7/4.8 generation. Opus 4.8 adds Fast Mode at $10/$50 (2.5x output speed) and a 4x reduction in code-flaw rate, but the standard mode rate is unchanged. The catch is the tokenizer change (introduced in 4.7, carried to 4.8) may use up to 35% more tokens for the same input text, which raises effective cost per English word even though the rate-per-token is unchanged.

What is the cheapest Claude model in 2026?

Haiku 3 at $0.25 / $1.25 per MTok remains the absolute cheapest, but capabilities lag Haiku 4.5 by enough that most teams now route to Haiku 4.5 ($1/$5) for production work. With Batch API, Haiku 4.5 drops to $0.50/$2.50.

When will Claude Mythos be publicly available?

Anthropic confirmed "in the coming weeks" as of May 28, 2026. Realistic window: mid-June through end of July 2026. Projected pricing is ~$25 input / ~$125 output per MTok at a premium tier above Opus. Full timeline and prep checklist in our Mythos coming weeks article.

Does Anthropic offer volume discounts?

Yes, but only via direct sales contact for enterprise contracts. There is no published volume discount tier. The fastest cost reduction available to all users is the Batch API (flat 50% off) plus prompt caching (90% off cache reads).

How does prompt caching pricing work?

Cache writes cost 1.25x base input for 5-minute TTL or 2x for 1-hour TTL. Cache reads cost 0.1x base input. The 5-minute cache pays off after one cache read; the 1-hour cache pays off after two reads. Opus 4.8 lowered the minimum cacheable prompt length to 1,024 tokens, so more short system prompts now build cache entries. See our Claude API cache pricing deep-dive for the break-even math.

Can Batch API and prompt caching be combined?

Yes. The discounts stack multiplicatively. According to Anthropic's pricing docs, Batch API "discounts can be combined" with prompt caching. Fast mode on Opus 4.8 is the one feature that does not combine with Batch.

What is the cheapest way to use Claude in production?

Stack three levers: route Haiku 4.5 for triage, enable prompt caching with a stable system prompt prefix, and submit non-time-sensitive jobs through Batch API. Combined effective rate can reach $0.30-$0.50 per MTok input on cached calls — roughly the cost of Haiku 3 with Sonnet-tier capabilities behind it via routing.

How do I avoid surprise costs on Claude API?

Set hard max_tokens budgets on every request, monitor usage.cache_read_input_tokens and usage.cache_creation_input_tokens separately, and run alerts on output-token spikes. Output is where 70-80% of bills live in agentic workloads. On Opus 4.8 specifically, set effort: "medium" explicitly to avoid the new default high raising your costs unexpectedly.

Related Articles

Sources

By TokenMix Research Lab · Updated 2026-06-01