TokenMix Research Lab · 2026-04-29

Claude API Pricing 2026: Opus 4.8, Sonnet 4.8, Haiku 4.5 Compared
Last Updated: 2026-06-01 Author: TokenMix Research Lab Data checked: 2026-06-01
Claude API pricing in 2026 splits cleanly across three current tiers plus an upcoming Mythos-class tier: Opus 4.8 at $5/$25 per million input/output tokens (Fast Mode $10/$50), Sonnet 4.8 at $3/$15, Haiku 4.5 at $1/$5, and Mythos-class models confirmed by Anthropic for public release "in the coming weeks" at a projected premium tier above Opus.
According to Anthropic's official Opus 4.8 launch docs, Opus 4.8, 4.7, and 4.6 share identical standard per-token rates ($5/$25). Opus 4.8 adds a Fast Mode research preview at $10/$50 for 2.5x output speed. Sonnet 4.8 holds at $3/$15 across the 4.x line. Haiku 4.5 stays at $1/$5. Cache reads remain 0.1x base input (90% discount), batch processing keeps the flat 50% discount, and Opus 4.7's tokenizer change (carried to 4.8) can use up to 35% more tokens for the same English text — the headline rate card hides this.
Table of Contents
- Quick Answer
- Confirmed Facts vs Common Misreads
- Full Claude Pricing Table
- Mythos-Class Pricing (Coming Weeks)
- How Did Claude Pricing Change From 2025 to 2026?
- Opus, Sonnet, Haiku: Which Tier Should You Pick?
- Cost Examples Across Real Workloads
- Claude vs GPT-5.5 vs Gemini 3.5 Pro
- How to Cut Claude API Costs by 60-90%
- Hidden Cost Factors Most Posts Miss
- When Should You Use Bedrock or Vertex Instead?
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Answer
| Question | Direct Answer |
|---|---|
| What does Claude Opus 4.8 cost per million tokens? | $5 input / $25 output (1M context window, standard mode) |
| What does Opus 4.8 Fast Mode cost? | $10 input / $50 output (2.5x output speed, research preview) |
| What does Claude Sonnet 4.8 cost? | $3 input / $15 output (1M context window) |
| What does Claude Haiku 4.5 cost? | $1 input / $5 output (200K context window) |
| What will Mythos-class cost? | Projected ~$25 input / ~$125 output per MTok at premium tier (likely, not confirmed) |
| Cheapest Claude model? | Haiku 3 at $0.25 / $1.25, but capabilities lag Haiku 4.5 |
| How much does prompt caching save? | 90% on cache reads (0.1x base input) |
| Batch API discount? | Flat 50% on input and output |
Confirmed Facts vs Common Misreads
| Claim | Status | Source |
|---|---|---|
| Opus 4.8 = $5/$25 per MTok, identical to 4.7 | Confirmed | Anthropic Opus 4.8 launch docs |
| Opus 4.8 Fast Mode = $10/$50 per MTok | Confirmed | Anthropic Opus 4.8 launch docs |
| Sonnet 4.8 = $3/$15 per MTok | Confirmed | Anthropic pricing page |
| Sonnet 4.8 supports 1M context at standard pricing | Confirmed | Long context pricing section |
| Opus 4.8 cache write minimum dropped to 1,024 tokens | Confirmed | Opus 4.8 release notes |
| Cache hits cost 0.1x base input | Confirmed | Prompt caching multiplier table |
| 5-minute cache write costs 1.25x; 1-hour costs 2x | Confirmed | Anthropic prompt caching docs |
| Mythos-class public release coming "in the coming weeks" (as of May 28, 2026) | Confirmed | Anthropic statement via BleepingComputer |
| Mythos-class will cost ~$25/$125 per MTok at launch | Likely | Project Glasswing partner reference pricing |
| Claude is always more expensive than OpenAI | False | GPT-5.5 standard is $3/$15; Opus 4.8 is $5/$25 — depends on workload mix |
| Anthropic raised Claude prices in 2026 | False | Opus 4.8 launched at the same $5/$25 as Opus 4.7 |
| US-only data residency on Opus 4.x adds 1.1x | Confirmed | Data residency pricing section |
Full Claude Pricing Table
Based on Anthropic's official pricing page (data checked 2026-06-01, all prices USD per 1M tokens):
| Model | Input | 5m Cache Write | 1h Cache Write | Cache Read | Output | Batch Input | Batch Output | Context |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.8 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 1M |
| Claude Opus 4.8 Fast Mode | $10 | $12.50 | $20 | $1.00 | $50 | n/a | n/a | 1M |
| Claude Opus 4.7 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 1M |
| Claude Opus 4.6 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 1M |
| Claude Opus 4.5 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 | $12.50 | 200K |
| Claude Opus 4.1 (deprecated) | $15 | $18.75 | $30 | $1.50 | $75 | $7.50 | $37.50 | 200K |
| Claude Sonnet 4.8 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 1M |
| Claude Sonnet 4.7 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 1M |
| Claude Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 1M |
| Claude Sonnet 4.5 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 200K |
| Claude Sonnet 3.7 (deprecated) | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 | $7.50 | 200K |
| Claude Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 | $0.50 | $2.50 | 200K |
| Claude Haiku 3.5 | $0.80 | $1 | $1.60 | $0.08 | $4 | $0.40 | $2 | 200K |
| Claude Haiku 3 | $0.25 | $0.30 | $0.50 | $0.03 | $1.25 | $0.125 | $0.625 | 200K |
| Claude Mythos-class (projected) | ~$25 | TBD | TBD | TBD | ~$125 | TBD | TBD | TBD |
Four things to take from this table. First, Opus 4.8 introduced Fast Mode at $10/$50 — 2x rate for 2.5x output speed, suited to user-facing interactive workloads. Second, the entire 4.x generation collapsed Opus pricing from $15/$75 down to $5/$25 — that's a 67% input cut, 67% output cut versus Opus 4.1. Third, Sonnet has held steady at $3/$15 across five releases (4 → 4.5 → 4.6 → 4.7 → 4.8). Fourth, the upcoming Mythos-class tier sits at a projected 5x Opus premium — a specialty tier for security audit and autonomous research workloads, not a general-purpose default.
Mythos-Class Pricing (Coming Weeks)
Anthropic confirmed on May 28, 2026 that Claude Mythos-class models will "roll out to all our customers in the coming weeks." Public Mythos pricing has not been announced, but signals point to a premium tier above Opus:
| Signal | Implied pricing |
|---|---|
| Project Glasswing partner reference rate | $25 input / $125 output per MTok |
| 5x Opus pricing pattern (matches Opus-over-Sonnet jump) | $20-30 input, $100-150 output |
| Likely Fast Mode addition at launch | 2-2.5x premium for speed |
| Capability differential (90x Firefox exploit gap vs Opus 4.6) | Justifies security-focused premium |
The realistic public release window is mid-June through end of July 2026. For a complete breakdown of what Mythos-class means at the capability layer, see Claude Mythos vs Opus 4.8 and the Project Glasswing data covering 23,019 software flaws discovered.
For most production workloads, Mythos is not the right default. Stay on Opus 4.8 unless your workload is security audit, vulnerability research, or autonomous long-horizon research where the 5x premium maps to measurable revenue or cost reduction.
How Did Claude Pricing Change From 2025 to 2026?
Four structural shifts hit between mid-2025 and June 2026:
| Change | When | Impact |
|---|---|---|
| Opus 4.1 → Opus 4.5/4.6/4.7/4.8 rate cut | Late 2025 - May 2026 | Input/output rate dropped 3x ($15/$75 → $5/$25) |
| 1M context window opened at standard pricing | Sonnet 4.6, Opus 4.6, all later | No 2x premium beyond 200k like some competitors |
| Cache TTL options expanded | 2025-09 | 1-hour cache (2x write) added alongside 5-min (1.25x write) |
| Opus 4.8 Fast Mode introduced | 2026-05-28 | First Anthropic premium-speed tier on Opus, $10/$50 |
The deprecated tier matters for budgeting. According to Anthropic's deprecation policy, Sonnet 3.7 and Opus 3 are scheduled for retirement. The migration target (Sonnet 4.x or Opus 4.x) costs the same or less, which is unusual for an SaaS API in this space.
Opus, Sonnet, Haiku: Which Tier Should You Pick?
Tier choice should follow the workload, not the brand. Decision matrix:
| Workload type | Recommended tier | Why |
|---|---|---|
| Production code generation, agentic coding | Sonnet 4.8 | Hits ~75% on SWE-bench Verified at 1/5 the Opus 4.8 price |
| Hard reasoning, multi-step research, novel domain | Opus 4.8 | Highest sustained accuracy, 88.6% SWE-bench Verified |
| Interactive copilots / user-facing chat | Opus 4.8 Fast Mode | 2.5x output speed; cost premium correctly priced for revenue-sensitive UX |
| Customer support classification, intent detection | Haiku 4.5 | $1/$5 with strong instruction-following |
| Bulk summarization, doc cleanup | Haiku 4.5 + Batch | Drops to $0.50/$2.50 with Batch API |
| Long-context ingestion (>200k) | Sonnet 4.8 | 1M context at standard rates |
| Stateful agents with cacheable system prompts | Sonnet 4.8 + cache | Cache reads at $0.30/MTok make agent loops cheap |
| Security audit / vulnerability research | Mythos-class (when public) | 90x capability multiplier on offensive security workloads |
A common mistake is reaching for Opus 4.8 by default. Opus 4.8 leads Sonnet 4.8 by 2-4 percentage points on most reasoning benchmarks, but the price ratio is 5x. For 90% of production workloads, Sonnet 4.8 is the right call and Opus is reserved for the hard 10% where accuracy gains pay back the spend. The Claude Sonnet vs Opus routing guide covers when to escalate.
Cost Examples Across Real Workloads
Customer support chatbot — 10,000 tickets/month
Average ticket = 3,700 tokens (per Anthropic's customer support guide). Assume 80/20 input/output split.
| Model | Monthly cost | Per ticket |
|---|---|---|
| Haiku 4.5 | $37.00 | $0.0037 |
| Haiku 4.5 + 5-min cache (60% hit rate) | $19.20 | $0.0019 |
| Haiku 4.5 + Batch | $18.50 | $0.0019 |
| Sonnet 4.8 | $111.00 | $0.0111 |
| Opus 4.8 | $185.00 | $0.0185 |
Most support classification fits Haiku 4.5; the 90% input savings from cache compound when system prompts are reused across tickets.
Coding agent — 100,000 calls/month with reused tool schemas
Assume 8k input tokens (mostly tool schemas, repo context) and 1k output tokens per call. Tool schemas are a perfect prompt-caching candidate.
| Strategy | Sonnet 4.8 monthly | Opus 4.8 monthly |
|---|---|---|
| No cache | $3,900 | $6,500 |
| 5-min cache (75% hit rate) | $1,275 | $2,125 |
| 5-min cache + Batch (where applicable) | ~$650 | ~$1,065 |
| Opus 4.8 Fast Mode (no cache, user-facing) | n/a | $13,000 |
ProjectDiscovery's published case study shows that moving dynamic content out of the cacheable prefix raised their hit rate from 7% to 74% in a single deployment. This is the single largest lever available. The mid-conversation system message support added in Opus 4.8 preserves cache hits across longer agentic loops than 4.7 allowed.
Document summarization — 1M documents/month
Average document = 5k input tokens, output = 500 tokens. Latency-insensitive, perfect for Batch API.
| Model | Standard cost | Batch cost (50% off) |
|---|---|---|
| Haiku 4.5 | $7,500 | $3,750 |
| Sonnet 4.8 | $22,500 | $11,250 |
| Opus 4.8 | $37,500 | $18,750 |
For non-time-sensitive document workloads, Haiku 4.5 + Batch puts cost-per-document at ~$0.0038. That's the floor before you start considering self-hosted inference.
Code review pipeline — 5,000 PRs/month with 80k average context
Long-context PR review is where the 1M context window earns its keep. Opus 4.8 lands a 4x reduction in code-flaw rate vs Opus 4.7, which matters more at this workload than the per-token cost.
| Model | Per-PR cost | Monthly cost |
|---|---|---|
| Sonnet 4.8 (no cache) | $0.27 | $1,350 |
| Sonnet 4.8 + 1-hour cache (repo context cached) | $0.094 | $470 |
| Opus 4.8 + 1-hour cache | $0.156 | $780 |
The 1-hour cache (2x write multiplier) breaks even after just two reads — a bar most code review pipelines clear easily.
Claude vs GPT-5.5 vs Gemini 3.5 Pro
Same workload (10M output tokens/month, agent-heavy):
| Provider | Model | Input $/MTok | Output $/MTok | Monthly cost (10M output) |
|---|---|---|---|---|
| Anthropic | Opus 4.8 | $5 | $25 | $250 |
| Anthropic | Opus 4.8 Fast Mode | $10 | $50 | $500 |
| OpenAI | GPT-5.5 standard | $3 | $15 | $150 |
| OpenAI | GPT-5.5 Pro tier | $30 | $180 | $1,800 |
| Anthropic | Sonnet 4.8 | $3 | $15 | $150 |
| Gemini 3.5 Pro | $2.50 | $10 | $100 | |
| Anthropic | Mythos-class (projected) | ~$25 | ~$125 | ~$1,250 |
Two caveats. First, llm-stats.com benchmark reports Opus 4.8 wins SWE-Bench Pro by 10.6 pts over GPT-5.5 and GDPval-AA by 121 Elo — meaning the headline cost gap is misleading when output quality matters. Second, Finout's Opus 4.7 pricing analysis flags that the Opus 4.7/4.8 tokenizer can swell the same input text by up to 35%, which Anthropic confirmed in their own documentation.
Rate-card comparisons mislead. Real cost comparisons need to measure cost-per-completed-task, not cost-per-token. See the Frontier Pro Tier comparison for cost-per-task math across all major frontier models.
How to Cut Claude API Costs by 60-90%
Five levers, ranked by effort vs return:
| Lever | Effort | Typical savings | Notes |
|---|---|---|---|
| Prompt caching with stable prefix | Low | 60-80% on input | See Claude API cache pricing guide |
| Tier downshift (Opus → Sonnet, Sonnet → Haiku) | Low-Medium | 50-80% | Validate quality on golden eval set |
| Batch API for async workloads | Low | 50% flat | Doesn't stack with Fast mode |
| Output token budgeting (max_tokens, stop sequences) | Medium | 20-40% on output | Highest leverage when output is most of cost |
| Multi-model routing (Haiku triage → Sonnet escalate → Opus 4.8 reasoning) | High | 40-60% | Use TokenMix routing or LiteLLM |
The prompt caching lever is the most underused. Per Vellum's prompt caching docs, cached tokens run roughly 50% cheaper than non-cached even before you stack on Anthropic's 0.1x cache hit rate. Helicone reports similar savings in their prompt caching observability changelog. Opus 4.8's lower cache minimum (1,024 tokens vs Opus 4.7's higher floor) makes more system prompts cacheable without code changes.
Hidden Cost Factors Most Posts Miss
Inferred from production usage patterns plus Anthropic documentation:
| Factor | Real cost impact |
|---|---|
| Opus 4.7/4.8 tokenizer overhead (up to 35%) | Effective rate per English token is closer to $6.75/$33.75 versus the $5/$25 headline |
| Opus 4.8 effort default = high | Default raised from medium to high — costs go up unless you set effort: "medium" explicitly |
| US-only data residency on Opus 4.x | 1.1x multiplier on every token category; matters for compliance workloads |
| Web search tool usage | $10 per 1,000 searches plus standard token costs for retrieved content |
| Tool use system prompt overhead | 313-346 tokens added per request when tools parameter is set |
| Computer use beta tokens | Adds 466-499 tokens per request before screenshots |
| Fast mode pricing premium | Opus 4.8 Fast Mode is $10/$50 (2x rate) — premium for latency-sensitive flows only |
| Bedrock / Vertex regional endpoints | 10% premium over global on Sonnet 4.5+ and Haiku 4.5+ |
| Mythos-class gating | Premium tier likely gated behind Cyber Verification Program for offensive capabilities |
Most cost surprises in production come from these factors, not the headline rate card.
When Should You Use Bedrock or Vertex Instead?
Anthropic exposes Claude through three first-party paths and three cloud platforms:
| Channel | Pricing | Latency | Compliance | Best for |
|---|---|---|---|---|
| Claude API direct | Standard rates | Lowest, global | SOC 2 | Default choice |
| AWS Bedrock | +10% on regional, same on global | Comparable | HIPAA, IL5 available | Workloads already on AWS |
| Google Vertex AI | +10% on regional, same on global | Comparable | EU data residency, ISO | Workloads already on GCP |
| Microsoft Foundry | Per Microsoft pricing | Comparable | Azure compliance umbrella | Enterprise Microsoft shops |
| TokenMix.ai unified gateway | Direct rates pass-through, OpenAI-compatible | Single endpoint | See official authorized access | Multi-model apps with Alipay/WeChat Pay |
The cleanest decision rule: pick the channel that matches your existing data plane. Migrating to Bedrock just to access Claude rarely pays back the integration tax. For Mythos public release, AWS Bedrock US East is the only currently-deployed channel — Project Glasswing partners access there.
Final Recommendation
For most production workloads in June 2026, route Sonnet 4.8 by default with prompt caching enabled and batch processing for any non-real-time job. Reserve Opus 4.8 for the hard 10% where accuracy gains exceed the 5x cost premium — and use Opus 4.8 Fast Mode only for user-facing interactive flows where latency maps to revenue. Use Haiku 4.5 as the cheap triage tier in front of Sonnet. Stay alert for Mythos-class public release in mid-June through end of July 2026 — useful only if your workload is security audit, vulnerability research, or autonomous long-horizon research.
FAQ
Is Claude API cheaper than GPT-5.5 in 2026?
Mixed. Opus 4.8 is $5/$25 versus GPT-5.5 at $3/$15 — GPT-5.5 wins on rate card. But Opus 4.8 wins SWE-Bench Pro by 10.6 pts and GDPval-AA by 121 Elo, so cost-per-completed-task often favors Opus on coding workloads. Sonnet 4.8 at $3/$15 matches GPT-5.5 on price and is the closer apples-to-apples comparison.
Why is Claude Opus 4.8 priced the same as Opus 4.7?
Anthropic kept the rate card flat across the 4.5/4.6/4.7/4.8 generation. Opus 4.8 adds Fast Mode at $10/$50 (2.5x output speed) and a 4x reduction in code-flaw rate, but the standard mode rate is unchanged. The catch is the tokenizer change (introduced in 4.7, carried to 4.8) may use up to 35% more tokens for the same input text, which raises effective cost per English word even though the rate-per-token is unchanged.
What is the cheapest Claude model in 2026?
Haiku 3 at $0.25 / $1.25 per MTok remains the absolute cheapest, but capabilities lag Haiku 4.5 by enough that most teams now route to Haiku 4.5 ($1/$5) for production work. With Batch API, Haiku 4.5 drops to $0.50/$2.50.
When will Claude Mythos be publicly available?
Anthropic confirmed "in the coming weeks" as of May 28, 2026. Realistic window: mid-June through end of July 2026. Projected pricing is ~$25 input / ~$125 output per MTok at a premium tier above Opus. Full timeline and prep checklist in our Mythos coming weeks article.
Does Anthropic offer volume discounts?
Yes, but only via direct sales contact for enterprise contracts. There is no published volume discount tier. The fastest cost reduction available to all users is the Batch API (flat 50% off) plus prompt caching (90% off cache reads).
How does prompt caching pricing work?
Cache writes cost 1.25x base input for 5-minute TTL or 2x for 1-hour TTL. Cache reads cost 0.1x base input. The 5-minute cache pays off after one cache read; the 1-hour cache pays off after two reads. Opus 4.8 lowered the minimum cacheable prompt length to 1,024 tokens, so more short system prompts now build cache entries. See our Claude API cache pricing deep-dive for the break-even math.
Can Batch API and prompt caching be combined?
Yes. The discounts stack multiplicatively. According to Anthropic's pricing docs, Batch API "discounts can be combined" with prompt caching. Fast mode on Opus 4.8 is the one feature that does not combine with Batch.
What is the cheapest way to use Claude in production?
Stack three levers: route Haiku 4.5 for triage, enable prompt caching with a stable system prompt prefix, and submit non-time-sensitive jobs through Batch API. Combined effective rate can reach $0.30-$0.50 per MTok input on cached calls — roughly the cost of Haiku 3 with Sonnet-tier capabilities behind it via routing.
How do I avoid surprise costs on Claude API?
Set hard max_tokens budgets on every request, monitor usage.cache_read_input_tokens and usage.cache_creation_input_tokens separately, and run alerts on output-token spikes. Output is where 70-80% of bills live in agentic workloads. On Opus 4.8 specifically, set effort: "medium" explicitly to avoid the new default high raising your costs unexpectedly.
Related Articles
- Claude Opus 4.8 Review 2026: Pricing, Benchmarks, vs 4.7 and GPT-5.5
- Claude Mythos Public Release Coming Weeks 2026: Anthropic Confirms
- Claude Mythos vs Opus 4.8: What Makes a Model Mythos-Class 2026
- Claude Sonnet vs Opus 2026: Pricing, Quality, Routing Guide
- Claude API Cache Pricing 2026: 90% Input Savings Explained
- Frontier Pro Tier 2026: GPT-5.5 vs Opus 4.7 vs Gemini 3.x
- Claude Haiku vs Sonnet: Cost vs Quality 2026
Sources
- Anthropic — What's new in Claude Opus 4.8
- Anthropic — Claude API pricing official documentation
- Anthropic — Prompt caching guide
- Anthropic — Batch processing documentation
- Anthropic — Model deprecations schedule
- BleepingComputer — Anthropic confirms Mythos-class models will roll out
- Artificial Analysis — Claude Opus 4.8 Intelligence Index
- Finout — Claude Opus 4.7 real cost story
- llm-stats.com — Claude Opus 4.8 Launch, Benchmarks
- ProjectDiscovery — How we cut LLM cost with prompt caching
- Helicone — Anthropic prompt caching support
- Vellum — Prompt caching documentation
By TokenMix Research Lab · Updated 2026-06-01