Claude API Pricing in 2026: Every Model, Real Costs, and How to Save Up to 95% · 2026-04-01
Claude API Pricing in 2026: Every Model, Real Costs, and How to Save Up to 95%
Anthropic's Claude API costs
Claude API Pricing in 2026: Every Model, Real Costs, and How to Save Up to 95% · 2026-04-01
Anthropic's Claude API costs
-$25 per million tokens depending on which model you pick — but the sticker price is only the starting point. Prompt caching cuts input costs by 90%, batch processing halves everything, and stacking both gets you 95% off list price. Meanwhile, Sonnet 4.6 quietly doubles its rates on requests over 200K tokens — a detail most pricing guides skip. This guide breaks down every Claude model's real cost with production scenarios, compares Claude head-to-head with GPT-5.4 and DeepSeek V4, and shows you exactly how to structure your setup for minimum spend. All pricing data verified against Anthropic's official docs and tracked by TokenMix.ai as of April 2026.
All prices per 1M tokens, Anthropic direct API, as of April 2026:
| Model | Input | Cache Hit | Output | Batch Input | Batch Output | Context Window |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $0.50 | $25.00 | $2.50 | 2.50 | 1M |
| Claude Sonnet 4.6 | $3.00 | $0.30 | 5.00 | .50 | $7.50 | 1M* |
| Claude Haiku 4.5 | .00 | $0.10 | $5.00 | $0.50 | $2.50 | 200K |
| Claude Opus 4.5 | $5.00 | $0.50 | $25.00 | $2.50 | 2.50 | 1M |
| Claude Sonnet 4.5 | $3.00 | $0.30 | 5.00 | .50 | $7.50 | 200K |
| Claude Haiku 3.5 | $0.80 | $0.08 | $4.00 | $0.40 | $2.00 | 200K |
| Claude Haiku 3 | $0.25 | $0.03 | .25 | $0.125 | $0.625 | 200K |
*Sonnet 4.6 charges 2x on requests exceeding 200K input tokens. See next section.
The headline numbers: Haiku 4.5 at /$5 is Anthropic's budget option. Sonnet 4.6 at $3/ 5 is the production workhorse. Opus 4.6 at $5/$25 is the flagship — and at 67% cheaper than the previous Opus 4.1 ( 5/$75), it's now within reach for more teams.
Previous-generation Opus 4.1 and Opus 4 remain available at 5/$75 — but there's no reason to use them. Opus 4.6 is both better and cheaper.
This is the pricing detail most guides bury or miss entirely.
Claude Sonnet 4.6 has a 1M token context window — but any request with more than 200K input tokens triggers double pricing:
| Sonnet 4.6 | ≤200K Input | >200K Input |
|---|---|---|
| Input | $3.00/M | $6.00/M |
| Output | 5.00/M | $22.50/M |
A 300K-token input request on Sonnet 4.6 costs .80 — but the same request on Opus 4.6 costs .50 at flat $5/M pricing across the full 1M window.
Opus is cheaper than Sonnet for long-context requests. This is counterintuitive and most teams don't realize it until they see the bill.
TokenMix.ai tracking data confirms this pattern: teams processing documents over 200K tokens consistently see lower costs on Opus 4.6 than on Sonnet 4.6, despite Opus being the "premium" model.
Rule of thumb: If your average input exceeds 150K tokens, run the numbers on Opus before defaulting to Sonnet.
Prompt caching is the single most impactful cost optimization for Claude API users. Cache hits cost just 10% of the standard input price.
When you mark parts of your prompt for caching, Anthropic stores those tokens server-side. Subsequent requests that reuse the same prefix pay the cache-hit rate instead of full input price.
| Cache Operation | Multiplier | What It Means for Sonnet 4.6 |
|---|---|---|
| Standard input | 1x | $3.00/M |
| 5-minute cache write | 1.25x | $3.75/M (first request pays more) |
| 1-hour cache write | 2x | $6.00/M (first request pays more) |
| Cache hit (read) | 0.1x | $0.30/M (90% savings) |
Break-even math: A 5-minute cache write costs 1.25x on the first request. One cache hit at 0.1x pays it back entirely. For the 1-hour cache, you need two cache hits to break even — still trivial for any production workload.
| Content Type | Typical Size | Cache Savings per Request |
|---|---|---|
| System prompt | 500-2,000 tokens | $0.0045-0.018 saved per call (Sonnet) |
| Few-shot examples | 2,000-10,000 tokens | $0.018-0.09 saved per call |
| Document context (RAG) | 10,000-100,000 tokens | $0.09-0.90 saved per call |
| Conversation history | Grows per turn | Compounds — later turns mostly cached |
Implementation is simple. Add cache_control to your request and Anthropic handles the rest automatically. No separate cache management, no TTL configuration beyond choosing 5-minute or 1-hour duration.
Any workload that tolerates up to 24 hours of latency should use the Batch API. It's a flat 50% discount on both input and output tokens.
| Model | Standard Output | Batch Output | Savings |
|---|---|---|---|
| Opus 4.6 | $25.00/M | 2.50/M | 50% |
| Sonnet 4.6 | 5.00/M | $7.50/M | 50% |
| Haiku 4.5 | $5.00/M | $2.50/M | 50% |
Best batch use cases:
Limitation: Batch API cannot be combined with Fast Mode (Opus 4.6's 6x-priced speed mode). But it stacks with prompt caching — which is where the real savings happen.
Prompt caching (90% off input) and batch processing (50% off everything) stack multiplicatively. Here's what that looks like on Sonnet 4.6:
| Optimization Level | Input Cost/M | Output Cost/M | Total Discount |
|---|---|---|---|
| Standard pricing | $3.00 | 5.00 | 0% |
| Cache hits only | $0.30 | 5.00 | ~90% on input |
| Batch only | .50 | $7.50 | 50% across the board |
| Cache + Batch | $0.15 | $7.50 | 95% on input, 50% on output |
Sonnet 4.6 input drops from $3.00 to $0.15 per million tokens — cheaper than DeepSeek V4's cache-miss price ($0.30/M). That's Anthropic's premium model at budget-model pricing.
The catch: you need both conditions — cacheable prompts AND tolerance for async processing. In practice, this fits batch classification, bulk extraction, and offline analysis workflows perfectly.
All prices per 1M tokens, official API pricing, April 2026:
| Model | Input | Output | Cache Hit | Context | Batch Input | Batch Output |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 | 1M | $2.50 | 2.50 |
| Claude Sonnet 4.6 | $3.00 | 5.00 | $0.30 | 1M* | .50 | $7.50 |
| Claude Haiku 4.5 | .00 | $5.00 | $0.10 | 200K | $0.50 | $2.50 |
| GPT-5.4 | $2.50 | 5.00 | $0.625 | 1M | .25 | $7.50 |
| GPT-5.4 Mini | $0.75 | $4.50 | $0.1875 | 400K | $0.375 | $2.25 |
| DeepSeek V4 | $0.30 | $0.50 | $0.03 | 1M | N/A | N/A |
| Gemini 3.1 Pro | $2.00 | 2.00 | $0.50 | 1M | N/A | N/A |
Key insights from TokenMix.ai cross-provider tracking:
Claude Sonnet 4.6 vs GPT-5.4: Sonnet is 20% more expensive on input ($3 vs $2.50) but identical on output ( 5). Cache hits favor Claude: $0.30 vs $0.625 — Claude caching is 2x cheaper.
Claude vs DeepSeek V4: DeepSeek is 10x cheaper on input and 30x cheaper on output at list price. With Claude's stacked discounts (cache + batch), the input gap narrows to 2x — but the output gap remains massive.
Haiku 4.5 vs GPT-5.4 Mini: Similar pricing tier ( /$5 vs $0.75/$4.50). GPT Mini is 25% cheaper on input, 10% cheaper on output. Haiku's cache hit ($0.10/M) beats GPT Mini's ($0.1875/M).
Opus 4.6 is a pricing anomaly. At $5/$25, it's barely more expensive than Sonnet ($3/ 5) while being significantly more capable. The old Opus 4.1 at 5/$75 made Sonnet the obvious choice — that math has changed.
| Model | Monthly Cost (No Cache) | Monthly Cost (Cached) |
|---|---|---|
| Claude Sonnet 4.6 | 26.00 | $46.80 |
| Claude Haiku 4.5 | $42.00 | 5.60 |
| GPT-5.4 Mini | $36.00 | $27.00 |
| DeepSeek V4 | $6.60 | $4.10 |
At this scale, Haiku 4.5 with caching ( 5.60/mo) is the sweet spot — Claude-level quality for chatbots at a price that's effectively free for a funded startup.
| Model | Standard | Cached | Cached + Batch |
|---|---|---|---|
| Claude Sonnet 4.6 | $4,725 | ,713 | $923 |
| Claude Opus 4.6 | $7,875 | $2,419 | ,294 |
| GPT-5.4 | $4,500 | $2,484 | ,350 |
| DeepSeek V4 | $248 | 22 | N/A |
Sonnet 4.6 cached + batch at $923/mo delivers the best balance of quality and cost among premium models. DeepSeek V4 is still 7.5x cheaper — the decision hinges on whether Claude's quality premium justifies the gap for your use case.
| Model | Standard | Cached + Batch |
|---|---|---|
| Claude Sonnet 4.6 | 12,500 | 3,875 |
| Claude Opus 4.6 | 87,500 | 8,750 |
| GPT-5.4 | 05,000 | $23,438 |
At enterprise scale, Claude's stacked discounts beat GPT-5.4 even though GPT has a lower list price. Sonnet cached + batch ( 3,875) vs GPT-5.4 cached + batch ($23,438) — Claude saves $9,563/month.
Claude is available through multiple channels with different pricing and features:
| Provider | Sonnet 4.6 Input/M | Added Value | Trade-off |
|---|---|---|---|
| Anthropic Direct | $3.00 | Full feature access, fastest updates | Single provider |
| AWS Bedrock (Global) | $3.00 | AWS integration, IAM, VPC | Regional endpoints +10% |
| AWS Bedrock (Regional) | $3.30 | Data residency guarantee | 10% premium |
| Google Vertex AI | $3.00 | GCP integration | Regional +10% |
| TokenMix.ai | $2.85 | 155+ models, unified API, auto-failover | Third-party dependency |
When to use each:
Data residency note: Anthropic's direct API now charges 1.1x for US-only inference (via inference_geo parameter) on Opus 4.6+ models. If you need guaranteed US data processing, factor in the 10% premium.
| Your Situation | Recommended Model | Why |
|---|---|---|
| Budget-first, simple tasks | Haiku 4.5 ( /$5) | 5x cheaper than Sonnet, sufficient for classification/extraction |
| General production workload | Sonnet 4.6 ($3/ 5) | Best quality/price ratio for most use cases |
| Long documents (>200K tokens) | Opus 4.6 ($5/$25) | Flat pricing — actually cheaper than Sonnet's 2x long-context markup |
| Maximum quality, cost secondary | Opus 4.6 ($5/$25) | Best Claude model, 67% cheaper than previous Opus generation |
| High-volume async processing | Sonnet 4.6 + Batch ( .50/$7.50) | 50% off, tolerate 24h latency |
| Cost-sensitive, quality-flexible | Haiku 4.5 + Cache + Batch | $0.05/M input — cheapest Claude option possible |
| Multi-provider strategy | Any model via TokenMix.ai | Unified API, auto-failover, additional 3-5% savings |
The biggest mistake teams make: Defaulting to Sonnet for everything. At current pricing, Opus 4.6 is only 67% more expensive than Sonnet but significantly more capable. For complex tasks, the quality improvement often reduces retry costs and manual review, making Opus the cheaper option in total cost of ownership.
Related: Compare all model pricing in our complete LLM API pricing comparison
Claude API pricing in 2026 is more competitive than it's ever been. Opus 4.6 at $5/$25 is a 67% price cut from the previous generation. Sonnet 4.6 remains the default choice at $3/ 5 — but watch for the long-context 2x markup above 200K tokens.
The real cost story isn't in the price table. It's in the optimization stack: prompt caching (90% off input) + batch processing (50% off everything) = 95% savings on input tokens. Sonnet 4.6 input drops from $3.00 to $0.15/M when you stack both — cheaper than DeepSeek V4's list price.
Three actionable moves: cache aggressively (every production app should), batch anything that isn't real-time, and check whether Opus beats Sonnet for your context length. For multi-model setups, TokenMix.ai gives you Claude alongside 155+ other models through a single API with additional 3-5% savings.
Compare real-time Claude pricing against every major provider at tokenmix.ai/pricing — updated daily.
Claude's three recommended models cost: Haiku 4.5 at input / $5 output, Sonnet 4.6 at $3 input / 5 output, and Opus 4.6 at $5 input / $25 output — all per million tokens. With prompt caching, input costs drop by 90%. With batch processing, all costs drop 50%. Stacking both gives 95% off input.
Claude Haiku 3 at $0.25/ .25 per million tokens is the absolute cheapest. Among current-generation models, Haiku 4.5 at /$5 is the budget pick. For maximum savings, use Haiku 4.5 with prompt caching and batch processing: $0.05/M input, $2.50/M output.
It depends on the tier. Sonnet 4.6 ($3/ 5) is 20% more expensive on input than GPT-5.4 ($2.50/ 5) but identical on output. Claude's cache hit rate ($0.30/M) is cheaper than GPT's ($0.625/M). At scale with caching and batching, Claude can actually be cheaper than GPT.
Sonnet 4.6 doubles its rates for requests with over 200K input tokens: $6/M input and $22.50/M output. This makes Opus 4.6 ($5/$25 flat) cheaper for long-context workloads — a counterintuitive but important detail for teams processing large documents.
Cache hits cost 10% of standard input price. A 5-minute cache write costs 1.25x upfront but pays back after just one cache hit. Most production workloads see 60-85% cache hit rates, reducing effective input costs by 55-77% overall.
Yes. AWS Bedrock offers Claude at the same base pricing as Anthropic's direct API for global endpoints. Regional endpoints (guaranteed data residency) add a 10% premium. Bedrock provides IAM integration, VPC connectivity, and AWS compliance certifications.
DeepSeek V4 is dramatically cheaper: $0.30/$0.50 vs Claude Sonnet's $3/ 5 — that's 10x on input and 30x on output. With Claude's maximum stacked discounts (cache + batch), the input gap narrows to ~2x, but output remains 15x more expensive. Choose Claude for quality-critical tasks, DeepSeek for cost-sensitive high-volume processing.
Anthropic provides a small amount of free credits for new accounts to test the API. There's no ongoing free tier. For extended evaluation, contact Anthropic's sales team. Alternatively, TokenMix.ai offers pay-as-you-go access to Claude with no minimum commitment.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Anthropic Official Pricing and TokenMix.ai real-time model tracking