TokenMix Research Lab · 2026-04-04

Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Anthropic API pricing is not just the Claude model table. The real cost comes from eight modifiers: prompt cache writes, cache reads, Batch API, 1M context, data residency, fast mode, web search, code execution, and tool-token overhead.
The base Claude API rates are simple: Opus 4.7 and Opus 4.6 are $5/$25 per 1M tokens, Sonnet 4.6 is $3/$15, and Haiku 4.5 is $1/$5. The cost modifiers are where bills diverge. Anthropic's official pricing page says cache reads cost 10% of base input, 5-minute cache writes cost 1.25x input, 1-hour cache writes cost 2x input, Batch API gives 50% off input and output, US-only data residency adds a 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models, and Opus 4.6 fast mode costs 6x standard rates.
My judgement: use the claude-api-cost page for the main Claude model price table. Use this page as the operational cost checklist before you put Anthropic into production.
Table of Contents
- Quick Answer
- Base Claude API Pricing
- Confirmed Cost Modifiers
- Prompt Caching
- Batch API
- 1M Context Pricing
- Data Residency
- Fast Mode
- Tool and Server Feature Costs
- Cost Scenarios
- Direct Anthropic vs TokenMix.ai
- Related Articles
- FAQ
- Sources
Quick Answer
For most teams, Anthropic API cost optimization follows one order: choose the right Claude tier, cache repeated input, batch offline jobs, and route hard tasks instead of sending everything to Opus.
| Cost lever | Official pricing rule | Best use |
|---|---|---|
| Model choice | Haiku $1/$5, Sonnet $3/$15, Opus $5/$25 | Match model strength to workflow value. |
| Cache read | 0.1x base input | Repeated prompts, docs, schemas, agent context. |
| 5-minute cache write | 1.25x base input | Short sessions with immediate reuse. |
| 1-hour cache write | 2x base input | Longer workflows with repeated context. |
| Batch API | 50% off input and output | Offline jobs and bulk processing. |
| Data residency | 1.1x for US-only inference on newer models | Compliance-driven workloads. |
| Fast mode | Opus 4.6 at 6x standard rates | Speed-critical premium tasks only. |
| Web search | $10 per 1,000 searches plus token costs | Research agents and web-grounded answers. |
The largest safe saving is not a secret discount. It is avoiding the wrong model for the wrong workflow.
Base Claude API Pricing
All prices are per 1M tokens from Anthropic's official pricing page, checked on 2026-04-29.
| Model | Base input | Cache read | Output | Batch input | Batch output |
|---|---|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $0.50 | $25.00 | $2.50 | $12.50 |
| Claude Opus 4.6 | $5.00 | $0.50 | $25.00 | $2.50 | $12.50 |
| Claude Opus 4.5 | $5.00 | $0.50 | $25.00 | $2.50 | $12.50 |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 | $1.50 | $7.50 |
| Claude Sonnet 4.5 | $3.00 | $0.30 | $15.00 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $1.00 | $0.10 | $5.00 | $0.50 | $2.50 |
| Claude Haiku 3.5 | $0.80 | $0.08 | $4.00 | $0.40 | $2.00 |
| Claude Haiku 3 | $0.25 | $0.03 | $1.25 | $0.125 | $0.625 |
The main Claude API pricing guide covers model choice in more detail. This page focuses on the modifiers around that table.
Confirmed Cost Modifiers
| Modifier | Status | Pricing impact | Source |
|---|---|---|---|
| Prompt cache read | Confirmed | 0.1x base input | Anthropic pricing |
| 5-minute cache write | Confirmed | 1.25x base input | Anthropic pricing |
| 1-hour cache write | Confirmed | 2x base input | Anthropic pricing |
| Batch API | Confirmed | 50% off input and output | Anthropic pricing |
| 1M context standard pricing | Confirmed for Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6 | No current long-context surcharge listed | Anthropic pricing |
| US-only data residency | Confirmed | 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models | Anthropic pricing |
| Opus 4.6 fast mode | Confirmed beta | 6x standard rates | Anthropic pricing |
| Opus 4.7 tokenizer | Confirmed caveat | May use up to 35% more tokens for same fixed text | Anthropic pricing |
This is why a Claude cost forecast needs more than one price table.
Prompt Caching
For the cache-only math, including break-even points and Batch stacking estimates, see Claude API cache pricing 2026.
Prompt caching reduces repeated input cost. It does not reduce output cost.
| Operation | Multiplier | Haiku 4.5 | Sonnet 4.6 | Opus 4.7 |
|---|---|---|---|---|
| Standard input | 1x | $1.00/M | $3.00/M | $5.00/M |
| 5-minute cache write | 1.25x | $1.25/M | $3.75/M | $6.25/M |
| 1-hour cache write | 2x | $2.00/M | $6.00/M | $10.00/M |
| Cache read | 0.1x | $0.10/M | $0.30/M | $0.50/M |
Anthropic says caching pays off after one cache read for 5-minute cache writes and after two cache reads for 1-hour cache writes.
| Cache target | Typical workflow | Cost signal |
|---|---|---|
| System prompt | Support bot or agent policy | Cache almost always. |
| Tool schema | Coding and browser agents | Cache when schema repeats. |
| Few-shot examples | Extraction and classification | Cache if examples are stable. |
| Large document prefix | RAG and contract review | Cache when users query the same corpus. |
| Conversation history | Multi-turn agents | Cache when history is reused across turns. |
Prompt caching is the first optimization to implement on any high-input Claude workload.
Batch API
Batch API is a clean 50% discount for async jobs.
| Workload | Batch fit | Why |
|---|---|---|
| Nightly summarization | Strong | Users are not waiting. |
| Bulk classification | Strong | Large volume, low urgency. |
| Test suite evaluation | Strong | Latency is acceptable. |
| Dataset enrichment | Strong | Cost matters more than real-time response. |
| Live chat | Weak | User waits for response. |
| Interactive coding agent | Weak | Agent steps need immediate feedback. |
Batch can stack with caching according to Anthropic's pricing page, but the best implementation depends on request structure. Measure rather than assuming 95% savings everywhere.
1M Context Pricing
Anthropic's current pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.
| Model | 1M context status | Cost implication |
|---|---|---|
| Claude Opus 4.7 | Included | Standard $5/$25 rates across full context. |
| Claude Opus 4.6 | Included | Standard $5/$25 rates across full context. |
| Claude Sonnet 4.6 | Included | Standard $3/$15 rates across full context. |
| Claude Haiku 4.5 | 200K tier shown in table | Use for shorter, cheaper work. |
The old "long context automatically doubles Sonnet cost" shortcut should not be used without checking the current official page.
Data Residency
Data residency is a compliance feature, not a free toggle.
| Setting | Pricing impact | Applies to |
|---|---|---|
| Global routing | Standard pricing | Default Claude API behavior |
US-only inference via inference_geo |
1.1x multiplier | Opus 4.7, Opus 4.6, and newer models |
| Third-party regional endpoints | Platform-specific pricing | AWS Bedrock, Google Vertex AI, Microsoft Foundry |
If compliance requires US-only inference, budget the 10% increase before launch. If compliance does not require it, do not turn it on accidentally.
Fast Mode
Fast mode is an Opus 4.6 beta priced at 6x standard rates.
| Mode | Input | Output | Batch support | Best use |
|---|---|---|---|---|
| Opus 4.6 standard | $5/M | $25/M | Yes | Normal premium reasoning |
| Opus 4.6 fast mode | $30/M | $150/M | No | Speed-critical premium tasks |
Fast mode should be treated like an emergency lane. It is not a default production route.
Tool and Server Feature Costs
Agents make pricing harder because tools add tokens and sometimes separate meters.
| Feature | Anthropic pricing note | Cost risk |
|---|---|---|
| Web search | $10 per 1,000 searches plus token costs | Search-heavy agents can exceed token spend. |
| Web fetch | No additional charge beyond standard token costs | Fetched content still adds input tokens. |
| Code execution | 1,550 free hours/month, then $0.05/hour/container | Long sessions need runtime monitoring. |
| Text editor tool | Adds input tokens for Claude 4.x tool definition | Coding agents pay for tool schema. |
| Bash tool | Adds input tokens and command outputs | Long logs can inflate input. |
| Computer use | Standard tool pricing plus screenshots/results | Screenshots and GUI state can grow context. |
For Claude agents, monitor four numbers: input tokens, cache-read tokens, output tokens, and server tool use.
Cost Scenarios
Assume 100M input tokens and 30M output tokens per month.
| Route | Standard | 70% input cache read | Batch only | Batch + 70% cache read |
|---|---|---|---|---|
| Haiku 4.5 | $250.00 | $187.00 | $125.00 | $93.50 |
| Sonnet 4.6 | $750.00 | $561.00 | $375.00 | $280.50 |
| Opus 4.7 | $1,250.00 | $935.00 | $625.00 | $467.50 |
Add modifier examples:
| Modifier | Example base | New cost | Difference |
|---|---|---|---|
| US-only data residency on Opus 4.7 | $1,250.00 | $1,375.00 | +$125.00 |
| Opus 4.6 fast mode | $1,250.00 | $7,500.00 | +$6,250.00 |
| Opus 4.7 tokenizer +20% tokens | $1,250.00 | $1,500.00 | +$250.00 |
| Web search 50K searches/month | N/A | $500.00 plus tokens | Separate line item |
The biggest avoidable bill shock is not the model price. It is tool usage and accidental premium modes.
Direct Anthropic vs TokenMix.ai
| Access path | Best for | Cost control pattern |
|---|---|---|
| Anthropic direct API | Full Claude-native features | Use cache, batch, model routing, and usage monitoring. |
| AWS Bedrock / Vertex / Foundry | Cloud procurement and regional controls | Check third-party endpoint premiums and billing rules. |
| TokenMix.ai | Multi-model production routing | Route Claude only where it beats cheaper models. |
| LiteLLM self-hosted | Internal platform teams | Centralize keys and policies, but operate the gateway. |
TokenMix.ai fits when Claude is one model family inside a broader model router. A common production route is Haiku for cheap Claude tasks, Sonnet for quality, Opus for high-value escalation, and DeepSeek/Gemini/OpenAI-compatible models for workloads where Claude is not cost-efficient.
Related Articles
- Claude API Cache Pricing 2026: 90% Input Savings Explained
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- Claude Haiku vs Sonnet 2026: The Cost-Quality Line
- Claude Sonnet vs Opus 2026: Which to Pick for What
- AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub
- DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes
- AI API Gateway 2026: 7 LLM Routing and Fallback Options
- OpenAI-Compatible API Gateway: 9 Providers, One SDK Guide
FAQ
What is Anthropic API pricing in 2026?
Anthropic API pricing starts at $1/$5 per 1M tokens for Claude Haiku 4.5, $3/$15 for Claude Sonnet 4.6, and $5/$25 for Claude Opus 4.7 and Opus 4.6. Feature modifiers such as cache, batch, data residency, tools, and fast mode can change the final bill.
How much does Anthropic prompt caching save?
Cache reads cost 10% of base input price, so they cut repeated input cost by 90%. Cache writes cost more than standard input, so caching works best when the same prefix is reused.
Does Anthropic Batch API save 50%?
Yes. Anthropic lists Batch API prices at 50% of standard input and output rates. Batch is best for offline jobs where latency is acceptable.
Does Claude 1M context cost extra?
The current official pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.
What is Anthropic data residency pricing?
For Opus 4.7, Opus 4.6, and newer models, US-only inference through inference_geo has a 1.1x multiplier on token pricing categories. Third-party platforms have their own regional pricing.
What is Claude fast mode pricing?
Fast mode is a beta for Claude Opus 4.6 at 6x standard rates: $30 input and $150 output per 1M tokens. It is not available with the Batch API.
Why did my Claude API bill increase after moving to Opus 4.7?
One reason may be tokenization. Anthropic says Opus 4.7 uses a new tokenizer that may use up to 35% more tokens for the same fixed text. Measure token counts before a full migration.
Should I use Anthropic direct API or TokenMix.ai?
Use Anthropic direct API when Claude-native features are the priority. Use TokenMix.ai when Claude is part of a broader multi-model routing strategy and you want centralized access across providers.