TokenMix Research Lab · 2026-04-04

Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Anthropic API pricing is not just the Claude model table. The real cost comes from eight modifiers: prompt cache writes, cache reads, Batch API, 1M context, data residency, fast mode, web search, code execution, and tool-token overhead.

The base Claude API rates are simple: Opus 4.7 and Opus 4.6 are $5/$25 per 1M tokens, Sonnet 4.6 is $3/$15, and Haiku 4.5 is $1/$5. The cost modifiers are where bills diverge. Anthropic's official pricing page says cache reads cost 10% of base input, 5-minute cache writes cost 1.25x input, 1-hour cache writes cost 2x input, Batch API gives 50% off input and output, US-only data residency adds a 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models, and Opus 4.6 fast mode costs 6x standard rates.

My judgement: use the claude-api-cost page for the main Claude model price table. Use this page as the operational cost checklist before you put Anthropic into production.

Quick Answer
Base Claude API Pricing
Confirmed Cost Modifiers
Prompt Caching
Batch API
1M Context Pricing
Data Residency
Fast Mode
Tool and Server Feature Costs
Cost Scenarios
Direct Anthropic vs TokenMix.ai
Related Articles
FAQ
Sources

Quick Answer

For most teams, Anthropic API cost optimization follows one order: choose the right Claude tier, cache repeated input, batch offline jobs, and route hard tasks instead of sending everything to Opus.

Cost lever	Official pricing rule	Best use
Model choice	Haiku $1/$5, Sonnet $3/$15, Opus $5/$25	Match model strength to workflow value.
Cache read	0.1x base input	Repeated prompts, docs, schemas, agent context.
5-minute cache write	1.25x base input	Short sessions with immediate reuse.
1-hour cache write	2x base input	Longer workflows with repeated context.
Batch API	50% off input and output	Offline jobs and bulk processing.
Data residency	1.1x for US-only inference on newer models	Compliance-driven workloads.
Fast mode	Opus 4.6 at 6x standard rates	Speed-critical premium tasks only.
Web search	$10 per 1,000 searches plus token costs	Research agents and web-grounded answers.

The largest safe saving is not a secret discount. It is avoiding the wrong model for the wrong workflow.

Base Claude API Pricing

All prices are per 1M tokens from Anthropic's official pricing page, checked on 2026-04-29.

Model	Base input	Cache read	Output	Batch input	Batch output
Claude Opus 4.7	$5.00	$0.50	$25.00	$2.50	$12.50
Claude Opus 4.6	$5.00	$0.50	$25.00	$2.50	$12.50
Claude Opus 4.5	$5.00	$0.50	$25.00	$2.50	$12.50
Claude Sonnet 4.6	$3.00	$0.30	$15.00	$1.50	$7.50
Claude Sonnet 4.5	$3.00	$0.30	$15.00	$1.50	$7.50
Claude Haiku 4.5	$1.00	$0.10	$5.00	$0.50	$2.50
Claude Haiku 3.5	$0.80	$0.08	$4.00	$0.40	$2.00
Claude Haiku 3	$0.25	$0.03	$1.25	$0.125	$0.625

The main Claude API pricing guide covers model choice in more detail. This page focuses on the modifiers around that table.

Confirmed Cost Modifiers

Modifier	Status	Pricing impact	Source
Prompt cache read	Confirmed	0.1x base input	Anthropic pricing
5-minute cache write	Confirmed	1.25x base input	Anthropic pricing
1-hour cache write	Confirmed	2x base input	Anthropic pricing
Batch API	Confirmed	50% off input and output	Anthropic pricing
1M context standard pricing	Confirmed for Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6	No current long-context surcharge listed	Anthropic pricing
US-only data residency	Confirmed	1.1x multiplier for Opus 4.7, Opus 4.6, and newer models	Anthropic pricing
Opus 4.6 fast mode	Confirmed beta	6x standard rates	Anthropic pricing
Opus 4.7 tokenizer	Confirmed caveat	May use up to 35% more tokens for same fixed text	Anthropic pricing

This is why a Claude cost forecast needs more than one price table.

Prompt Caching

For the cache-only math, including break-even points and Batch stacking estimates, see Claude API cache pricing 2026.

Prompt caching reduces repeated input cost. It does not reduce output cost.

Operation	Multiplier	Haiku 4.5	Sonnet 4.6	Opus 4.7
Standard input	1x	$1.00/M	$3.00/M	$5.00/M
5-minute cache write	1.25x	$1.25/M	$3.75/M	$6.25/M
1-hour cache write	2x	$2.00/M	$6.00/M	$10.00/M
Cache read	0.1x	$0.10/M	$0.30/M	$0.50/M

Anthropic says caching pays off after one cache read for 5-minute cache writes and after two cache reads for 1-hour cache writes.

Cache target	Typical workflow	Cost signal
System prompt	Support bot or agent policy	Cache almost always.
Tool schema	Coding and browser agents	Cache when schema repeats.
Few-shot examples	Extraction and classification	Cache if examples are stable.
Large document prefix	RAG and contract review	Cache when users query the same corpus.
Conversation history	Multi-turn agents	Cache when history is reused across turns.

Prompt caching is the first optimization to implement on any high-input Claude workload.

Batch API

Batch API is a clean 50% discount for async jobs.

Workload	Batch fit	Why
Nightly summarization	Strong	Users are not waiting.
Bulk classification	Strong	Large volume, low urgency.
Test suite evaluation	Strong	Latency is acceptable.
Dataset enrichment	Strong	Cost matters more than real-time response.
Live chat	Weak	User waits for response.
Interactive coding agent	Weak	Agent steps need immediate feedback.

Batch can stack with caching according to Anthropic's pricing page, but the best implementation depends on request structure. Measure rather than assuming 95% savings everywhere.

1M Context Pricing

Anthropic's current pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

Model	1M context status	Cost implication
Claude Opus 4.7	Included	Standard $5/$25 rates across full context.
Claude Opus 4.6	Included	Standard $5/$25 rates across full context.
Claude Sonnet 4.6	Included	Standard $3/$15 rates across full context.
Claude Haiku 4.5	200K tier shown in table	Use for shorter, cheaper work.

The old "long context automatically doubles Sonnet cost" shortcut should not be used without checking the current official page.

Data Residency

Data residency is a compliance feature, not a free toggle.

Setting	Pricing impact	Applies to
Global routing	Standard pricing	Default Claude API behavior
US-only inference via `inference_geo`	1.1x multiplier	Opus 4.7, Opus 4.6, and newer models
Third-party regional endpoints	Platform-specific pricing	AWS Bedrock, Google Vertex AI, Microsoft Foundry

If compliance requires US-only inference, budget the 10% increase before launch. If compliance does not require it, do not turn it on accidentally.

Fast Mode

Fast mode is an Opus 4.6 beta priced at 6x standard rates.

Mode	Input	Output	Batch support	Best use
Opus 4.6 standard	$5/M	$25/M	Yes	Normal premium reasoning
Opus 4.6 fast mode	$30/M	$150/M	No	Speed-critical premium tasks

Fast mode should be treated like an emergency lane. It is not a default production route.

Tool and Server Feature Costs

Agents make pricing harder because tools add tokens and sometimes separate meters.

Feature	Anthropic pricing note	Cost risk
Web search	$10 per 1,000 searches plus token costs	Search-heavy agents can exceed token spend.
Web fetch	No additional charge beyond standard token costs	Fetched content still adds input tokens.
Code execution	1,550 free hours/month, then $0.05/hour/container	Long sessions need runtime monitoring.
Text editor tool	Adds input tokens for Claude 4.x tool definition	Coding agents pay for tool schema.
Bash tool	Adds input tokens and command outputs	Long logs can inflate input.
Computer use	Standard tool pricing plus screenshots/results	Screenshots and GUI state can grow context.

For Claude agents, monitor four numbers: input tokens, cache-read tokens, output tokens, and server tool use.

Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Route	Standard	70% input cache read	Batch only	Batch + 70% cache read
Haiku 4.5	$250.00	$187.00	$125.00	$93.50
Sonnet 4.6	$750.00	$561.00	$375.00	$280.50
Opus 4.7	$1,250.00	$935.00	$625.00	$467.50

Add modifier examples:

Modifier	Example base	New cost	Difference
US-only data residency on Opus 4.7	$1,250.00	$1,375.00	+$125.00
Opus 4.6 fast mode	$1,250.00	$7,500.00	+$6,250.00
Opus 4.7 tokenizer +20% tokens	$1,250.00	$1,500.00	+$250.00
Web search 50K searches/month	N/A	$500.00 plus tokens	Separate line item

The biggest avoidable bill shock is not the model price. It is tool usage and accidental premium modes.

Direct Anthropic vs TokenMix.ai

Access path	Best for	Cost control pattern
Anthropic direct API	Full Claude-native features	Use cache, batch, model routing, and usage monitoring.
AWS Bedrock / Vertex / Foundry	Cloud procurement and regional controls	Check third-party endpoint premiums and billing rules.
TokenMix.ai	Multi-model production routing	Route Claude only where it beats cheaper models.
LiteLLM self-hosted	Internal platform teams	Centralize keys and policies, but operate the gateway.

TokenMix.ai fits when Claude is one model family inside a broader model router. A common production route is Haiku for cheap Claude tasks, Sonnet for quality, Opus for high-value escalation, and DeepSeek/Gemini/OpenAI-compatible models for workloads where Claude is not cost-efficient.

FAQ

What is Anthropic API pricing in 2026?

Anthropic API pricing starts at $1/$5 per 1M tokens for Claude Haiku 4.5, $3/$15 for Claude Sonnet 4.6, and $5/$25 for Claude Opus 4.7 and Opus 4.6. Feature modifiers such as cache, batch, data residency, tools, and fast mode can change the final bill.

How much does Anthropic prompt caching save?

Cache reads cost 10% of base input price, so they cut repeated input cost by 90%. Cache writes cost more than standard input, so caching works best when the same prefix is reused.

Does Anthropic Batch API save 50%?

Yes. Anthropic lists Batch API prices at 50% of standard input and output rates. Batch is best for offline jobs where latency is acceptable.

Does Claude 1M context cost extra?

The current official pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

What is Anthropic data residency pricing?

For Opus 4.7, Opus 4.6, and newer models, US-only inference through inference_geo has a 1.1x multiplier on token pricing categories. Third-party platforms have their own regional pricing.

What is Claude fast mode pricing?

Fast mode is a beta for Claude Opus 4.6 at 6x standard rates: $30 input and $150 output per 1M tokens. It is not available with the Batch API.

Why did my Claude API bill increase after moving to Opus 4.7?

One reason may be tokenization. Anthropic says Opus 4.7 uses a new tokenizer that may use up to 35% more tokens for the same fixed text. Measure token counts before a full migration.

Should I use Anthropic direct API or TokenMix.ai?

Use Anthropic direct API when Claude-native features are the priority. Use TokenMix.ai when Claude is part of a broader multi-model routing strategy and you want centralized access across providers.