TokenMix Research Lab · 2026-04-04

Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees

Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Anthropic API pricing is not just the Claude model table. The real cost comes from eight modifiers: prompt cache writes, cache reads, Batch API, 1M context, data residency, fast mode, web search, code execution, and tool-token overhead.

The base Claude API rates are simple: Opus 4.7 and Opus 4.6 are $5/$25 per 1M tokens, Sonnet 4.6 is $3/$15, and Haiku 4.5 is $1/$5. The cost modifiers are where bills diverge. Anthropic's official pricing page says cache reads cost 10% of base input, 5-minute cache writes cost 1.25x input, 1-hour cache writes cost 2x input, Batch API gives 50% off input and output, US-only data residency adds a 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models, and Opus 4.6 fast mode costs 6x standard rates.

My judgement: use the claude-api-cost page for the main Claude model price table. Use this page as the operational cost checklist before you put Anthropic into production.

Table of Contents

Quick Answer

For most teams, Anthropic API cost optimization follows one order: choose the right Claude tier, cache repeated input, batch offline jobs, and route hard tasks instead of sending everything to Opus.

Cost lever Official pricing rule Best use
Model choice Haiku $1/$5, Sonnet $3/$15, Opus $5/$25 Match model strength to workflow value.
Cache read 0.1x base input Repeated prompts, docs, schemas, agent context.
5-minute cache write 1.25x base input Short sessions with immediate reuse.
1-hour cache write 2x base input Longer workflows with repeated context.
Batch API 50% off input and output Offline jobs and bulk processing.
Data residency 1.1x for US-only inference on newer models Compliance-driven workloads.
Fast mode Opus 4.6 at 6x standard rates Speed-critical premium tasks only.
Web search $10 per 1,000 searches plus token costs Research agents and web-grounded answers.

The largest safe saving is not a secret discount. It is avoiding the wrong model for the wrong workflow.

Base Claude API Pricing

All prices are per 1M tokens from Anthropic's official pricing page, checked on 2026-04-29.

Model Base input Cache read Output Batch input Batch output
Claude Opus 4.7 $5.00 $0.50 $25.00 $2.50 $12.50
Claude Opus 4.6 $5.00 $0.50 $25.00 $2.50 $12.50
Claude Opus 4.5 $5.00 $0.50 $25.00 $2.50 $12.50
Claude Sonnet 4.6 $3.00 $0.30 $15.00 $1.50 $7.50
Claude Sonnet 4.5 $3.00 $0.30 $15.00 $1.50 $7.50
Claude Haiku 4.5 $1.00 $0.10 $5.00 $0.50 $2.50
Claude Haiku 3.5 $0.80 $0.08 $4.00 $0.40 $2.00
Claude Haiku 3 $0.25 $0.03 $1.25 $0.125 $0.625

The main Claude API pricing guide covers model choice in more detail. This page focuses on the modifiers around that table.

Confirmed Cost Modifiers

Modifier Status Pricing impact Source
Prompt cache read Confirmed 0.1x base input Anthropic pricing
5-minute cache write Confirmed 1.25x base input Anthropic pricing
1-hour cache write Confirmed 2x base input Anthropic pricing
Batch API Confirmed 50% off input and output Anthropic pricing
1M context standard pricing Confirmed for Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6 No current long-context surcharge listed Anthropic pricing
US-only data residency Confirmed 1.1x multiplier for Opus 4.7, Opus 4.6, and newer models Anthropic pricing
Opus 4.6 fast mode Confirmed beta 6x standard rates Anthropic pricing
Opus 4.7 tokenizer Confirmed caveat May use up to 35% more tokens for same fixed text Anthropic pricing

This is why a Claude cost forecast needs more than one price table.

Prompt Caching

For the cache-only math, including break-even points and Batch stacking estimates, see Claude API cache pricing 2026.

Prompt caching reduces repeated input cost. It does not reduce output cost.

Operation Multiplier Haiku 4.5 Sonnet 4.6 Opus 4.7
Standard input 1x $1.00/M $3.00/M $5.00/M
5-minute cache write 1.25x $1.25/M $3.75/M $6.25/M
1-hour cache write 2x $2.00/M $6.00/M $10.00/M
Cache read 0.1x $0.10/M $0.30/M $0.50/M

Anthropic says caching pays off after one cache read for 5-minute cache writes and after two cache reads for 1-hour cache writes.

Cache target Typical workflow Cost signal
System prompt Support bot or agent policy Cache almost always.
Tool schema Coding and browser agents Cache when schema repeats.
Few-shot examples Extraction and classification Cache if examples are stable.
Large document prefix RAG and contract review Cache when users query the same corpus.
Conversation history Multi-turn agents Cache when history is reused across turns.

Prompt caching is the first optimization to implement on any high-input Claude workload.

Batch API

Batch API is a clean 50% discount for async jobs.

Workload Batch fit Why
Nightly summarization Strong Users are not waiting.
Bulk classification Strong Large volume, low urgency.
Test suite evaluation Strong Latency is acceptable.
Dataset enrichment Strong Cost matters more than real-time response.
Live chat Weak User waits for response.
Interactive coding agent Weak Agent steps need immediate feedback.

Batch can stack with caching according to Anthropic's pricing page, but the best implementation depends on request structure. Measure rather than assuming 95% savings everywhere.

1M Context Pricing

Anthropic's current pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

Model 1M context status Cost implication
Claude Opus 4.7 Included Standard $5/$25 rates across full context.
Claude Opus 4.6 Included Standard $5/$25 rates across full context.
Claude Sonnet 4.6 Included Standard $3/$15 rates across full context.
Claude Haiku 4.5 200K tier shown in table Use for shorter, cheaper work.

The old "long context automatically doubles Sonnet cost" shortcut should not be used without checking the current official page.

Data Residency

Data residency is a compliance feature, not a free toggle.

Setting Pricing impact Applies to
Global routing Standard pricing Default Claude API behavior
US-only inference via inference_geo 1.1x multiplier Opus 4.7, Opus 4.6, and newer models
Third-party regional endpoints Platform-specific pricing AWS Bedrock, Google Vertex AI, Microsoft Foundry

If compliance requires US-only inference, budget the 10% increase before launch. If compliance does not require it, do not turn it on accidentally.

Fast Mode

Fast mode is an Opus 4.6 beta priced at 6x standard rates.

Mode Input Output Batch support Best use
Opus 4.6 standard $5/M $25/M Yes Normal premium reasoning
Opus 4.6 fast mode $30/M $150/M No Speed-critical premium tasks

Fast mode should be treated like an emergency lane. It is not a default production route.

Tool and Server Feature Costs

Agents make pricing harder because tools add tokens and sometimes separate meters.

Feature Anthropic pricing note Cost risk
Web search $10 per 1,000 searches plus token costs Search-heavy agents can exceed token spend.
Web fetch No additional charge beyond standard token costs Fetched content still adds input tokens.
Code execution 1,550 free hours/month, then $0.05/hour/container Long sessions need runtime monitoring.
Text editor tool Adds input tokens for Claude 4.x tool definition Coding agents pay for tool schema.
Bash tool Adds input tokens and command outputs Long logs can inflate input.
Computer use Standard tool pricing plus screenshots/results Screenshots and GUI state can grow context.

For Claude agents, monitor four numbers: input tokens, cache-read tokens, output tokens, and server tool use.

Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Route Standard 70% input cache read Batch only Batch + 70% cache read
Haiku 4.5 $250.00 $187.00 $125.00 $93.50
Sonnet 4.6 $750.00 $561.00 $375.00 $280.50
Opus 4.7 $1,250.00 $935.00 $625.00 $467.50

Add modifier examples:

Modifier Example base New cost Difference
US-only data residency on Opus 4.7 $1,250.00 $1,375.00 +$125.00
Opus 4.6 fast mode $1,250.00 $7,500.00 +$6,250.00
Opus 4.7 tokenizer +20% tokens $1,250.00 $1,500.00 +$250.00
Web search 50K searches/month N/A $500.00 plus tokens Separate line item

The biggest avoidable bill shock is not the model price. It is tool usage and accidental premium modes.

Direct Anthropic vs TokenMix.ai

Access path Best for Cost control pattern
Anthropic direct API Full Claude-native features Use cache, batch, model routing, and usage monitoring.
AWS Bedrock / Vertex / Foundry Cloud procurement and regional controls Check third-party endpoint premiums and billing rules.
TokenMix.ai Multi-model production routing Route Claude only where it beats cheaper models.
LiteLLM self-hosted Internal platform teams Centralize keys and policies, but operate the gateway.

TokenMix.ai fits when Claude is one model family inside a broader model router. A common production route is Haiku for cheap Claude tasks, Sonnet for quality, Opus for high-value escalation, and DeepSeek/Gemini/OpenAI-compatible models for workloads where Claude is not cost-efficient.

Related Articles

FAQ

What is Anthropic API pricing in 2026?

Anthropic API pricing starts at $1/$5 per 1M tokens for Claude Haiku 4.5, $3/$15 for Claude Sonnet 4.6, and $5/$25 for Claude Opus 4.7 and Opus 4.6. Feature modifiers such as cache, batch, data residency, tools, and fast mode can change the final bill.

How much does Anthropic prompt caching save?

Cache reads cost 10% of base input price, so they cut repeated input cost by 90%. Cache writes cost more than standard input, so caching works best when the same prefix is reused.

Does Anthropic Batch API save 50%?

Yes. Anthropic lists Batch API prices at 50% of standard input and output rates. Batch is best for offline jobs where latency is acceptable.

Does Claude 1M context cost extra?

The current official pricing page says Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing.

What is Anthropic data residency pricing?

For Opus 4.7, Opus 4.6, and newer models, US-only inference through inference_geo has a 1.1x multiplier on token pricing categories. Third-party platforms have their own regional pricing.

What is Claude fast mode pricing?

Fast mode is a beta for Claude Opus 4.6 at 6x standard rates: $30 input and $150 output per 1M tokens. It is not available with the Batch API.

Why did my Claude API bill increase after moving to Opus 4.7?

One reason may be tokenization. Anthropic says Opus 4.7 uses a new tokenizer that may use up to 35% more tokens for the same fixed text. Measure token counts before a full migration.

Should I use Anthropic direct API or TokenMix.ai?

Use Anthropic direct API when Claude-native features are the priority. Use TokenMix.ai when Claude is part of a broader multi-model routing strategy and you want centralized access across providers.

Sources