TokenMix Research Lab · 2026-06-08

Claude API Cost Calculator 2026: Opus, Sonnet, Haiku Math

Last Updated: 2026-06-08 Author: TokenMix Research Lab Data verified: 2026-06-08 - Anthropic Claude pricing docs, prompt caching docs, token counting docs, batch pricing table, and TokenMix Claude cluster

Claude API cost depends on model tier and cache behavior. Opus, Sonnet, and Haiku have different input, output, cache write, cache hit, and batch rates.

Anthropic publishes Claude Opus 4.8 at $5 input and $25 output per MTok, Sonnet 4.6 at $3 and $15, and Haiku 4.5 at $1 and $5. Prompt caching docs also list 5-minute writes, 1-hour writes, cache hits, and batch rates. The calculator below treats cache miss and cache hit as different events, because that is where Claude bills can surprise teams.

Quick Verdict
Core Formula
Claude Price Inputs
5 Workload Calculator
Prompt Cache Math
Python Formula
Where Claude Loses
Search Intent Map
Cost Per Task Calculator
Decision Matrix
Monitoring Checklist
Non-Claims and Caveats
Final Recommendation
FAQ
Sources
Related Articles

Quick Verdict

Claim	Status	Source
Claude Opus 4.8 prompt caching table lists $5 input and $25 output per MTok	Confirmed	Claude prompt caching
Claude Sonnet 4.6 prompt caching table lists $3 input and $15 output per MTok	Confirmed	Claude prompt caching
Claude Haiku 4.5 prompt caching table lists $1 input and $5 output per MTok	Confirmed	Claude prompt caching
Claude cache read tokens are 0.1x base input price	Confirmed	Claude prompt caching
All active Claude models support token counting	Confirmed	Claude token counting
Claude cache hits are automatic magic for every prompt	False	Cache rules depend on prompt structure and supported models
Sonnet is the default cost-performance tier for many agent workloads	Likely	It sits between Opus and Haiku price tiers
Claude cache hit rate can be predicted without logs	Speculation	Requires workload-specific prompt stability

Core Formula

The calculator logic for Claude API cost is provider-neutral first: count monthly token volume, apply the provider's current per-million-token rates, then add retries, cache effects, tool calls, and non-token infrastructure. The model-specific price belongs in the final step, not in the mental model.

Input	Meaning	Status
`input_mtok`	Monthly input tokens divided by 1,000,000	Confirmed
`output_mtok`	Monthly output tokens divided by 1,000,000	Confirmed
`cache_hit_mtok`	Cached or reused input tokens where provider exposes a lower price	Confirmed
`retry_rate`	Failed calls divided by total attempted calls	Likely
`tool_calls`	Search, retrieval, shell, SQL, or other tool calls per task	Likely
`cache_write_mtok`	Tokens written into 5m or 1h cache	Confirmed
`batch_mode`	Anthropic batch input/output columns	Confirmed

from dataclasses import dataclass

@dataclass
class TokenPrice:
    input_per_m: float
    output_per_m: float
    cached_input_per_m: float | None = None


def llm_cost(input_tokens, output_tokens, price: TokenPrice, cached_input_tokens=0, retry_rate=0.0):
    uncached_input = max(input_tokens - cached_input_tokens, 0)
    input_cost = uncached_input / 1_000_000 * price.input_per_m
    if price.cached_input_per_m is not None:
        input_cost += cached_input_tokens / 1_000_000 * price.cached_input_per_m
    else:
        input_cost += cached_input_tokens / 1_000_000 * price.input_per_m
    output_cost = output_tokens / 1_000_000 * price.output_per_m
    return (input_cost + output_cost) * (1 + retry_rate)

Use Claude Opus/Sonnet/Haiku rates only after you have measured average input, average output, retries, cache hit rate, and tool calls. A model that is cheap per token can still lose if it causes extra retries or longer output.

Claude Price Inputs

Claude tier	Base input	Cache hit	Output	Status
Opus 4.8	$5/MTok	$0.50/MTok	$25/MTok	Confirmed
Sonnet 4.6	$3/MTok	$0.30/MTok	$15/MTok	Confirmed
Haiku 4.5	$1/MTok	$0.10/MTok	$5/MTok	Confirmed
Opus batch	$2.50 in / $12.50 out	N/A	N/A	Confirmed
Sonnet batch	$1.50 in / $7.50 out	N/A	N/A	Confirmed

For broader cost comparison, pair this with LLM API Cost Calculator, Claude CLI Pricing, and AI Chatbot Cost Calculator.

5 Workload Calculator

These five workloads are intentionally concrete. Replace the numbers with your own logs before procurement.

Workload	Monthly volume	Token/tool shape	Calculator output	Status
Claude chat UI	30,000 calls	2K input / 600 output	Compare Sonnet vs Haiku first	Likely workload
Coding agent	10,000 tasks	8 turns x 8K input	Cache hit rate dominates	Likely workload
Long-doc QA	5,000 docs	80K stable prefix / 1K output	Cache can change economics	Likely workload
Batch eval	100,000 prompts	1K input / 200 output	Use batch if latency can wait	Confirmed route
High-stakes analysis	2,000 tasks	10K input / 2K output	Opus only if quality wins	Likely workload

Scenario math should be written as tokens first and dollars second. That keeps the estimate portable across OpenAI, Claude, Gemini, DeepSeek, Groq, or an OpenAI-compatible gateway.

Prompt Cache Math

Cache event	Price logic	Practical meaning	Status
Base input	Full input rate	First uncached prompt path	Confirmed
5m write	1.25x base input	Write cost for short cache TTL	Confirmed
1h write	2x base input	Higher write cost for longer TTL	Confirmed
Cache hit	0.1x base input	Cheap reuse if prompt prefix matches	Confirmed
Output	Normal output rate	Cache does not discount generated text	Confirmed

A Claude calculator must split cache write, cache hit, and output. If it only shows total input tokens, it hides the most important cost lever.

Python Formula

def claude_cost(base_input, cache_write, cache_hit, output, base_price, output_price, cache_write_multiplier=1.25):
    return (
        base_input / 1_000_000 * base_price
        + cache_write / 1_000_000 * base_price * cache_write_multiplier
        + cache_hit / 1_000_000 * base_price * 0.1
        + output / 1_000_000 * output_price
    )

Use cache_write_multiplier=2.0 for a 1-hour cache write where that path is selected.

Where Claude Loses

The calculator is only useful if it catches the hidden multipliers. These are the traps that turn cheap demo calls into expensive production months.

Trap	Cost symptom	Fix	Status
Cache miss	Expected cheap prefix bills at full input/write rate	Log cache hits	Confirmed
Output-heavy answers	Sonnet/Opus output dominates	Short response policy	Confirmed
Using Opus for routine tasks	Blended cost spikes	Route routine tasks to Sonnet/Haiku	Likely
Batch ignored	Offline evals overpay	Use batch for async work	Confirmed
Token count guessed	Invoice differs from estimate	Use Claude count_tokens	Confirmed

A cost calculator should show cost per successful task, not only cost per API call. Failed calls, retries, cache misses, and long outputs are still part of the bill.

Search Intent Map

Search query	What the user really needs	Best answer	Status
`claude api cost calculator`	A current, non-marketing answer	Compare official limits and cost controls	Confirmed
`claude api cost calculator pricing`	Whether this becomes a monthly bill	Use per-task math, not sticker price	Confirmed
`claude api cost calculator free`	Whether a no-cost path exists	Treat free quota as testing capacity	Likely
`claude api cost calculator error`	Why setup fails	Check auth, quota, region, and model access	Likely
`claude api cost calculator alternative`	Whether another route is safer	Compare direct API, gateway, and self-hosting	Likely

This is the reason the article is structured around tables instead of a narrative review. Search traffic for these terms usually comes from blocked developers, not readers browsing AI news.

Cost Per Task Calculator

Cost component	Formula	Why it matters	Status
Input tokens	input MTok x input price	Long prompts dominate retrieval and agents	Confirmed
Output tokens	output MTok x output price	Reasoning and verbose answers compound cost	Confirmed
Retry waste	failed calls x average cost	429 and timeout loops become real spend	Likely
Human review	minutes saved or added x hourly rate	Tooling can shift, not remove, labor cost	Likely
Infrastructure	storage, runners, or hosted platform cost	Non-token cost often appears later	Confirmed

Use this minimum calculator before choosing a provider: 30 days x calls per day x average input tokens x input price, plus 30 days x calls per day x average output tokens x output price. Then add retries. If the retry rate is 10%, your apparent price is already 1.1x before latency or support cost.

Monthly calls	Avg input	Avg output	Token volume	Operational reading
1,000	1K	300	1M in / 0.3M out	Prototype
10,000	2K	600	20M in / 6M out	Small app
100,000	4K	1K	400M in / 100M out	Production workload
1,000,000	2K	500	2B in / 500M out	Procurement problem

Decision Matrix

If your situation is...	Default move	Why	Confidence
You are still prototyping	Use the lowest-friction official route	Learning speed beats premature optimization	Likely
You have user-facing traffic	Add fallback and spend caps before launch	Users feel quota failures immediately	Confirmed
You have compliance constraints	Prefer direct vendor, cloud marketplace, or audited gateway	Procurement trail matters	Likely
You have high volume but flexible latency	Test batch or async processing	Batch discounts can beat realtime routes	Confirmed where documented
You have unknown token shape	Run a 7-day sample before committing	Average prompts hide tail risk	Likely
You need newest model features	Check direct provider docs first	Gateways and clouds may lag direct release	Likely

The durable rule: do not optimize for the cheapest successful demo. Optimize for the cheapest successful month with logs, retries, fallback, and support.

def pick_route(stage, traffic, compliance, latency_flexible):
    if stage == "prototype" and traffic < 1000:
        return "official_free_or_low_cost_route"
    if compliance == "strict":
        return "direct_vendor_or_cloud_marketplace"
    if latency_flexible and traffic > 100000:
        return "batch_or_async_route"
    if traffic > 10000:
        return "gateway_with_budget_caps"
    return "direct_api_with_monitoring"

Monitoring Checklist

Metric	Alert threshold	Why	Status
429 rate	>2% sustained	Quota is now user-visible	Confirmed
Retry multiplier	>1.1x	Hidden cost leak	Likely
Fallback rate	>10%	Primary route is unstable	Likely
Output/input ratio	Sudden 2x jump	Prompt or model behavior changed	Likely
Cost per successful task	Week-over-week increase	Real business KPI	Confirmed
Error by model	Any model-specific spike	Route or provider issue	Confirmed
User-level spend	Outlier user >5x median	Abuse or runaway workflow	Likely

The operational test is simple: if you cannot answer which model, user, route, or retry loop created the cost, you are not ready to scale that workflow.

Non-Claims and Caveats

Not claimed	Reason	Label
Universal benchmark superiority	No single benchmark covers every workload and provider route	False as a broad claim
Permanent free availability	Free tiers and previews can change	Speculation
Guaranteed model access in every region	Providers gate by region, tier, quota, or account status	False as a broad claim
Refund availability without official text	Refund terms must come from provider policy or support	Speculation
Identical pricing across direct API, cloud, and gateway	Routing layer, region, priority, and batch mode can change cost	False as a broad claim
Production safety from docs alone	Real workloads need logs and failure drills	Confirmed

This article uses official docs for hard numbers and marks forward-looking guidance as Likely or Speculation. If a provider changes a price, model name, rate limit, or credit rule after the data verification date, the conclusion should be rechecked before procurement.

Final Recommendation

Claude cost calculators must separate base input, cache writes, cache hits, output, and batch. Use Sonnet as the default benchmark, Haiku for cheap routine paths, and Opus only when quality offsets the price.

FAQ

How do I calculate Claude API cost?

Split base input, cache write tokens, cache hit tokens, and output tokens, then apply the model's current Claude price table.

What is Claude cache hit pricing?

Anthropic states cache read tokens are 0.1x the base input token price.

Is Sonnet cheaper than Opus?

Yes. The current prompt caching table lists Sonnet 4.6 below Opus 4.8 for input and output rates.

Should I use Haiku for everything?

No. Haiku is cheaper, but quality failures and retries can erase savings.

Can Claude count tokens before a call?

Yes. Anthropic documents a token counting API supported by active models.

Does Batch API help Claude cost?

Yes for async work. Anthropic publishes batch input and output rates.

What should I log?

Log base input, cache write, cache hit, output, model, retries, and task success.