TokenMix Research Lab · 2026-06-08

Claude API Cost Calculator 2026: Opus, Sonnet, Haiku Math
Last Updated: 2026-06-08 Author: TokenMix Research Lab Data verified: 2026-06-08 - Anthropic Claude pricing docs, prompt caching docs, token counting docs, batch pricing table, and TokenMix Claude cluster
Claude API cost depends on model tier and cache behavior. Opus, Sonnet, and Haiku have different input, output, cache write, cache hit, and batch rates.
Anthropic publishes Claude Opus 4.8 at $5 input and $25 output per MTok, Sonnet 4.6 at $3 and $15, and Haiku 4.5 at $1 and $5. Prompt caching docs also list 5-minute writes, 1-hour writes, cache hits, and batch rates. The calculator below treats cache miss and cache hit as different events, because that is where Claude bills can surprise teams.
Table of Contents
- Quick Verdict
- Core Formula
- Claude Price Inputs
- 5 Workload Calculator
- Prompt Cache Math
- Python Formula
- Where Claude Loses
- Search Intent Map
- Cost Per Task Calculator
- Decision Matrix
- Monitoring Checklist
- Non-Claims and Caveats
- Final Recommendation
- FAQ
- Sources
- Related Articles
Quick Verdict
| Claim | Status | Source |
|---|---|---|
| Claude Opus 4.8 prompt caching table lists $5 input and $25 output per MTok | Confirmed | Claude prompt caching |
| Claude Sonnet 4.6 prompt caching table lists $3 input and $15 output per MTok | Confirmed | Claude prompt caching |
| Claude Haiku 4.5 prompt caching table lists $1 input and $5 output per MTok | Confirmed | Claude prompt caching |
| Claude cache read tokens are 0.1x base input price | Confirmed | Claude prompt caching |
| All active Claude models support token counting | Confirmed | Claude token counting |
| Claude cache hits are automatic magic for every prompt | False | Cache rules depend on prompt structure and supported models |
| Sonnet is the default cost-performance tier for many agent workloads | Likely | It sits between Opus and Haiku price tiers |
| Claude cache hit rate can be predicted without logs | Speculation | Requires workload-specific prompt stability |
Core Formula
The calculator logic for Claude API cost is provider-neutral first: count monthly token volume, apply the provider's current per-million-token rates, then add retries, cache effects, tool calls, and non-token infrastructure. The model-specific price belongs in the final step, not in the mental model.
| Input | Meaning | Status |
|---|---|---|
input_mtok |
Monthly input tokens divided by 1,000,000 | Confirmed |
output_mtok |
Monthly output tokens divided by 1,000,000 | Confirmed |
cache_hit_mtok |
Cached or reused input tokens where provider exposes a lower price | Confirmed |
retry_rate |
Failed calls divided by total attempted calls | Likely |
tool_calls |
Search, retrieval, shell, SQL, or other tool calls per task | Likely |
cache_write_mtok |
Tokens written into 5m or 1h cache | Confirmed |
batch_mode |
Anthropic batch input/output columns | Confirmed |
from dataclasses import dataclass
@dataclass
class TokenPrice:
input_per_m: float
output_per_m: float
cached_input_per_m: float | None = None
def llm_cost(input_tokens, output_tokens, price: TokenPrice, cached_input_tokens=0, retry_rate=0.0):
uncached_input = max(input_tokens - cached_input_tokens, 0)
input_cost = uncached_input / 1_000_000 * price.input_per_m
if price.cached_input_per_m is not None:
input_cost += cached_input_tokens / 1_000_000 * price.cached_input_per_m
else:
input_cost += cached_input_tokens / 1_000_000 * price.input_per_m
output_cost = output_tokens / 1_000_000 * price.output_per_m
return (input_cost + output_cost) * (1 + retry_rate)
Use Claude Opus/Sonnet/Haiku rates only after you have measured average input, average output, retries, cache hit rate, and tool calls. A model that is cheap per token can still lose if it causes extra retries or longer output.
Claude Price Inputs
| Claude tier | Base input | Cache hit | Output | Status |
|---|---|---|---|---|
| Opus 4.8 | $5/MTok | $0.50/MTok | $25/MTok | Confirmed |
| Sonnet 4.6 | $3/MTok | $0.30/MTok | $15/MTok | Confirmed |
| Haiku 4.5 | $1/MTok | $0.10/MTok | $5/MTok | Confirmed |
| Opus batch | $2.50 in / $12.50 out | N/A | N/A | Confirmed |
| Sonnet batch | $1.50 in / $7.50 out | N/A | N/A | Confirmed |
For broader cost comparison, pair this with LLM API Cost Calculator, Claude CLI Pricing, and AI Chatbot Cost Calculator.
5 Workload Calculator
These five workloads are intentionally concrete. Replace the numbers with your own logs before procurement.
| Workload | Monthly volume | Token/tool shape | Calculator output | Status |
|---|---|---|---|---|
| Claude chat UI | 30,000 calls | 2K input / 600 output | Compare Sonnet vs Haiku first | Likely workload |
| Coding agent | 10,000 tasks | 8 turns x 8K input | Cache hit rate dominates | Likely workload |
| Long-doc QA | 5,000 docs | 80K stable prefix / 1K output | Cache can change economics | Likely workload |
| Batch eval | 100,000 prompts | 1K input / 200 output | Use batch if latency can wait | Confirmed route |
| High-stakes analysis | 2,000 tasks | 10K input / 2K output | Opus only if quality wins | Likely workload |
Scenario math should be written as tokens first and dollars second. That keeps the estimate portable across OpenAI, Claude, Gemini, DeepSeek, Groq, or an OpenAI-compatible gateway.
Prompt Cache Math
| Cache event | Price logic | Practical meaning | Status |
|---|---|---|---|
| Base input | Full input rate | First uncached prompt path | Confirmed |
| 5m write | 1.25x base input | Write cost for short cache TTL | Confirmed |
| 1h write | 2x base input | Higher write cost for longer TTL | Confirmed |
| Cache hit | 0.1x base input | Cheap reuse if prompt prefix matches | Confirmed |
| Output | Normal output rate | Cache does not discount generated text | Confirmed |
A Claude calculator must split cache write, cache hit, and output. If it only shows total input tokens, it hides the most important cost lever.
Python Formula
def claude_cost(base_input, cache_write, cache_hit, output, base_price, output_price, cache_write_multiplier=1.25):
return (
base_input / 1_000_000 * base_price
+ cache_write / 1_000_000 * base_price * cache_write_multiplier
+ cache_hit / 1_000_000 * base_price * 0.1
+ output / 1_000_000 * output_price
)
Use cache_write_multiplier=2.0 for a 1-hour cache write where that path is selected.
Where Claude Loses
The calculator is only useful if it catches the hidden multipliers. These are the traps that turn cheap demo calls into expensive production months.
| Trap | Cost symptom | Fix | Status |
|---|---|---|---|
| Cache miss | Expected cheap prefix bills at full input/write rate | Log cache hits | Confirmed |
| Output-heavy answers | Sonnet/Opus output dominates | Short response policy | Confirmed |
| Using Opus for routine tasks | Blended cost spikes | Route routine tasks to Sonnet/Haiku | Likely |
| Batch ignored | Offline evals overpay | Use batch for async work | Confirmed |
| Token count guessed | Invoice differs from estimate | Use Claude count_tokens | Confirmed |
A cost calculator should show cost per successful task, not only cost per API call. Failed calls, retries, cache misses, and long outputs are still part of the bill.
Search Intent Map
| Search query | What the user really needs | Best answer | Status |
|---|---|---|---|
claude api cost calculator |
A current, non-marketing answer | Compare official limits and cost controls | Confirmed |
claude api cost calculator pricing |
Whether this becomes a monthly bill | Use per-task math, not sticker price | Confirmed |
claude api cost calculator free |
Whether a no-cost path exists | Treat free quota as testing capacity | Likely |
claude api cost calculator error |
Why setup fails | Check auth, quota, region, and model access | Likely |
claude api cost calculator alternative |
Whether another route is safer | Compare direct API, gateway, and self-hosting | Likely |
This is the reason the article is structured around tables instead of a narrative review. Search traffic for these terms usually comes from blocked developers, not readers browsing AI news.
Cost Per Task Calculator
| Cost component | Formula | Why it matters | Status |
|---|---|---|---|
| Input tokens | input MTok x input price | Long prompts dominate retrieval and agents | Confirmed |
| Output tokens | output MTok x output price | Reasoning and verbose answers compound cost | Confirmed |
| Retry waste | failed calls x average cost | 429 and timeout loops become real spend | Likely |
| Human review | minutes saved or added x hourly rate | Tooling can shift, not remove, labor cost | Likely |
| Infrastructure | storage, runners, or hosted platform cost | Non-token cost often appears later | Confirmed |
Use this minimum calculator before choosing a provider: 30 days x calls per day x average input tokens x input price, plus 30 days x calls per day x average output tokens x output price. Then add retries. If the retry rate is 10%, your apparent price is already 1.1x before latency or support cost.
| Monthly calls | Avg input | Avg output | Token volume | Operational reading |
|---|---|---|---|---|
| 1,000 | 1K | 300 | 1M in / 0.3M out | Prototype |
| 10,000 | 2K | 600 | 20M in / 6M out | Small app |
| 100,000 | 4K | 1K | 400M in / 100M out | Production workload |
| 1,000,000 | 2K | 500 | 2B in / 500M out | Procurement problem |
Decision Matrix
| If your situation is... | Default move | Why | Confidence |
|---|---|---|---|
| You are still prototyping | Use the lowest-friction official route | Learning speed beats premature optimization | Likely |
| You have user-facing traffic | Add fallback and spend caps before launch | Users feel quota failures immediately | Confirmed |
| You have compliance constraints | Prefer direct vendor, cloud marketplace, or audited gateway | Procurement trail matters | Likely |
| You have high volume but flexible latency | Test batch or async processing | Batch discounts can beat realtime routes | Confirmed where documented |
| You have unknown token shape | Run a 7-day sample before committing | Average prompts hide tail risk | Likely |
| You need newest model features | Check direct provider docs first | Gateways and clouds may lag direct release | Likely |
The durable rule: do not optimize for the cheapest successful demo. Optimize for the cheapest successful month with logs, retries, fallback, and support.
def pick_route(stage, traffic, compliance, latency_flexible):
if stage == "prototype" and traffic < 1000:
return "official_free_or_low_cost_route"
if compliance == "strict":
return "direct_vendor_or_cloud_marketplace"
if latency_flexible and traffic > 100000:
return "batch_or_async_route"
if traffic > 10000:
return "gateway_with_budget_caps"
return "direct_api_with_monitoring"
Monitoring Checklist
| Metric | Alert threshold | Why | Status |
|---|---|---|---|
| 429 rate | >2% sustained | Quota is now user-visible | Confirmed |
| Retry multiplier | >1.1x | Hidden cost leak | Likely |
| Fallback rate | >10% | Primary route is unstable | Likely |
| Output/input ratio | Sudden 2x jump | Prompt or model behavior changed | Likely |
| Cost per successful task | Week-over-week increase | Real business KPI | Confirmed |
| Error by model | Any model-specific spike | Route or provider issue | Confirmed |
| User-level spend | Outlier user >5x median | Abuse or runaway workflow | Likely |
The operational test is simple: if you cannot answer which model, user, route, or retry loop created the cost, you are not ready to scale that workflow.
Non-Claims and Caveats
| Not claimed | Reason | Label |
|---|---|---|
| Universal benchmark superiority | No single benchmark covers every workload and provider route | False as a broad claim |
| Permanent free availability | Free tiers and previews can change | Speculation |
| Guaranteed model access in every region | Providers gate by region, tier, quota, or account status | False as a broad claim |
| Refund availability without official text | Refund terms must come from provider policy or support | Speculation |
| Identical pricing across direct API, cloud, and gateway | Routing layer, region, priority, and batch mode can change cost | False as a broad claim |
| Production safety from docs alone | Real workloads need logs and failure drills | Confirmed |
This article uses official docs for hard numbers and marks forward-looking guidance as Likely or Speculation. If a provider changes a price, model name, rate limit, or credit rule after the data verification date, the conclusion should be rechecked before procurement.
Final Recommendation
Claude cost calculators must separate base input, cache writes, cache hits, output, and batch. Use Sonnet as the default benchmark, Haiku for cheap routine paths, and Opus only when quality offsets the price.
FAQ
How do I calculate Claude API cost?
Split base input, cache write tokens, cache hit tokens, and output tokens, then apply the model's current Claude price table.
What is Claude cache hit pricing?
Anthropic states cache read tokens are 0.1x the base input token price.
Is Sonnet cheaper than Opus?
Yes. The current prompt caching table lists Sonnet 4.6 below Opus 4.8 for input and output rates.
Should I use Haiku for everything?
No. Haiku is cheaper, but quality failures and retries can erase savings.
Can Claude count tokens before a call?
Yes. Anthropic documents a token counting API supported by active models.
Does Batch API help Claude cost?
Yes for async work. Anthropic publishes batch input and output rates.
What should I log?
Log base input, cache write, cache hit, output, model, retries, and task success.
Sources
- Claude Pricing
- Claude Prompt Caching
- Claude Token Counting
- Claude Messages API
- TokenMix Claude CLI Pricing
- TokenMix Claude Rate Exceeded
- TokenMix AI API Gateway
- TokenMix LLM Cost Calculator