TokenMix Research Lab · 2026-04-03

AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
AI API pricing in 2026 is no longer a single input/output table. The real bill depends on cache reads, batch processing, long-context rules, data residency, provider fees, and whether your app routes simple LLM requests away from premium models.
This AI API pricing hub uses official pricing pages checked on 2026-04-30. OpenAI's API pricing page lists GPT-5.5 at $5/$30, GPT-5.4 at $2.50/$15, and GPT-5.4 mini at $0.75/$4.50 per 1M tokens. Anthropic lists Claude Opus 4.7 at $5/$25, Sonnet 4.6 at $3/$15, and Haiku 4.5 at $1/$5. DeepSeek lists V4-Flash at $0.14/$0.28 and discounted V4-Pro at $0.435/$0.87. Google lists Gemini 3.1 Pro at $2/$12 under 200K prompt tokens and Gemini 3.1 Flash-Lite at $0.25/$1.50.
My judgement: DeepSeek-V4-Flash is the current price floor for serious text workloads. Sonnet 4.6 is the balanced Claude default. GPT-5.4 is the OpenAI default. GPT-5.5 and Claude Opus 4.7 should be escalation routes, not the model you send every request to.
Table of Contents
- Quick AI API Price Table
- Confirmed Facts, Inferences, and Risks
- Price Tiers
- Cache and Batch Discounts
- Long Context Rules
- Cost per Task
- Monthly Cost Scenarios
- Best Model by Use Case
- Gateway and Routing Effects
- How to Choose
- Related Articles
- FAQ
- Sources
Quick AI API Price Table
All prices are per 1M tokens in USD. Rows marked "preview" can change faster than stable models.
| Model | Provider | Input | Cache read | Output | Context / prompt rule | Best use |
|---|---|---|---|---|---|---|
| DeepSeek-V4-Flash | DeepSeek | $0.14 | $0.0028 | $0.28 | 1M context | Low-cost reasoning, agents, RAG |
| Gemini 3.1 Flash-Lite Preview | $0.25 | $0.025 | $1.50 | Preview, text/image/video rate | High-volume simple tasks | |
| Claude Haiku 3 | Anthropic | $0.25 | $0.03 | $1.25 | Legacy cheap tier | Cheapest Claude route |
| Claude Haiku 3.5 | Anthropic | $0.80 | $0.08 | $4.00 | Legacy Haiku | Cheap Claude compatibility |
| GPT-5.4 mini | OpenAI | $0.75 | $0.075 | $4.50 | Under 270K standard rate | Budget OpenAI coding and agents |
| Claude Haiku 4.5 | Anthropic | $1.00 | $0.10 | $5.00 | Current Haiku | High-volume Claude tasks |
| Gemini 3 Flash Preview | $0.50 | $0.05 | $3.00 | Preview | Fast Gemini default | |
| Gemini 2.5 Pro | $1.25 | $0.125 | $10.00 | Prompts <= 200K | Stable Gemini Pro route | |
| Gemini 3.1 Pro Preview | $2.00 | $0.20 | $12.00 | Prompts <= 200K | Premium Gemini reasoning | |
| GPT-5.4 | OpenAI | $2.50 | $0.25 | $15.00 | Standard flagship | OpenAI default production |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $0.30 | $15.00 | 1M context at standard pricing | Balanced Claude default |
| DeepSeek-V4-Pro | DeepSeek | $0.435 | $0.003625 | $0.87 | 75% discount until 2026-05-31 | Discounted premium DeepSeek |
| Claude Opus 4.6 | Anthropic | $5.00 | $0.50 | $25.00 | 1M context at standard pricing | Stable premium Claude |
| Claude Opus 4.7 | Anthropic | $5.00 | $0.50 | $25.00 | New tokenizer caveat | Premium coding and reasoning |
| GPT-5.5 | OpenAI | $5.00 | $0.50 | $30.00 | Frontier OpenAI model | Hardest OpenAI workloads |
| DeepSeek-V4-Pro full listed | DeepSeek | $1.74 | $0.0145 | $3.48 | Crossed-out official full price | Post-discount planning |
The table has one strange-looking row: discounted DeepSeek-V4-Pro is cheaper than several mid-tier models while the promotion lasts. Do not build a permanent cost model without checking the discount deadline.
Confirmed Facts, Inferences, and Risks
| Claim | Status | What it means | Source |
|---|---|---|---|
| OpenAI GPT-5.5 is $5 input and $30 output per 1M tokens | Confirmed | It is a premium escalation route. | OpenAI pricing |
| Claude Sonnet 4.6 is $3 input and $15 output per 1M tokens | Confirmed | It remains the balanced Claude default. | Anthropic pricing |
| DeepSeek-V4-Flash is $0.14 input and $0.28 output per 1M tokens | Confirmed | It is the lowest-cost serious route in this table. | DeepSeek pricing |
| Gemini 3.1 Pro doubles input price above 200K prompt tokens | Confirmed | Long prompts change Gemini cost. | Google pricing |
| Cache reads reduce input cost but not output cost | Confirmed across major providers | Output-heavy workflows still need model routing. | Provider pricing pages |
| Batch usually gives 50% off | Confirmed for OpenAI, Anthropic, and Gemini paid batch routes | Great for offline jobs. | Provider pricing pages |
| Cheapest per-token model is always best | False | Quality, latency, data policy, and reliability still matter. | Architecture judgement |
For GEO, the extractable answer is: AI API cost is now a routing problem, not a static model leaderboard.
Price Tiers
| Tier | Input range | Output range | Models |
|---|---|---|---|
| Budget | $0.14-$0.80 | $0.28-$4.50 | DeepSeek-V4-Flash, Gemini Flash-Lite, Claude Haiku 3/3.5, GPT-5.4 mini |
| Mid | $1.00-$2.50 | $3.00-$15.00 | Haiku 4.5, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 3.1 Pro, GPT-5.4 |
| Premium | $3.00-$5.00 | $15.00-$30.00 | Sonnet 4.6, Opus 4.6/4.7, GPT-5.5 |
| Discount anomaly | $0.435 | $0.87 | DeepSeek-V4-Pro during the 75% promotion |
DeepSeek breaks the normal tier logic. V4-Flash sits in the budget tier, and discounted V4-Pro sits below many mid-tier models.
Cache and Batch Discounts
Cache reads are the first cost lever when input repeats. Batch is the first cost lever when latency does not matter.
| Provider | Cache read example | Batch discount | Important caveat |
|---|---|---|---|
| OpenAI | GPT-5.4 cache read: $0.25/M | Batch API saves 50% on input and output | Data residency adds 10% where used. |
| Anthropic | Sonnet 4.6 cache read: $0.30/M | Batch API saves 50% on input and output | Cache writes cost 1.25x or 2x before reads. |
| DeepSeek | V4-Flash cache read: $0.0028/M | No general batch discount in the official price page checked | Cache-hit input is extremely cheap. |
| Google Gemini | Gemini 3.1 Pro cache read: $0.20/M under 200K | Batch and Flex cut Gemini 3.1 Pro to $1/$6 under 200K | Context cache storage has hourly pricing. |
Effective input price with 70% cache reads:
| Model | Base input | Cache read | Effective input at 70% cache |
|---|---|---|---|
| DeepSeek-V4-Flash | $0.14 | $0.0028 | $0.04396 |
| Gemini 3.1 Flash-Lite | $0.25 | $0.025 | $0.0925 |
| GPT-5.4 mini | $0.75 | $0.075 | $0.2775 |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $1.11 |
| GPT-5.4 | $2.50 | $0.25 | $0.925 |
| Claude Opus 4.7 | $5.00 | $0.50 | $1.85 |
| GPT-5.5 | $5.00 | $0.50 | $1.85 |
Formula:
effective input = uncached_share * base_input + cached_share * cache_read
Long Context Rules
Long context changes the price table in non-obvious ways.
| Provider/model | Long-context rule | Pricing effect |
|---|---|---|
| OpenAI GPT-5.4 mini | Pricing page says standard rates are for context lengths under 270K | Check detailed pricing for larger contexts. |
| Anthropic Sonnet 4.6 | Full 1M context included at standard pricing | No current 2x long-context surcharge on official page. |
| Anthropic Opus 4.7/4.6 | Full 1M context included at standard pricing | Strong for long-context premium workloads. |
| DeepSeek V4 | 1M context across official DeepSeek services | Cache hits matter heavily for repeated prefixes. |
| Gemini 3.1 Pro | Price doubles above 200K prompt tokens | $2/$12 becomes $4/$18 for long prompts. |
| Gemini 2.5 Pro | Price doubles above 200K prompt tokens | $1.25/$10 becomes $2.50/$15. |
The key mistake is using one "per 1M tokens" number for every context size.
Cost per Task
These are illustrative costs using standard non-batch pricing and no cache unless stated.
| Task | Token shape | DeepSeek V4 Flash | Gemini Flash-Lite | GPT-5.4 mini | Sonnet 4.6 | GPT-5.4 | Opus 4.7 |
|---|---|---|---|---|---|---|---|
| Simple chat reply | 500 in / 200 out | $0.000126 | $0.000425 | $0.001275 | $0.004500 | $0.004250 | $0.007500 |
| Support ticket answer | 2K in / 800 out | $0.000504 | $0.001700 | $0.005100 | $0.018000 | $0.017000 | $0.030000 |
| RAG answer | 8K in / 500 out | $0.001260 | $0.002750 | $0.008250 | $0.031500 | $0.027500 | $0.052500 |
| Code review | 20K in / 3K out | $0.003640 | $0.009500 | $0.028500 | $0.105000 | $0.095000 | $0.175000 |
| Long document pass | 900K in / 20K out | $0.131600 | Not ideal | Not shown | $3.000000 | Check context rule | $5.000000 |
At 10,000 simple chat replies per day, monthly cost is roughly:
| Model | Monthly cost |
|---|---|
| DeepSeek-V4-Flash | $37.80 |
| Gemini 3.1 Flash-Lite | $127.50 |
| GPT-5.4 mini | $382.50 |
| GPT-5.4 | $1,275.00 |
| Claude Sonnet 4.6 | $1,350.00 |
| Claude Opus 4.7 | $2,250.00 |
This is why routing matters. A premium model can be 35x to 60x more expensive than a budget route for simple chat.
Monthly Cost Scenarios
Assume 100M input tokens and 30M output tokens per month.
| Model | No cache | 70% input cache | Batch only | Batch + 70% cache where available |
|---|---|---|---|---|
| DeepSeek-V4-Flash | $22.40 | $12.80 | Not listed | Not listed |
| Gemini 3.1 Flash-Lite | $70.00 | $54.25 | $35.00 | $27.13 |
| GPT-5.4 mini | $210.00 | $162.75 | $105.00 | $81.38 |
| Claude Haiku 4.5 | $250.00 | $187.00 | $125.00 | $93.50 |
| Gemini 2.5 Pro | $425.00 | $346.25 | $212.50 | $177.50 |
| GPT-5.4 | $700.00 | $542.50 | $350.00 | $271.25 |
| Claude Sonnet 4.6 | $750.00 | $561.00 | $375.00 | $280.50 |
| Claude Opus 4.7 | $1,250.00 | $935.00 | $625.00 | $467.50 |
| GPT-5.5 | $1,400.00 | $1,085.00 | $700.00 | $542.50 |
The best cost reduction stack is: cheaper model first, cache repeated input, batch offline work, then escalate only hard requests.
Best Model by Use Case
| Use case | Best first pick | Escalation pick | Why |
|---|---|---|---|
| Simple support classification | DeepSeek-V4-Flash or Gemini Flash-Lite | Haiku 4.5 | Price matters more than frontier reasoning. |
| User-facing support answer | GPT-5.4 mini or Sonnet 4.6 | GPT-5.4 or Opus 4.7 | Balance quality and cost. |
| Coding assistant | GPT-5.4 mini or Sonnet 4.6 | GPT-5.5 or Opus 4.7 | Use premium only for hard edits. |
| Long-context document review | Sonnet 4.6 or DeepSeek V4 | Opus 4.7 | 1M context and cache reads matter. |
| Bulk summarization | Batch Gemini, Batch OpenAI, Batch Claude | Stronger batch model for failures | Batch cuts cost in half. |
| Low-cost reasoning agent | DeepSeek-V4-Flash | DeepSeek-V4-Pro discounted | Cost per agent step stays low. |
Do not pick a global model. Pick a routing policy.
Gateway and Routing Effects
Direct provider pricing is only one part of the real bill.
| Access path | Best for | Cost impact |
|---|---|---|
| Direct OpenAI / Anthropic / Google / DeepSeek | Native features and direct contracts | Lowest visible platform overhead, more integrations. |
| TokenMix.ai | Managed multi-model routing | Helps route by task and centralize access across model families. |
| OpenRouter | Broad model marketplace | Published 5.5% platform fee on credit purchases; strong for discovery. |
| LiteLLM self-hosted | Internal LLM platform teams | Direct provider prices plus infrastructure and engineering time. |
TokenMix.ai fits when you want one OpenAI-compatible access layer across OpenAI, Claude, Gemini, DeepSeek, and other model families. Direct APIs fit when you need one provider's native features and direct billing.
How to Choose
| If your main constraint is... | Choose |
|---|---|
| Lowest per-token cost | DeepSeek-V4-Flash first |
| OpenAI ecosystem compatibility | GPT-5.4 mini, GPT-5.4, then GPT-5.5 escalation |
| Claude quality at sane cost | Sonnet 4.6 |
| Premium Claude coding/reasoning | Opus 4.7 after tokenizer test |
| Google multimodal and long-context ecosystem | Gemini 3.1 Pro or Gemini 3 Flash |
| Offline batch cost | Batch on OpenAI, Anthropic, or Gemini |
| Multi-model production | TokenMix.ai or another gateway |
Final recommendation: build a three-tier router. Budget model first, balanced model second, premium model last. That is the only AI API pricing strategy that survives 2026 model churn.
Related Articles
- DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees
- Google Gemini API Pricing 2026: 3.1 Pro, Flash, Batch Costs
- GPT-5 API Pricing 2026: 5.5, 5.4, Mini Costs, Batch Math
- Claude API Cache Pricing 2026: 90% Input Savings Explained
- DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide
- OpenAI-Compatible API Gateway: 9 Providers, One SDK Guide
- AI API Gateway 2026: 7 LLM Routing and Fallback Options
- Best Unified AI API Gateways 2026: 7 Tools, Scores, Costs
- OpenAI API No Credit Card 2026: 5 Legal Ways To Get Access
- OpenAI API With Alipay 2026: 4 Legal Payment Routes Guide
- AI API With WeChat Pay 2026: 5 Gateway Setup Options Guide
- Official Authorized AI API Access 2026: 7 Verification Checks
FAQ
What is the cheapest AI API in 2026?
Among the major official providers in this comparison, DeepSeek-V4-Flash has the lowest serious text-model price at $0.14 input and $0.28 output per 1M tokens. Gemini 3.1 Flash-Lite is also low-cost at $0.25 input and $1.50 output.
What is the best AI API for production?
There is no single best model. GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and DeepSeek-V4-Flash each fit different production constraints. Most teams should route by workflow instead of using one model globally.
Which AI API has the best cache pricing?
DeepSeek-V4-Flash has the lowest cache-read input price in this table at $0.0028 per 1M tokens. Anthropic, OpenAI, and Gemini also offer cache-read pricing that can cut repeated input cost sharply.
Does Batch API reduce AI API cost?
Yes for providers that support it. OpenAI says Batch saves 50% on inputs and outputs, Anthropic lists 50% batch rates, and Gemini offers batch or flex pricing for several models. Use batch only when latency can wait.
Why is output price more important than input price?
Output often costs several times more than input. GPT-5.5 output is $30 per 1M tokens, Claude Opus output is $25, and Sonnet output is $15. Long, verbose answers can dominate the bill even with cached input.
Is DeepSeek cheaper than Claude and OpenAI?
Yes on official token pricing in this comparison. DeepSeek-V4-Flash is much cheaper than Claude Sonnet 4.6, Claude Opus 4.7, GPT-5.4, and GPT-5.5. Reliability, compliance, routing, and quality still need testing.
Should I use direct APIs or an LLM API gateway?
Use direct APIs when you need native provider features and direct billing. Use a gateway such as TokenMix.ai when your app needs multiple model families, routing, fallback, and centralized usage tracking.
How often should AI API pricing be updated?
For competitive pricing pages, update weekly or whenever a major provider changes a model, cache rule, batch rate, or context pricing rule. In 2026, stale pricing can become wrong within days.