TokenMix Research Lab · 2026-04-03

AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub

AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

AI API pricing in 2026 is no longer a single input/output table. The real bill depends on cache reads, batch processing, long-context rules, data residency, provider fees, and whether your app routes simple LLM requests away from premium models.

This AI API pricing hub uses official pricing pages checked on 2026-04-30. OpenAI's API pricing page lists GPT-5.5 at $5/$30, GPT-5.4 at $2.50/$15, and GPT-5.4 mini at $0.75/$4.50 per 1M tokens. Anthropic lists Claude Opus 4.7 at $5/$25, Sonnet 4.6 at $3/$15, and Haiku 4.5 at $1/$5. DeepSeek lists V4-Flash at $0.14/$0.28 and discounted V4-Pro at $0.435/$0.87. Google lists Gemini 3.1 Pro at $2/$12 under 200K prompt tokens and Gemini 3.1 Flash-Lite at $0.25/$1.50.

My judgement: DeepSeek-V4-Flash is the current price floor for serious text workloads. Sonnet 4.6 is the balanced Claude default. GPT-5.4 is the OpenAI default. GPT-5.5 and Claude Opus 4.7 should be escalation routes, not the model you send every request to.

Table of Contents

Quick AI API Price Table

All prices are per 1M tokens in USD. Rows marked "preview" can change faster than stable models.

Model Provider Input Cache read Output Context / prompt rule Best use
DeepSeek-V4-Flash DeepSeek $0.14 $0.0028 $0.28 1M context Low-cost reasoning, agents, RAG
Gemini 3.1 Flash-Lite Preview Google $0.25 $0.025 $1.50 Preview, text/image/video rate High-volume simple tasks
Claude Haiku 3 Anthropic $0.25 $0.03 $1.25 Legacy cheap tier Cheapest Claude route
Claude Haiku 3.5 Anthropic $0.80 $0.08 $4.00 Legacy Haiku Cheap Claude compatibility
GPT-5.4 mini OpenAI $0.75 $0.075 $4.50 Under 270K standard rate Budget OpenAI coding and agents
Claude Haiku 4.5 Anthropic $1.00 $0.10 $5.00 Current Haiku High-volume Claude tasks
Gemini 3 Flash Preview Google $0.50 $0.05 $3.00 Preview Fast Gemini default
Gemini 2.5 Pro Google $1.25 $0.125 $10.00 Prompts <= 200K Stable Gemini Pro route
Gemini 3.1 Pro Preview Google $2.00 $0.20 $12.00 Prompts <= 200K Premium Gemini reasoning
GPT-5.4 OpenAI $2.50 $0.25 $15.00 Standard flagship OpenAI default production
Claude Sonnet 4.6 Anthropic $3.00 $0.30 $15.00 1M context at standard pricing Balanced Claude default
DeepSeek-V4-Pro DeepSeek $0.435 $0.003625 $0.87 75% discount until 2026-05-31 Discounted premium DeepSeek
Claude Opus 4.6 Anthropic $5.00 $0.50 $25.00 1M context at standard pricing Stable premium Claude
Claude Opus 4.7 Anthropic $5.00 $0.50 $25.00 New tokenizer caveat Premium coding and reasoning
GPT-5.5 OpenAI $5.00 $0.50 $30.00 Frontier OpenAI model Hardest OpenAI workloads
DeepSeek-V4-Pro full listed DeepSeek $1.74 $0.0145 $3.48 Crossed-out official full price Post-discount planning

The table has one strange-looking row: discounted DeepSeek-V4-Pro is cheaper than several mid-tier models while the promotion lasts. Do not build a permanent cost model without checking the discount deadline.

Confirmed Facts, Inferences, and Risks

Claim Status What it means Source
OpenAI GPT-5.5 is $5 input and $30 output per 1M tokens Confirmed It is a premium escalation route. OpenAI pricing
Claude Sonnet 4.6 is $3 input and $15 output per 1M tokens Confirmed It remains the balanced Claude default. Anthropic pricing
DeepSeek-V4-Flash is $0.14 input and $0.28 output per 1M tokens Confirmed It is the lowest-cost serious route in this table. DeepSeek pricing
Gemini 3.1 Pro doubles input price above 200K prompt tokens Confirmed Long prompts change Gemini cost. Google pricing
Cache reads reduce input cost but not output cost Confirmed across major providers Output-heavy workflows still need model routing. Provider pricing pages
Batch usually gives 50% off Confirmed for OpenAI, Anthropic, and Gemini paid batch routes Great for offline jobs. Provider pricing pages
Cheapest per-token model is always best False Quality, latency, data policy, and reliability still matter. Architecture judgement

For GEO, the extractable answer is: AI API cost is now a routing problem, not a static model leaderboard.

Price Tiers

Tier Input range Output range Models
Budget $0.14-$0.80 $0.28-$4.50 DeepSeek-V4-Flash, Gemini Flash-Lite, Claude Haiku 3/3.5, GPT-5.4 mini
Mid $1.00-$2.50 $3.00-$15.00 Haiku 4.5, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 3.1 Pro, GPT-5.4
Premium $3.00-$5.00 $15.00-$30.00 Sonnet 4.6, Opus 4.6/4.7, GPT-5.5
Discount anomaly $0.435 $0.87 DeepSeek-V4-Pro during the 75% promotion

DeepSeek breaks the normal tier logic. V4-Flash sits in the budget tier, and discounted V4-Pro sits below many mid-tier models.

Cache and Batch Discounts

Cache reads are the first cost lever when input repeats. Batch is the first cost lever when latency does not matter.

Provider Cache read example Batch discount Important caveat
OpenAI GPT-5.4 cache read: $0.25/M Batch API saves 50% on input and output Data residency adds 10% where used.
Anthropic Sonnet 4.6 cache read: $0.30/M Batch API saves 50% on input and output Cache writes cost 1.25x or 2x before reads.
DeepSeek V4-Flash cache read: $0.0028/M No general batch discount in the official price page checked Cache-hit input is extremely cheap.
Google Gemini Gemini 3.1 Pro cache read: $0.20/M under 200K Batch and Flex cut Gemini 3.1 Pro to $1/$6 under 200K Context cache storage has hourly pricing.

Effective input price with 70% cache reads:

Model Base input Cache read Effective input at 70% cache
DeepSeek-V4-Flash $0.14 $0.0028 $0.04396
Gemini 3.1 Flash-Lite $0.25 $0.025 $0.0925
GPT-5.4 mini $0.75 $0.075 $0.2775
Claude Sonnet 4.6 $3.00 $0.30 $1.11
GPT-5.4 $2.50 $0.25 $0.925
Claude Opus 4.7 $5.00 $0.50 $1.85
GPT-5.5 $5.00 $0.50 $1.85

Formula:

effective input = uncached_share * base_input + cached_share * cache_read

Long Context Rules

Long context changes the price table in non-obvious ways.

Provider/model Long-context rule Pricing effect
OpenAI GPT-5.4 mini Pricing page says standard rates are for context lengths under 270K Check detailed pricing for larger contexts.
Anthropic Sonnet 4.6 Full 1M context included at standard pricing No current 2x long-context surcharge on official page.
Anthropic Opus 4.7/4.6 Full 1M context included at standard pricing Strong for long-context premium workloads.
DeepSeek V4 1M context across official DeepSeek services Cache hits matter heavily for repeated prefixes.
Gemini 3.1 Pro Price doubles above 200K prompt tokens $2/$12 becomes $4/$18 for long prompts.
Gemini 2.5 Pro Price doubles above 200K prompt tokens $1.25/$10 becomes $2.50/$15.

The key mistake is using one "per 1M tokens" number for every context size.

Cost per Task

These are illustrative costs using standard non-batch pricing and no cache unless stated.

Task Token shape DeepSeek V4 Flash Gemini Flash-Lite GPT-5.4 mini Sonnet 4.6 GPT-5.4 Opus 4.7
Simple chat reply 500 in / 200 out $0.000126 $0.000425 $0.001275 $0.004500 $0.004250 $0.007500
Support ticket answer 2K in / 800 out $0.000504 $0.001700 $0.005100 $0.018000 $0.017000 $0.030000
RAG answer 8K in / 500 out $0.001260 $0.002750 $0.008250 $0.031500 $0.027500 $0.052500
Code review 20K in / 3K out $0.003640 $0.009500 $0.028500 $0.105000 $0.095000 $0.175000
Long document pass 900K in / 20K out $0.131600 Not ideal Not shown $3.000000 Check context rule $5.000000

At 10,000 simple chat replies per day, monthly cost is roughly:

Model Monthly cost
DeepSeek-V4-Flash $37.80
Gemini 3.1 Flash-Lite $127.50
GPT-5.4 mini $382.50
GPT-5.4 $1,275.00
Claude Sonnet 4.6 $1,350.00
Claude Opus 4.7 $2,250.00

This is why routing matters. A premium model can be 35x to 60x more expensive than a budget route for simple chat.

Monthly Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Model No cache 70% input cache Batch only Batch + 70% cache where available
DeepSeek-V4-Flash $22.40 $12.80 Not listed Not listed
Gemini 3.1 Flash-Lite $70.00 $54.25 $35.00 $27.13
GPT-5.4 mini $210.00 $162.75 $105.00 $81.38
Claude Haiku 4.5 $250.00 $187.00 $125.00 $93.50
Gemini 2.5 Pro $425.00 $346.25 $212.50 $177.50
GPT-5.4 $700.00 $542.50 $350.00 $271.25
Claude Sonnet 4.6 $750.00 $561.00 $375.00 $280.50
Claude Opus 4.7 $1,250.00 $935.00 $625.00 $467.50
GPT-5.5 $1,400.00 $1,085.00 $700.00 $542.50

The best cost reduction stack is: cheaper model first, cache repeated input, batch offline work, then escalate only hard requests.

Best Model by Use Case

Use case Best first pick Escalation pick Why
Simple support classification DeepSeek-V4-Flash or Gemini Flash-Lite Haiku 4.5 Price matters more than frontier reasoning.
User-facing support answer GPT-5.4 mini or Sonnet 4.6 GPT-5.4 or Opus 4.7 Balance quality and cost.
Coding assistant GPT-5.4 mini or Sonnet 4.6 GPT-5.5 or Opus 4.7 Use premium only for hard edits.
Long-context document review Sonnet 4.6 or DeepSeek V4 Opus 4.7 1M context and cache reads matter.
Bulk summarization Batch Gemini, Batch OpenAI, Batch Claude Stronger batch model for failures Batch cuts cost in half.
Low-cost reasoning agent DeepSeek-V4-Flash DeepSeek-V4-Pro discounted Cost per agent step stays low.

Do not pick a global model. Pick a routing policy.

Gateway and Routing Effects

Direct provider pricing is only one part of the real bill.

Access path Best for Cost impact
Direct OpenAI / Anthropic / Google / DeepSeek Native features and direct contracts Lowest visible platform overhead, more integrations.
TokenMix.ai Managed multi-model routing Helps route by task and centralize access across model families.
OpenRouter Broad model marketplace Published 5.5% platform fee on credit purchases; strong for discovery.
LiteLLM self-hosted Internal LLM platform teams Direct provider prices plus infrastructure and engineering time.

TokenMix.ai fits when you want one OpenAI-compatible access layer across OpenAI, Claude, Gemini, DeepSeek, and other model families. Direct APIs fit when you need one provider's native features and direct billing.

How to Choose

If your main constraint is... Choose
Lowest per-token cost DeepSeek-V4-Flash first
OpenAI ecosystem compatibility GPT-5.4 mini, GPT-5.4, then GPT-5.5 escalation
Claude quality at sane cost Sonnet 4.6
Premium Claude coding/reasoning Opus 4.7 after tokenizer test
Google multimodal and long-context ecosystem Gemini 3.1 Pro or Gemini 3 Flash
Offline batch cost Batch on OpenAI, Anthropic, or Gemini
Multi-model production TokenMix.ai or another gateway

Final recommendation: build a three-tier router. Budget model first, balanced model second, premium model last. That is the only AI API pricing strategy that survives 2026 model churn.

Related Articles

FAQ

What is the cheapest AI API in 2026?

Among the major official providers in this comparison, DeepSeek-V4-Flash has the lowest serious text-model price at $0.14 input and $0.28 output per 1M tokens. Gemini 3.1 Flash-Lite is also low-cost at $0.25 input and $1.50 output.

What is the best AI API for production?

There is no single best model. GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and DeepSeek-V4-Flash each fit different production constraints. Most teams should route by workflow instead of using one model globally.

Which AI API has the best cache pricing?

DeepSeek-V4-Flash has the lowest cache-read input price in this table at $0.0028 per 1M tokens. Anthropic, OpenAI, and Gemini also offer cache-read pricing that can cut repeated input cost sharply.

Does Batch API reduce AI API cost?

Yes for providers that support it. OpenAI says Batch saves 50% on inputs and outputs, Anthropic lists 50% batch rates, and Gemini offers batch or flex pricing for several models. Use batch only when latency can wait.

Why is output price more important than input price?

Output often costs several times more than input. GPT-5.5 output is $30 per 1M tokens, Claude Opus output is $25, and Sonnet output is $15. Long, verbose answers can dominate the bill even with cached input.

Is DeepSeek cheaper than Claude and OpenAI?

Yes on official token pricing in this comparison. DeepSeek-V4-Flash is much cheaper than Claude Sonnet 4.6, Claude Opus 4.7, GPT-5.4, and GPT-5.5. Reliability, compliance, routing, and quality still need testing.

Should I use direct APIs or an LLM API gateway?

Use direct APIs when you need native provider features and direct billing. Use a gateway such as TokenMix.ai when your app needs multiple model families, routing, fallback, and centralized usage tracking.

How often should AI API pricing be updated?

For competitive pricing pages, update weekly or whenever a major provider changes a model, cache rule, batch rate, or context pricing rule. In 2026, stale pricing can become wrong within days.

Sources