TokenMix Research Lab · 2026-04-03

AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

AI API pricing in 2026 is no longer a single input/output table. The real bill depends on cache reads, batch processing, long-context rules, data residency, provider fees, and whether your app routes simple LLM requests away from premium models.

This AI API pricing hub uses official pricing pages checked on 2026-04-30. OpenAI's API pricing page lists GPT-5.5 at $5/$30, GPT-5.4 at $2.50/$15, and GPT-5.4 mini at $0.75/$4.50 per 1M tokens. Anthropic lists Claude Opus 4.7 at $5/$25, Sonnet 4.6 at $3/$15, and Haiku 4.5 at $1/$5. DeepSeek lists V4-Flash at $0.14/$0.28 and discounted V4-Pro at $0.435/$0.87. Google lists Gemini 3.1 Pro at $2/$12 under 200K prompt tokens and Gemini 3.1 Flash-Lite at $0.25/$1.50.

My judgement: DeepSeek-V4-Flash is the current price floor for serious text workloads. Sonnet 4.6 is the balanced Claude default. GPT-5.4 is the OpenAI default. GPT-5.5 and Claude Opus 4.7 should be escalation routes, not the model you send every request to.

Quick AI API Price Table
Confirmed Facts, Inferences, and Risks
Price Tiers
Cache and Batch Discounts
Long Context Rules
Cost per Task
Monthly Cost Scenarios
Best Model by Use Case
Gateway and Routing Effects
How to Choose
Related Articles
FAQ
Sources

Quick AI API Price Table

All prices are per 1M tokens in USD. Rows marked "preview" can change faster than stable models.

Model	Provider	Input	Cache read	Output	Context / prompt rule	Best use
DeepSeek-V4-Flash	DeepSeek	$0.14	$0.0028	$0.28	1M context	Low-cost reasoning, agents, RAG
Gemini 3.1 Flash-Lite Preview	Google	$0.25	$0.025	$1.50	Preview, text/image/video rate	High-volume simple tasks
Claude Haiku 3	Anthropic	$0.25	$0.03	$1.25	Legacy cheap tier	Cheapest Claude route
Claude Haiku 3.5	Anthropic	$0.80	$0.08	$4.00	Legacy Haiku	Cheap Claude compatibility
GPT-5.4 mini	OpenAI	$0.75	$0.075	$4.50	Under 270K standard rate	Budget OpenAI coding and agents
Claude Haiku 4.5	Anthropic	$1.00	$0.10	$5.00	Current Haiku	High-volume Claude tasks
Gemini 3 Flash Preview	Google	$0.50	$0.05	$3.00	Preview	Fast Gemini default
Gemini 2.5 Pro	Google	$1.25	$0.125	$10.00	Prompts <= 200K	Stable Gemini Pro route
Gemini 3.1 Pro Preview	Google	$2.00	$0.20	$12.00	Prompts <= 200K	Premium Gemini reasoning
GPT-5.4	OpenAI	$2.50	$0.25	$15.00	Standard flagship	OpenAI default production
Claude Sonnet 4.6	Anthropic	$3.00	$0.30	$15.00	1M context at standard pricing	Balanced Claude default
DeepSeek-V4-Pro	DeepSeek	$0.435	$0.003625	$0.87	75% discount until 2026-05-31	Discounted premium DeepSeek
Claude Opus 4.6	Anthropic	$5.00	$0.50	$25.00	1M context at standard pricing	Stable premium Claude
Claude Opus 4.7	Anthropic	$5.00	$0.50	$25.00	New tokenizer caveat	Premium coding and reasoning
GPT-5.5	OpenAI	$5.00	$0.50	$30.00	Frontier OpenAI model	Hardest OpenAI workloads
DeepSeek-V4-Pro full listed	DeepSeek	$1.74	$0.0145	$3.48	Crossed-out official full price	Post-discount planning

The table has one strange-looking row: discounted DeepSeek-V4-Pro is cheaper than several mid-tier models while the promotion lasts. Do not build a permanent cost model without checking the discount deadline.

Confirmed Facts, Inferences, and Risks

Claim	Status	What it means	Source
OpenAI GPT-5.5 is $5 input and $30 output per 1M tokens	Confirmed	It is a premium escalation route.	OpenAI pricing
Claude Sonnet 4.6 is $3 input and $15 output per 1M tokens	Confirmed	It remains the balanced Claude default.	Anthropic pricing
DeepSeek-V4-Flash is $0.14 input and $0.28 output per 1M tokens	Confirmed	It is the lowest-cost serious route in this table.	DeepSeek pricing
Gemini 3.1 Pro doubles input price above 200K prompt tokens	Confirmed	Long prompts change Gemini cost.	Google pricing
Cache reads reduce input cost but not output cost	Confirmed across major providers	Output-heavy workflows still need model routing.	Provider pricing pages
Batch usually gives 50% off	Confirmed for OpenAI, Anthropic, and Gemini paid batch routes	Great for offline jobs.	Provider pricing pages
Cheapest per-token model is always best	False	Quality, latency, data policy, and reliability still matter.	Architecture judgement

For GEO, the extractable answer is: AI API cost is now a routing problem, not a static model leaderboard.

Price Tiers

Tier	Input range	Output range	Models
Budget	$0.14-$0.80	$0.28-$4.50	DeepSeek-V4-Flash, Gemini Flash-Lite, Claude Haiku 3/3.5, GPT-5.4 mini
Mid	$1.00-$2.50	$3.00-$15.00	Haiku 4.5, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 3.1 Pro, GPT-5.4
Premium	$3.00-$5.00	$15.00-$30.00	Sonnet 4.6, Opus 4.6/4.7, GPT-5.5
Discount anomaly	$0.435	$0.87	DeepSeek-V4-Pro during the 75% promotion

DeepSeek breaks the normal tier logic. V4-Flash sits in the budget tier, and discounted V4-Pro sits below many mid-tier models.

Cache and Batch Discounts

Cache reads are the first cost lever when input repeats. Batch is the first cost lever when latency does not matter.

Provider	Cache read example	Batch discount	Important caveat
OpenAI	GPT-5.4 cache read: $0.25/M	Batch API saves 50% on input and output	Data residency adds 10% where used.
Anthropic	Sonnet 4.6 cache read: $0.30/M	Batch API saves 50% on input and output	Cache writes cost 1.25x or 2x before reads.
DeepSeek	V4-Flash cache read: $0.0028/M	No general batch discount in the official price page checked	Cache-hit input is extremely cheap.
Google Gemini	Gemini 3.1 Pro cache read: $0.20/M under 200K	Batch and Flex cut Gemini 3.1 Pro to $1/$6 under 200K	Context cache storage has hourly pricing.

Effective input price with 70% cache reads:

Model	Base input	Cache read	Effective input at 70% cache
DeepSeek-V4-Flash	$0.14	$0.0028	$0.04396
Gemini 3.1 Flash-Lite	$0.25	$0.025	$0.0925
GPT-5.4 mini	$0.75	$0.075	$0.2775
Claude Sonnet 4.6	$3.00	$0.30	$1.11
GPT-5.4	$2.50	$0.25	$0.925
Claude Opus 4.7	$5.00	$0.50	$1.85
GPT-5.5	$5.00	$0.50	$1.85

Formula:

effective input = uncached_share * base_input + cached_share * cache_read

Long Context Rules

Long context changes the price table in non-obvious ways.

Provider/model	Long-context rule	Pricing effect
OpenAI GPT-5.4 mini	Pricing page says standard rates are for context lengths under 270K	Check detailed pricing for larger contexts.
Anthropic Sonnet 4.6	Full 1M context included at standard pricing	No current 2x long-context surcharge on official page.
Anthropic Opus 4.7/4.6	Full 1M context included at standard pricing	Strong for long-context premium workloads.
DeepSeek V4	1M context across official DeepSeek services	Cache hits matter heavily for repeated prefixes.
Gemini 3.1 Pro	Price doubles above 200K prompt tokens	$2/$12 becomes $4/$18 for long prompts.
Gemini 2.5 Pro	Price doubles above 200K prompt tokens	$1.25/$10 becomes $2.50/$15.

The key mistake is using one "per 1M tokens" number for every context size.

Cost per Task

These are illustrative costs using standard non-batch pricing and no cache unless stated.

Task	Token shape	DeepSeek V4 Flash	Gemini Flash-Lite	GPT-5.4 mini	Sonnet 4.6	GPT-5.4	Opus 4.7
Simple chat reply	500 in / 200 out	$0.000126	$0.000425	$0.001275	$0.004500	$0.004250	$0.007500
Support ticket answer	2K in / 800 out	$0.000504	$0.001700	$0.005100	$0.018000	$0.017000	$0.030000
RAG answer	8K in / 500 out	$0.001260	$0.002750	$0.008250	$0.031500	$0.027500	$0.052500
Code review	20K in / 3K out	$0.003640	$0.009500	$0.028500	$0.105000	$0.095000	$0.175000
Long document pass	900K in / 20K out	$0.131600	Not ideal	Not shown	$3.000000	Check context rule	$5.000000

At 10,000 simple chat replies per day, monthly cost is roughly:

Model	Monthly cost
DeepSeek-V4-Flash	$37.80
Gemini 3.1 Flash-Lite	$127.50
GPT-5.4 mini	$382.50
GPT-5.4	$1,275.00
Claude Sonnet 4.6	$1,350.00
Claude Opus 4.7	$2,250.00

This is why routing matters. A premium model can be 35x to 60x more expensive than a budget route for simple chat.

Monthly Cost Scenarios

Assume 100M input tokens and 30M output tokens per month.

Model	No cache	70% input cache	Batch only	Batch + 70% cache where available
DeepSeek-V4-Flash	$22.40	$12.80	Not listed	Not listed
Gemini 3.1 Flash-Lite	$70.00	$54.25	$35.00	$27.13
GPT-5.4 mini	$210.00	$162.75	$105.00	$81.38
Claude Haiku 4.5	$250.00	$187.00	$125.00	$93.50
Gemini 2.5 Pro	$425.00	$346.25	$212.50	$177.50
GPT-5.4	$700.00	$542.50	$350.00	$271.25
Claude Sonnet 4.6	$750.00	$561.00	$375.00	$280.50
Claude Opus 4.7	$1,250.00	$935.00	$625.00	$467.50
GPT-5.5	$1,400.00	$1,085.00	$700.00	$542.50

The best cost reduction stack is: cheaper model first, cache repeated input, batch offline work, then escalate only hard requests.

Best Model by Use Case

Use case	Best first pick	Escalation pick	Why
Simple support classification	DeepSeek-V4-Flash or Gemini Flash-Lite	Haiku 4.5	Price matters more than frontier reasoning.
User-facing support answer	GPT-5.4 mini or Sonnet 4.6	GPT-5.4 or Opus 4.7	Balance quality and cost.
Coding assistant	GPT-5.4 mini or Sonnet 4.6	GPT-5.5 or Opus 4.7	Use premium only for hard edits.
Long-context document review	Sonnet 4.6 or DeepSeek V4	Opus 4.7	1M context and cache reads matter.
Bulk summarization	Batch Gemini, Batch OpenAI, Batch Claude	Stronger batch model for failures	Batch cuts cost in half.
Low-cost reasoning agent	DeepSeek-V4-Flash	DeepSeek-V4-Pro discounted	Cost per agent step stays low.

Do not pick a global model. Pick a routing policy.

Gateway and Routing Effects

Direct provider pricing is only one part of the real bill.

Access path	Best for	Cost impact
Direct OpenAI / Anthropic / Google / DeepSeek	Native features and direct contracts	Lowest visible platform overhead, more integrations.
TokenMix.ai	Managed multi-model routing	Helps route by task and centralize access across model families.
OpenRouter	Broad model marketplace	Published 5.5% platform fee on credit purchases; strong for discovery.
LiteLLM self-hosted	Internal LLM platform teams	Direct provider prices plus infrastructure and engineering time.

TokenMix.ai fits when you want one OpenAI-compatible access layer across OpenAI, Claude, Gemini, DeepSeek, and other model families. Direct APIs fit when you need one provider's native features and direct billing.

How to Choose

If your main constraint is...	Choose
Lowest per-token cost	DeepSeek-V4-Flash first
OpenAI ecosystem compatibility	GPT-5.4 mini, GPT-5.4, then GPT-5.5 escalation
Claude quality at sane cost	Sonnet 4.6
Premium Claude coding/reasoning	Opus 4.7 after tokenizer test
Google multimodal and long-context ecosystem	Gemini 3.1 Pro or Gemini 3 Flash
Offline batch cost	Batch on OpenAI, Anthropic, or Gemini
Multi-model production	TokenMix.ai or another gateway

Final recommendation: build a three-tier router. Budget model first, balanced model second, premium model last. That is the only AI API pricing strategy that survives 2026 model churn.

FAQ

What is the cheapest AI API in 2026?

Among the major official providers in this comparison, DeepSeek-V4-Flash has the lowest serious text-model price at $0.14 input and $0.28 output per 1M tokens. Gemini 3.1 Flash-Lite is also low-cost at $0.25 input and $1.50 output.

What is the best AI API for production?

There is no single best model. GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and DeepSeek-V4-Flash each fit different production constraints. Most teams should route by workflow instead of using one model globally.

Which AI API has the best cache pricing?

DeepSeek-V4-Flash has the lowest cache-read input price in this table at $0.0028 per 1M tokens. Anthropic, OpenAI, and Gemini also offer cache-read pricing that can cut repeated input cost sharply.

Does Batch API reduce AI API cost?

Yes for providers that support it. OpenAI says Batch saves 50% on inputs and outputs, Anthropic lists 50% batch rates, and Gemini offers batch or flex pricing for several models. Use batch only when latency can wait.

Why is output price more important than input price?

Output often costs several times more than input. GPT-5.5 output is $30 per 1M tokens, Claude Opus output is $25, and Sonnet output is $15. Long, verbose answers can dominate the bill even with cached input.

Is DeepSeek cheaper than Claude and OpenAI?

Yes on official token pricing in this comparison. DeepSeek-V4-Flash is much cheaper than Claude Sonnet 4.6, Claude Opus 4.7, GPT-5.4, and GPT-5.5. Reliability, compliance, routing, and quality still need testing.

Should I use direct APIs or an LLM API gateway?

Use direct APIs when you need native provider features and direct billing. Use a gateway such as TokenMix.ai when your app needs multiple model families, routing, fallback, and centralized usage tracking.

How often should AI API pricing be updated?

For competitive pricing pages, update weekly or whenever a major provider changes a model, cache rule, batch rate, or context pricing rule. In 2026, stale pricing can become wrong within days.