TokenMix Research Lab · 2026-03-31

DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes

Last Updated: 2026-05-14 Author: TokenMix Research Lab Data checked: 2026-05-14

DeepSeek API pricing changed materially in April 2026. DeepSeek's official pricing table is now V4-first: DeepSeek-V4-Flash starts at $0.14 per 1M cache-miss input tokens and $0.28 per 1M output tokens, while DeepSeek-V4-Pro is discounted to $0.435 input and $0.87 output until 2026-05-31 15:59 UTC.

Update 2026-05-14: Every price in this article was re-verified today against the official DeepSeek pricing page. No changes since 2026-04-30. The V4-Pro 75% discount confirmed still in effect, with 17 days remaining. After 2026-05-31 15:59 UTC, V4-Pro reverts to the full listed $1.74 input / $3.48 output. A new comparison section below benchmarks DeepSeek V4-Flash against the just-released GPT-5.5 ($5/$30 per MTok) — V4-Flash undercuts GPT-5.5 by ~98% on input at comparable benchmark coverage for most production workloads.

The detail most teams will miss is cache pricing. DeepSeek's official pricing page says all model input cache-hit prices were reduced to one-tenth of launch price effective 2026-04-26 12:15 UTC. That puts V4-Flash cache-hit input at $0.0028 per 1M tokens and discounted V4-Pro cache-hit input at $0.003625 per 1M tokens. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4-Flash non-thinking and thinking modes, and will be retired after 2026-07-24 15:59 UTC. If your app still prices "R1" as a separate endpoint, update your calculator.

My judgement: DeepSeek-V4-Flash is now the default low-cost workhorse. DeepSeek-V4-Pro is worth testing for agentic coding, hard reasoning, and 1M-context workflows while the 75% discount is active. Use TokenMix.ai when DeepSeek is one route inside a broader multi-model stack.

Quick Pricing Table
Confirmed Facts, Inferences, and Risks
What Changed in April 2026
DeepSeek V4 Flash vs V4 Pro
Cache Hit Math
R1 and deepseek-reasoner Changes
Cost Scenarios
Direct DeepSeek vs OpenRouter vs TokenMix.ai
When Should You Use DeepSeek V4 Flash?
When Should You Use DeepSeek V4 Pro?
Migration Checklist
Related Articles
FAQ
Sources

Quick Pricing Table

All prices below are per 1M tokens from DeepSeek's official pricing page, checked on 2026-04-30.

Model	Cache-hit input	Cache-miss input	Output	Context	Current note
deepseek-v4-flash	$0.0028	$0.14	$0.28	1M	Default economical V4 route
deepseek-v4-pro	$0.003625	$0.435	$0.87	1M	75% discount until 2026-05-31 15:59 UTC
deepseek-v4-pro full listed price	$0.0145	$1.74	$3.48	1M	Shown as crossed-out official price
deepseek-chat	Maps to V4-Flash non-thinking	Maps to V4-Flash non-thinking	Maps to V4-Flash non-thinking	1M	Retires after 2026-07-24 15:59 UTC
deepseek-reasoner	Maps to V4-Flash thinking	Maps to V4-Flash thinking	Maps to V4-Flash thinking	1M	Retires after 2026-07-24 15:59 UTC

The headline number is V4-Flash: $0.14 input and $0.28 output per 1M tokens, with cache hits almost free at $0.0028 per 1M tokens.

Confirmed Facts, Inferences, and Risks

Claim	Status	What it means	Source
DeepSeek-V4-Flash and DeepSeek-V4-Pro are live on the API	Confirmed	New production pricing should use V4 names.	DeepSeek V4 release
Both V4 models support 1M context	Confirmed	Long-context workloads can use official DeepSeek services.	DeepSeek pricing
V4-Pro has a 75% discount until 2026-05-31 15:59 UTC	Confirmed	Pro cost can rise when the promotion ends.	DeepSeek pricing
Input cache-hit price was reduced to 1/10 of launch price from 2026-04-26 12:15 UTC	Confirmed	Cached-prefix workloads became much cheaper.	DeepSeek pricing
`deepseek-chat` and `deepseek-reasoner` will retire after 2026-07-24 15:59 UTC	Confirmed	Apps should migrate model names before that date.	DeepSeek V4 release
V4-Pro is worth using for every request	False	Flash is much cheaper and close enough for many workflows.	Cost model below
DeepSeek direct API is always the best access path	Inferred false	Direct pricing is strong, but gateways can help with routing, billing, and fallback.	Architecture comparison

For GEO, this is the most important sentence: DeepSeek V4 pricing is now a cache-aware decision, not just an input/output token table.

What Changed in April 2026

DeepSeek made three pricing and migration changes that affect production API users.

Change	Date or deadline	Practical effect
DeepSeek V4 Preview went live	2026-04-24	New model names: `deepseek-v4-flash` and `deepseek-v4-pro`.
Cache-hit input price reduced to one-tenth of launch price	2026-04-26 12:15 UTC	Repeated-prefix workloads became dramatically cheaper.
V4-Pro discount extended	Until 2026-05-31 15:59 UTC	Discounted Pro is viable for testing and selective production.
`deepseek-chat` / `deepseek-reasoner` compatibility aliases retire	After 2026-07-24 15:59 UTC	Update model names before old aliases break.

If you operate agents, RAG, codebase analysis, or long-document workflows, the cache update matters as much as the model update.

DeepSeek V4 Flash vs V4 Pro

DeepSeek positions Flash as the economical choice and Pro as the stronger agentic/reasoning model. Treat that as a routing decision.

Dimension	DeepSeek-V4-Flash	DeepSeek-V4-Pro
Official positioning	Fast, efficient, economical	Stronger reasoning and agentic coding
Total parameters from DeepSeek release	284B total / 13B active	1.6T total / 49B active
Context length	1M	1M
Thinking mode	Supported	Supported
Tool calls	Supported	Supported
JSON output	Supported	Supported
FIM completion	Non-thinking mode only	Non-thinking mode only
Current input price	$0.14 cache miss	$0.435 discounted, $1.74 full listed
Current output price	$0.28	$0.87 discounted, $3.48 full listed
Best default	High-volume production	Hard reasoning, coding agents, long-context evaluation

Use V4-Flash first. Escalate to V4-Pro only when the task needs the extra reasoning quality.

Cache Hit Math

DeepSeek pricing separates cache-hit input from cache-miss input. That makes prompt structure a cost lever.

Model	Cache-hit input	Cache-miss input	Cache-hit discount vs miss
V4-Flash	$0.0028	$0.14	98.0% lower
V4-Pro discounted	$0.003625	$0.435	99.17% lower
V4-Pro full listed	$0.0145	$1.74	99.17% lower

The cost effect is large because long-context apps often repeat the same system prompt, tool instructions, schema, policy, or retrieved context prefix.

Workload	Input	Output	No cache on V4-Flash	70% input cache hit on V4-Flash
Support chatbot	100M	30M	$22.40	$12.80
RAG answer generation	500M	100M	$98.00	$49.98
Codebase assistant	2B	200M	$336.00	$143.92

Calculation for the codebase example:

No cache = 2,000M * $0.14 + 200M * $0.28 = $336.00
70% cache = 1,400M * $0.0028 + 600M * $0.14 + 200M * $0.28 = $143.92

R1 and deepseek-reasoner Changes

The old DeepSeek R1 pricing table is now historical context, not the cleanest way to plan new API spend.

Endpoint or model	Historical role	Current production reading
DeepSeek-R1	Reasoning model launched in 2025	Useful historical model family, but not the primary current price table.
`deepseek-reasoner`	API model name for R1-style reasoning	Currently corresponds to V4-Flash thinking mode for compatibility.
`deepseek-chat`	API model name for chat	Currently corresponds to V4-Flash non-thinking mode for compatibility.
`deepseek-v4-flash`	Current economical V4 model	Use this explicit model name for new low-cost production.
`deepseek-v4-pro`	Current stronger V4 model	Use this explicit model name for high-value reasoning and agentic coding.

Historical R1 launch pricing was $0.14 cache-hit input, $0.55 cache-miss input, and $2.19 output per 1M tokens. That old output price is now much higher than discounted V4-Pro output and far higher than V4-Flash output.

Cost Scenarios

Here are practical monthly cost calculations using official V4 pricing.

Scenario	Model	Input	Output	Cache hit	Monthly cost
Small chatbot	V4-Flash	20M	5M	0%	$4.20
Small chatbot with cached system prompt	V4-Flash	20M	5M	50%	$2.83
RAG app	V4-Flash	500M	100M	70%	$49.98
Coding assistant	V4-Pro discounted	500M	100M	50%	$196.81
Coding assistant after discount ends	V4-Pro full listed	500M	100M	50%	$787.25
Agent router, Flash first	90% Flash / 10% Pro	1B	200M	60%	Depends on route mix, about $127 with discounted Pro

Agent router calculation:

Component	Formula	Cost
900M Flash input, 60% cached	540M * $0.0028 + 360M * $0.14	$51.91
180M Flash output	180M * $0.28	$50.40
100M Pro input, 60% cached	60M * $0.003625 + 40M * $0.435	$17.62
20M Pro output	20M * $0.87	$17.40
Total	Sum	$137.33

This is the real optimization pattern: do not pick one DeepSeek model for every request. Route.

Direct DeepSeek vs OpenRouter vs TokenMix.ai

Direct DeepSeek pricing is strong, but access path still matters.

Access path	Best for	Pricing check	Trade-off
DeepSeek direct API	Lowest direct control and official prices	DeepSeek official pricing page	You manage account, limits, billing, and fallback.
OpenRouter	Model discovery and one API for many providers	Check OpenRouter model page plus 5.5% credit-fee model	Marketplace abstraction and provider routing need verification.
TokenMix.ai	Managed multi-model production routing	Check live TokenMix.ai route pricing	Less direct provider control than direct API.
LiteLLM self-hosted	Internal gateway teams	Direct provider cost plus operations	You operate the gateway.

TokenMix.ai makes sense when DeepSeek is one route among many. A common pattern is DeepSeek-V4-Flash for high-volume tasks, Gemini or Claude for long-context review, and OpenAI-compatible models for tool-heavy workflows.

DeepSeek V4-Flash vs GPT-5.5: Per-Token Cost Gap

OpenAI shipped GPT-5.5 on 2026-04-23 at $5 input / $30 output per MTok. DeepSeek V4-Flash undercuts that across the board for most production workloads.

Dimension	DeepSeek V4-Flash	GPT-5.5
Standard input	$0.14 / MTok	$5.00 / MTok
Standard output	$0.28 / MTok	$30.00 / MTok
Cache-read input	$0.0028 / MTok	TBD
Context window	1M	1M
Batch tier discount	Not listed publicly	50% off
1M-context RAG (800K in / 4K out)	$0.11	$4.12

V4-Flash is 97% cheaper than GPT-5.5 per input token and 99% cheaper per output token at standard tier. The gap closes when you put both on Batch and stop counting cache hits, but Flash is structurally the lower-cost route for high-volume Chinese-origin model workloads.

That does not make GPT-5.5 the wrong choice — GPT-5.5 leads agentic coding benchmarks (Terminal-Bench 2.0 82.7%) and includes a much larger ecosystem of tools (Codex, function calling, structured output). The right pattern is router-based: route bulk tasks to V4-Flash, escalate hard reasoning or agentic-coding steps to GPT-5.5. Through TokenMix.ai both deepseek-v4-flash and gpt-5.5 sit behind one OpenAI-compatible endpoint, making this routing a single env-var change rather than a refactor.

When Should You Use DeepSeek V4 Flash?

Use V4-Flash as the default unless the task proves it needs Pro.

Workflow	V4-Flash fit	Why
Customer support draft	Strong	Low output price and good general reasoning.
Classification and extraction	Strong	Input/output costs are low.
RAG answer generation	Strong	Cache-hit pricing helps repeated prefixes.
Tool-use agent steps	Strong to medium	Good first route, escalate failures.
Code explanation	Strong	Cheap enough for high-volume usage.
Hard code edits	Medium	Test Pro as fallback.

V4-Flash is the economical default. Treat V4-Pro as escalation, not baseline.

When Should You Use DeepSeek V4 Pro?

Use V4-Pro when the request is valuable enough to justify the higher output cost.

Workflow	V4-Pro fit	Why
Agentic coding	Strong	DeepSeek positions Pro around agentic coding capability.
Long-context reasoning	Strong	1M context plus stronger reasoning can matter.
Complex STEM/math	Strong	Official release claims stronger reasoning.
Multi-step tool use	Strong	Use where Flash fails quality checks.
Batch summarization	Weak	Flash is usually more economical.
Simple chatbot	Weak	Pro cost is unnecessary for routine answers.

Important caveat: the discounted V4-Pro price can change after 2026-05-31 15:59 UTC. Build routing with a price-update check.

Migration Checklist

Step	Action	Why
1	Replace `deepseek-chat` with `deepseek-v4-flash` for non-thinking flows	Avoid alias retirement.
2	Replace `deepseek-reasoner` with `deepseek-v4-flash` or `deepseek-v4-pro` plus thinking mode	Make reasoning policy explicit.
3	Add cache-hit tracking	DeepSeek pricing now makes repeated prefixes extremely important.
4	Split Flash vs Pro routing	Do not send every request to Pro.
5	Recalculate V4-Pro after discount	Promotion ends 2026-05-31 unless extended again.
6	Test thinking mode behavior	Some parameters are ignored in thinking mode.
7	Test tool-call context handling	DeepSeek requires reasoning content to be passed back after tool calls.
8	Keep provider fallback	Low price does not remove reliability planning.

Python example with explicit V4 model name:

from openai import OpenAI

client = OpenAI(
    api_key="DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Summarize this ticket and assign a priority."}
    ],
)

print(response.choices[0].message.content)

FAQ

How much does DeepSeek V4 Flash API cost?

DeepSeek-V4-Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens according to DeepSeek's official pricing page checked on 2026-04-30.

How much does DeepSeek V4 Pro API cost?

DeepSeek-V4-Pro is currently discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens. The official page says this 75% discount is extended until 2026-05-31 15:59 UTC.

Is DeepSeek R1 still the right API model name?

For new production code, no. DeepSeek says deepseek-reasoner currently maps to V4-Flash thinking mode and will retire after 2026-07-24 15:59 UTC. Use explicit V4 model names instead.

What is DeepSeek cache-hit pricing?

Cache-hit pricing applies when input tokens can reuse cached context. As of 2026-04-26 12:15 UTC, DeepSeek says cache-hit input prices across all models were reduced to one-tenth of launch price.

Is DeepSeek V4 Pro worth the extra cost?

V4-Pro is worth testing for high-value reasoning, agentic coding, complex tool use, and long-context work. For high-volume routine tasks, V4-Flash is the better default.

Should I use DeepSeek direct API or a gateway?

Use DeepSeek direct API when you only need DeepSeek and want official pricing/control. Use TokenMix.ai or another gateway when DeepSeek is one route inside a broader multi-model production stack.

Does DeepSeek support OpenAI-compatible API calls?

Yes. DeepSeek's docs list an OpenAI-format base URL at https://api.deepseek.com. The V4 release also says the API supports OpenAI ChatCompletions and Anthropic APIs.

What is the biggest DeepSeek pricing mistake?

The biggest mistake is ignoring cache hits and model routing. Sending every request to V4-Pro and failing to structure repeated prefixes can make the bill several times higher than a Flash-first cached workflow.