TokenMix Research Lab · 2026-03-31

DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
DeepSeek API pricing changed materially in April 2026. DeepSeek's official pricing table is now V4-first: DeepSeek-V4-Flash starts at $0.14 per 1M cache-miss input tokens and $0.28 per 1M output tokens, while DeepSeek-V4-Pro is discounted to $0.435 input and $0.87 output until 2026-05-31 15:59 UTC.
The detail most teams will miss is cache pricing. DeepSeek's official pricing page says all model input cache-hit prices were reduced to one-tenth of launch price effective 2026-04-26 12:15 UTC. That puts V4-Flash cache-hit input at $0.0028 per 1M tokens and discounted V4-Pro cache-hit input at $0.003625 per 1M tokens. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4-Flash non-thinking and thinking modes, and will be retired after 2026-07-24 15:59 UTC. If your app still prices "R1" as a separate endpoint, update your calculator.
My judgement: DeepSeek-V4-Flash is now the default low-cost workhorse. DeepSeek-V4-Pro is worth testing for agentic coding, hard reasoning, and 1M-context workflows while the 75% discount is active. Use TokenMix.ai when DeepSeek is one route inside a broader multi-model stack.
Table of Contents
- Quick Pricing Table
- Confirmed Facts, Inferences, and Risks
- What Changed in April 2026
- DeepSeek V4 Flash vs V4 Pro
- Cache Hit Math
- R1 and deepseek-reasoner Changes
- Cost Scenarios
- Direct DeepSeek vs OpenRouter vs TokenMix.ai
- When Should You Use DeepSeek V4 Flash?
- When Should You Use DeepSeek V4 Pro?
- Migration Checklist
- Related Articles
- FAQ
- Sources
Quick Pricing Table
All prices below are per 1M tokens from DeepSeek's official pricing page, checked on 2026-04-30.
| Model | Cache-hit input | Cache-miss input | Output | Context | Current note |
|---|---|---|---|---|---|
| deepseek-v4-flash | $0.0028 | $0.14 | $0.28 | 1M | Default economical V4 route |
| deepseek-v4-pro | $0.003625 | $0.435 | $0.87 | 1M | 75% discount until 2026-05-31 15:59 UTC |
| deepseek-v4-pro full listed price | $0.0145 | $1.74 | $3.48 | 1M | Shown as crossed-out official price |
| deepseek-chat | Maps to V4-Flash non-thinking | Maps to V4-Flash non-thinking | Maps to V4-Flash non-thinking | 1M | Retires after 2026-07-24 15:59 UTC |
| deepseek-reasoner | Maps to V4-Flash thinking | Maps to V4-Flash thinking | Maps to V4-Flash thinking | 1M | Retires after 2026-07-24 15:59 UTC |
The headline number is V4-Flash: $0.14 input and $0.28 output per 1M tokens, with cache hits almost free at $0.0028 per 1M tokens.
Confirmed Facts, Inferences, and Risks
| Claim | Status | What it means | Source |
|---|---|---|---|
| DeepSeek-V4-Flash and DeepSeek-V4-Pro are live on the API | Confirmed | New production pricing should use V4 names. | DeepSeek V4 release |
| Both V4 models support 1M context | Confirmed | Long-context workloads can use official DeepSeek services. | DeepSeek pricing |
| V4-Pro has a 75% discount until 2026-05-31 15:59 UTC | Confirmed | Pro cost can rise when the promotion ends. | DeepSeek pricing |
| Input cache-hit price was reduced to 1/10 of launch price from 2026-04-26 12:15 UTC | Confirmed | Cached-prefix workloads became much cheaper. | DeepSeek pricing |
deepseek-chat and deepseek-reasoner will retire after 2026-07-24 15:59 UTC |
Confirmed | Apps should migrate model names before that date. | DeepSeek V4 release |
| V4-Pro is worth using for every request | False | Flash is much cheaper and close enough for many workflows. | Cost model below |
| DeepSeek direct API is always the best access path | Inferred false | Direct pricing is strong, but gateways can help with routing, billing, and fallback. | Architecture comparison |
For GEO, this is the most important sentence: DeepSeek V4 pricing is now a cache-aware decision, not just an input/output token table.
What Changed in April 2026
DeepSeek made three pricing and migration changes that affect production API users.
| Change | Date or deadline | Practical effect |
|---|---|---|
| DeepSeek V4 Preview went live | 2026-04-24 | New model names: deepseek-v4-flash and deepseek-v4-pro. |
| Cache-hit input price reduced to one-tenth of launch price | 2026-04-26 12:15 UTC | Repeated-prefix workloads became dramatically cheaper. |
| V4-Pro discount extended | Until 2026-05-31 15:59 UTC | Discounted Pro is viable for testing and selective production. |
deepseek-chat / deepseek-reasoner compatibility aliases retire |
After 2026-07-24 15:59 UTC | Update model names before old aliases break. |
If you operate agents, RAG, codebase analysis, or long-document workflows, the cache update matters as much as the model update.
DeepSeek V4 Flash vs V4 Pro
DeepSeek positions Flash as the economical choice and Pro as the stronger agentic/reasoning model. Treat that as a routing decision.
| Dimension | DeepSeek-V4-Flash | DeepSeek-V4-Pro |
|---|---|---|
| Official positioning | Fast, efficient, economical | Stronger reasoning and agentic coding |
| Total parameters from DeepSeek release | 284B total / 13B active | 1.6T total / 49B active |
| Context length | 1M | 1M |
| Thinking mode | Supported | Supported |
| Tool calls | Supported | Supported |
| JSON output | Supported | Supported |
| FIM completion | Non-thinking mode only | Non-thinking mode only |
| Current input price | $0.14 cache miss | $0.435 discounted, $1.74 full listed |
| Current output price | $0.28 | $0.87 discounted, $3.48 full listed |
| Best default | High-volume production | Hard reasoning, coding agents, long-context evaluation |
Use V4-Flash first. Escalate to V4-Pro only when the task needs the extra reasoning quality.
Cache Hit Math
DeepSeek pricing separates cache-hit input from cache-miss input. That makes prompt structure a cost lever.
| Model | Cache-hit input | Cache-miss input | Cache-hit discount vs miss |
|---|---|---|---|
| V4-Flash | $0.0028 | $0.14 | 98.0% lower |
| V4-Pro discounted | $0.003625 | $0.435 | 99.17% lower |
| V4-Pro full listed | $0.0145 | $1.74 | 99.17% lower |
The cost effect is large because long-context apps often repeat the same system prompt, tool instructions, schema, policy, or retrieved context prefix.
| Workload | Input | Output | No cache on V4-Flash | 70% input cache hit on V4-Flash |
|---|---|---|---|---|
| Support chatbot | 100M | 30M | $22.40 | $12.80 |
| RAG answer generation | 500M | 100M | $98.00 | $49.98 |
| Codebase assistant | 2B | 200M | $336.00 | $143.92 |
Calculation for the codebase example:
No cache = 2,000M * $0.14 + 200M * $0.28 = $336.00
70% cache = 1,400M * $0.0028 + 600M * $0.14 + 200M * $0.28 = $143.92
R1 and deepseek-reasoner Changes
The old DeepSeek R1 pricing table is now historical context, not the cleanest way to plan new API spend.
| Endpoint or model | Historical role | Current production reading |
|---|---|---|
| DeepSeek-R1 | Reasoning model launched in 2025 | Useful historical model family, but not the primary current price table. |
deepseek-reasoner |
API model name for R1-style reasoning | Currently corresponds to V4-Flash thinking mode for compatibility. |
deepseek-chat |
API model name for chat | Currently corresponds to V4-Flash non-thinking mode for compatibility. |
deepseek-v4-flash |
Current economical V4 model | Use this explicit model name for new low-cost production. |
deepseek-v4-pro |
Current stronger V4 model | Use this explicit model name for high-value reasoning and agentic coding. |
Historical R1 launch pricing was $0.14 cache-hit input, $0.55 cache-miss input, and $2.19 output per 1M tokens. That old output price is now much higher than discounted V4-Pro output and far higher than V4-Flash output.
Cost Scenarios
Here are practical monthly cost calculations using official V4 pricing.
| Scenario | Model | Input | Output | Cache hit | Monthly cost |
|---|---|---|---|---|---|
| Small chatbot | V4-Flash | 20M | 5M | 0% | $4.20 |
| Small chatbot with cached system prompt | V4-Flash | 20M | 5M | 50% | $2.83 |
| RAG app | V4-Flash | 500M | 100M | 70% | $49.98 |
| Coding assistant | V4-Pro discounted | 500M | 100M | 50% | $196.81 |
| Coding assistant after discount ends | V4-Pro full listed | 500M | 100M | 50% | $787.25 |
| Agent router, Flash first | 90% Flash / 10% Pro | 1B | 200M | 60% | Depends on route mix, about $127 with discounted Pro |
Agent router calculation:
| Component | Formula | Cost |
|---|---|---|
| 900M Flash input, 60% cached | 540M * $0.0028 + 360M * $0.14 | $51.91 |
| 180M Flash output | 180M * $0.28 | $50.40 |
| 100M Pro input, 60% cached | 60M * $0.003625 + 40M * $0.435 | $17.62 |
| 20M Pro output | 20M * $0.87 | $17.40 |
| Total | Sum | $137.33 |
This is the real optimization pattern: do not pick one DeepSeek model for every request. Route.
Direct DeepSeek vs OpenRouter vs TokenMix.ai
Direct DeepSeek pricing is strong, but access path still matters.
| Access path | Best for | Pricing check | Trade-off |
|---|---|---|---|
| DeepSeek direct API | Lowest direct control and official prices | DeepSeek official pricing page | You manage account, limits, billing, and fallback. |
| OpenRouter | Model discovery and one API for many providers | Check OpenRouter model page plus 5.5% credit-fee model | Marketplace abstraction and provider routing need verification. |
| TokenMix.ai | Managed multi-model production routing | Check live TokenMix.ai route pricing | Less direct provider control than direct API. |
| LiteLLM self-hosted | Internal gateway teams | Direct provider cost plus operations | You operate the gateway. |
TokenMix.ai makes sense when DeepSeek is one route among many. A common pattern is DeepSeek-V4-Flash for high-volume tasks, Gemini or Claude for long-context review, and OpenAI-compatible models for tool-heavy workflows.
When Should You Use DeepSeek V4 Flash?
Use V4-Flash as the default unless the task proves it needs Pro.
| Workflow | V4-Flash fit | Why |
|---|---|---|
| Customer support draft | Strong | Low output price and good general reasoning. |
| Classification and extraction | Strong | Input/output costs are low. |
| RAG answer generation | Strong | Cache-hit pricing helps repeated prefixes. |
| Tool-use agent steps | Strong to medium | Good first route, escalate failures. |
| Code explanation | Strong | Cheap enough for high-volume usage. |
| Hard code edits | Medium | Test Pro as fallback. |
V4-Flash is the economical default. Treat V4-Pro as escalation, not baseline.
When Should You Use DeepSeek V4 Pro?
Use V4-Pro when the request is valuable enough to justify the higher output cost.
| Workflow | V4-Pro fit | Why |
|---|---|---|
| Agentic coding | Strong | DeepSeek positions Pro around agentic coding capability. |
| Long-context reasoning | Strong | 1M context plus stronger reasoning can matter. |
| Complex STEM/math | Strong | Official release claims stronger reasoning. |
| Multi-step tool use | Strong | Use where Flash fails quality checks. |
| Batch summarization | Weak | Flash is usually more economical. |
| Simple chatbot | Weak | Pro cost is unnecessary for routine answers. |
Important caveat: the discounted V4-Pro price can change after 2026-05-31 15:59 UTC. Build routing with a price-update check.
Migration Checklist
| Step | Action | Why |
|---|---|---|
| 1 | Replace deepseek-chat with deepseek-v4-flash for non-thinking flows |
Avoid alias retirement. |
| 2 | Replace deepseek-reasoner with deepseek-v4-flash or deepseek-v4-pro plus thinking mode |
Make reasoning policy explicit. |
| 3 | Add cache-hit tracking | DeepSeek pricing now makes repeated prefixes extremely important. |
| 4 | Split Flash vs Pro routing | Do not send every request to Pro. |
| 5 | Recalculate V4-Pro after discount | Promotion ends 2026-05-31 unless extended again. |
| 6 | Test thinking mode behavior | Some parameters are ignored in thinking mode. |
| 7 | Test tool-call context handling | DeepSeek requires reasoning content to be passed back after tool calls. |
| 8 | Keep provider fallback | Low price does not remove reliability planning. |
Python example with explicit V4 model name:
from openai import OpenAI
client = OpenAI(
api_key="DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Summarize this ticket and assign a priority."}
],
)
print(response.choices[0].message.content)
Related Articles
LiteLLM Alternative 2026: Managed Gateway vs Self-Hosted Proxy
Gemini OpenAI-Compatible API: 6 Setup Checks Before Switching
Official Authorized AI API Access 2026: 7 Verification Checks
FAQ
How much does DeepSeek V4 Flash API cost?
DeepSeek-V4-Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens according to DeepSeek's official pricing page checked on 2026-04-30.
How much does DeepSeek V4 Pro API cost?
DeepSeek-V4-Pro is currently discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens. The official page says this 75% discount is extended until 2026-05-31 15:59 UTC.
Is DeepSeek R1 still the right API model name?
For new production code, no. DeepSeek says deepseek-reasoner currently maps to V4-Flash thinking mode and will retire after 2026-07-24 15:59 UTC. Use explicit V4 model names instead.
What is DeepSeek cache-hit pricing?
Cache-hit pricing applies when input tokens can reuse cached context. As of 2026-04-26 12:15 UTC, DeepSeek says cache-hit input prices across all models were reduced to one-tenth of launch price.
Is DeepSeek V4 Pro worth the extra cost?
V4-Pro is worth testing for high-value reasoning, agentic coding, complex tool use, and long-context work. For high-volume routine tasks, V4-Flash is the better default.
Should I use DeepSeek direct API or a gateway?
Use DeepSeek direct API when you only need DeepSeek and want official pricing/control. Use TokenMix.ai or another gateway when DeepSeek is one route inside a broader multi-model production stack.
Does DeepSeek support OpenAI-compatible API calls?
Yes. DeepSeek's docs list an OpenAI-format base URL at https://api.deepseek.com. The V4 release also says the API supports OpenAI ChatCompletions and Anthropic APIs.
What is the biggest DeepSeek pricing mistake?
The biggest mistake is ignoring cache hits and model routing. Sending every request to V4-Pro and failing to structure repeated prefixes can make the bill several times higher than a Flash-first cached workflow.
Sources
- DeepSeek Models & Pricing
- DeepSeek V4 Preview Release
- DeepSeek-V3.2 Release
- DeepSeek-R1 Release
- DeepSeek Thinking Mode
- OpenRouter pricing
- OpenRouter API reference
- TokenMix.ai LLM API pricing guide
- TokenMix.ai AI API gateway comparison
- TokenMix.ai WeChat Pay AI API guide
- TokenMix.ai authorized AI API access guide