DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide
Last Updated: 2026-04-30 Author: TokenMix Research Lab Data checked: 2026-04-30
DeepSeek cache hit pricing is the hidden cost lever in V4. On V4 Flash, cache-hit input costs $0.0028 per 1M tokens, while cache-miss input costs $0.14. That is a 98% input-cost reduction.
According to DeepSeek's official Models & Pricing page, V4 Flash is $0.0028 cache-hit input, $0.14 cache-miss input, and $0.28 output per 1M tokens. V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek's context caching guide says cache hits are reported through prompt_cache_hit_tokens and cache misses through prompt_cache_miss_tokens.
A lower input price for tokens that reuse cached context
V4 Flash cache-hit price
$0.0028 per 1M input tokens
V4 Flash cache-miss price
$0.14 per 1M input tokens
V4 Flash input savings
98% on cache-hit input
V4 Pro discounted cache-hit price
$0.003625 per 1M input tokens
Fields to log
prompt_cache_hit_tokens, prompt_cache_miss_tokens
Best workloads
RAG, agents, repeated system prompts, long-document workflows
The practical rule: if your DeepSeek workload repeats a long prefix, cache-hit pricing can matter more than the base model price.
Confirmed Facts
Claim
Status
Source
DeepSeek V4 Flash cache-hit input is $0.0028/M
Confirmed
DeepSeek pricing page
DeepSeek V4 Flash cache-miss input is $0.14/M
Confirmed
DeepSeek pricing page
DeepSeek V4 Flash output is $0.28/M
Confirmed
DeepSeek pricing page
DeepSeek V4 Pro discounted cache-hit input is $0.003625/M
Confirmed
DeepSeek pricing page
DeepSeek V4 Pro discount ends 2026-05-31 15:59 UTC
Confirmed
DeepSeek pricing page
DeepSeek exposes hit and miss token fields
Confirmed
DeepSeek context caching docs
Cache hits are guaranteed for every repeated prefix
False
Cache is an optimization, not a contract
Cache Hit vs Cache Miss Pricing
All prices are per 1M tokens, checked on 2026-04-30.
Model
Cache-hit input
Cache-miss input
Output
Hit savings vs miss
DeepSeek V4 Flash
$0.0028
$0.14
$0.28
98.0%
DeepSeek V4 Pro discounted
$0.003625
$0.435
$0.87
99.17%
DeepSeek V4 Pro full listed
$0.0145
.74
$3.48
99.17%
Two things matter. First, V4 Flash is already cheap on cache misses. Second, cache hits make repeated input almost free.
Cost Math
Scenario 1: 1M repeated input tokens
Model
Cache miss cost
Cache hit cost
Savings
V4 Flash
$0.14
$0.0028
$0.1372
V4 Pro discounted
$0.435
$0.003625
$0.431375
V4 Pro full listed
.74
$0.0145
.7255
Scenario 2: 100M repeated input tokens
Model
Cache miss cost
Cache hit cost
Savings
V4 Flash
4.00
$0.28
3.72
V4 Pro discounted
$43.50
$0.3625
$43.1375
V4 Pro full listed
74.00
.45
72.55
Scenario 3: RAG app with repeated system prompt
Variable
Value
System and policy prefix
8,000 tokens
Requests per month
200,000
Repeated input volume
1.6B tokens
V4 Flash cache-miss cost
$224
V4 Flash cache-hit cost
$4.48
Input savings if fully cached
$219.52
The savings can look small per request and large per month. That is exactly why cache-hit accounting belongs in your usage dashboard.
How DeepSeek Cache Hits Work
DeepSeek context caching is designed for overlapping input prefixes.
Pattern
Cache chance
Why
Stable system prompt
High
Same prefix repeats
Agent tool instructions
High
Tool schema and rules repeat
RAG with fixed policy wrapper
Medium to high
Wrapper repeats, retrieved chunks vary
Chat with changing first message
Lower
Prefix changes too early
Randomized prompt templates
Lower
Cacheable prefix becomes unstable
If you want cache hits, keep the stable part of the prompt first. Put changing user content later.
API Fields To Log
DeepSeek's caching docs describe usage fields that separate cache hits from misses.
Field
What to store
Why
prompt_cache_hit_tokens
Cached input tokens
Proves hit volume
prompt_cache_miss_tokens
Non-cached input tokens
Shows expensive input
completion_tokens
Output tokens
Output can still dominate
total_tokens
Total token count
Basic usage reconciliation
Model name
V4 Flash or V4 Pro
Prevents mixed-rate confusion
Workflow ID
App task name
Lets you rank cost by workflow
Do not only log total tokens. Total tokens hide the difference between a $0.14/M input path and a $0.0028/M input path.
Best Workloads For Cache Hits
Workload
Cache value
Reason
RAG answer generation
High
System prompt and citation rules repeat
Coding agents
High
Tool schemas and repo instructions repeat
Customer support
High
Policy and tone rules repeat
Long-document analysis
Medium to high
Large document prefixes may repeat across tasks
Batch classification
Medium
Template repeats, payload changes
Open-ended chat
Low to medium
User history varies quickly
DeepSeek cache-hit pricing is most valuable when the app has a stable prefix and high request volume.
Routing Strategy
Step
Route
Reason
1
V4 Flash with stable prompt prefix
Lowest cost default
2
V4 Flash with cache-hit monitoring
Confirm real savings
3
V4 Pro for failed eval cases
Escalate only hard tasks
4
GPT or Claude fallback
Handle provider or quality risk
5
TokenMix.ai gateway
Centralize routing, billing, and fallback
TokenMix.ai fits when DeepSeek cache-hit savings are one part of a broader router. A common production stack is V4 Flash for cached low-cost traffic, V4 Pro for hard DeepSeek tasks, and GPT or Claude fallback for high-stakes failures.
Common Mistakes
Mistake
Result
Fix
Counting all input at cache-miss price
Overstates cost
Split hit and miss tokens
Assuming every repeat becomes a hit
Understates cost risk
Track actual hit rate
Randomizing the prompt prefix
Reduces cache hits
Keep stable instructions first
Sending everything to V4 Pro
Pays extra for easy tasks
Use Flash-first routing
Ignoring output length
Misses another cost driver
Cap output tokens
Using old aliases forever
Migration risk
Use explicit V4 model names
Final Recommendation
Use V4 Flash as the default DeepSeek route. Structure prompts so stable system instructions come first. Log cache-hit and cache-miss tokens separately. Escalate to V4 Pro only when evals show that Flash is not good enough.
For production, do not treat cache as magic. Treat it as a measured cost lever. If your hit rate is high, DeepSeek becomes much more economical than a simple input/output table suggests.
FAQ
What is DeepSeek cache hit pricing?
DeepSeek cache hit pricing is the discounted input rate for tokens that reuse cached context. For V4 Flash, cache-hit input is $0.0028 per 1M tokens.
How much cheaper is a DeepSeek cache hit?
For V4 Flash, a cache hit is 98% cheaper than a cache miss on input tokens. V4 Flash cache miss is $0.14/M, while cache hit is $0.0028/M.
Does DeepSeek cache output tokens?
No. Cache pricing applies to input tokens. Output tokens are still billed at the model output rate.
Which DeepSeek model has the best cache-hit price?
V4 Flash has the lowest absolute cache-hit input price at $0.0028 per 1M tokens. Discounted V4 Pro is slightly higher at $0.003625 per 1M tokens.
Is cache hit pricing guaranteed?
No. Cache hits depend on whether the system can reuse prior context. You should log actual hit and miss tokens instead of assuming a fixed hit rate.
How do I improve cache hit rate?
Keep stable system prompts, policy text, tool definitions, and formatting rules at the beginning of the prompt. Put changing user content after the stable prefix.
Should I use V4 Pro for cached workloads?
Only if V4 Pro wins your quality eval. V4 Pro is still more expensive than V4 Flash, even though its cache-hit input price is very low during the discount window.
Should I route DeepSeek through TokenMix.ai?
Use TokenMix.ai when you need DeepSeek plus other models, fallback routing, unified usage logs, and payment options such as Alipay or WeChat Pay.