TokenMix Research Lab · 2026-04-30

DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
DeepSeek cache hit pricing is the hidden cost lever in V4. On V4 Flash, cache-hit input costs $0.0028 per 1M tokens, while cache-miss input costs $0.14. That is a 98% input-cost reduction.
According to DeepSeek's official Models & Pricing page, V4 Flash is $0.0028 cache-hit input, $0.14 cache-miss input, and $0.28 output per 1M tokens. V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek's context caching guide says cache hits are reported through prompt_cache_hit_tokens and cache misses through prompt_cache_miss_tokens.
Table of Contents
- Quick Answer
- Confirmed Facts
- Cache Hit vs Cache Miss Pricing
- Cost Math
- How DeepSeek Cache Hits Work
- API Fields To Log
- Best Workloads For Cache Hits
- Routing Strategy
- Common Mistakes
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Answer
| Question | Answer |
|---|---|
| What is DeepSeek cache hit pricing? | A lower input price for tokens that reuse cached context |
| V4 Flash cache-hit price | $0.0028 per 1M input tokens |
| V4 Flash cache-miss price | $0.14 per 1M input tokens |
| V4 Flash input savings | 98% on cache-hit input |
| V4 Pro discounted cache-hit price | $0.003625 per 1M input tokens |
| Fields to log | prompt_cache_hit_tokens, prompt_cache_miss_tokens |
| Best workloads | RAG, agents, repeated system prompts, long-document workflows |
The practical rule: if your DeepSeek workload repeats a long prefix, cache-hit pricing can matter more than the base model price.
Confirmed Facts
| Claim | Status | Source |
|---|---|---|
| DeepSeek V4 Flash cache-hit input is $0.0028/M | Confirmed | DeepSeek pricing page |
| DeepSeek V4 Flash cache-miss input is $0.14/M | Confirmed | DeepSeek pricing page |
| DeepSeek V4 Flash output is $0.28/M | Confirmed | DeepSeek pricing page |
| DeepSeek V4 Pro discounted cache-hit input is $0.003625/M | Confirmed | DeepSeek pricing page |
| DeepSeek V4 Pro discount ends 2026-05-31 15:59 UTC | Confirmed | DeepSeek pricing page |
| DeepSeek exposes hit and miss token fields | Confirmed | DeepSeek context caching docs |
| Cache hits are guaranteed for every repeated prefix | False | Cache is an optimization, not a contract |
Cache Hit vs Cache Miss Pricing
All prices are per 1M tokens, checked on 2026-04-30.
| Model | Cache-hit input | Cache-miss input | Output | Hit savings vs miss |
|---|---|---|---|---|
| DeepSeek V4 Flash | $0.0028 | $0.14 | $0.28 | 98.0% |
| DeepSeek V4 Pro discounted | $0.003625 | $0.435 | $0.87 | 99.17% |
| DeepSeek V4 Pro full listed | $0.0145 | $1.74 | $3.48 | 99.17% |
Two things matter. First, V4 Flash is already cheap on cache misses. Second, cache hits make repeated input almost free.
Cost Math
Scenario 1: 1M repeated input tokens
| Model | Cache miss cost | Cache hit cost | Savings |
|---|---|---|---|
| V4 Flash | $0.14 | $0.0028 | $0.1372 |
| V4 Pro discounted | $0.435 | $0.003625 | $0.431375 |
| V4 Pro full listed | $1.74 | $0.0145 | $1.7255 |
Scenario 2: 100M repeated input tokens
| Model | Cache miss cost | Cache hit cost | Savings |
|---|---|---|---|
| V4 Flash | $14.00 | $0.28 | $13.72 |
| V4 Pro discounted | $43.50 | $0.3625 | $43.1375 |
| V4 Pro full listed | $174.00 | $1.45 | $172.55 |
Scenario 3: RAG app with repeated system prompt
| Variable | Value |
|---|---|
| System and policy prefix | 8,000 tokens |
| Requests per month | 200,000 |
| Repeated input volume | 1.6B tokens |
| V4 Flash cache-miss cost | $224 |
| V4 Flash cache-hit cost | $4.48 |
| Input savings if fully cached | $219.52 |
The savings can look small per request and large per month. That is exactly why cache-hit accounting belongs in your usage dashboard.
How DeepSeek Cache Hits Work
DeepSeek context caching is designed for overlapping input prefixes.
| Pattern | Cache chance | Why |
|---|---|---|
| Stable system prompt | High | Same prefix repeats |
| Agent tool instructions | High | Tool schema and rules repeat |
| RAG with fixed policy wrapper | Medium to high | Wrapper repeats, retrieved chunks vary |
| Chat with changing first message | Lower | Prefix changes too early |
| Randomized prompt templates | Lower | Cacheable prefix becomes unstable |
If you want cache hits, keep the stable part of the prompt first. Put changing user content later.
API Fields To Log
DeepSeek's caching docs describe usage fields that separate cache hits from misses.
| Field | What to store | Why |
|---|---|---|
prompt_cache_hit_tokens |
Cached input tokens | Proves hit volume |
prompt_cache_miss_tokens |
Non-cached input tokens | Shows expensive input |
completion_tokens |
Output tokens | Output can still dominate |
total_tokens |
Total token count | Basic usage reconciliation |
| Model name | V4 Flash or V4 Pro | Prevents mixed-rate confusion |
| Workflow ID | App task name | Lets you rank cost by workflow |
Do not only log total tokens. Total tokens hide the difference between a $0.14/M input path and a $0.0028/M input path.
Best Workloads For Cache Hits
| Workload | Cache value | Reason |
|---|---|---|
| RAG answer generation | High | System prompt and citation rules repeat |
| Coding agents | High | Tool schemas and repo instructions repeat |
| Customer support | High | Policy and tone rules repeat |
| Long-document analysis | Medium to high | Large document prefixes may repeat across tasks |
| Batch classification | Medium | Template repeats, payload changes |
| Open-ended chat | Low to medium | User history varies quickly |
DeepSeek cache-hit pricing is most valuable when the app has a stable prefix and high request volume.
Routing Strategy
| Step | Route | Reason |
|---|---|---|
| 1 | V4 Flash with stable prompt prefix | Lowest cost default |
| 2 | V4 Flash with cache-hit monitoring | Confirm real savings |
| 3 | V4 Pro for failed eval cases | Escalate only hard tasks |
| 4 | GPT or Claude fallback | Handle provider or quality risk |
| 5 | TokenMix.ai gateway | Centralize routing, billing, and fallback |
TokenMix.ai fits when DeepSeek cache-hit savings are one part of a broader router. A common production stack is V4 Flash for cached low-cost traffic, V4 Pro for hard DeepSeek tasks, and GPT or Claude fallback for high-stakes failures.
Common Mistakes
| Mistake | Result | Fix |
|---|---|---|
| Counting all input at cache-miss price | Overstates cost | Split hit and miss tokens |
| Assuming every repeat becomes a hit | Understates cost risk | Track actual hit rate |
| Randomizing the prompt prefix | Reduces cache hits | Keep stable instructions first |
| Sending everything to V4 Pro | Pays extra for easy tasks | Use Flash-first routing |
| Ignoring output length | Misses another cost driver | Cap output tokens |
| Using old aliases forever | Migration risk | Use explicit V4 model names |
Final Recommendation
Use V4 Flash as the default DeepSeek route. Structure prompts so stable system instructions come first. Log cache-hit and cache-miss tokens separately. Escalate to V4 Pro only when evals show that Flash is not good enough.
For production, do not treat cache as magic. Treat it as a measured cost lever. If your hit rate is high, DeepSeek becomes much more economical than a simple input/output table suggests.
FAQ
What is DeepSeek cache hit pricing?
DeepSeek cache hit pricing is the discounted input rate for tokens that reuse cached context. For V4 Flash, cache-hit input is $0.0028 per 1M tokens.
How much cheaper is a DeepSeek cache hit?
For V4 Flash, a cache hit is 98% cheaper than a cache miss on input tokens. V4 Flash cache miss is $0.14/M, while cache hit is $0.0028/M.
Does DeepSeek cache output tokens?
No. Cache pricing applies to input tokens. Output tokens are still billed at the model output rate.
Which DeepSeek model has the best cache-hit price?
V4 Flash has the lowest absolute cache-hit input price at $0.0028 per 1M tokens. Discounted V4 Pro is slightly higher at $0.003625 per 1M tokens.
Is cache hit pricing guaranteed?
No. Cache hits depend on whether the system can reuse prior context. You should log actual hit and miss tokens instead of assuming a fixed hit rate.
How do I improve cache hit rate?
Keep stable system prompts, policy text, tool definitions, and formatting rules at the beginning of the prompt. Put changing user content after the stable prefix.
Should I use V4 Pro for cached workloads?
Only if V4 Pro wins your quality eval. V4 Pro is still more expensive than V4 Flash, even though its cache-hit input price is very low during the discount window.
Should I route DeepSeek through TokenMix.ai?
Use TokenMix.ai when you need DeepSeek plus other models, fallback routing, unified usage logs, and payment options such as Alipay or WeChat Pay.
Related Articles
- DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes
- DeepSeek API Tutorial 2026
- DeepSeek API Key 2026
- OpenAI vs DeepSeek Cost 2026
- DeepSeek V4 Review 2026
- LLM API Pricing 2026
- AI Gateway Caching 2026
- LLM API Gateway Guide