TokenMix Research Lab · 2026-04-30

DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide

DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

DeepSeek cache hit pricing is the hidden cost lever in V4. On V4 Flash, cache-hit input costs $0.0028 per 1M tokens, while cache-miss input costs $0.14. That is a 98% input-cost reduction.

According to DeepSeek's official Models & Pricing page, V4 Flash is $0.0028 cache-hit input, $0.14 cache-miss input, and $0.28 output per 1M tokens. V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek's context caching guide says cache hits are reported through prompt_cache_hit_tokens and cache misses through prompt_cache_miss_tokens.

Table of Contents

Quick Answer

Question Answer
What is DeepSeek cache hit pricing? A lower input price for tokens that reuse cached context
V4 Flash cache-hit price $0.0028 per 1M input tokens
V4 Flash cache-miss price $0.14 per 1M input tokens
V4 Flash input savings 98% on cache-hit input
V4 Pro discounted cache-hit price $0.003625 per 1M input tokens
Fields to log prompt_cache_hit_tokens, prompt_cache_miss_tokens
Best workloads RAG, agents, repeated system prompts, long-document workflows

The practical rule: if your DeepSeek workload repeats a long prefix, cache-hit pricing can matter more than the base model price.

Confirmed Facts

Claim Status Source
DeepSeek V4 Flash cache-hit input is $0.0028/M Confirmed DeepSeek pricing page
DeepSeek V4 Flash cache-miss input is $0.14/M Confirmed DeepSeek pricing page
DeepSeek V4 Flash output is $0.28/M Confirmed DeepSeek pricing page
DeepSeek V4 Pro discounted cache-hit input is $0.003625/M Confirmed DeepSeek pricing page
DeepSeek V4 Pro discount ends 2026-05-31 15:59 UTC Confirmed DeepSeek pricing page
DeepSeek exposes hit and miss token fields Confirmed DeepSeek context caching docs
Cache hits are guaranteed for every repeated prefix False Cache is an optimization, not a contract

Cache Hit vs Cache Miss Pricing

All prices are per 1M tokens, checked on 2026-04-30.

Model Cache-hit input Cache-miss input Output Hit savings vs miss
DeepSeek V4 Flash $0.0028 $0.14 $0.28 98.0%
DeepSeek V4 Pro discounted $0.003625 $0.435 $0.87 99.17%
DeepSeek V4 Pro full listed $0.0145 .74 $3.48 99.17%

Two things matter. First, V4 Flash is already cheap on cache misses. Second, cache hits make repeated input almost free.

Cost Math

Scenario 1: 1M repeated input tokens

Model Cache miss cost Cache hit cost Savings
V4 Flash $0.14 $0.0028 $0.1372
V4 Pro discounted $0.435 $0.003625 $0.431375
V4 Pro full listed .74 $0.0145 .7255

Scenario 2: 100M repeated input tokens

Model Cache miss cost Cache hit cost Savings
V4 Flash 4.00 $0.28 3.72
V4 Pro discounted $43.50 $0.3625 $43.1375
V4 Pro full listed 74.00 .45 72.55

Scenario 3: RAG app with repeated system prompt

Variable Value
System and policy prefix 8,000 tokens
Requests per month 200,000
Repeated input volume 1.6B tokens
V4 Flash cache-miss cost $224
V4 Flash cache-hit cost $4.48
Input savings if fully cached $219.52

The savings can look small per request and large per month. That is exactly why cache-hit accounting belongs in your usage dashboard.

How DeepSeek Cache Hits Work

DeepSeek context caching is designed for overlapping input prefixes.

Pattern Cache chance Why
Stable system prompt High Same prefix repeats
Agent tool instructions High Tool schema and rules repeat
RAG with fixed policy wrapper Medium to high Wrapper repeats, retrieved chunks vary
Chat with changing first message Lower Prefix changes too early
Randomized prompt templates Lower Cacheable prefix becomes unstable

If you want cache hits, keep the stable part of the prompt first. Put changing user content later.

API Fields To Log

DeepSeek's caching docs describe usage fields that separate cache hits from misses.

Field What to store Why
prompt_cache_hit_tokens Cached input tokens Proves hit volume
prompt_cache_miss_tokens Non-cached input tokens Shows expensive input
completion_tokens Output tokens Output can still dominate
total_tokens Total token count Basic usage reconciliation
Model name V4 Flash or V4 Pro Prevents mixed-rate confusion
Workflow ID App task name Lets you rank cost by workflow

Do not only log total tokens. Total tokens hide the difference between a $0.14/M input path and a $0.0028/M input path.

Best Workloads For Cache Hits

Workload Cache value Reason
RAG answer generation High System prompt and citation rules repeat
Coding agents High Tool schemas and repo instructions repeat
Customer support High Policy and tone rules repeat
Long-document analysis Medium to high Large document prefixes may repeat across tasks
Batch classification Medium Template repeats, payload changes
Open-ended chat Low to medium User history varies quickly

DeepSeek cache-hit pricing is most valuable when the app has a stable prefix and high request volume.

Routing Strategy

Step Route Reason
1 V4 Flash with stable prompt prefix Lowest cost default
2 V4 Flash with cache-hit monitoring Confirm real savings
3 V4 Pro for failed eval cases Escalate only hard tasks
4 GPT or Claude fallback Handle provider or quality risk
5 TokenMix.ai gateway Centralize routing, billing, and fallback

TokenMix.ai fits when DeepSeek cache-hit savings are one part of a broader router. A common production stack is V4 Flash for cached low-cost traffic, V4 Pro for hard DeepSeek tasks, and GPT or Claude fallback for high-stakes failures.

Common Mistakes

Mistake Result Fix
Counting all input at cache-miss price Overstates cost Split hit and miss tokens
Assuming every repeat becomes a hit Understates cost risk Track actual hit rate
Randomizing the prompt prefix Reduces cache hits Keep stable instructions first
Sending everything to V4 Pro Pays extra for easy tasks Use Flash-first routing
Ignoring output length Misses another cost driver Cap output tokens
Using old aliases forever Migration risk Use explicit V4 model names

Final Recommendation

Use V4 Flash as the default DeepSeek route. Structure prompts so stable system instructions come first. Log cache-hit and cache-miss tokens separately. Escalate to V4 Pro only when evals show that Flash is not good enough.

For production, do not treat cache as magic. Treat it as a measured cost lever. If your hit rate is high, DeepSeek becomes much more economical than a simple input/output table suggests.

FAQ

What is DeepSeek cache hit pricing?

DeepSeek cache hit pricing is the discounted input rate for tokens that reuse cached context. For V4 Flash, cache-hit input is $0.0028 per 1M tokens.

How much cheaper is a DeepSeek cache hit?

For V4 Flash, a cache hit is 98% cheaper than a cache miss on input tokens. V4 Flash cache miss is $0.14/M, while cache hit is $0.0028/M.

Does DeepSeek cache output tokens?

No. Cache pricing applies to input tokens. Output tokens are still billed at the model output rate.

Which DeepSeek model has the best cache-hit price?

V4 Flash has the lowest absolute cache-hit input price at $0.0028 per 1M tokens. Discounted V4 Pro is slightly higher at $0.003625 per 1M tokens.

Is cache hit pricing guaranteed?

No. Cache hits depend on whether the system can reuse prior context. You should log actual hit and miss tokens instead of assuming a fixed hit rate.

How do I improve cache hit rate?

Keep stable system prompts, policy text, tool definitions, and formatting rules at the beginning of the prompt. Put changing user content after the stable prefix.

Should I use V4 Pro for cached workloads?

Only if V4 Pro wins your quality eval. V4 Pro is still more expensive than V4 Flash, even though its cache-hit input price is very low during the discount window.

Should I route DeepSeek through TokenMix.ai?

Use TokenMix.ai when you need DeepSeek plus other models, fallback routing, unified usage logs, and payment options such as Alipay or WeChat Pay.

Related Articles

Sources