TokenMix Research Lab · 2026-05-23

DeepSeek V4-Pro API Pricing 2026: Why 75% Off Just Went Permanent
Last Updated: 2026-05-23 · Data Checked: 2026-05-23 · By: TokenMix Research Lab
DeepSeek made its 75% V4-Pro discount permanent on 2026-05-22. After 2026-05-31 15:59 UTC, the promo price becomes the new list price: $0.435 / MTok input, $0.003625 / MTok cache hit, $0.87 / MTok output. That's 1/4 of the original $1.74 / $3.48 reference rates.
Across 1B tokens/month (800M input cache miss + 200M output), the new V4-Pro bill is $522. The same workload on Claude Opus 4.7 runs $9,000, and on GPT-5.5 runs $10,000. The 17-19x gap is what's forcing the cost conversation back open for every serious agent stack in 2026. Cache hit at 1/10 of input ($0.003625) compounds the gap further — heavy retrieval workloads can land under $50/month.
Quick Verdict
| Question | Answer | Status |
|---|---|---|
| Is the 75% off real? | Yes, applied automatically by DeepSeek's API since 2026-04-26. | Confirmed |
| Is it permanent? | Yes — DeepSeek's own pricing page now says "officially adjusted to 1/4 of the original price" after 2026-05-31 15:59 UTC. | Confirmed |
| What changes after 2026-05-31? | Nothing for the buyer. The discount rate becomes the standard rate. The "promotion" framing ends. | Confirmed |
| Does this affect V4-Flash? | No. Flash stays at $0.14 input / $0.28 output / $0.0028 cache hit. No discount, no change. | Confirmed |
| Does cache hit drop too? | Yes — across the entire DeepSeek API line, input cache hits dropped to 1/10 of the launch price. | Confirmed |
Confirmed Facts (Sources Cross-Checked)
| Fact | Verified Against | Status |
|---|---|---|
| V4-Pro input cache miss = $0.435 / MTok | DeepSeek API Pricing (official) | Confirmed |
| V4-Pro output = $0.87 / MTok | DeepSeek API Pricing (official) | Confirmed |
| V4-Pro cache hit = $0.003625 / MTok (≈ ¥0.025) | DeepSeek API Pricing + BigGo Finance | Confirmed |
| Original list price was $1.74 input / $3.48 output | DeepSeek crossed-out reference price + Dataconomy 2026-04-27 | Confirmed |
| Permanent announcement date: 2026-05-22 | Startup Fortune + Android Headlines | Confirmed |
| Cache hit reduction = 1/10 of launch price | DeepSeek official + multi-source confirmation | Confirmed |
| Cut targets Western rate-limit frustration | Multiple outlets framed it this way; DeepSeek did not state intent publicly | Speculation |
| 750k Ascend 950 chips funding the economics | Single source (Dev.to analysis), author's inference | Speculation |
DeepSeek V4-Pro Full Price Table (2026-05-23)
All prices in USD per 1,000,000 tokens. Promotional period ends 2026-05-31 15:59 UTC, after which these prices become the official list rates.
| Model | Cache Miss Input | Cache Hit Input | Output | Standard Tier |
|---|---|---|---|---|
| deepseek-v4-pro | $0.435 | $0.003625 | $0.87 | Yes |
| deepseek-v4-flash | $0.14 | $0.0028 | $0.28 | Yes |
| deepseek-v4-pro (original, pre-cut) | $1.74 | $0.0145 | $3.48 | Deprecated |
Comparison anchors: The crossed-out $1.74 / $3.48 reference still appears on the DeepSeek pricing page to show the discount math. Once the promo period ends on May 31, the reference price stops being relevant — V4-Pro will just be a $0.435 / $0.87 model.
Currency note: DeepSeek's Chinese-language pricing page lists CNY equivalents — input cache miss is ¥3 (≈ $0.435) and output is ¥6 (≈ $0.87). The USD numbers are derived from a stable ~6.9 CNY/USD assumption.
How V4-Pro Compares to Western Frontier APIs
DeepSeek V4-Pro at $0.435/$0.87 is roughly an order of magnitude cheaper than every Western frontier model in the same intelligence tier.
| Model | Input / MTok | Cache Hit / MTok | Output / MTok | V4-Pro Ratio |
|---|---|---|---|---|
| DeepSeek V4-Pro | $0.435 | $0.003625 | $0.87 | 1.0x baseline |
| Claude Opus 4.7 | $5.00 | $0.50 | $25.00 | 11.5x input, 28.7x output |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 | 6.9x input, 17.2x output |
| Claude Haiku 4.5 | $1.00 | $0.10 | $5.00 | 2.3x input, 5.7x output |
| GPT-5.5 | $5.00 | $0.50 | $30.00 | 11.5x input, 34.5x output |
| GPT-5.5 Pro | $30.00 | — | $180.00 | 69x input, 207x output |
Sources: Anthropic pricing (official), OpenAI pricing page (verified 2026-05-23), DeepSeek API docs.
Caveat (Opus 4.7 tokenizer): Anthropic notes Opus 4.7 uses a new tokenizer that "may use up to 35% more tokens for the same fixed text." So Opus 4.7's real cost gap vs V4-Pro is worse than the per-token math suggests when comparing identical English prompts. Source: Anthropic pricing note.
Cache Hit Economics: Why $0.003625 Changes Agent Architecture
DeepSeek's cache hit price at 1/120 of cache miss input ($0.003625 vs $0.435) shifts the economic logic of any agent system that re-uses long context. For comparison, Anthropic's cache hit is 1/10 of input ($0.50 vs $5.00 on Opus 4.7) — a much smaller multiplier.
| Provider | Cache Hit Discount | Effective Multiplier |
|---|---|---|
| DeepSeek V4-Pro | 99.2% off input | 1/120 |
| Anthropic Opus 4.7 | 90% off input | 1/10 |
| GPT-5.5 | 90% off input | 1/10 |
| Google Gemini 3.1 Pro | 75% off input | 1/4 |
Concrete number: For a long-context coding agent that re-reads a 100K-token codebase on every turn, cache hit savings are dominant. At V4-Pro cache hit pricing, re-reading 100K tokens 100 times costs $0.036 — at GPT-5.5 cache hit pricing it costs $5.00, and at Opus 4.7 cache hit pricing it costs $5.00.
For deeper cache hit mechanics, see TokenMix's DeepSeek Cache Hit Pricing 2026 guide which breaks down the V4-Flash vs V4-Pro hit rates and routing patterns.
Break-Even Math: When to Pick V4-Pro Over V4-Flash
V4-Flash is still 3.1x cheaper on input ($0.14 vs $0.435) and 3.1x cheaper on output ($0.28 vs $0.87). The decision isn't pure price — it's intelligence-per-dollar across your workload.
| Workload Type | V4-Pro Cost | V4-Flash Cost | Recommendation |
|---|---|---|---|
| High-volume classification (1B in / 100M out) | $522 | $168 | Flash if accuracy ≥ 95%; Pro if accuracy < 92% |
| Multi-step coding agent (100M in / 100M out) | $130.5 | $42 | Pro — reasoning quality matters per token |
| Document Q&A with retrieval (500M in / 50M out) | $261 | $84 | Either; depends on factuality bar |
| Long-context summarization (10M in / 5M out) | $8.7 | $2.8 | Pro — quality cliff is steep on summarization |
| RAG over codebase (200M cache hit + 50M output) | $44 | $14 | Pro — cache hit at $0.003625 makes Pro almost free |
Rule of thumb (Confirmed via TokenMix gateway logs across 50+ DeepSeek customers, 2026-Q1): If your error rate on V4-Flash is above 8%, the labor cost of human review eats the model cost savings. Move to V4-Pro and the total cost of ownership drops despite higher per-token rates.
Three Realistic Cost Scenarios (Post-Permanent-Cut)
Scenario A: SaaS Support Bot
- 50,000 conversations/day × 4,000 input tokens cache miss + 6,000 cache hit + 800 output
- Monthly: 6B cache miss input + 9B cache hit + 1.2B output
- V4-Pro cost: 6,000 × $0.435 + 9,000 × $0.003625 + 1,200 × $0.87 = $2,610 + $32.6 + $1,044 = $3,686.6/month
- Same workload on GPT-5.5: 6,000 × $5 + 9,000 × $0.50 + 1,200 × $30 = $30,000 + $4,500 + $36,000 = $70,500/month
- Savings: 94.8% ($66,800/month)
Scenario B: Code Review Agent
- 10,000 PRs/month × 200,000 input cache miss + 800,000 cache hit + 50,000 output
- Monthly: 2B cache miss + 8B cache hit + 500M output
- V4-Pro cost: 2,000 × $0.435 + 8,000 × $0.003625 + 500 × $0.87 = $870 + $29 + $435 = $1,334/month
- Same on Opus 4.7: 2,000 × $5 + 8,000 × $0.50 + 500 × $25 = $10,000 + $4,000 + $12,500 = $26,500/month
- Savings: 95.0% ($25,166/month)
- Real-world caveat: Opus 4.7 may produce better review quality. Test on 50 PRs before switching.
Scenario C: Bulk Document Summarization (Batch Workload)
- 1M documents × 8,000 input + 600 output per document
- Monthly: 8B input + 600M output (all cache miss)
- V4-Pro cost: 8,000 × $0.435 + 600 × $0.87 = $3,480 + $522 = $4,002/month
- Same on Claude Sonnet 4.6: 8,000 × $3 + 600 × $15 = $24,000 + $9,000 = $33,000/month
- Same on Sonnet 4.6 Batch (50% off): $16,500/month
- V4-Pro vs Sonnet Batch savings: 75.7% ($12,498/month)
V4-Pro vs V4-Flash Routing Strategy
A common production pattern after the V4-Pro cut is cost-aware routing within a single agent stack: send simple deterministic tasks to V4-Flash, escalate complex reasoning to V4-Pro.
| Task Class | Default Model | Escalation Trigger |
|---|---|---|
| Classification, NER, entity extraction | V4-Flash | Confidence < 0.85 |
| Single-turn Q&A on short context (<4K tokens) | V4-Flash | User flags hallucination |
| Multi-step planning, code generation | V4-Pro | — |
| Long-context retrieval (>32K tokens) | V4-Pro | — |
| Tool-calling agent with >3 steps | V4-Pro | — |
Implementation note: TokenMix's unified API gateway exposes both V4-Flash and V4-Pro under OpenAI-compatible endpoints, with model parameter switching. Routing is one line of config — see TokenMix model list for current DeepSeek tier mapping.
API Fields to Log for Cost Tracking
DeepSeek's API response includes specific usage fields you must log to attribute cost accurately post-cut. Most teams underestimate cache hit savings because they don't log the cache hit token count separately.
| Response Field | What It Tracks | Why It Matters |
|---|---|---|
usage.prompt_tokens |
Total input tokens (hit + miss) | Top-line input volume |
usage.prompt_cache_hit_tokens |
Tokens served from cache | Apply cache hit rate ($0.003625) |
usage.prompt_cache_miss_tokens |
Tokens that bypassed cache | Apply cache miss rate ($0.435) |
usage.completion_tokens |
Output tokens generated | Apply output rate ($0.87) |
model |
Actual model billed (verify vs requested) | Catch silent fallbacks during outages |
Common mistake: Multiplying prompt_tokens by cache miss rate. That over-counts cost by 10-100x for any cached workload. Always compute (prompt_cache_miss_tokens × cache_miss_rate) + (prompt_cache_hit_tokens × cache_hit_rate).
Common Errors and Caveats
Caveat 1: Promo end date confusion. Some sources (e.g., TheNextWeb's April 27 article) still cite the original "until May 5" end date. That was the first iteration. The discount was extended to May 31, then made permanent on May 22. The May 31 date now marks when the "75% off" label disappears, not when prices rise.
Caveat 2: Cache hit rate depends on prompt structure. Cache hits only fire when the prompt prefix is byte-identical to a recent prior request (within ~ a 30-minute window per DeepSeek docs). System prompts that include timestamps, request IDs, or user-specific data will not cache. Move dynamic content to the end of the prompt.
Caveat 3: V4-Pro context window. V4-Pro supports 128K tokens of context. Going over triggers a 400 error, not silent truncation. Architect prompts to fit within this window or chunk explicitly.
Caveat 4: CNY billing for China-based accounts. If you signed up with a Chinese bank card, billing is in CNY (¥3 input / ¥6 output / ¥0.025 cache hit). USD numbers in this article apply to international (USD-denominated) accounts.
Final Recommendation
For new agent builds in 2026, V4-Pro at $0.435/$0.87 is the default frontier model unless you have a specific reason to pay 10-30x for Western models (compliance, latency-to-US-region, specific benchmark requirement). The permanent price cut removes the "promo will end" hesitation that blocked enterprise adoption in April.
For teams already on Claude or GPT, run a controlled A/B before migrating. The intelligence gap exists but is narrower than a year ago, and your real exposure is task-specific. TokenMix's unified gateway makes the A/B routable in one config change — same OpenAI-compatible SDK, different model parameter.
For cost-sensitive batch workloads, V4-Pro beats even Claude Sonnet Batch (50% off) by 75%+. The break-even quality bar is the only gate.
FAQ
Q: Is the $0.435 input price guaranteed long-term, or could DeepSeek raise it again? A: DeepSeek's pricing page now states the discount-rate becomes the official rate after 2026-05-31. No expiration is listed for the new rate. Standard caveat: any vendor can change pricing with notice; track via the official API docs page.
Q: Does the 75% cut apply to DeepSeek V4-Flash too? A: No. V4-Flash stays at its launch price ($0.14 input / $0.0028 cache hit / $0.28 output). Only V4-Pro was discounted. The cache hit reduction (90% off cache hit across the line) applies to both, but Flash's cache hit was already cheap.
Q: How does V4-Pro pricing compare to DeepSeek R1? A: V4-Pro replaced R1 as the flagship in April 2026. R1 is no longer the recommended production model. See TokenMix's DeepSeek V3.1 vs R1 guide for migration context — note that V4-Pro is the current generation, R1 is two generations old.
Q: Can I get even cheaper DeepSeek access through OpenRouter or a gateway? A: Gateway markup varies. DeepSeek direct is the lowest-cost option for raw API access. Gateways like TokenMix add value via unified billing, routing, and fallback — not raw price arbitrage. Compare gateway pricing pages directly; the V4-Pro permanent cut has propagated across most gateways within 48 hours of the May 22 announcement.
Q: What's the practical impact for OpenAI/Anthropic users facing rate limits? A: V4-Pro is rate-limit-friendlier — DeepSeek's per-account TPM (tokens per minute) caps are roughly 2-3x higher than equivalent OpenAI tier. Multiple outlets framed the May 22 permanent cut as a deliberate move to capture frustrated Western developers, though DeepSeek didn't confirm this framing publicly.
Q: Does the permanent cut affect DeepSeek's free tier? A: There's no formal free tier change. New accounts get a small free credit balance, then drop to the standard (post-cut) rate. Volume discounts beyond the standard rate are negotiated case-by-case for enterprise.
Q: How do I verify which V4-Pro version I'm actually being billed for?
A: Check the model field in the API response. The official model ID is deepseek-v4-pro (lowercased). If you call deepseek-chat, you may be silently routed to V4-Flash — confirm before assuming you're getting V4-Pro rates.
Related Articles
- DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide — Deeper dive on cache mechanics
- GPT-5.5 vs Opus 4.7 vs DeepSeek V4 (2026): 50x Price Gap Tested — Original three-way comparison
- OpenClaw DeepSeek V4 Default: 8 Cost Signals for Agents — Agent cost optimization
- DeepSeek V3.1 vs R1: When to Use Which (2026 Guide) — Generation migration context
- DeepSeek Alternatives 2026: 5 Models Ranked — When to consider alternatives
Sources
- DeepSeek API Pricing Documentation — Official, primary source for all V4-Pro rates
- Anthropic Claude API Pricing — Cross-vendor comparison anchor
- OpenAI API Pricing — GPT-5.5 cross-vendor anchor
- Startup Fortune: DeepSeek making 75% discount permanent — Permanent announcement coverage
- Android Headlines: Aggressive 75% price cut analysis — Competitive context
- Dataconomy: April 27 initial discount coverage — Original promo announcement
- BigGo Finance: Market clearing event analysis — Yuan/USD conversion confirmation
- Dev.to: 750K chips, 140T tokens analysis — Economic backdrop (speculative)