TokenMix Research Lab · 2026-05-23

DeepSeek V4-Pro API Pricing 2026: Why 75% Off Just Went Permanent

Last Updated: 2026-05-23 · Data Checked: 2026-05-23 · By: TokenMix Research Lab

DeepSeek made its 75% V4-Pro discount permanent on 2026-05-22. After 2026-05-31 15:59 UTC, the promo price becomes the new list price: $0.435 / MTok input, $0.003625 / MTok cache hit, $0.87 / MTok output. That's 1/4 of the original $1.74 / $3.48 reference rates.

Across 1B tokens/month (800M input cache miss + 200M output), the new V4-Pro bill is $522. The same workload on Claude Opus 4.7 runs $9,000, and on GPT-5.5 runs $10,000. The 17-19x gap is what's forcing the cost conversation back open for every serious agent stack in 2026. Cache hit at 1/10 of input ($0.003625) compounds the gap further — heavy retrieval workloads can land under $50/month.

Quick Verdict

Question	Answer	Status
Is the 75% off real?	Yes, applied automatically by DeepSeek's API since 2026-04-26.	Confirmed
Is it permanent?	Yes — DeepSeek's own pricing page now says "officially adjusted to 1/4 of the original price" after 2026-05-31 15:59 UTC.	Confirmed
What changes after 2026-05-31?	Nothing for the buyer. The discount rate becomes the standard rate. The "promotion" framing ends.	Confirmed
Does this affect V4-Flash?	No. Flash stays at $0.14 input / $0.28 output / $0.0028 cache hit. No discount, no change.	Confirmed
Does cache hit drop too?	Yes — across the entire DeepSeek API line, input cache hits dropped to 1/10 of the launch price.	Confirmed

Confirmed Facts (Sources Cross-Checked)

Fact	Verified Against	Status
V4-Pro input cache miss = $0.435 / MTok	DeepSeek API Pricing (official)	Confirmed
V4-Pro output = $0.87 / MTok	DeepSeek API Pricing (official)	Confirmed
V4-Pro cache hit = $0.003625 / MTok (≈ ¥0.025)	DeepSeek API Pricing + BigGo Finance	Confirmed
Original list price was $1.74 input / $3.48 output	DeepSeek crossed-out reference price + Dataconomy 2026-04-27	Confirmed
Permanent announcement date: 2026-05-22	Startup Fortune + Android Headlines	Confirmed
Cache hit reduction = 1/10 of launch price	DeepSeek official + multi-source confirmation	Confirmed
Cut targets Western rate-limit frustration	Multiple outlets framed it this way; DeepSeek did not state intent publicly	Speculation
750k Ascend 950 chips funding the economics	Single source (Dev.to analysis), author's inference	Speculation

DeepSeek V4-Pro Full Price Table (2026-05-23)

All prices in USD per 1,000,000 tokens. Promotional period ends 2026-05-31 15:59 UTC, after which these prices become the official list rates.

Model	Cache Miss Input	Cache Hit Input	Output	Standard Tier
deepseek-v4-pro	$0.435	$0.003625	$0.87	Yes
deepseek-v4-flash	$0.14	$0.0028	$0.28	Yes
deepseek-v4-pro (original, pre-cut)	$1.74	$0.0145	$3.48	Deprecated

Comparison anchors: The crossed-out $1.74 / $3.48 reference still appears on the DeepSeek pricing page to show the discount math. Once the promo period ends on May 31, the reference price stops being relevant — V4-Pro will just be a $0.435 / $0.87 model.

Currency note: DeepSeek's Chinese-language pricing page lists CNY equivalents — input cache miss is ¥3 (≈ $0.435) and output is ¥6 (≈ $0.87). The USD numbers are derived from a stable ~6.9 CNY/USD assumption.

How V4-Pro Compares to Western Frontier APIs

DeepSeek V4-Pro at $0.435/$0.87 is roughly an order of magnitude cheaper than every Western frontier model in the same intelligence tier.

Model	Input / MTok	Cache Hit / MTok	Output / MTok	V4-Pro Ratio
DeepSeek V4-Pro	$0.435	$0.003625	$0.87	1.0x baseline
Claude Opus 4.7	$5.00	$0.50	$25.00	11.5x input, 28.7x output
Claude Sonnet 4.6	$3.00	$0.30	$15.00	6.9x input, 17.2x output
Claude Haiku 4.5	$1.00	$0.10	$5.00	2.3x input, 5.7x output
GPT-5.5	$5.00	$0.50	$30.00	11.5x input, 34.5x output
GPT-5.5 Pro	$30.00	—	$180.00	69x input, 207x output

Sources: Anthropic pricing (official), OpenAI pricing page (verified 2026-05-23), DeepSeek API docs.

Caveat (Opus 4.7 tokenizer): Anthropic notes Opus 4.7 uses a new tokenizer that "may use up to 35% more tokens for the same fixed text." So Opus 4.7's real cost gap vs V4-Pro is worse than the per-token math suggests when comparing identical English prompts. Source: Anthropic pricing note.

Cache Hit Economics: Why $0.003625 Changes Agent Architecture

DeepSeek's cache hit price at 1/120 of cache miss input ($0.003625 vs $0.435) shifts the economic logic of any agent system that re-uses long context. For comparison, Anthropic's cache hit is 1/10 of input ($0.50 vs $5.00 on Opus 4.7) — a much smaller multiplier.

Provider	Cache Hit Discount	Effective Multiplier
DeepSeek V4-Pro	99.2% off input	1/120
Anthropic Opus 4.7	90% off input	1/10
GPT-5.5	90% off input	1/10
Google Gemini 3.1 Pro	75% off input	1/4

Concrete number: For a long-context coding agent that re-reads a 100K-token codebase on every turn, cache hit savings are dominant. At V4-Pro cache hit pricing, re-reading 100K tokens 100 times costs $0.036 — at GPT-5.5 cache hit pricing it costs $5.00, and at Opus 4.7 cache hit pricing it costs $5.00.

For deeper cache hit mechanics, see TokenMix's DeepSeek Cache Hit Pricing 2026 guide which breaks down the V4-Flash vs V4-Pro hit rates and routing patterns.

Break-Even Math: When to Pick V4-Pro Over V4-Flash

V4-Flash is still 3.1x cheaper on input ($0.14 vs $0.435) and 3.1x cheaper on output ($0.28 vs $0.87). The decision isn't pure price — it's intelligence-per-dollar across your workload.

Workload Type	V4-Pro Cost	V4-Flash Cost	Recommendation
High-volume classification (1B in / 100M out)	$522	$168	Flash if accuracy ≥ 95%; Pro if accuracy < 92%
Multi-step coding agent (100M in / 100M out)	$130.5	$42	Pro — reasoning quality matters per token
Document Q&A with retrieval (500M in / 50M out)	$261	$84	Either; depends on factuality bar
Long-context summarization (10M in / 5M out)	$8.7	$2.8	Pro — quality cliff is steep on summarization
RAG over codebase (200M cache hit + 50M output)	$44	$14	Pro — cache hit at $0.003625 makes Pro almost free

Rule of thumb (Confirmed via TokenMix gateway logs across 50+ DeepSeek customers, 2026-Q1): If your error rate on V4-Flash is above 8%, the labor cost of human review eats the model cost savings. Move to V4-Pro and the total cost of ownership drops despite higher per-token rates.

Three Realistic Cost Scenarios (Post-Permanent-Cut)

Scenario A: SaaS Support Bot

50,000 conversations/day × 4,000 input tokens cache miss + 6,000 cache hit + 800 output
Monthly: 6B cache miss input + 9B cache hit + 1.2B output
V4-Pro cost: 6,000 × $0.435 + 9,000 × $0.003625 + 1,200 × $0.87 = $2,610 + $32.6 + $1,044 = $3,686.6/month
Same workload on GPT-5.5: 6,000 × $5 + 9,000 × $0.50 + 1,200 × $30 = $30,000 + $4,500 + $36,000 = $70,500/month
Savings: 94.8% ($66,800/month)

Scenario B: Code Review Agent

10,000 PRs/month × 200,000 input cache miss + 800,000 cache hit + 50,000 output
Monthly: 2B cache miss + 8B cache hit + 500M output
V4-Pro cost: 2,000 × $0.435 + 8,000 × $0.003625 + 500 × $0.87 = $870 + $29 + $435 = $1,334/month
Same on Opus 4.7: 2,000 × $5 + 8,000 × $0.50 + 500 × $25 = $10,000 + $4,000 + $12,500 = $26,500/month
Savings: 95.0% ($25,166/month)
Real-world caveat: Opus 4.7 may produce better review quality. Test on 50 PRs before switching.

Scenario C: Bulk Document Summarization (Batch Workload)

1M documents × 8,000 input + 600 output per document
Monthly: 8B input + 600M output (all cache miss)
V4-Pro cost: 8,000 × $0.435 + 600 × $0.87 = $3,480 + $522 = $4,002/month
Same on Claude Sonnet 4.6: 8,000 × $3 + 600 × $15 = $24,000 + $9,000 = $33,000/month
Same on Sonnet 4.6 Batch (50% off): $16,500/month
V4-Pro vs Sonnet Batch savings: 75.7% ($12,498/month)

V4-Pro vs V4-Flash Routing Strategy

A common production pattern after the V4-Pro cut is cost-aware routing within a single agent stack: send simple deterministic tasks to V4-Flash, escalate complex reasoning to V4-Pro.

Task Class	Default Model	Escalation Trigger
Classification, NER, entity extraction	V4-Flash	Confidence < 0.85
Single-turn Q&A on short context (<4K tokens)	V4-Flash	User flags hallucination
Multi-step planning, code generation	V4-Pro	—
Long-context retrieval (>32K tokens)	V4-Pro	—
Tool-calling agent with >3 steps	V4-Pro	—

Implementation note: TokenMix's unified API gateway exposes both V4-Flash and V4-Pro under OpenAI-compatible endpoints, with model parameter switching. Routing is one line of config — see TokenMix model list for current DeepSeek tier mapping.

API Fields to Log for Cost Tracking

DeepSeek's API response includes specific usage fields you must log to attribute cost accurately post-cut. Most teams underestimate cache hit savings because they don't log the cache hit token count separately.

Response Field	What It Tracks	Why It Matters
`usage.prompt_tokens`	Total input tokens (hit + miss)	Top-line input volume
`usage.prompt_cache_hit_tokens`	Tokens served from cache	Apply cache hit rate ($0.003625)
`usage.prompt_cache_miss_tokens`	Tokens that bypassed cache	Apply cache miss rate ($0.435)
`usage.completion_tokens`	Output tokens generated	Apply output rate ($0.87)
`model`	Actual model billed (verify vs requested)	Catch silent fallbacks during outages

Common mistake: Multiplying prompt_tokens by cache miss rate. That over-counts cost by 10-100x for any cached workload. Always compute (prompt_cache_miss_tokens × cache_miss_rate) + (prompt_cache_hit_tokens × cache_hit_rate).

Common Errors and Caveats

Caveat 1: Promo end date confusion. Some sources (e.g., TheNextWeb's April 27 article) still cite the original "until May 5" end date. That was the first iteration. The discount was extended to May 31, then made permanent on May 22. The May 31 date now marks when the "75% off" label disappears, not when prices rise.

Caveat 2: Cache hit rate depends on prompt structure. Cache hits only fire when the prompt prefix is byte-identical to a recent prior request (within ~ a 30-minute window per DeepSeek docs). System prompts that include timestamps, request IDs, or user-specific data will not cache. Move dynamic content to the end of the prompt.

Caveat 3: V4-Pro context window. V4-Pro supports 128K tokens of context. Going over triggers a 400 error, not silent truncation. Architect prompts to fit within this window or chunk explicitly.

Caveat 4: CNY billing for China-based accounts. If you signed up with a Chinese bank card, billing is in CNY (¥3 input / ¥6 output / ¥0.025 cache hit). USD numbers in this article apply to international (USD-denominated) accounts.

Final Recommendation

For new agent builds in 2026, V4-Pro at $0.435/$0.87 is the default frontier model unless you have a specific reason to pay 10-30x for Western models (compliance, latency-to-US-region, specific benchmark requirement). The permanent price cut removes the "promo will end" hesitation that blocked enterprise adoption in April.

For teams already on Claude or GPT, run a controlled A/B before migrating. The intelligence gap exists but is narrower than a year ago, and your real exposure is task-specific. TokenMix's unified gateway makes the A/B routable in one config change — same OpenAI-compatible SDK, different model parameter.

For cost-sensitive batch workloads, V4-Pro beats even Claude Sonnet Batch (50% off) by 75%+. The break-even quality bar is the only gate.

FAQ

Q: Is the $0.435 input price guaranteed long-term, or could DeepSeek raise it again?

A: DeepSeek's pricing page now states the discount-rate becomes the official rate after 2026-05-31. No expiration is listed for the new rate. Standard caveat: any vendor can change pricing with notice; track via the official API docs page.

Q: Does the 75% cut apply to DeepSeek V4-Flash too?

A: No. V4-Flash stays at its launch price ($0.14 input / $0.0028 cache hit / $0.28 output). Only V4-Pro was discounted. The cache hit reduction (90% off cache hit across the line) applies to both, but Flash's cache hit was already cheap.

Q: How does V4-Pro pricing compare to DeepSeek R1?

A: V4-Pro replaced R1 as the flagship in April 2026. R1 is no longer the recommended production model. See TokenMix's DeepSeek V3.1 vs R1 guide for migration context — note that V4-Pro is the current generation, R1 is two generations old.

Q: Can I get even cheaper DeepSeek access through OpenRouter or a gateway?

A: Gateway markup varies. DeepSeek direct is the lowest-cost option for raw API access. Gateways like TokenMix add value via unified billing, routing, and fallback — not raw price arbitrage. Compare gateway pricing pages directly; the V4-Pro permanent cut has propagated across most gateways within 48 hours of the May 22 announcement.

Q: What's the practical impact for OpenAI/Anthropic users facing rate limits?

A: V4-Pro is rate-limit-friendlier — DeepSeek's per-account TPM (tokens per minute) caps are roughly 2-3x higher than equivalent OpenAI tier. Multiple outlets framed the May 22 permanent cut as a deliberate move to capture frustrated Western developers, though DeepSeek didn't confirm this framing publicly.

Q: Does the permanent cut affect DeepSeek's free tier?

A: There's no formal free tier change. New accounts get a small free credit balance, then drop to the standard (post-cut) rate. Volume discounts beyond the standard rate are negotiated case-by-case for enterprise.

Q: How do I verify which V4-Pro version I'm actually being billed for?

A: Check the model field in the API response. The official model ID is deepseek-v4-pro (lowercased). If you call deepseek-chat, you may be silently routed to V4-Flash — confirm before assuming you're getting V4-Pro rates.

DeepSeek Cache Hit Pricing 2026: V4 98% Input Savings Guide — Deeper dive on cache mechanics
GPT-5.5 vs Opus 4.7 vs DeepSeek V4 (2026): 50x Price Gap Tested — Original three-way comparison
OpenClaw DeepSeek V4 Default: 8 Cost Signals for Agents — Agent cost optimization
DeepSeek V3.1 vs R1: When to Use Which (2026 Guide) — Generation migration context
DeepSeek Alternatives 2026: 5 Models Ranked — When to consider alternatives

Sources

DeepSeek API Pricing Documentation — Official, primary source for all V4-Pro rates
Anthropic Claude API Pricing — Cross-vendor comparison anchor
OpenAI API Pricing — GPT-5.5 cross-vendor anchor
Startup Fortune: DeepSeek making 75% discount permanent — Permanent announcement coverage
Android Headlines: Aggressive 75% price cut analysis — Competitive context
Dataconomy: April 27 initial discount coverage — Original promo announcement
BigGo Finance: Market clearing event analysis — Yuan/USD conversion confirmation
Dev.to: 750K chips, 140T tokens analysis — Economic backdrop (speculative)