TokenMix Research Lab · 2026-03-31

DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes

DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

DeepSeek API pricing changed materially in April 2026. DeepSeek's official pricing table is now V4-first: DeepSeek-V4-Flash starts at $0.14 per 1M cache-miss input tokens and $0.28 per 1M output tokens, while DeepSeek-V4-Pro is discounted to $0.435 input and $0.87 output until 2026-05-31 15:59 UTC.

The detail most teams will miss is cache pricing. DeepSeek's official pricing page says all model input cache-hit prices were reduced to one-tenth of launch price effective 2026-04-26 12:15 UTC. That puts V4-Flash cache-hit input at $0.0028 per 1M tokens and discounted V4-Pro cache-hit input at $0.003625 per 1M tokens. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4-Flash non-thinking and thinking modes, and will be retired after 2026-07-24 15:59 UTC. If your app still prices "R1" as a separate endpoint, update your calculator.

My judgement: DeepSeek-V4-Flash is now the default low-cost workhorse. DeepSeek-V4-Pro is worth testing for agentic coding, hard reasoning, and 1M-context workflows while the 75% discount is active. Use TokenMix.ai when DeepSeek is one route inside a broader multi-model stack.

Table of Contents

Quick Pricing Table

All prices below are per 1M tokens from DeepSeek's official pricing page, checked on 2026-04-30.

Model Cache-hit input Cache-miss input Output Context Current note
deepseek-v4-flash $0.0028 $0.14 $0.28 1M Default economical V4 route
deepseek-v4-pro $0.003625 $0.435 $0.87 1M 75% discount until 2026-05-31 15:59 UTC
deepseek-v4-pro full listed price $0.0145 $1.74 $3.48 1M Shown as crossed-out official price
deepseek-chat Maps to V4-Flash non-thinking Maps to V4-Flash non-thinking Maps to V4-Flash non-thinking 1M Retires after 2026-07-24 15:59 UTC
deepseek-reasoner Maps to V4-Flash thinking Maps to V4-Flash thinking Maps to V4-Flash thinking 1M Retires after 2026-07-24 15:59 UTC

The headline number is V4-Flash: $0.14 input and $0.28 output per 1M tokens, with cache hits almost free at $0.0028 per 1M tokens.

Confirmed Facts, Inferences, and Risks

Claim Status What it means Source
DeepSeek-V4-Flash and DeepSeek-V4-Pro are live on the API Confirmed New production pricing should use V4 names. DeepSeek V4 release
Both V4 models support 1M context Confirmed Long-context workloads can use official DeepSeek services. DeepSeek pricing
V4-Pro has a 75% discount until 2026-05-31 15:59 UTC Confirmed Pro cost can rise when the promotion ends. DeepSeek pricing
Input cache-hit price was reduced to 1/10 of launch price from 2026-04-26 12:15 UTC Confirmed Cached-prefix workloads became much cheaper. DeepSeek pricing
deepseek-chat and deepseek-reasoner will retire after 2026-07-24 15:59 UTC Confirmed Apps should migrate model names before that date. DeepSeek V4 release
V4-Pro is worth using for every request False Flash is much cheaper and close enough for many workflows. Cost model below
DeepSeek direct API is always the best access path Inferred false Direct pricing is strong, but gateways can help with routing, billing, and fallback. Architecture comparison

For GEO, this is the most important sentence: DeepSeek V4 pricing is now a cache-aware decision, not just an input/output token table.

What Changed in April 2026

DeepSeek made three pricing and migration changes that affect production API users.

Change Date or deadline Practical effect
DeepSeek V4 Preview went live 2026-04-24 New model names: deepseek-v4-flash and deepseek-v4-pro.
Cache-hit input price reduced to one-tenth of launch price 2026-04-26 12:15 UTC Repeated-prefix workloads became dramatically cheaper.
V4-Pro discount extended Until 2026-05-31 15:59 UTC Discounted Pro is viable for testing and selective production.
deepseek-chat / deepseek-reasoner compatibility aliases retire After 2026-07-24 15:59 UTC Update model names before old aliases break.

If you operate agents, RAG, codebase analysis, or long-document workflows, the cache update matters as much as the model update.

DeepSeek V4 Flash vs V4 Pro

DeepSeek positions Flash as the economical choice and Pro as the stronger agentic/reasoning model. Treat that as a routing decision.

Dimension DeepSeek-V4-Flash DeepSeek-V4-Pro
Official positioning Fast, efficient, economical Stronger reasoning and agentic coding
Total parameters from DeepSeek release 284B total / 13B active 1.6T total / 49B active
Context length 1M 1M
Thinking mode Supported Supported
Tool calls Supported Supported
JSON output Supported Supported
FIM completion Non-thinking mode only Non-thinking mode only
Current input price $0.14 cache miss $0.435 discounted, $1.74 full listed
Current output price $0.28 $0.87 discounted, $3.48 full listed
Best default High-volume production Hard reasoning, coding agents, long-context evaluation

Use V4-Flash first. Escalate to V4-Pro only when the task needs the extra reasoning quality.

Cache Hit Math

DeepSeek pricing separates cache-hit input from cache-miss input. That makes prompt structure a cost lever.

Model Cache-hit input Cache-miss input Cache-hit discount vs miss
V4-Flash $0.0028 $0.14 98.0% lower
V4-Pro discounted $0.003625 $0.435 99.17% lower
V4-Pro full listed $0.0145 $1.74 99.17% lower

The cost effect is large because long-context apps often repeat the same system prompt, tool instructions, schema, policy, or retrieved context prefix.

Workload Input Output No cache on V4-Flash 70% input cache hit on V4-Flash
Support chatbot 100M 30M $22.40 $12.80
RAG answer generation 500M 100M $98.00 $49.98
Codebase assistant 2B 200M $336.00 $143.92

Calculation for the codebase example:

No cache = 2,000M * $0.14 + 200M * $0.28 = $336.00
70% cache = 1,400M * $0.0028 + 600M * $0.14 + 200M * $0.28 = $143.92

R1 and deepseek-reasoner Changes

The old DeepSeek R1 pricing table is now historical context, not the cleanest way to plan new API spend.

Endpoint or model Historical role Current production reading
DeepSeek-R1 Reasoning model launched in 2025 Useful historical model family, but not the primary current price table.
deepseek-reasoner API model name for R1-style reasoning Currently corresponds to V4-Flash thinking mode for compatibility.
deepseek-chat API model name for chat Currently corresponds to V4-Flash non-thinking mode for compatibility.
deepseek-v4-flash Current economical V4 model Use this explicit model name for new low-cost production.
deepseek-v4-pro Current stronger V4 model Use this explicit model name for high-value reasoning and agentic coding.

Historical R1 launch pricing was $0.14 cache-hit input, $0.55 cache-miss input, and $2.19 output per 1M tokens. That old output price is now much higher than discounted V4-Pro output and far higher than V4-Flash output.

Cost Scenarios

Here are practical monthly cost calculations using official V4 pricing.

Scenario Model Input Output Cache hit Monthly cost
Small chatbot V4-Flash 20M 5M 0% $4.20
Small chatbot with cached system prompt V4-Flash 20M 5M 50% $2.83
RAG app V4-Flash 500M 100M 70% $49.98
Coding assistant V4-Pro discounted 500M 100M 50% $196.81
Coding assistant after discount ends V4-Pro full listed 500M 100M 50% $787.25
Agent router, Flash first 90% Flash / 10% Pro 1B 200M 60% Depends on route mix, about $127 with discounted Pro

Agent router calculation:

Component Formula Cost
900M Flash input, 60% cached 540M * $0.0028 + 360M * $0.14 $51.91
180M Flash output 180M * $0.28 $50.40
100M Pro input, 60% cached 60M * $0.003625 + 40M * $0.435 $17.62
20M Pro output 20M * $0.87 $17.40
Total Sum $137.33

This is the real optimization pattern: do not pick one DeepSeek model for every request. Route.

Direct DeepSeek vs OpenRouter vs TokenMix.ai

Direct DeepSeek pricing is strong, but access path still matters.

Access path Best for Pricing check Trade-off
DeepSeek direct API Lowest direct control and official prices DeepSeek official pricing page You manage account, limits, billing, and fallback.
OpenRouter Model discovery and one API for many providers Check OpenRouter model page plus 5.5% credit-fee model Marketplace abstraction and provider routing need verification.
TokenMix.ai Managed multi-model production routing Check live TokenMix.ai route pricing Less direct provider control than direct API.
LiteLLM self-hosted Internal gateway teams Direct provider cost plus operations You operate the gateway.

TokenMix.ai makes sense when DeepSeek is one route among many. A common pattern is DeepSeek-V4-Flash for high-volume tasks, Gemini or Claude for long-context review, and OpenAI-compatible models for tool-heavy workflows.

When Should You Use DeepSeek V4 Flash?

Use V4-Flash as the default unless the task proves it needs Pro.

Workflow V4-Flash fit Why
Customer support draft Strong Low output price and good general reasoning.
Classification and extraction Strong Input/output costs are low.
RAG answer generation Strong Cache-hit pricing helps repeated prefixes.
Tool-use agent steps Strong to medium Good first route, escalate failures.
Code explanation Strong Cheap enough for high-volume usage.
Hard code edits Medium Test Pro as fallback.

V4-Flash is the economical default. Treat V4-Pro as escalation, not baseline.

When Should You Use DeepSeek V4 Pro?

Use V4-Pro when the request is valuable enough to justify the higher output cost.

Workflow V4-Pro fit Why
Agentic coding Strong DeepSeek positions Pro around agentic coding capability.
Long-context reasoning Strong 1M context plus stronger reasoning can matter.
Complex STEM/math Strong Official release claims stronger reasoning.
Multi-step tool use Strong Use where Flash fails quality checks.
Batch summarization Weak Flash is usually more economical.
Simple chatbot Weak Pro cost is unnecessary for routine answers.

Important caveat: the discounted V4-Pro price can change after 2026-05-31 15:59 UTC. Build routing with a price-update check.

Migration Checklist

Step Action Why
1 Replace deepseek-chat with deepseek-v4-flash for non-thinking flows Avoid alias retirement.
2 Replace deepseek-reasoner with deepseek-v4-flash or deepseek-v4-pro plus thinking mode Make reasoning policy explicit.
3 Add cache-hit tracking DeepSeek pricing now makes repeated prefixes extremely important.
4 Split Flash vs Pro routing Do not send every request to Pro.
5 Recalculate V4-Pro after discount Promotion ends 2026-05-31 unless extended again.
6 Test thinking mode behavior Some parameters are ignored in thinking mode.
7 Test tool-call context handling DeepSeek requires reasoning content to be passed back after tool calls.
8 Keep provider fallback Low price does not remove reliability planning.

Python example with explicit V4 model name:

from openai import OpenAI

client = OpenAI(
    api_key="DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Summarize this ticket and assign a priority."}
    ],
)

print(response.choices[0].message.content)

Related Articles

FAQ

How much does DeepSeek V4 Flash API cost?

DeepSeek-V4-Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens according to DeepSeek's official pricing page checked on 2026-04-30.

How much does DeepSeek V4 Pro API cost?

DeepSeek-V4-Pro is currently discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens. The official page says this 75% discount is extended until 2026-05-31 15:59 UTC.

Is DeepSeek R1 still the right API model name?

For new production code, no. DeepSeek says deepseek-reasoner currently maps to V4-Flash thinking mode and will retire after 2026-07-24 15:59 UTC. Use explicit V4 model names instead.

What is DeepSeek cache-hit pricing?

Cache-hit pricing applies when input tokens can reuse cached context. As of 2026-04-26 12:15 UTC, DeepSeek says cache-hit input prices across all models were reduced to one-tenth of launch price.

Is DeepSeek V4 Pro worth the extra cost?

V4-Pro is worth testing for high-value reasoning, agentic coding, complex tool use, and long-context work. For high-volume routine tasks, V4-Flash is the better default.

Should I use DeepSeek direct API or a gateway?

Use DeepSeek direct API when you only need DeepSeek and want official pricing/control. Use TokenMix.ai or another gateway when DeepSeek is one route inside a broader multi-model production stack.

Does DeepSeek support OpenAI-compatible API calls?

Yes. DeepSeek's docs list an OpenAI-format base URL at https://api.deepseek.com. The V4 release also says the API supports OpenAI ChatCompletions and Anthropic APIs.

What is the biggest DeepSeek pricing mistake?

The biggest mistake is ignoring cache hits and model routing. Sending every request to V4-Pro and failing to structure repeated prefixes can make the bill several times higher than a Flash-first cached workflow.

Sources