TokenMix Research Lab · 2026-04-12

DeepSeek API Tutorial 2026: V4 Flash, Pro, Cache Setup Guide
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Use deepseek-v4-flash as the default DeepSeek API model in 2026. Use deepseek-v4-pro only when a workflow needs harder reasoning, agentic coding, or long-context quality.
The official DeepSeek Models & Pricing page lists V4 Flash at $0.14 per 1M cache-miss input tokens, $0.0028 per 1M cache-hit input tokens, and $0.28 per 1M output tokens. V4 Pro is discounted to $0.435 cache-miss input, $0.003625 cache-hit input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4 Flash compatibility modes and will be deprecated.
Table of Contents
- Quick Setup
- Confirmed Facts
- Current Models And Prices
- Python Setup
- Node.js Setup
- Thinking Mode
- Context Caching
- Migration From Old Model Names
- Production Checks
- When TokenMix.ai Fits
- FAQ
- Related Articles
- Sources
Quick Setup
| Step | Action | Current recommendation |
|---|---|---|
| 1 | Create API key | Use DeepSeek Platform or a verified gateway |
| 2 | Install SDK | Use the standard OpenAI SDK |
| 3 | Set base URL | https://api.deepseek.com |
| 4 | Pick model | deepseek-v4-flash first |
| 5 | Escalate model | Use deepseek-v4-pro for hard tasks |
| 6 | Track cache fields | Read prompt_cache_hit_tokens and prompt_cache_miss_tokens |
For most apps, the first production route is V4 Flash. It is the economical route, supports 1M context, and uses the same OpenAI Chat Completions pattern.
Confirmed Facts
| Claim | Status | Source |
|---|---|---|
| V4 Flash and V4 Pro are current API models | Confirmed | DeepSeek V4 release |
| Both V4 models support 1M context | Confirmed | DeepSeek pricing page |
OpenAI-format base URL is https://api.deepseek.com |
Confirmed | DeepSeek pricing page |
Anthropic-format base URL is https://api.deepseek.com/anthropic |
Confirmed | DeepSeek pricing page |
deepseek-chat maps to V4 Flash non-thinking mode |
Confirmed | DeepSeek pricing page |
deepseek-reasoner maps to V4 Flash thinking mode |
Confirmed | DeepSeek pricing page |
| Cache is automatic and best-effort | Confirmed | DeepSeek context caching docs |
Current Models And Prices
All prices are per 1M tokens, checked on 2026-04-30.
| Model | Cache-hit input | Cache-miss input | Output | Context | Best use |
|---|---|---|---|---|---|
| deepseek-v4-flash | $0.0028 | $0.14 | $0.28 | 1M | Default chat, RAG, agents, low-cost reasoning |
| deepseek-v4-pro | $0.003625 | $0.435 | $0.87 | 1M | Hard reasoning, agentic coding, long-context quality |
| deepseek-v4-pro full listed price | $0.0145 | $1.74 | $3.48 | 1M | Post-discount planning |
| deepseek-chat | Alias to V4 Flash non-thinking | Alias | Alias | 1M | Legacy compatibility only |
| deepseek-reasoner | Alias to V4 Flash thinking | Alias | Alias | 1M | Legacy compatibility only |
Do not price new DeepSeek projects with old R1 or V3.2 assumptions. The current official table is V4-first.
Python Setup
Install the OpenAI SDK:
pip install openai
Use DeepSeek with an OpenAI-compatible client:
from openai import OpenAI
client = OpenAI(
api_key="DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a precise technical assistant."},
{"role": "user", "content": "Summarize this bug report in 3 bullets."}
]
)
print(response.choices[0].message.content)
Use V4 Pro only after the Flash result fails a real quality threshold:
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "user", "content": "Analyze this refactor plan and identify the riskiest dependency."}
]
)
Node.js Setup
Install the SDK:
npm install openai
Call V4 Flash:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com"
});
const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "user", content: "Classify this support ticket as low, medium, or high priority." }
]
});
console.log(response.choices[0].message.content);
Thinking Mode
DeepSeek V4 supports thinking and non-thinking modes. The old model name deepseek-reasoner currently maps to V4 Flash thinking mode, but DeepSeek says old aliases will be deprecated. New code should choose explicit V4 model names and mode settings according to the current docs.
| Workload | Start with | Escalate when |
|---|---|---|
| Support classification | V4 Flash non-thinking | Policy edge cases fail |
| RAG answer drafting | V4 Flash | Source synthesis fails |
| Coding agent step | V4 Flash | Planning or debugging is weak |
| Long-context review | V4 Flash | The document has high-stakes reasoning |
| Complex math or planning | V4 Pro | Keep Pro if it wins in evals |
Context Caching
DeepSeek context caching is enabled by default. The official docs say overlapping prefixes can be fetched from disk cache and billed as cache hits.
| Field | Meaning |
|---|---|
prompt_cache_hit_tokens |
Input tokens billed at cache-hit price |
prompt_cache_miss_tokens |
Input tokens billed at cache-miss price |
| V4 Flash cache hit price | $0.0028 per 1M input tokens |
| V4 Flash cache miss price | $0.14 per 1M input tokens |
| Cache caveat | Best-effort, not guaranteed |
Example cost for 1M repeated input tokens on V4 Flash:
| Cache state | Cost |
|---|---|
| Cache miss | $0.14 |
| Cache hit | $0.0028 |
| Savings | 98% input reduction |
Migration From Old Model Names
| Old pattern | Current action |
|---|---|
deepseek-chat |
Replace with deepseek-v4-flash non-thinking |
deepseek-reasoner |
Replace with explicit V4 thinking mode when needed |
| Old R1 price table | Treat as historical, not current production default |
| V3.2 calculator | Update to V4 Flash/Pro and cache-hit fields |
| 128K context assumption | Update to 1M context for current V4 models |
DeepSeek says deepseek-chat and deepseek-reasoner will be retired after 2026-07-24 15:59 UTC. Do not wait for the deadline if your app is in production.
Production Checks
| Check | Why it matters |
|---|---|
| Log model name | Avoid silent alias confusion |
| Log cache hit/miss tokens | Cache drives real cost |
| Set max output tokens | Output still dominates many bills |
| Add retry limits | Retries multiply token spend |
| Run Flash vs Pro eval | Pro is cheap during promo, but still pricier than Flash |
| Add fallback | Use TokenMix.ai or another gateway when uptime matters |
When TokenMix.ai Fits
Use direct DeepSeek when you only need DeepSeek and want the official account relationship. Use TokenMix.ai when DeepSeek is one route inside a broader production stack with OpenAI, Claude, Gemini, Qwen, Grok, or Kimi.
| Need | Best route |
|---|---|
| DeepSeek-only experiments | Direct DeepSeek API |
| Multi-model routing | TokenMix.ai |
| Alipay or WeChat Pay billing | TokenMix.ai |
| Fallback from DeepSeek to Claude/OpenAI | TokenMix.ai or another gateway |
| One OpenAI-compatible endpoint | TokenMix.ai |
FAQ
What is the current DeepSeek API model to use?
Use deepseek-v4-flash first. It is the economical V4 route and supports 1M context, tool calls, JSON output, and thinking or non-thinking modes.
How much does DeepSeek V4 Flash cost?
DeepSeek V4 Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens.
How much does DeepSeek V4 Pro cost?
V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens until 2026-05-31 15:59 UTC.
Should I still use deepseek-chat?
No for new code. DeepSeek says deepseek-chat currently maps to V4 Flash non-thinking mode and will be deprecated. Use explicit V4 model names.
Should I still use deepseek-reasoner?
No for new code. deepseek-reasoner currently maps to V4 Flash thinking mode, but new applications should use explicit V4 naming and mode controls.
Does DeepSeek cache require code changes?
No for basic usage. DeepSeek says context caching is enabled by default. You should still log prompt_cache_hit_tokens and prompt_cache_miss_tokens.
Is DeepSeek OpenAI-compatible?
Yes. DeepSeek lists an OpenAI-format base URL at https://api.deepseek.com, so the standard OpenAI SDK pattern works.
When should I use TokenMix.ai instead of direct DeepSeek?
Use TokenMix.ai when you need DeepSeek plus other providers, fallback routing, unified billing, or local payment methods such as Alipay and WeChat Pay.