TokenMix Research Lab · 2026-04-12

DeepSeek API Tutorial 2026: V4 Flash, Pro, Cache Setup Guide

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Use deepseek-v4-flash as the default DeepSeek API model in 2026. Use deepseek-v4-pro only when a workflow needs harder reasoning, agentic coding, or long-context quality.

The official DeepSeek Models & Pricing page lists V4 Flash at $0.14 per 1M cache-miss input tokens, $0.0028 per 1M cache-hit input tokens, and $0.28 per 1M output tokens. V4 Pro is discounted to $0.435 cache-miss input, $0.003625 cache-hit input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4 Flash compatibility modes and will be deprecated.

Quick Setup
Confirmed Facts
Current Models And Prices
Python Setup
Node.js Setup
Thinking Mode
Context Caching
Migration From Old Model Names
Production Checks
When TokenMix.ai Fits
FAQ
Related Articles
Sources

Quick Setup

Step	Action	Current recommendation
1	Create API key	Use DeepSeek Platform or a verified gateway
2	Install SDK	Use the standard OpenAI SDK
3	Set base URL	`https://api.deepseek.com`
4	Pick model	`deepseek-v4-flash` first
5	Escalate model	Use `deepseek-v4-pro` for hard tasks
6	Track cache fields	Read `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens`

For most apps, the first production route is V4 Flash. It is the economical route, supports 1M context, and uses the same OpenAI Chat Completions pattern.

Confirmed Facts

Claim	Status	Source
V4 Flash and V4 Pro are current API models	Confirmed	DeepSeek V4 release
Both V4 models support 1M context	Confirmed	DeepSeek pricing page
OpenAI-format base URL is `https://api.deepseek.com`	Confirmed	DeepSeek pricing page
Anthropic-format base URL is `https://api.deepseek.com/anthropic`	Confirmed	DeepSeek pricing page
`deepseek-chat` maps to V4 Flash non-thinking mode	Confirmed	DeepSeek pricing page
`deepseek-reasoner` maps to V4 Flash thinking mode	Confirmed	DeepSeek pricing page
Cache is automatic and best-effort	Confirmed	DeepSeek context caching docs

Current Models And Prices

All prices are per 1M tokens, checked on 2026-04-30.

Model	Cache-hit input	Cache-miss input	Output	Context	Best use
deepseek-v4-flash	$0.0028	$0.14	$0.28	1M	Default chat, RAG, agents, low-cost reasoning
deepseek-v4-pro	$0.003625	$0.435	$0.87	1M	Hard reasoning, agentic coding, long-context quality
deepseek-v4-pro full listed price	$0.0145	$1.74	$3.48	1M	Post-discount planning
deepseek-chat	Alias to V4 Flash non-thinking	Alias	Alias	1M	Legacy compatibility only
deepseek-reasoner	Alias to V4 Flash thinking	Alias	Alias	1M	Legacy compatibility only

Do not price new DeepSeek projects with old R1 or V3.2 assumptions. The current official table is V4-first.

Python Setup

Install the OpenAI SDK:

pip install openai

Use DeepSeek with an OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(
    api_key="DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a precise technical assistant."},
        {"role": "user", "content": "Summarize this bug report in 3 bullets."}
    ]
)

print(response.choices[0].message.content)

Use V4 Pro only after the Flash result fails a real quality threshold:

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Analyze this refactor plan and identify the riskiest dependency."}
    ]
)

Node.js Setup

Install the SDK:

npm install openai

Call V4 Flash:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com"
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "user", content: "Classify this support ticket as low, medium, or high priority." }
  ]
});

console.log(response.choices[0].message.content);

Thinking Mode

DeepSeek V4 supports thinking and non-thinking modes. The old model name deepseek-reasoner currently maps to V4 Flash thinking mode, but DeepSeek says old aliases will be deprecated. New code should choose explicit V4 model names and mode settings according to the current docs.

Workload	Start with	Escalate when
Support classification	V4 Flash non-thinking	Policy edge cases fail
RAG answer drafting	V4 Flash	Source synthesis fails
Coding agent step	V4 Flash	Planning or debugging is weak
Long-context review	V4 Flash	The document has high-stakes reasoning
Complex math or planning	V4 Pro	Keep Pro if it wins in evals

Context Caching

DeepSeek context caching is enabled by default. The official docs say overlapping prefixes can be fetched from disk cache and billed as cache hits.

Field	Meaning
`prompt_cache_hit_tokens`	Input tokens billed at cache-hit price
`prompt_cache_miss_tokens`	Input tokens billed at cache-miss price
V4 Flash cache hit price	$0.0028 per 1M input tokens
V4 Flash cache miss price	$0.14 per 1M input tokens
Cache caveat	Best-effort, not guaranteed

Example cost for 1M repeated input tokens on V4 Flash:

Cache state	Cost
Cache miss	$0.14
Cache hit	$0.0028
Savings	98% input reduction

Migration From Old Model Names

Old pattern	Current action
`deepseek-chat`	Replace with `deepseek-v4-flash` non-thinking
`deepseek-reasoner`	Replace with explicit V4 thinking mode when needed
Old R1 price table	Treat as historical, not current production default
V3.2 calculator	Update to V4 Flash/Pro and cache-hit fields
128K context assumption	Update to 1M context for current V4 models

DeepSeek says deepseek-chat and deepseek-reasoner will be retired after 2026-07-24 15:59 UTC. Do not wait for the deadline if your app is in production.

Production Checks

Check	Why it matters
Log model name	Avoid silent alias confusion
Log cache hit/miss tokens	Cache drives real cost
Set max output tokens	Output still dominates many bills
Add retry limits	Retries multiply token spend
Run Flash vs Pro eval	Pro is cheap during promo, but still pricier than Flash
Add fallback	Use TokenMix.ai or another gateway when uptime matters

When TokenMix.ai Fits

Use direct DeepSeek when you only need DeepSeek and want the official account relationship. Use TokenMix.ai when DeepSeek is one route inside a broader production stack with OpenAI, Claude, Gemini, Qwen, Grok, or Kimi.

Need	Best route
DeepSeek-only experiments	Direct DeepSeek API
Multi-model routing	TokenMix.ai
Alipay or WeChat Pay billing	TokenMix.ai
Fallback from DeepSeek to Claude/OpenAI	TokenMix.ai or another gateway
One OpenAI-compatible endpoint	TokenMix.ai

FAQ

What is the current DeepSeek API model to use?

Use deepseek-v4-flash first. It is the economical V4 route and supports 1M context, tool calls, JSON output, and thinking or non-thinking modes.

How much does DeepSeek V4 Flash cost?

DeepSeek V4 Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens.

How much does DeepSeek V4 Pro cost?

V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens until 2026-05-31 15:59 UTC.

Should I still use deepseek-chat?

No for new code. DeepSeek says deepseek-chat currently maps to V4 Flash non-thinking mode and will be deprecated. Use explicit V4 model names.

Should I still use deepseek-reasoner?

No for new code. deepseek-reasoner currently maps to V4 Flash thinking mode, but new applications should use explicit V4 naming and mode controls.

Does DeepSeek cache require code changes?

No for basic usage. DeepSeek says context caching is enabled by default. You should still log prompt_cache_hit_tokens and prompt_cache_miss_tokens.

Is DeepSeek OpenAI-compatible?

Yes. DeepSeek lists an OpenAI-format base URL at https://api.deepseek.com, so the standard OpenAI SDK pattern works.

When should I use TokenMix.ai instead of direct DeepSeek?

Use TokenMix.ai when you need DeepSeek plus other providers, fallback routing, unified billing, or local payment methods such as Alipay and WeChat Pay.