TokenMix Research Lab · 2026-04-12

DeepSeek API Tutorial 2026: V4 Flash, Pro, Cache Setup Guide

DeepSeek API Tutorial 2026: V4 Flash, Pro, Cache Setup Guide

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Use deepseek-v4-flash as the default DeepSeek API model in 2026. Use deepseek-v4-pro only when a workflow needs harder reasoning, agentic coding, or long-context quality.

The official DeepSeek Models & Pricing page lists V4 Flash at $0.14 per 1M cache-miss input tokens, $0.0028 per 1M cache-hit input tokens, and $0.28 per 1M output tokens. V4 Pro is discounted to $0.435 cache-miss input, $0.003625 cache-hit input, and $0.87 output until 2026-05-31 15:59 UTC. DeepSeek also says deepseek-chat and deepseek-reasoner currently map to V4 Flash compatibility modes and will be deprecated.

Table of Contents

Quick Setup

Step Action Current recommendation
1 Create API key Use DeepSeek Platform or a verified gateway
2 Install SDK Use the standard OpenAI SDK
3 Set base URL https://api.deepseek.com
4 Pick model deepseek-v4-flash first
5 Escalate model Use deepseek-v4-pro for hard tasks
6 Track cache fields Read prompt_cache_hit_tokens and prompt_cache_miss_tokens

For most apps, the first production route is V4 Flash. It is the economical route, supports 1M context, and uses the same OpenAI Chat Completions pattern.

Confirmed Facts

Claim Status Source
V4 Flash and V4 Pro are current API models Confirmed DeepSeek V4 release
Both V4 models support 1M context Confirmed DeepSeek pricing page
OpenAI-format base URL is https://api.deepseek.com Confirmed DeepSeek pricing page
Anthropic-format base URL is https://api.deepseek.com/anthropic Confirmed DeepSeek pricing page
deepseek-chat maps to V4 Flash non-thinking mode Confirmed DeepSeek pricing page
deepseek-reasoner maps to V4 Flash thinking mode Confirmed DeepSeek pricing page
Cache is automatic and best-effort Confirmed DeepSeek context caching docs

Current Models And Prices

All prices are per 1M tokens, checked on 2026-04-30.

Model Cache-hit input Cache-miss input Output Context Best use
deepseek-v4-flash $0.0028 $0.14 $0.28 1M Default chat, RAG, agents, low-cost reasoning
deepseek-v4-pro $0.003625 $0.435 $0.87 1M Hard reasoning, agentic coding, long-context quality
deepseek-v4-pro full listed price $0.0145 $1.74 $3.48 1M Post-discount planning
deepseek-chat Alias to V4 Flash non-thinking Alias Alias 1M Legacy compatibility only
deepseek-reasoner Alias to V4 Flash thinking Alias Alias 1M Legacy compatibility only

Do not price new DeepSeek projects with old R1 or V3.2 assumptions. The current official table is V4-first.

Python Setup

Install the OpenAI SDK:

pip install openai

Use DeepSeek with an OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(
    api_key="DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a precise technical assistant."},
        {"role": "user", "content": "Summarize this bug report in 3 bullets."}
    ]
)

print(response.choices[0].message.content)

Use V4 Pro only after the Flash result fails a real quality threshold:

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Analyze this refactor plan and identify the riskiest dependency."}
    ]
)

Node.js Setup

Install the SDK:

npm install openai

Call V4 Flash:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com"
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "user", content: "Classify this support ticket as low, medium, or high priority." }
  ]
});

console.log(response.choices[0].message.content);

Thinking Mode

DeepSeek V4 supports thinking and non-thinking modes. The old model name deepseek-reasoner currently maps to V4 Flash thinking mode, but DeepSeek says old aliases will be deprecated. New code should choose explicit V4 model names and mode settings according to the current docs.

Workload Start with Escalate when
Support classification V4 Flash non-thinking Policy edge cases fail
RAG answer drafting V4 Flash Source synthesis fails
Coding agent step V4 Flash Planning or debugging is weak
Long-context review V4 Flash The document has high-stakes reasoning
Complex math or planning V4 Pro Keep Pro if it wins in evals

Context Caching

DeepSeek context caching is enabled by default. The official docs say overlapping prefixes can be fetched from disk cache and billed as cache hits.

Field Meaning
prompt_cache_hit_tokens Input tokens billed at cache-hit price
prompt_cache_miss_tokens Input tokens billed at cache-miss price
V4 Flash cache hit price $0.0028 per 1M input tokens
V4 Flash cache miss price $0.14 per 1M input tokens
Cache caveat Best-effort, not guaranteed

Example cost for 1M repeated input tokens on V4 Flash:

Cache state Cost
Cache miss $0.14
Cache hit $0.0028
Savings 98% input reduction

Migration From Old Model Names

Old pattern Current action
deepseek-chat Replace with deepseek-v4-flash non-thinking
deepseek-reasoner Replace with explicit V4 thinking mode when needed
Old R1 price table Treat as historical, not current production default
V3.2 calculator Update to V4 Flash/Pro and cache-hit fields
128K context assumption Update to 1M context for current V4 models

DeepSeek says deepseek-chat and deepseek-reasoner will be retired after 2026-07-24 15:59 UTC. Do not wait for the deadline if your app is in production.

Production Checks

Check Why it matters
Log model name Avoid silent alias confusion
Log cache hit/miss tokens Cache drives real cost
Set max output tokens Output still dominates many bills
Add retry limits Retries multiply token spend
Run Flash vs Pro eval Pro is cheap during promo, but still pricier than Flash
Add fallback Use TokenMix.ai or another gateway when uptime matters

When TokenMix.ai Fits

Use direct DeepSeek when you only need DeepSeek and want the official account relationship. Use TokenMix.ai when DeepSeek is one route inside a broader production stack with OpenAI, Claude, Gemini, Qwen, Grok, or Kimi.

Need Best route
DeepSeek-only experiments Direct DeepSeek API
Multi-model routing TokenMix.ai
Alipay or WeChat Pay billing TokenMix.ai
Fallback from DeepSeek to Claude/OpenAI TokenMix.ai or another gateway
One OpenAI-compatible endpoint TokenMix.ai

FAQ

What is the current DeepSeek API model to use?

Use deepseek-v4-flash first. It is the economical V4 route and supports 1M context, tool calls, JSON output, and thinking or non-thinking modes.

How much does DeepSeek V4 Flash cost?

DeepSeek V4 Flash costs $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens.

How much does DeepSeek V4 Pro cost?

V4 Pro is discounted to $0.003625 cache-hit input, $0.435 cache-miss input, and $0.87 output per 1M tokens until 2026-05-31 15:59 UTC.

Should I still use deepseek-chat?

No for new code. DeepSeek says deepseek-chat currently maps to V4 Flash non-thinking mode and will be deprecated. Use explicit V4 model names.

Should I still use deepseek-reasoner?

No for new code. deepseek-reasoner currently maps to V4 Flash thinking mode, but new applications should use explicit V4 naming and mode controls.

Does DeepSeek cache require code changes?

No for basic usage. DeepSeek says context caching is enabled by default. You should still log prompt_cache_hit_tokens and prompt_cache_miss_tokens.

Is DeepSeek OpenAI-compatible?

Yes. DeepSeek lists an OpenAI-format base URL at https://api.deepseek.com, so the standard OpenAI SDK pattern works.

When should I use TokenMix.ai instead of direct DeepSeek?

Use TokenMix.ai when you need DeepSeek plus other providers, fallback routing, unified billing, or local payment methods such as Alipay and WeChat Pay.

Related Articles

Sources