TokenMix Research Lab · 2026-06-27

DeepSeek Response API 2026: reasoning_content, JSON, TokenMix

DeepSeek Response API 2026: reasoning_content, JSON, TokenMix

Last Updated: 2026-06-27 Author: TokenMix Research Lab Data verified: 2026-06-27 - DeepSeek API quick start, chat completion reference, thinking mode, JSON output, models and pricing, TokenMix docs, TokenMix model catalog API

DeepSeek's response protocol is Chat Completions plus reasoning_content, not the OpenAI Responses API. TokenMix supports DeepSeek V4 Pro and Flash through one OpenAI-compatible endpoint.

The important change for developers is precise: DeepSeek's official docs now list deepseek-v4-flash and deepseek-v4-pro, with old deepseek-chat and deepseek-reasoner names deprecated on 2026-07-24 15:59 UTC (DeepSeek quick start). The API uses an OpenAI/Anthropic-compatible format, but the response shape is still /chat/completions: choices[0].message.content for the final answer, choices[0].message.reasoning_content for thinking output, tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference). TokenMix's live model catalog confirms DeepSeek V4 Pro and DeepSeek V4 Flash with streaming, reasoning, JSON, tools, structured output, and prompt caching support, exposed through https://api.tokenmix.ai/v1 (TokenMix models, TokenMix API model catalog).

Table of Contents

Quick Verdict

DeepSeek response compatibility is real, but the exact protocol is narrower than the marketing phrase: use Chat Completions, preserve reasoning_content when tools are involved, and do not assume OpenAI /responses endpoint parity.

Claim Status Source
DeepSeek uses an OpenAI/Anthropic-compatible API format Confirmed DeepSeek quick start
DeepSeek V4 supports deepseek-v4-flash and deepseek-v4-pro model IDs Confirmed DeepSeek quick start, DeepSeek pricing
deepseek-chat and deepseek-reasoner are deprecated on 2026-07-24 15:59 UTC Confirmed DeepSeek quick start
DeepSeek's response protocol is OpenAI /responses API compatible False Official endpoint is /chat/completions, not /responses
DeepSeek returns thinking output through reasoning_content Confirmed DeepSeek chat completion reference, Thinking Mode
DeepSeek V4 supports JSON Output with response_format: {"type":"json_object"} Confirmed DeepSeek JSON Output
DeepSeek V4 supports tool calls Confirmed DeepSeek pricing / model features, chat completion reference
TokenMix supports DeepSeek V4 Pro and Flash Confirmed TokenMix model catalog, TokenMix API model catalog
TokenMix exposes DeepSeek through one OpenAI-compatible base URL Confirmed TokenMix docs
TokenMix currently lists DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output Confirmed TokenMix model catalog
TokenMix currently lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output Confirmed TokenMix model catalog
reasoning_content can always be dropped in multi-turn tool workflows False DeepSeek says it must be passed back after tool calls in thinking mode

What Changed in DeepSeek V4

The biggest DeepSeek API change is model naming and thinking-mode control: V4 Pro/Flash are now the primary IDs, while old chat and reasoner names are compatibility aliases heading toward deprecation.

DeepSeek's quick start lists deepseek-v4-flash, deepseek-v4-pro, deepseek-chat, and deepseek-reasoner, but adds that the old names will be deprecated on 2026-07-24 15:59 UTC. For compatibility, deepseek-chat maps to non-thinking mode of V4 Flash, while deepseek-reasoner maps to thinking mode of V4 Flash (DeepSeek quick start).

Change Old pattern 2026 DeepSeek V4 pattern Status
General model ID deepseek-chat deepseek-v4-flash or V4 non-thinking mode Confirmed
Reasoning model ID deepseek-reasoner deepseek-v4-flash / deepseek-v4-pro with thinking enabled Confirmed
Endpoint /chat/completions /chat/completions Confirmed
Thinking toggle Separate model name thinking: {"type":"enabled"} or disabled Confirmed
Reasoning effort Limited older behavior high or max; low/medium mapped to high Confirmed
Context Smaller prior-generation assumptions 1M context Confirmed
Max output Older R1/V3 limits varied Maximum 384K Confirmed
Output protocol content plus optional CoT content + reasoning_content + tool calls + usage details Confirmed

The practical migration advice: stop hardcoding deepseek-reasoner in new apps. Use V4 model IDs and explicitly control thinking mode.

Confirmed Protocol Map

DeepSeek's response protocol is best described as "OpenAI Chat Completions-compatible with DeepSeek-specific thinking fields."

That phrasing matters. Many developers search for "DeepSeek response API" because they expect OpenAI's newer /responses endpoint. DeepSeek's official API reference says the endpoint creates a model response for a chat conversation at /chat/completions, and returns a chat completion object (DeepSeek chat completion reference).

Protocol surface DeepSeek behavior OpenAI compatibility level Status
Base URL https://api.deepseek.com SDK-compatible by changing base URL Confirmed
Endpoint /chat/completions Chat Completions-compatible Confirmed
Anthropic format https://api.deepseek.com/anthropic Anthropic-compatible surface Confirmed
Final answer choices[0].message.content OpenAI-style Confirmed
Thinking output choices[0].message.reasoning_content DeepSeek-specific extension Confirmed
Tool calls choices[0].message.tool_calls OpenAI-style function tool calls Confirmed
JSON mode response_format: {"type":"json_object"} OpenAI-style JSON mode Confirmed
Streaming SSE chunks when stream: true OpenAI-style SSE pattern Confirmed
Usage prompt_tokens, completion_tokens, cache hit/miss, reasoning tokens OpenAI-like plus DeepSeek fields Confirmed
OpenAI /responses No official support found Not confirmed False / Unknown

If your SDK only parses message.content, it may silently ignore the most important DeepSeek field: reasoning_content. That is where many "OpenAI-compatible" integrations break in subtle ways.

TokenMix Support Map

TokenMix support is confirmed at the catalog and documentation layer: the site exposes DeepSeek through https://api.tokenmix.ai/v1 and lists V4 Pro/Flash with reasoning, streaming, JSON and tools enabled.

The TokenMix docs say developers can use one TokenMix API key through an OpenAI-compatible base URL. The live TokenMix model catalog lists DeepSeek V4 Pro and DeepSeek V4 Flash as top models, while the public API model catalog marks both with support_reasoning=true, support_streaming=true, support_json=true, support_tools=true, support_structured_output=true, and support_prompt_caching=true.

TokenMix model Context Max output Input / 1M Output / 1M Reasoning JSON Tools Streaming
deepseek/deepseek-v4-pro 1,000,000 384,000 $0.419118 $0.838235 Yes Yes Yes Yes
deepseek/deepseek-v4-flash 1,000,000 384,000 $0.132353 $0.264706 Yes Yes Yes Yes

These are TokenMix catalog prices and feature flags, not a replacement for DeepSeek's direct billing table. DeepSeek direct pricing separates cache-hit and cache-miss input tokens, while TokenMix currently presents model catalog input/output rates for routing through TokenMix.

For broader routing strategy, use AI API Gateway 2026 and TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek response support is one protocol detail; production reliability still needs fallback, logging, and budget controls.

Response Object Anatomy

The DeepSeek response object has four fields developers should parse deliberately: content, reasoning_content, tool_calls, and usage details.

The official schema shows message.content for the final answer, message.reasoning_content for thinking-mode output, message.tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference).

Field Where it appears What it means Parse it?
choices[0].message.content Response message Final answer shown to user Always
choices[0].message.reasoning_content Response message Thinking output before final answer Yes, if using thinking mode
choices[0].message.tool_calls Response message Function/tool call payloads Yes, if tools enabled
choices[0].finish_reason Choice Why generation stopped Always
usage.prompt_tokens Usage Input tokens billed Always
usage.prompt_cache_hit_tokens Usage Input tokens served from cache For cost accounting
usage.prompt_cache_miss_tokens Usage Input tokens not served from cache For cost accounting
usage.completion_tokens_details.reasoning_tokens Usage Reasoning tokens inside output For reasoning cost/debug

Minimal response parser:

def parse_deepseek_chat_completion(response):
    choice = response.choices[0]
    message = choice.message
    return {
        "answer": getattr(message, "content", None),
        "reasoning": getattr(message, "reasoning_content", None),
        "tool_calls": getattr(message, "tool_calls", None),
        "finish_reason": choice.finish_reason,
        "usage": getattr(response, "usage", None),
    }

The safe rule: never assume a response has only one text field. DeepSeek reasoning models can return both a hidden-work field and a final answer field.

Thinking Mode and reasoning_content

Thinking mode is enabled by default in DeepSeek V4, and reasoning_content handling changes when tool calls are involved.

DeepSeek's thinking-mode docs say thinking: {"type":"enabled"} controls thinking mode, while reasoning_effort accepts high or max; compatibility mappings convert low and medium to high, and xhigh to max (Thinking Mode). The docs also say temperature, top_p, presence_penalty, and frequency_penalty have no effect in thinking mode.

Workflow What to do with reasoning_content Status
Single-turn reasoning Read it if you want visible reasoning/debug Confirmed
Multi-turn, no tool call Do not need to pass previous CoT back Confirmed
Multi-turn with tool call Pass the intermediate reasoning_content back in subsequent turns Confirmed
Legacy deepseek-reasoner Use only for compatibility before deprecation Confirmed
V4 thinking mode Use deepseek-v4-pro or Flash plus thinking toggle Confirmed
Thinking + temperature tuning Do not rely on temperature/top_p Confirmed

This is the most important implementation caveat. A generic OpenAI Chat Completions wrapper may drop unknown fields between turns. That can break DeepSeek thinking-mode tool workflows because DeepSeek explicitly requires reasoning content to be passed back after tool calls.

JSON, Tools, Streaming, and Cache

DeepSeek V4 supports JSON output, function tools, SSE streaming, and cache accounting, but each feature has one catch.

JSON Output uses response_format: {"type":"json_object"} and requires you to tell the model to output JSON in the prompt; DeepSeek warns that JSON output can occasionally return empty content and should be mitigated through prompt changes (DeepSeek JSON Output). Streaming uses data-only SSE chunks and can include a final usage chunk when stream_options.include_usage is set (DeepSeek chat completion reference).

Feature Request parameter Catch Status
JSON Output response_format: {"type":"json_object"} Prompt must include JSON instruction Confirmed
Tool calls tools, tool_choice Validate generated arguments before executing Confirmed
Strict tool schema strict: true in tool definition Beta feature Confirmed
Streaming stream: true Parse SSE deltas and [DONE] Confirmed
Stream usage stream_options.include_usage Usage appears in an extra chunk Confirmed
Prompt caching cache hit/miss usage fields Cost depends on cache hit rate Confirmed
FIM completion non-thinking mode only Not for thinking mode Confirmed
Chat prefix completion beta base URL required direct to DeepSeek Route support may vary Likely

For Node.js streaming patterns, see Node.js AI API 2026. The same OpenAI-compatible client shape works, but the DeepSeek-specific fields still need explicit parsing.

Cost Math

DeepSeek direct pricing is cache-sensitive; TokenMix pricing is catalog-rate routing through TokenMix. Do not mix the two tables without labeling which bill you are estimating.

DeepSeek direct prices per 1M tokens are: Flash cache hit $0.0028, Flash cache miss $0.14, Flash output $0.28; Pro cache hit $0.003625, Pro cache miss $0.435, Pro output $0.87 (DeepSeek pricing). TokenMix currently lists Flash at $0.132353 input and $0.264706 output, and Pro at $0.419118 input and $0.838235 output in the live model catalog.

Workload Direct DeepSeek Flash, no cache Direct DeepSeek Pro, no cache TokenMix Flash catalog TokenMix Pro catalog
10M input + 2M output $1.96 $6.09 $1.85 $5.87
100M input + 20M output $19.60 $60.90 $18.53 $58.68
1B input + 200M output $196.00 $609.00 $185.29 $586.76

Cost calculation 1: a 10M input / 2M output monthly workload on direct DeepSeek Flash with no cache costs 10 x $0.14 + 2 x $0.28 = $1.96.

Cost calculation 2: the same workload on TokenMix Flash catalog rates costs 10 x $0.132353 + 2 x $0.264706 = $1.85. The difference is small; the main value is unified routing and one API key, not a magical free tier.

Cost calculation 3: if 70% of direct DeepSeek Flash input hits cache, 10M input becomes 7M cache-hit and 3M cache-miss. Input cost is 7 x $0.0028 + 3 x $0.14 = $0.4396, plus 2 x $0.28 = $0.56 output, so total is $0.9996. Cache hit rate can cut the direct bill roughly in half for repeat-heavy workloads.

Cost lever Direct DeepSeek impact TokenMix impact Confidence
Prompt caching Large if repeated prompts hit cache Prompt caching supported in catalog Confirmed
Flash vs Pro Flash is ~3.1x cheaper than Pro no-cache direct Flash is ~3.17x cheaper than Pro in catalog Confirmed
Thinking mode Reasoning tokens can increase output usage Track output and reasoning tokens Confirmed
Streaming No price change by itself UX/latency benefit Confirmed
Tool loops Can multiply calls and tokens Add budget caps Confirmed

For broader token planning, use Token Counting Guide 2026 and DeepSeek 5M Free Tokens.

Code Examples

Use V4 model IDs and explicitly parse reasoning_content. That is the difference between basic compatibility and correct DeepSeek support.

cURL through TokenMix:

curl https://api.tokenmix.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKENMIX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-pro",
    "messages": [
      {"role": "user", "content": "Explain why cache hit tokens change DeepSeek API cost."}
    ],
    "reasoning_effort": "high",
    "extra_body": {
      "thinking": {"type": "enabled"}
    },
    "stream": false
  }'

Python with OpenAI SDK through TokenMix:

from openai import OpenAI

client = OpenAI(
    api_key="TOKENMIX_API_KEY",
    base_url="https://api.tokenmix.ai/v1",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "Return concise answers."},
        {"role": "user", "content": "Give me a JSON checklist for DeepSeek response parsing."},
    ],
    response_format={"type": "json_object"},
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)
print("usage:", response.usage)

Decision helper:

def choose_deepseek_route(workload):
    if workload.get("needs_best_reasoning"):
        return "Use deepseek/deepseek-v4-pro with thinking enabled."
    if workload.get("high_volume") and not workload.get("needs_max_reasoning"):
        return "Use deepseek/deepseek-v4-flash and track cache hit rate."
    if workload.get("needs_tools"):
        return "Use V4 with tools, but preserve reasoning_content after tool calls."
    if workload.get("legacy_model_name"):
        return "Migrate away from deepseek-chat/deepseek-reasoner before 2026-07-24."
    return "Start with Flash, then benchmark Pro only where answers fail."

Decision Matrix

Use DeepSeek V4 Flash for cost-sensitive production, V4 Pro for hard reasoning, and TokenMix when one OpenAI-compatible endpoint matters more than direct-provider wiring.

If your priority is... Pick Why
Lowest cost general chat DeepSeek V4 Flash Lowest confirmed V4 rate
Complex reasoning DeepSeek V4 Pro Higher capability tier
Agent tool loops V4 Pro or Flash with tools Tool calls supported
JSON extraction V4 Flash with JSON Output Cheap and structured
One API key across vendors TokenMix route Same endpoint for DeepSeek plus other models
OpenAI SDK reuse TokenMix or direct DeepSeek Both use OpenAI-compatible client shape
Cache-heavy workloads Direct DeepSeek or route with cache awareness Cache hit/miss changes input cost
Legacy app using deepseek-reasoner Migrate to V4 IDs Old names deprecate on 2026-07-24

If you are comparing provider gateways rather than DeepSeek itself, read TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek protocol support is one row in a larger routing decision.

Risks and Caveats

The biggest integration risk is silent field loss: generic OpenAI clients can parse the final answer while dropping reasoning_content, cache details, or reasoning-token usage.

Risk Likelihood Status Fix
Treating DeepSeek as OpenAI /responses High False assumption Use /chat/completions
Dropping reasoning_content High Confirmed risk Extend parser and message history logic
Keeping old model IDs after deprecation Medium Confirmed deadline Migrate before 2026-07-24
Passing CoT back incorrectly Medium Confirmed caveat Follow no-tool vs tool-call rules
Ignoring cache hit/miss High Confirmed cost issue Log both usage fields
Relying on temperature in thinking mode Medium Confirmed no-op Use reasoning effort instead
Assuming JSON mode always returns content Medium Confirmed caveat Prompt for JSON and handle empty content
Executing tool arguments unvalidated High Confirmed model caveat Validate JSON and schema before execution

The trust rule: OpenAI-compatible does not mean field-identical. Compatibility gets the request through the SDK; correctness requires parsing DeepSeek's extra fields.

Final Recommendation

Use deepseek-v4-flash for cheap high-volume work, deepseek-v4-pro for harder reasoning, and TokenMix when you want DeepSeek plus other vendors behind one OpenAI-compatible endpoint. Parse reasoning_content, track cache hit/miss tokens, migrate away from deepseek-chat and deepseek-reasoner before 2026-07-24, and do not call this OpenAI /responses compatibility until DeepSeek documents that endpoint.

FAQ

Does DeepSeek support the OpenAI Responses API?

No official DeepSeek /responses endpoint is documented. DeepSeek uses /chat/completions and returns a chat completion object with DeepSeek-specific reasoning fields.

What is DeepSeek reasoning_content?

reasoning_content is the thinking-mode output before the final answer. It appears at the same message level as content, while content is the final answer.

Does TokenMix support DeepSeek response protocol?

TokenMix supports DeepSeek V4 Pro and DeepSeek V4 Flash through its OpenAI-compatible /v1/chat/completions endpoint. The live model catalog marks both as supporting reasoning, streaming, JSON, tools, structured output, and prompt caching.

Should I use deepseek-chat or deepseek-v4-flash?

Use deepseek-v4-flash for new work. DeepSeek says deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24 15:59 UTC, with compatibility mappings to V4 Flash modes.

How do I enable DeepSeek thinking mode?

Pass thinking: {"type":"enabled"} and use reasoning_effort: "high" or "max". In OpenAI SDKs, DeepSeek shows the thinking parameter inside extra_body.

Does DeepSeek JSON mode guarantee strict JSON?

DeepSeek JSON Output uses response_format: {"type":"json_object"} and is designed to produce valid JSON strings. DeepSeek still tells users to include JSON instructions in the prompt and notes that empty content can occasionally occur.

Does DeepSeek support function calling?

Yes, DeepSeek V4 supports tool calls. Validate the generated function arguments before executing tools, because the docs warn that models may produce invalid JSON or hallucinated parameters.

How much does DeepSeek V4 cost through TokenMix?

As of 2026-06-27, TokenMix lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output, and DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output. Always check the live model catalog before shipping.

About TokenMix

TokenMix.ai is an AI API relay and model access platform for teams that need one endpoint across OpenAI, Anthropic, Google, DeepSeek, Qwen, GLM and 300+ other models. Use TokenMix models to compare available models, TokenMix pricing to plan spend, and TokenMix docs to connect through an OpenAI-compatible API.

Sources

Related Articles