TokenMix Research Lab · 2026-06-27

DeepSeek Response API 2026: reasoning_content, JSON, TokenMix

Last Updated: 2026-06-27 Author: TokenMix Research Lab Data verified: 2026-06-27 - DeepSeek API quick start, chat completion reference, thinking mode, JSON output, models and pricing, TokenMix docs, TokenMix model catalog API

DeepSeek's response protocol is Chat Completions plus reasoning_content, not the OpenAI Responses API. TokenMix supports DeepSeek V4 Pro and Flash through one OpenAI-compatible endpoint.

The important change for developers is precise: DeepSeek's official docs now list deepseek-v4-flash and deepseek-v4-pro, with old deepseek-chat and deepseek-reasoner names deprecated on 2026-07-24 15:59 UTC (DeepSeek quick start). The API uses an OpenAI/Anthropic-compatible format, but the response shape is still /chat/completions: choices[0].message.content for the final answer, choices[0].message.reasoning_content for thinking output, tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference). TokenMix's live model catalog confirms DeepSeek V4 Pro and DeepSeek V4 Flash with streaming, reasoning, JSON, tools, structured output, and prompt caching support, exposed through https://api.tokenmix.ai/v1 (TokenMix models, TokenMix API model catalog).

Quick Verdict
What Changed in DeepSeek V4
Confirmed Protocol Map
TokenMix Support Map
Response Object Anatomy
Thinking Mode and reasoning_content
JSON, Tools, Streaming, and Cache
Cost Math
Code Examples
Decision Matrix
Risks and Caveats
Final Recommendation
FAQ
About TokenMix
Sources
Related Articles

Quick Verdict

DeepSeek response compatibility is real, but the exact protocol is narrower than the marketing phrase: use Chat Completions, preserve reasoning_content when tools are involved, and do not assume OpenAI /responses endpoint parity.

Claim	Status	Source
DeepSeek uses an OpenAI/Anthropic-compatible API format	Confirmed	DeepSeek quick start
DeepSeek V4 supports `deepseek-v4-flash` and `deepseek-v4-pro` model IDs	Confirmed	DeepSeek quick start, DeepSeek pricing
`deepseek-chat` and `deepseek-reasoner` are deprecated on 2026-07-24 15:59 UTC	Confirmed	DeepSeek quick start
DeepSeek's response protocol is OpenAI `/responses` API compatible	False	Official endpoint is `/chat/completions`, not `/responses`
DeepSeek returns thinking output through `reasoning_content`	Confirmed	DeepSeek chat completion reference, Thinking Mode
DeepSeek V4 supports JSON Output with `response_format: {"type":"json_object"}`	Confirmed	DeepSeek JSON Output
DeepSeek V4 supports tool calls	Confirmed	DeepSeek pricing / model features, chat completion reference
TokenMix supports DeepSeek V4 Pro and Flash	Confirmed	TokenMix model catalog, TokenMix API model catalog
TokenMix exposes DeepSeek through one OpenAI-compatible base URL	Confirmed	TokenMix docs
TokenMix currently lists DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output	Confirmed	TokenMix model catalog
TokenMix currently lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output	Confirmed	TokenMix model catalog
`reasoning_content` can always be dropped in multi-turn tool workflows	False	DeepSeek says it must be passed back after tool calls in thinking mode

What Changed in DeepSeek V4

The biggest DeepSeek API change is model naming and thinking-mode control: V4 Pro/Flash are now the primary IDs, while old chat and reasoner names are compatibility aliases heading toward deprecation.

DeepSeek's quick start lists deepseek-v4-flash, deepseek-v4-pro, deepseek-chat, and deepseek-reasoner, but adds that the old names will be deprecated on 2026-07-24 15:59 UTC. For compatibility, deepseek-chat maps to non-thinking mode of V4 Flash, while deepseek-reasoner maps to thinking mode of V4 Flash (DeepSeek quick start).

Change	Old pattern	2026 DeepSeek V4 pattern	Status
General model ID	`deepseek-chat`	`deepseek-v4-flash` or V4 non-thinking mode	Confirmed
Reasoning model ID	`deepseek-reasoner`	`deepseek-v4-flash` / `deepseek-v4-pro` with thinking enabled	Confirmed
Endpoint	`/chat/completions`	`/chat/completions`	Confirmed
Thinking toggle	Separate model name	`thinking: {"type":"enabled"}` or disabled	Confirmed
Reasoning effort	Limited older behavior	`high` or `max`; low/medium mapped to high	Confirmed
Context	Smaller prior-generation assumptions	1M context	Confirmed
Max output	Older R1/V3 limits varied	Maximum 384K	Confirmed
Output protocol	`content` plus optional CoT	`content` + `reasoning_content` + tool calls + usage details	Confirmed

The practical migration advice: stop hardcoding deepseek-reasoner in new apps. Use V4 model IDs and explicitly control thinking mode.

Confirmed Protocol Map

DeepSeek's response protocol is best described as "OpenAI Chat Completions-compatible with DeepSeek-specific thinking fields."

That phrasing matters. Many developers search for "DeepSeek response API" because they expect OpenAI's newer /responses endpoint. DeepSeek's official API reference says the endpoint creates a model response for a chat conversation at /chat/completions, and returns a chat completion object (DeepSeek chat completion reference).

Protocol surface	DeepSeek behavior	OpenAI compatibility level	Status
Base URL	`https://api.deepseek.com`	SDK-compatible by changing base URL	Confirmed
Endpoint	`/chat/completions`	Chat Completions-compatible	Confirmed
Anthropic format	`https://api.deepseek.com/anthropic`	Anthropic-compatible surface	Confirmed
Final answer	`choices[0].message.content`	OpenAI-style	Confirmed
Thinking output	`choices[0].message.reasoning_content`	DeepSeek-specific extension	Confirmed
Tool calls	`choices[0].message.tool_calls`	OpenAI-style function tool calls	Confirmed
JSON mode	`response_format: {"type":"json_object"}`	OpenAI-style JSON mode	Confirmed
Streaming	SSE chunks when `stream: true`	OpenAI-style SSE pattern	Confirmed
Usage	`prompt_tokens`, `completion_tokens`, cache hit/miss, reasoning tokens	OpenAI-like plus DeepSeek fields	Confirmed
OpenAI `/responses`	No official support found	Not confirmed	False / Unknown

If your SDK only parses message.content, it may silently ignore the most important DeepSeek field: reasoning_content. That is where many "OpenAI-compatible" integrations break in subtle ways.

TokenMix Support Map

TokenMix support is confirmed at the catalog and documentation layer: the site exposes DeepSeek through https://api.tokenmix.ai/v1 and lists V4 Pro/Flash with reasoning, streaming, JSON and tools enabled.

The TokenMix docs say developers can use one TokenMix API key through an OpenAI-compatible base URL. The live TokenMix model catalog lists DeepSeek V4 Pro and DeepSeek V4 Flash as top models, while the public API model catalog marks both with support_reasoning=true, support_streaming=true, support_json=true, support_tools=true, support_structured_output=true, and support_prompt_caching=true.

TokenMix model	Context	Max output	Input / 1M	Output / 1M	Reasoning	JSON	Tools	Streaming
`deepseek/deepseek-v4-pro`	1,000,000	384,000	$0.419118	$0.838235	Yes	Yes	Yes	Yes
`deepseek/deepseek-v4-flash`	1,000,000	384,000	$0.132353	$0.264706	Yes	Yes	Yes	Yes

These are TokenMix catalog prices and feature flags, not a replacement for DeepSeek's direct billing table. DeepSeek direct pricing separates cache-hit and cache-miss input tokens, while TokenMix currently presents model catalog input/output rates for routing through TokenMix.

For broader routing strategy, use AI API Gateway 2026 and TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek response support is one protocol detail; production reliability still needs fallback, logging, and budget controls.

Response Object Anatomy

The DeepSeek response object has four fields developers should parse deliberately: content, reasoning_content, tool_calls, and usage details.

The official schema shows message.content for the final answer, message.reasoning_content for thinking-mode output, message.tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference).

Field	Where it appears	What it means	Parse it?
`choices[0].message.content`	Response message	Final answer shown to user	Always
`choices[0].message.reasoning_content`	Response message	Thinking output before final answer	Yes, if using thinking mode
`choices[0].message.tool_calls`	Response message	Function/tool call payloads	Yes, if tools enabled
`choices[0].finish_reason`	Choice	Why generation stopped	Always
`usage.prompt_tokens`	Usage	Input tokens billed	Always
`usage.prompt_cache_hit_tokens`	Usage	Input tokens served from cache	For cost accounting
`usage.prompt_cache_miss_tokens`	Usage	Input tokens not served from cache	For cost accounting
`usage.completion_tokens_details.reasoning_tokens`	Usage	Reasoning tokens inside output	For reasoning cost/debug

Minimal response parser:

def parse_deepseek_chat_completion(response):
    choice = response.choices[0]
    message = choice.message
    return {
        "answer": getattr(message, "content", None),
        "reasoning": getattr(message, "reasoning_content", None),
        "tool_calls": getattr(message, "tool_calls", None),
        "finish_reason": choice.finish_reason,
        "usage": getattr(response, "usage", None),
    }

The safe rule: never assume a response has only one text field. DeepSeek reasoning models can return both a hidden-work field and a final answer field.

Thinking Mode and reasoning_content

Thinking mode is enabled by default in DeepSeek V4, and reasoning_content handling changes when tool calls are involved.

DeepSeek's thinking-mode docs say thinking: {"type":"enabled"} controls thinking mode, while reasoning_effort accepts high or max; compatibility mappings convert low and medium to high, and xhigh to max (Thinking Mode). The docs also say temperature, top_p, presence_penalty, and frequency_penalty have no effect in thinking mode.

Workflow	What to do with `reasoning_content`	Status
Single-turn reasoning	Read it if you want visible reasoning/debug	Confirmed
Multi-turn, no tool call	Do not need to pass previous CoT back	Confirmed
Multi-turn with tool call	Pass the intermediate `reasoning_content` back in subsequent turns	Confirmed
Legacy `deepseek-reasoner`	Use only for compatibility before deprecation	Confirmed
V4 thinking mode	Use `deepseek-v4-pro` or Flash plus thinking toggle	Confirmed
Thinking + temperature tuning	Do not rely on temperature/top_p	Confirmed

This is the most important implementation caveat. A generic OpenAI Chat Completions wrapper may drop unknown fields between turns. That can break DeepSeek thinking-mode tool workflows because DeepSeek explicitly requires reasoning content to be passed back after tool calls.

JSON, Tools, Streaming, and Cache

DeepSeek V4 supports JSON output, function tools, SSE streaming, and cache accounting, but each feature has one catch.

JSON Output uses response_format: {"type":"json_object"} and requires you to tell the model to output JSON in the prompt; DeepSeek warns that JSON output can occasionally return empty content and should be mitigated through prompt changes (DeepSeek JSON Output). Streaming uses data-only SSE chunks and can include a final usage chunk when stream_options.include_usage is set (DeepSeek chat completion reference).

Feature	Request parameter	Catch	Status
JSON Output	`response_format: {"type":"json_object"}`	Prompt must include JSON instruction	Confirmed
Tool calls	`tools`, `tool_choice`	Validate generated arguments before executing	Confirmed
Strict tool schema	`strict: true` in tool definition	Beta feature	Confirmed
Streaming	`stream: true`	Parse SSE deltas and `[DONE]`	Confirmed
Stream usage	`stream_options.include_usage`	Usage appears in an extra chunk	Confirmed
Prompt caching	cache hit/miss usage fields	Cost depends on cache hit rate	Confirmed
FIM completion	non-thinking mode only	Not for thinking mode	Confirmed
Chat prefix completion	beta base URL required direct to DeepSeek	Route support may vary	Likely

For Node.js streaming patterns, see Node.js AI API 2026. The same OpenAI-compatible client shape works, but the DeepSeek-specific fields still need explicit parsing.

Cost Math

DeepSeek direct pricing is cache-sensitive; TokenMix pricing is catalog-rate routing through TokenMix. Do not mix the two tables without labeling which bill you are estimating.

DeepSeek direct prices per 1M tokens are: Flash cache hit $0.0028, Flash cache miss $0.14, Flash output $0.28; Pro cache hit $0.003625, Pro cache miss $0.435, Pro output $0.87 (DeepSeek pricing). TokenMix currently lists Flash at $0.132353 input and $0.264706 output, and Pro at $0.419118 input and $0.838235 output in the live model catalog.

Workload	Direct DeepSeek Flash, no cache	Direct DeepSeek Pro, no cache	TokenMix Flash catalog	TokenMix Pro catalog
10M input + 2M output	$1.96	$6.09	$1.85	$5.87
100M input + 20M output	$19.60	$60.90	$18.53	$58.68
1B input + 200M output	$196.00	$609.00	$185.29	$586.76

Cost calculation 1: a 10M input / 2M output monthly workload on direct DeepSeek Flash with no cache costs 10 x $0.14 + 2 x $0.28 = $1.96.

Cost calculation 2: the same workload on TokenMix Flash catalog rates costs 10 x $0.132353 + 2 x $0.264706 = $1.85. The difference is small; the main value is unified routing and one API key, not a magical free tier.

Cost calculation 3: if 70% of direct DeepSeek Flash input hits cache, 10M input becomes 7M cache-hit and 3M cache-miss. Input cost is 7 x $0.0028 + 3 x $0.14 = $0.4396, plus 2 x $0.28 = $0.56 output, so total is $0.9996. Cache hit rate can cut the direct bill roughly in half for repeat-heavy workloads.

Cost lever	Direct DeepSeek impact	TokenMix impact	Confidence
Prompt caching	Large if repeated prompts hit cache	Prompt caching supported in catalog	Confirmed
Flash vs Pro	Flash is ~3.1x cheaper than Pro no-cache direct	Flash is ~3.17x cheaper than Pro in catalog	Confirmed
Thinking mode	Reasoning tokens can increase output usage	Track output and reasoning tokens	Confirmed
Streaming	No price change by itself	UX/latency benefit	Confirmed
Tool loops	Can multiply calls and tokens	Add budget caps	Confirmed

For broader token planning, use Token Counting Guide 2026 and DeepSeek 5M Free Tokens.

Code Examples

Use V4 model IDs and explicitly parse reasoning_content. That is the difference between basic compatibility and correct DeepSeek support.

cURL through TokenMix:

curl https://api.tokenmix.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKENMIX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-pro",
    "messages": [
      {"role": "user", "content": "Explain why cache hit tokens change DeepSeek API cost."}
    ],
    "reasoning_effort": "high",
    "extra_body": {
      "thinking": {"type": "enabled"}
    },
    "stream": false
  }'

Python with OpenAI SDK through TokenMix:

from openai import OpenAI

client = OpenAI(
    api_key="TOKENMIX_API_KEY",
    base_url="https://api.tokenmix.ai/v1",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "Return concise answers."},
        {"role": "user", "content": "Give me a JSON checklist for DeepSeek response parsing."},
    ],
    response_format={"type": "json_object"},
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)
print("usage:", response.usage)

Decision helper:

def choose_deepseek_route(workload):
    if workload.get("needs_best_reasoning"):
        return "Use deepseek/deepseek-v4-pro with thinking enabled."
    if workload.get("high_volume") and not workload.get("needs_max_reasoning"):
        return "Use deepseek/deepseek-v4-flash and track cache hit rate."
    if workload.get("needs_tools"):
        return "Use V4 with tools, but preserve reasoning_content after tool calls."
    if workload.get("legacy_model_name"):
        return "Migrate away from deepseek-chat/deepseek-reasoner before 2026-07-24."
    return "Start with Flash, then benchmark Pro only where answers fail."

Decision Matrix

Use DeepSeek V4 Flash for cost-sensitive production, V4 Pro for hard reasoning, and TokenMix when one OpenAI-compatible endpoint matters more than direct-provider wiring.

If your priority is...	Pick	Why
Lowest cost general chat	DeepSeek V4 Flash	Lowest confirmed V4 rate
Complex reasoning	DeepSeek V4 Pro	Higher capability tier
Agent tool loops	V4 Pro or Flash with tools	Tool calls supported
JSON extraction	V4 Flash with JSON Output	Cheap and structured
One API key across vendors	TokenMix route	Same endpoint for DeepSeek plus other models
OpenAI SDK reuse	TokenMix or direct DeepSeek	Both use OpenAI-compatible client shape
Cache-heavy workloads	Direct DeepSeek or route with cache awareness	Cache hit/miss changes input cost
Legacy app using `deepseek-reasoner`	Migrate to V4 IDs	Old names deprecate on 2026-07-24

If you are comparing provider gateways rather than DeepSeek itself, read TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek protocol support is one row in a larger routing decision.

Risks and Caveats

The biggest integration risk is silent field loss: generic OpenAI clients can parse the final answer while dropping reasoning_content, cache details, or reasoning-token usage.

Risk	Likelihood	Status	Fix
Treating DeepSeek as OpenAI `/responses`	High	False assumption	Use `/chat/completions`
Dropping `reasoning_content`	High	Confirmed risk	Extend parser and message history logic
Keeping old model IDs after deprecation	Medium	Confirmed deadline	Migrate before 2026-07-24
Passing CoT back incorrectly	Medium	Confirmed caveat	Follow no-tool vs tool-call rules
Ignoring cache hit/miss	High	Confirmed cost issue	Log both usage fields
Relying on temperature in thinking mode	Medium	Confirmed no-op	Use reasoning effort instead
Assuming JSON mode always returns content	Medium	Confirmed caveat	Prompt for JSON and handle empty content
Executing tool arguments unvalidated	High	Confirmed model caveat	Validate JSON and schema before execution

The trust rule: OpenAI-compatible does not mean field-identical. Compatibility gets the request through the SDK; correctness requires parsing DeepSeek's extra fields.

Final Recommendation

Use deepseek-v4-flash for cheap high-volume work, deepseek-v4-pro for harder reasoning, and TokenMix when you want DeepSeek plus other vendors behind one OpenAI-compatible endpoint. Parse reasoning_content, track cache hit/miss tokens, migrate away from deepseek-chat and deepseek-reasoner before 2026-07-24, and do not call this OpenAI /responses compatibility until DeepSeek documents that endpoint.

FAQ

Does DeepSeek support the OpenAI Responses API?

No official DeepSeek /responses endpoint is documented. DeepSeek uses /chat/completions and returns a chat completion object with DeepSeek-specific reasoning fields.

What is DeepSeek `reasoning_content`?

reasoning_content is the thinking-mode output before the final answer. It appears at the same message level as content, while content is the final answer.

Does TokenMix support DeepSeek response protocol?

TokenMix supports DeepSeek V4 Pro and DeepSeek V4 Flash through its OpenAI-compatible /v1/chat/completions endpoint. The live model catalog marks both as supporting reasoning, streaming, JSON, tools, structured output, and prompt caching.

Should I use `deepseek-chat` or `deepseek-v4-flash`?

Use deepseek-v4-flash for new work. DeepSeek says deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24 15:59 UTC, with compatibility mappings to V4 Flash modes.

How do I enable DeepSeek thinking mode?

Pass thinking: {"type":"enabled"} and use reasoning_effort: "high" or "max". In OpenAI SDKs, DeepSeek shows the thinking parameter inside extra_body.

Does DeepSeek JSON mode guarantee strict JSON?

DeepSeek JSON Output uses response_format: {"type":"json_object"} and is designed to produce valid JSON strings. DeepSeek still tells users to include JSON instructions in the prompt and notes that empty content can occasionally occur.

Does DeepSeek support function calling?

Yes, DeepSeek V4 supports tool calls. Validate the generated function arguments before executing tools, because the docs warn that models may produce invalid JSON or hallucinated parameters.

How much does DeepSeek V4 cost through TokenMix?

As of 2026-06-27, TokenMix lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output, and DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output. Always check the live model catalog before shipping.

About TokenMix

TokenMix.ai is an AI API relay and model access platform for teams that need one endpoint across OpenAI, Anthropic, Google, DeepSeek, Qwen, GLM and 300+ other models. Use TokenMix models to compare available models, TokenMix pricing to plan spend, and TokenMix docs to connect through an OpenAI-compatible API.

Sources

DeepSeek API Quick Start - official OpenAI/Anthropic-compatible format, model IDs, deprecation note, base URLs and first-call examples
DeepSeek Chat Completion Reference - official request parameters, response object, streaming, tool calls, JSON mode, usage fields and reasoning_content
DeepSeek Thinking Mode - official thinking toggle, reasoning effort, no-op sampling parameters and reasoning_content multi-turn rules
DeepSeek Reasoning Model Guide - official legacy deepseek-reasoner behavior and CoT output field
DeepSeek JSON Output - official response_format setup and JSON caveats
DeepSeek Models and Pricing - official V4 model details, context, max output, cache hit/miss pricing and output pricing
DeepSeek List Models Reference - official model listing endpoint
TokenMix OpenAI Compatibility Docs - official TokenMix base URL and OpenAI-compatible client guidance
TokenMix Chat Completions Docs - official TokenMix chat completion endpoint guidance
TokenMix Streaming Docs - official TokenMix SSE streaming guidance
TokenMix Structured Output Docs - official TokenMix JSON / structured output guidance
TokenMix Model Catalog - live model list and displayed DeepSeek V4 prices
TokenMix API Model Catalog - public JSON model feature flags for DeepSeek V4 Pro and Flash