TokenMix Research Lab · 2026-06-27

DeepSeek Response API 2026: reasoning_content, JSON, TokenMix
Last Updated: 2026-06-27 Author: TokenMix Research Lab Data verified: 2026-06-27 - DeepSeek API quick start, chat completion reference, thinking mode, JSON output, models and pricing, TokenMix docs, TokenMix model catalog API
DeepSeek's response protocol is Chat Completions plus reasoning_content, not the OpenAI Responses API. TokenMix supports DeepSeek V4 Pro and Flash through one OpenAI-compatible endpoint.
The important change for developers is precise: DeepSeek's official docs now list deepseek-v4-flash and deepseek-v4-pro, with old deepseek-chat and deepseek-reasoner names deprecated on 2026-07-24 15:59 UTC (DeepSeek quick start). The API uses an OpenAI/Anthropic-compatible format, but the response shape is still /chat/completions: choices[0].message.content for the final answer, choices[0].message.reasoning_content for thinking output, tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference). TokenMix's live model catalog confirms DeepSeek V4 Pro and DeepSeek V4 Flash with streaming, reasoning, JSON, tools, structured output, and prompt caching support, exposed through https://api.tokenmix.ai/v1 (TokenMix models, TokenMix API model catalog).
Table of Contents
- Quick Verdict
- What Changed in DeepSeek V4
- Confirmed Protocol Map
- TokenMix Support Map
- Response Object Anatomy
- Thinking Mode and reasoning_content
- JSON, Tools, Streaming, and Cache
- Cost Math
- Code Examples
- Decision Matrix
- Risks and Caveats
- Final Recommendation
- FAQ
- About TokenMix
- Sources
- Related Articles
Quick Verdict
DeepSeek response compatibility is real, but the exact protocol is narrower than the marketing phrase: use Chat Completions, preserve reasoning_content when tools are involved, and do not assume OpenAI /responses endpoint parity.
| Claim | Status | Source |
|---|---|---|
| DeepSeek uses an OpenAI/Anthropic-compatible API format | Confirmed | DeepSeek quick start |
DeepSeek V4 supports deepseek-v4-flash and deepseek-v4-pro model IDs |
Confirmed | DeepSeek quick start, DeepSeek pricing |
deepseek-chat and deepseek-reasoner are deprecated on 2026-07-24 15:59 UTC |
Confirmed | DeepSeek quick start |
DeepSeek's response protocol is OpenAI /responses API compatible |
False | Official endpoint is /chat/completions, not /responses |
DeepSeek returns thinking output through reasoning_content |
Confirmed | DeepSeek chat completion reference, Thinking Mode |
DeepSeek V4 supports JSON Output with response_format: {"type":"json_object"} |
Confirmed | DeepSeek JSON Output |
| DeepSeek V4 supports tool calls | Confirmed | DeepSeek pricing / model features, chat completion reference |
| TokenMix supports DeepSeek V4 Pro and Flash | Confirmed | TokenMix model catalog, TokenMix API model catalog |
| TokenMix exposes DeepSeek through one OpenAI-compatible base URL | Confirmed | TokenMix docs |
| TokenMix currently lists DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output | Confirmed | TokenMix model catalog |
| TokenMix currently lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output | Confirmed | TokenMix model catalog |
reasoning_content can always be dropped in multi-turn tool workflows |
False | DeepSeek says it must be passed back after tool calls in thinking mode |
What Changed in DeepSeek V4
The biggest DeepSeek API change is model naming and thinking-mode control: V4 Pro/Flash are now the primary IDs, while old chat and reasoner names are compatibility aliases heading toward deprecation.
DeepSeek's quick start lists deepseek-v4-flash, deepseek-v4-pro, deepseek-chat, and deepseek-reasoner, but adds that the old names will be deprecated on 2026-07-24 15:59 UTC. For compatibility, deepseek-chat maps to non-thinking mode of V4 Flash, while deepseek-reasoner maps to thinking mode of V4 Flash (DeepSeek quick start).
| Change | Old pattern | 2026 DeepSeek V4 pattern | Status |
|---|---|---|---|
| General model ID | deepseek-chat |
deepseek-v4-flash or V4 non-thinking mode |
Confirmed |
| Reasoning model ID | deepseek-reasoner |
deepseek-v4-flash / deepseek-v4-pro with thinking enabled |
Confirmed |
| Endpoint | /chat/completions |
/chat/completions |
Confirmed |
| Thinking toggle | Separate model name | thinking: {"type":"enabled"} or disabled |
Confirmed |
| Reasoning effort | Limited older behavior | high or max; low/medium mapped to high |
Confirmed |
| Context | Smaller prior-generation assumptions | 1M context | Confirmed |
| Max output | Older R1/V3 limits varied | Maximum 384K | Confirmed |
| Output protocol | content plus optional CoT |
content + reasoning_content + tool calls + usage details |
Confirmed |
The practical migration advice: stop hardcoding deepseek-reasoner in new apps. Use V4 model IDs and explicitly control thinking mode.
Confirmed Protocol Map
DeepSeek's response protocol is best described as "OpenAI Chat Completions-compatible with DeepSeek-specific thinking fields."
That phrasing matters. Many developers search for "DeepSeek response API" because they expect OpenAI's newer /responses endpoint. DeepSeek's official API reference says the endpoint creates a model response for a chat conversation at /chat/completions, and returns a chat completion object (DeepSeek chat completion reference).
| Protocol surface | DeepSeek behavior | OpenAI compatibility level | Status |
|---|---|---|---|
| Base URL | https://api.deepseek.com |
SDK-compatible by changing base URL | Confirmed |
| Endpoint | /chat/completions |
Chat Completions-compatible | Confirmed |
| Anthropic format | https://api.deepseek.com/anthropic |
Anthropic-compatible surface | Confirmed |
| Final answer | choices[0].message.content |
OpenAI-style | Confirmed |
| Thinking output | choices[0].message.reasoning_content |
DeepSeek-specific extension | Confirmed |
| Tool calls | choices[0].message.tool_calls |
OpenAI-style function tool calls | Confirmed |
| JSON mode | response_format: {"type":"json_object"} |
OpenAI-style JSON mode | Confirmed |
| Streaming | SSE chunks when stream: true |
OpenAI-style SSE pattern | Confirmed |
| Usage | prompt_tokens, completion_tokens, cache hit/miss, reasoning tokens |
OpenAI-like plus DeepSeek fields | Confirmed |
OpenAI /responses |
No official support found | Not confirmed | False / Unknown |
If your SDK only parses message.content, it may silently ignore the most important DeepSeek field: reasoning_content. That is where many "OpenAI-compatible" integrations break in subtle ways.
TokenMix Support Map
TokenMix support is confirmed at the catalog and documentation layer: the site exposes DeepSeek through https://api.tokenmix.ai/v1 and lists V4 Pro/Flash with reasoning, streaming, JSON and tools enabled.
The TokenMix docs say developers can use one TokenMix API key through an OpenAI-compatible base URL. The live TokenMix model catalog lists DeepSeek V4 Pro and DeepSeek V4 Flash as top models, while the public API model catalog marks both with support_reasoning=true, support_streaming=true, support_json=true, support_tools=true, support_structured_output=true, and support_prompt_caching=true.
| TokenMix model | Context | Max output | Input / 1M | Output / 1M | Reasoning | JSON | Tools | Streaming |
|---|---|---|---|---|---|---|---|---|
deepseek/deepseek-v4-pro |
1,000,000 | 384,000 | $0.419118 | $0.838235 | Yes | Yes | Yes | Yes |
deepseek/deepseek-v4-flash |
1,000,000 | 384,000 | $0.132353 | $0.264706 | Yes | Yes | Yes | Yes |
These are TokenMix catalog prices and feature flags, not a replacement for DeepSeek's direct billing table. DeepSeek direct pricing separates cache-hit and cache-miss input tokens, while TokenMix currently presents model catalog input/output rates for routing through TokenMix.
For broader routing strategy, use AI API Gateway 2026 and TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek response support is one protocol detail; production reliability still needs fallback, logging, and budget controls.
Response Object Anatomy
The DeepSeek response object has four fields developers should parse deliberately: content, reasoning_content, tool_calls, and usage details.
The official schema shows message.content for the final answer, message.reasoning_content for thinking-mode output, message.tool_calls for function calls, and usage.completion_tokens_details.reasoning_tokens for reasoning-token accounting (DeepSeek chat completion reference).
| Field | Where it appears | What it means | Parse it? |
|---|---|---|---|
choices[0].message.content |
Response message | Final answer shown to user | Always |
choices[0].message.reasoning_content |
Response message | Thinking output before final answer | Yes, if using thinking mode |
choices[0].message.tool_calls |
Response message | Function/tool call payloads | Yes, if tools enabled |
choices[0].finish_reason |
Choice | Why generation stopped | Always |
usage.prompt_tokens |
Usage | Input tokens billed | Always |
usage.prompt_cache_hit_tokens |
Usage | Input tokens served from cache | For cost accounting |
usage.prompt_cache_miss_tokens |
Usage | Input tokens not served from cache | For cost accounting |
usage.completion_tokens_details.reasoning_tokens |
Usage | Reasoning tokens inside output | For reasoning cost/debug |
Minimal response parser:
def parse_deepseek_chat_completion(response):
choice = response.choices[0]
message = choice.message
return {
"answer": getattr(message, "content", None),
"reasoning": getattr(message, "reasoning_content", None),
"tool_calls": getattr(message, "tool_calls", None),
"finish_reason": choice.finish_reason,
"usage": getattr(response, "usage", None),
}
The safe rule: never assume a response has only one text field. DeepSeek reasoning models can return both a hidden-work field and a final answer field.
Thinking Mode and reasoning_content
Thinking mode is enabled by default in DeepSeek V4, and reasoning_content handling changes when tool calls are involved.
DeepSeek's thinking-mode docs say thinking: {"type":"enabled"} controls thinking mode, while reasoning_effort accepts high or max; compatibility mappings convert low and medium to high, and xhigh to max (Thinking Mode). The docs also say temperature, top_p, presence_penalty, and frequency_penalty have no effect in thinking mode.
| Workflow | What to do with reasoning_content |
Status |
|---|---|---|
| Single-turn reasoning | Read it if you want visible reasoning/debug | Confirmed |
| Multi-turn, no tool call | Do not need to pass previous CoT back | Confirmed |
| Multi-turn with tool call | Pass the intermediate reasoning_content back in subsequent turns |
Confirmed |
Legacy deepseek-reasoner |
Use only for compatibility before deprecation | Confirmed |
| V4 thinking mode | Use deepseek-v4-pro or Flash plus thinking toggle |
Confirmed |
| Thinking + temperature tuning | Do not rely on temperature/top_p | Confirmed |
This is the most important implementation caveat. A generic OpenAI Chat Completions wrapper may drop unknown fields between turns. That can break DeepSeek thinking-mode tool workflows because DeepSeek explicitly requires reasoning content to be passed back after tool calls.
JSON, Tools, Streaming, and Cache
DeepSeek V4 supports JSON output, function tools, SSE streaming, and cache accounting, but each feature has one catch.
JSON Output uses response_format: {"type":"json_object"} and requires you to tell the model to output JSON in the prompt; DeepSeek warns that JSON output can occasionally return empty content and should be mitigated through prompt changes (DeepSeek JSON Output). Streaming uses data-only SSE chunks and can include a final usage chunk when stream_options.include_usage is set (DeepSeek chat completion reference).
| Feature | Request parameter | Catch | Status |
|---|---|---|---|
| JSON Output | response_format: {"type":"json_object"} |
Prompt must include JSON instruction | Confirmed |
| Tool calls | tools, tool_choice |
Validate generated arguments before executing | Confirmed |
| Strict tool schema | strict: true in tool definition |
Beta feature | Confirmed |
| Streaming | stream: true |
Parse SSE deltas and [DONE] |
Confirmed |
| Stream usage | stream_options.include_usage |
Usage appears in an extra chunk | Confirmed |
| Prompt caching | cache hit/miss usage fields | Cost depends on cache hit rate | Confirmed |
| FIM completion | non-thinking mode only | Not for thinking mode | Confirmed |
| Chat prefix completion | beta base URL required direct to DeepSeek | Route support may vary | Likely |
For Node.js streaming patterns, see Node.js AI API 2026. The same OpenAI-compatible client shape works, but the DeepSeek-specific fields still need explicit parsing.
Cost Math
DeepSeek direct pricing is cache-sensitive; TokenMix pricing is catalog-rate routing through TokenMix. Do not mix the two tables without labeling which bill you are estimating.
DeepSeek direct prices per 1M tokens are: Flash cache hit $0.0028, Flash cache miss $0.14, Flash output $0.28; Pro cache hit $0.003625, Pro cache miss $0.435, Pro output $0.87 (DeepSeek pricing). TokenMix currently lists Flash at $0.132353 input and $0.264706 output, and Pro at $0.419118 input and $0.838235 output in the live model catalog.
| Workload | Direct DeepSeek Flash, no cache | Direct DeepSeek Pro, no cache | TokenMix Flash catalog | TokenMix Pro catalog |
|---|---|---|---|---|
| 10M input + 2M output | $1.96 | $6.09 | $1.85 | $5.87 |
| 100M input + 20M output | $19.60 | $60.90 | $18.53 | $58.68 |
| 1B input + 200M output | $196.00 | $609.00 | $185.29 | $586.76 |
Cost calculation 1: a 10M input / 2M output monthly workload on direct DeepSeek Flash with no cache costs 10 x $0.14 + 2 x $0.28 = $1.96.
Cost calculation 2: the same workload on TokenMix Flash catalog rates costs 10 x $0.132353 + 2 x $0.264706 = $1.85. The difference is small; the main value is unified routing and one API key, not a magical free tier.
Cost calculation 3: if 70% of direct DeepSeek Flash input hits cache, 10M input becomes 7M cache-hit and 3M cache-miss. Input cost is 7 x $0.0028 + 3 x $0.14 = $0.4396, plus 2 x $0.28 = $0.56 output, so total is $0.9996. Cache hit rate can cut the direct bill roughly in half for repeat-heavy workloads.
| Cost lever | Direct DeepSeek impact | TokenMix impact | Confidence |
|---|---|---|---|
| Prompt caching | Large if repeated prompts hit cache | Prompt caching supported in catalog | Confirmed |
| Flash vs Pro | Flash is ~3.1x cheaper than Pro no-cache direct | Flash is ~3.17x cheaper than Pro in catalog | Confirmed |
| Thinking mode | Reasoning tokens can increase output usage | Track output and reasoning tokens | Confirmed |
| Streaming | No price change by itself | UX/latency benefit | Confirmed |
| Tool loops | Can multiply calls and tokens | Add budget caps | Confirmed |
For broader token planning, use Token Counting Guide 2026 and DeepSeek 5M Free Tokens.
Code Examples
Use V4 model IDs and explicitly parse reasoning_content. That is the difference between basic compatibility and correct DeepSeek support.
cURL through TokenMix:
curl https://api.tokenmix.ai/v1/chat/completions \
-H "Authorization: Bearer $TOKENMIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-pro",
"messages": [
{"role": "user", "content": "Explain why cache hit tokens change DeepSeek API cost."}
],
"reasoning_effort": "high",
"extra_body": {
"thinking": {"type": "enabled"}
},
"stream": false
}'
Python with OpenAI SDK through TokenMix:
from openai import OpenAI
client = OpenAI(
api_key="TOKENMIX_API_KEY",
base_url="https://api.tokenmix.ai/v1",
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[
{"role": "system", "content": "Return concise answers."},
{"role": "user", "content": "Give me a JSON checklist for DeepSeek response parsing."},
],
response_format={"type": "json_object"},
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
message = response.choices[0].message
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)
print("usage:", response.usage)
Decision helper:
def choose_deepseek_route(workload):
if workload.get("needs_best_reasoning"):
return "Use deepseek/deepseek-v4-pro with thinking enabled."
if workload.get("high_volume") and not workload.get("needs_max_reasoning"):
return "Use deepseek/deepseek-v4-flash and track cache hit rate."
if workload.get("needs_tools"):
return "Use V4 with tools, but preserve reasoning_content after tool calls."
if workload.get("legacy_model_name"):
return "Migrate away from deepseek-chat/deepseek-reasoner before 2026-07-24."
return "Start with Flash, then benchmark Pro only where answers fail."
Decision Matrix
Use DeepSeek V4 Flash for cost-sensitive production, V4 Pro for hard reasoning, and TokenMix when one OpenAI-compatible endpoint matters more than direct-provider wiring.
| If your priority is... | Pick | Why |
|---|---|---|
| Lowest cost general chat | DeepSeek V4 Flash | Lowest confirmed V4 rate |
| Complex reasoning | DeepSeek V4 Pro | Higher capability tier |
| Agent tool loops | V4 Pro or Flash with tools | Tool calls supported |
| JSON extraction | V4 Flash with JSON Output | Cheap and structured |
| One API key across vendors | TokenMix route | Same endpoint for DeepSeek plus other models |
| OpenAI SDK reuse | TokenMix or direct DeepSeek | Both use OpenAI-compatible client shape |
| Cache-heavy workloads | Direct DeepSeek or route with cache awareness | Cache hit/miss changes input cost |
Legacy app using deepseek-reasoner |
Migrate to V4 IDs | Old names deprecate on 2026-07-24 |
If you are comparing provider gateways rather than DeepSeek itself, read TokenMix vs OpenRouter vs Portkey vs LiteLLM. DeepSeek protocol support is one row in a larger routing decision.
Risks and Caveats
The biggest integration risk is silent field loss: generic OpenAI clients can parse the final answer while dropping reasoning_content, cache details, or reasoning-token usage.
| Risk | Likelihood | Status | Fix |
|---|---|---|---|
Treating DeepSeek as OpenAI /responses |
High | False assumption | Use /chat/completions |
Dropping reasoning_content |
High | Confirmed risk | Extend parser and message history logic |
| Keeping old model IDs after deprecation | Medium | Confirmed deadline | Migrate before 2026-07-24 |
| Passing CoT back incorrectly | Medium | Confirmed caveat | Follow no-tool vs tool-call rules |
| Ignoring cache hit/miss | High | Confirmed cost issue | Log both usage fields |
| Relying on temperature in thinking mode | Medium | Confirmed no-op | Use reasoning effort instead |
| Assuming JSON mode always returns content | Medium | Confirmed caveat | Prompt for JSON and handle empty content |
| Executing tool arguments unvalidated | High | Confirmed model caveat | Validate JSON and schema before execution |
The trust rule: OpenAI-compatible does not mean field-identical. Compatibility gets the request through the SDK; correctness requires parsing DeepSeek's extra fields.
Final Recommendation
Use deepseek-v4-flash for cheap high-volume work, deepseek-v4-pro for harder reasoning, and TokenMix when you want DeepSeek plus other vendors behind one OpenAI-compatible endpoint. Parse reasoning_content, track cache hit/miss tokens, migrate away from deepseek-chat and deepseek-reasoner before 2026-07-24, and do not call this OpenAI /responses compatibility until DeepSeek documents that endpoint.
FAQ
Does DeepSeek support the OpenAI Responses API?
No official DeepSeek /responses endpoint is documented. DeepSeek uses /chat/completions and returns a chat completion object with DeepSeek-specific reasoning fields.
What is DeepSeek reasoning_content?
reasoning_content is the thinking-mode output before the final answer. It appears at the same message level as content, while content is the final answer.
Does TokenMix support DeepSeek response protocol?
TokenMix supports DeepSeek V4 Pro and DeepSeek V4 Flash through its OpenAI-compatible /v1/chat/completions endpoint. The live model catalog marks both as supporting reasoning, streaming, JSON, tools, structured output, and prompt caching.
Should I use deepseek-chat or deepseek-v4-flash?
Use deepseek-v4-flash for new work. DeepSeek says deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24 15:59 UTC, with compatibility mappings to V4 Flash modes.
How do I enable DeepSeek thinking mode?
Pass thinking: {"type":"enabled"} and use reasoning_effort: "high" or "max". In OpenAI SDKs, DeepSeek shows the thinking parameter inside extra_body.
Does DeepSeek JSON mode guarantee strict JSON?
DeepSeek JSON Output uses response_format: {"type":"json_object"} and is designed to produce valid JSON strings. DeepSeek still tells users to include JSON instructions in the prompt and notes that empty content can occasionally occur.
Does DeepSeek support function calling?
Yes, DeepSeek V4 supports tool calls. Validate the generated function arguments before executing tools, because the docs warn that models may produce invalid JSON or hallucinated parameters.
How much does DeepSeek V4 cost through TokenMix?
As of 2026-06-27, TokenMix lists DeepSeek V4 Flash at $0.132353/M input and $0.264706/M output, and DeepSeek V4 Pro at $0.419118/M input and $0.838235/M output. Always check the live model catalog before shipping.
About TokenMix
TokenMix.ai is an AI API relay and model access platform for teams that need one endpoint across OpenAI, Anthropic, Google, DeepSeek, Qwen, GLM and 300+ other models. Use TokenMix models to compare available models, TokenMix pricing to plan spend, and TokenMix docs to connect through an OpenAI-compatible API.
Sources
- DeepSeek API Quick Start - official OpenAI/Anthropic-compatible format, model IDs, deprecation note, base URLs and first-call examples
- DeepSeek Chat Completion Reference - official request parameters, response object, streaming, tool calls, JSON mode, usage fields and
reasoning_content - DeepSeek Thinking Mode - official thinking toggle, reasoning effort, no-op sampling parameters and
reasoning_contentmulti-turn rules - DeepSeek Reasoning Model Guide - official legacy
deepseek-reasonerbehavior and CoT output field - DeepSeek JSON Output - official
response_formatsetup and JSON caveats - DeepSeek Models and Pricing - official V4 model details, context, max output, cache hit/miss pricing and output pricing
- DeepSeek List Models Reference - official model listing endpoint
- TokenMix OpenAI Compatibility Docs - official TokenMix base URL and OpenAI-compatible client guidance
- TokenMix Chat Completions Docs - official TokenMix chat completion endpoint guidance
- TokenMix Streaming Docs - official TokenMix SSE streaming guidance
- TokenMix Structured Output Docs - official TokenMix JSON / structured output guidance
- TokenMix Model Catalog - live model list and displayed DeepSeek V4 prices
- TokenMix API Model Catalog - public JSON model feature flags for DeepSeek V4 Pro and Flash