DeepSeek V4 Flash
by DeepSeek · chat
DeepSeek V4 Flash is the efficient DeepSeek V4 model for general chat, coding, analysis, and high-throughput workloads. It supports thinking and non-thinking modes, 1M context, up to 384K output, JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode.
Pricing
Input: $0.1358/M tokens · Output: $0.2716/M tokens
Capabilities
Function Calling, JSON Mode, Streaming, Reasoning
Context: 1000K tokens
Max output: 384K tokens
Routes: 1/1 healthy
Performance
TTFT: 1391ms · Latency: 3284ms · Throughput: 46.6 tok/s