TokenMix Research Lab · 2026-04-23

Step 3.5 Flash Review: StepFun's 196B MoE Outruns DeepSeek V3.2 at $0.10/MTok (2026)

Step 3.5 Flash Review: StepFun's 196B MoE Outruns DeepSeek V3.2 at $0.10/MTok (2026)

Shanghai-based StepFun open-sourced Step 3.5 Flash on February 1, 2026 — a 196-billion-parameter MoE with only 11B active per token, shipped under Apache 2.0. Headline: it outscored DeepSeek V3.2 (671B) and Moonshot's Kimi K2.5 (1T) on agentic, reasoning, and coding benchmarks despite being 3-5× smaller. Hard numbers: 97.3 AIME 2025, 86.4% LiveCodeBench-V6, 74.4% SWE-Bench Verified, 88.2 τ²-Bench, 262K context, 100-300 tok/s generation, and API pricing at $0.10 input / $0.30 output per MTok — the cheapest Chinese frontier tier on the market. TokenMix.ai tracks Step 3.5 Flash alongside 300+ other models, and this review covers who should actually use it, where it beats DeepSeek, and why the "small is the new big" thesis suddenly looks credible.

Table of Contents


Confirmed vs Speculation

Claim Status
Released February 1, 2026 Confirmed (StepFun GitHub)
196B total parameters, 11B active (sparse MoE) Confirmed
Apache 2.0 license Confirmed
262,144 context / 65,536 max output Confirmed (HuggingFace)
74.4% SWE-Bench Verified Confirmed
97.3 AIME 2025 Confirmed
86.4% LiveCodeBench-V6 Confirmed
88.2 τ²-Bench (agentic) Confirmed
$0.10 input / $0.30 output per MTok (OpenRouter) Confirmed (OpenRouter)
Beats DeepSeek V3.2 and Kimi K2.5 on several benchmarks Confirmed (self-reported + third-party)
Drop-in replacement for GPT-4 class workloads Partial — depends on task
Safe for production English-language customer-facing work No — still English-second-language quirks

Why 196B Matters: The Sparse MoE Bet

The 2026 arms race has two camps:

Step 3.5 Flash sits in a third camp: small sparse. Only 196B total, only 11B active per token. That's 1/10th the size of Kimi K2.6. By conventional scaling-law wisdom, it should get crushed.

Instead it beats both on several benchmarks. Why:

  1. Curated training data — StepFun emphasized quality over quantity (following Phi-style methodology but scaled up)
  2. Expert routing efficiency — active params get used well, not averaged
  3. Agent-first training objective — trained with tool use and multi-step coherence as first-class objectives, not afterthoughts

This matters because inference economics favor sparse-small. 11B active params means you can serve Step 3.5 Flash from a single H100 with good throughput (100-300 tok/s). Kimi K2.6 at 32B active needs 2-4× more silicon per request.

Benchmark Breakdown: Where Step 3.5 Flash Wins

Benchmark Step 3.5 Flash DeepSeek V3.2 Kimi K2.5 Claude Opus 4.6
SWE-Bench Verified 74.4% ~68% 80.2% ~83%
AIME 2025 97.3 ~94 ~93 ~95
LiveCodeBench-V6 86.4% ~78% ~82% ~88%
τ²-Bench (agentic) 88.2 ~75 ~80 ~85
GPQA Diamond ~72% ~68% ~70% ~78%
Context 262K 128K 256K 200K

Sources: StepFun self-reported + LLMBase / Design for Online

The real story: Step 3.5 Flash matches or beats DeepSeek V3.2 (a 671B model) on code and reasoning — despite being 3.4× smaller and ~2× cheaper per token. On math specifically (AIME 2025 at 97.3), it's near-saturation and ahead of every Chinese open-source peer.

Step 3.5 Flash vs DeepSeek V3.2 vs Kimi K2.5

Dimension Step 3.5 Flash DeepSeek V3.2 Kimi K2.5
Total params 196B 671B 1T
Active params 11B 37B 32B
Context 262K 128K 256K
License Apache 2.0 DeepSeek Model License Modified MIT
API input ($/MTok) $0.10 $0.14 ~$0.28
API output ($/MTok) $0.30 $0.28 ~ .00
Throughput (tok/s) 100-300 60-150 50-120
Best-in-class at Math + cost/param efficiency Balance Code (until K2.6)
Tool-use maturity B+ B A-

Decision heuristic:

Pricing: $0.10/MTok Is a Wedge

Provider Input ($/MTok) Output ($/MTok) Notes
OpenRouter $0.10 $0.30 Standard tier
OpenRouter free tier $0.00 $0.00 Limited RPM
StepFun direct $0.08-0.10 $0.25-0.30 Discount for high volume
NVIDIA NIM Varies Varies Enterprise path
TokenMix.ai unified API Tracking — see model page

Source: OpenRouter / NVIDIA NIM

Why $0.10/MTok matters: For a workload that spends ,000/month on GPT-4o (at ~$2.50/MTok input), the same workload on Step 3.5 Flash runs ~$40/month. That's a 25× cost compression. Even if Step 3.5 Flash needs 2× more iterations to hit the same quality on your specific task, you still pay 12× less.

The catch: at $0.10/MTok you're competing with DeepSeek V3.2 at $0.14 — and DeepSeek has broader tool ecosystem support. The wedge for Step 3.5 Flash is math-heavy workloads (where it pulls ahead) and large context (where its 262K beats DeepSeek's 128K).

Agentic Capabilities & Tool Use

τ²-Bench at 88.2 puts Step 3.5 Flash in the top tier of agent-capable models. In practical tool-use testing:

Where it's weak: long-horizon autonomous runs (12+ hours) — Step 3.5 Flash isn't trained for 4,000-step coordinated swarms like K2.6 is. For short-burst agent tasks (under 50 steps), it's competitive. For overnight refactors, K2.6 or Claude Opus is better.

How to Run Step 3.5 Flash (API + Self-Host)

Option 1 — OpenRouter:

from openai import OpenAI
client = OpenAI(api_key="your-openrouter-key", base_url="https://openrouter.ai/api/v1")
resp = client.chat.completions.create(
    model="stepfun/step-3.5-flash",
    messages=[{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
)

Option 2 — TokenMix.ai unified API (one key across providers):

client = OpenAI(api_key="your-tokenmix-key", base_url="https://api.tokenmix.ai/v1")
resp = client.chat.completions.create(model="step-3.5-flash", messages=[...])

Option 3 — Self-host from Hugging Face (stepfun-ai/Step-3.5-Flash): At 196B params, you need ~400GB VRAM for FP16, or ~200GB for FP8. Realistically that's 4× H100 or 2× H200. vLLM + expert parallelism works. Throughput: 150-250 tok/s/request at batch 4.

Option 4 — NVIDIA NIM for enterprise hosting (docs)

Where Step 3.5 Flash Falls Short

Three honest weaknesses:

  1. English fluency: Written output has occasional ESL patterns — fine for internal tooling, rough for customer-facing prose. Use Claude or GPT for final polish.
  2. Instruction following on edge cases: Complex system prompts with 10+ constraints sometimes drop a constraint or two. Verify with structured output validators.
  3. Ecosystem lag: Fewer third-party fine-tunes, tutorials, and MCP integrations than DeepSeek or Kimi. You'll be more on your own.

Who Should Actually Use This Model

Use Step 3.5 Flash when:

Don't use Step 3.5 Flash when:

TokenMix.ai lets you A/B test Step 3.5 Flash vs DeepSeek V3.2 and Kimi K2.6 on the same prompt — cheapest way to settle "which open model wins for my workload" without running three separate API accounts.

FAQ

Q: Is Step 3.5 Flash truly free to run commercially? A: Yes under Apache 2.0 — you can self-host and use it in commercial products without royalty or attribution beyond standard Apache requirements. Via OpenRouter's free tier there are RPM limits but no per-token charges.

Q: How does StepFun make money if the model is free? A: Paid API tier for high-volume use, enterprise licensing, and the flagship Step series (closed-weight) for premium customers. StepFun also raised ~$719M USD in 2026 and is pursuing a Hong Kong IPO.

Q: Is Step 3.5 Flash reasoning-model or traditional chat? A: Traditional chat with strong reasoning. No explicit <thinking> tokens like o-series; it produces chain-of-thought inline when prompted.

Q: Why 11B active instead of the more common 32-37B active? A: StepFun's bet is that tighter expert routing beats brute-forcing more active params. The benchmarks vindicate this for math and code; it's less clear-cut for creative writing and long-form English.

Q: Can Step 3.5 Flash handle 262K context reliably? A: In needle-in-haystack tests, recall is strong up to ~200K and degrades noticeably past that. For production use we'd recommend staying under 180K if accuracy is critical.

Q: When will StepFun release Step 4 or a larger flagship? A: No official date as of April 23, 2026. Given the $719M raise and IPO timeline, expect new flagship announcements in Q3/Q4 2026.

Q: Is StepFun blocked or restricted for non-Chinese users? A: No. OpenRouter, HuggingFace, and NVIDIA NIM distribution is global. Direct StepFun API may require additional KYC for Chinese-only billing paths, but international hosted options are unrestricted.


Sources

By TokenMix Research Lab · Updated 2026-04-23