TokenMix Research Lab · 2026-04-22
Qwen3.6-Max-Preview Review: 6 Benchmark #1s, Closed-Weights Shift (2026)
Alibaba released Qwen3.6-Max-Preview on April 20, 2026 — and for the first time in Qwen's history, the flagship ships closed-weights only. The model claims top rank on six major coding/agent benchmarks: SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode. Third-party Artificial Analysis gives it Intelligence Index 52 — well above median 14 for reasoning models in the same price tier. Specs: 260K context window, OpenAI + Anthropic API compatible, preserve_thinking feature for multi-turn agents. This review covers what it actually wins, what the closed-weights shift signals for Qwen's ecosystem, and how it compares to GLM-5.1, Claude Opus 4.7, and GPT-5.4. TokenMix.ai routes Qwen3.6-Max-Preview through OpenAI-compatible gateway for teams comparing Chinese and international flagships.
Table of Contents
- Confirmed vs Speculation: The Release Facts
- The 6 Benchmark #1s Explained
- Closed-Weights Shift: Why It Matters
- Specs & API Compatibility
- Qwen3.6-Max vs GLM-5.1 vs Claude Opus 4.7
- Pricing & Access
- Who Should Use Qwen3.6-Max-Preview
- FAQ
Confirmed vs Speculation: The Release Facts
| Claim | Status | Source |
|---|---|---|
| Released April 20, 2026 | Confirmed | Decrypt |
| #1 on SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, SciCode | Confirmed | Alibaba benchmark report |
| Closed-weights only | Confirmed | Model card |
| 260K context window | Confirmed | API docs |
| OpenAI + Anthropic API compatible | Confirmed | Developer docs |
| Intelligence Index 52 (Artificial Analysis) | Confirmed | Artificial Analysis |
| Beats Claude Opus 4.7 on all benchmarks | Mixed — SWE-Bench Pro yes, others Opus leads | — |
| Permanent closed-weights direction | Speculation — "preview" implies not final | — |
Bottom line: genuine #1 SOTA on specific coding benchmarks at release. Closed-weights is a policy shift to watch.
The 6 Benchmark #1s Explained
| Benchmark | What it tests | Qwen3.6-Max | Δ vs Qwen3.6-Plus |
|---|---|---|---|
| SWE-Bench Pro | Real-world software engineering (multi-file) | #1 SOTA | +3-5pp |
| Terminal-Bench 2.0 | Shell commands + build systems | #1 | +3.8 |
| SkillsBench | General problem-solving | #1 | +9.9 |
| SciCode | Scientific programming | #1 | +10.8 |
| QwenClawBench | Tool use & function calling | #1 | — |
| QwenWebBench | Web interaction / browsing | #1 | — |
On generalist benchmarks (non-first-party):
- SuperGPQA: +2.3 vs 3.6-Plus
- QwenChineseBench: +5.3 vs 3.6-Plus
Reality check on "#1 SOTA": SWE-Bench Pro leadership matters — that's where GLM-5.1 held #1 in April 2026 before being unseated. Terminal-Bench 2.0 was Claude Opus 4.7's stronghold at 69.4%. Qwen3.6-Max-Preview reclaiming both is significant.
Where it does not win: GPQA Diamond (Gemini 3.1 Pro still leads at 94.3%), MMLU (saturated around 92% across all frontier models), vision-language tasks (Claude Opus 4.7 + 3.75MP dominates).
Closed-Weights Shift: Why It Matters
Historically, Alibaba published Qwen weights under Apache 2.0 or similar permissive licenses. Qwen3.6-Max-Preview ships closed-weights only, with API-only access via Alibaba Cloud's dashscope and bailian platforms.
Why Alibaba did this (strategic reading):
- Protect compute moat — training a 3.6-Max scale model requires compute Chinese open labs can't match from weights alone
- Monetize API-first — following OpenAI/Anthropic playbook where closed flagship drives revenue
- Regulatory hedge — after the April 2026 US-China AI distillation battle, closed weights reduce distillation risk from both directions
- Research-to-prod pipeline — Qwen team can iterate faster without publishing every checkpoint
What remains open from Qwen: Qwen3.5-Plus, Qwen3-Coder-Plus, Qwen3-VL series, and all prior versions remain open-weight. Closed-weights is flagship-specific, not a brand-wide policy.
Implications for developers:
- Cannot fine-tune on your own data (use API only)
- Cannot self-host for privacy-sensitive workloads
- Pricing is set by Alibaba, not competition with fine-tuners
- Open Qwen variants (3.5-Plus, 3-Max) still fully accessible
Specs & API Compatibility
| Spec | Value |
|---|---|
| Context window | 260,000 tokens |
| Max output | 32,768 tokens |
| Languages | 100+ |
| API compatibility | OpenAI + Anthropic (both endpoints supported) |
| New feature | preserve_thinking — retains reasoning tokens across multi-turn |
| Tool calling | Native support, competitive with Claude |
| Vision | Not in this release (text-only) |
preserve_thinking is the interesting new primitive: for agentic workflows, the model's thinking tokens from turn N can be preserved into turn N+1 context. This mirrors Claude's "extended thinking" pattern and lets agents maintain reasoning trajectories across tool calls.
Qwen3.6-Max vs GLM-5.1 vs Claude Opus 4.7
| Dimension | Qwen3.6-Max-Preview | GLM-5.1 | Claude Opus 4.7 |
|---|---|---|---|
| SWE-Bench Pro | #1 | #2 (70%) | ~54% |
| SWE-Bench Verified | ~82-85% (est) | ~78% | 87.6% |
| Terminal-Bench 2.0 | #1 | ~60% | 69.4% |
| Context window | 260K | 128K | 200K |
| License | Closed | MIT | Commercial |
| Price (input $/M) | ~$0.78-3 (est) | ~$0.45 | $5.00 |
| Best use case | Agentic coding | Open-source coding | Enterprise coding |
Positioning summary:
- Qwen3.6-Max-Preview: agentic coding SOTA, use if API access + price + tool use is priority
- GLM-5.1: still better for self-hosting, fine-tuning, redistributing (MIT license)
- Claude Opus 4.7: still better for SWE-Bench Verified + vision + Anthropic ecosystem
Pricing & Access
Qwen3-Max pricing (base): $0.78/M input, $3.90/M output per OpenRouter/DashScope rates. Qwen3.6-Max-Preview pricing not yet public as of April 22, 2026 — expect a modest premium above 3-Max.
Estimated pricing band:
- Input: $0.80-