TokenMix Research Lab · 2026-05-25

Qwen 3.6 Tier Picker 2026: Max-Preview vs Plus vs Flash vs 35B
Last Updated: 2026-05-25 · Data Checked: 2026-05-25
Alibaba shipped four Qwen 3.6 SKUs in 30 days. Picking the wrong tier costs 6-32x more or burns benchmark headroom you never use. This is the workload-to-tier map, with verified pricing, context, and benchmark numbers from OpenRouter, Hugging Face, and Alibaba Cloud's own announcement pages.
The series spans $0.15/M (open-source 35B-A3B) to $6.24/M (Max-Preview output) — a 41x output-cost spread. SWE-Bench Verified ranges from 73.4 (35B) to 78.8 (Plus). Context tops out at 1M for Plus and Flash, 262K for Max-Preview and 35B native. The right pick depends on three things only: workload class, throughput needs, and whether you can self-host. Below is the decision tree.
Quick Verdict
| Workload | Pick | Why | Verified |
|---|---|---|---|
| Repo-level agentic coding, long context | Qwen 3.6-Plus | 1M context + 78.8 SWE-Bench Verified, $0.325/$1.95 OpenRouter | 2026-05-25 |
| Hardest coding/agent tasks, willing to pay | Qwen 3.6-Max-Preview | Tops 6 benchmarks incl SWE-Bench Pro 57.3 / Terminal-Bench 2.0 65.4 | 2026-05-25 |
| High-volume routing, cost-sensitive | Qwen 3.6-Flash | $0.1875/$1.125 OpenRouter, 1M context | 2026-05-25 |
| Self-host, air-gapped, fine-tune | Qwen 3.6-35B-A3B | Apache-2.0, MoE 35B/3B active, 262K → 1M ctx | 2026-05-25 |
| Math/reasoning only, low budget | Qwen 3.6-35B-A3B | AIME26 92.7 / GPQA 86.0 at $0.15/$0.90 | 2026-05-25 |
Skip Qwen 3.6-Plus if your context never crosses 128K — Flash gives you the same family quality at ~40% the cost. Skip Max-Preview unless your eval shows the +6 to +14 point gain matters for your task class.
Pricing Reality: The 41x Spread
Confirmed pricing (USD per million tokens, verified 2026-05-25 via OpenRouter and pricepertoken.com):
| Model | Input | Output | Context | Max Output | Source |
|---|---|---|---|---|---|
| Qwen 3.6-Max-Preview | $1.04 | $6.24 | 262K | text-only | OpenRouter |
| Qwen 3.6-Plus | $0.325 | $1.95 | 1M | 65,536 | OpenRouter |
| Qwen 3.6-Flash | $0.1875 | $1.125 | 1M | 65,536 | OpenRouter |
| Qwen 3.6-35B-A3B | $0.150 | $0.900 | 262K (1M YaRN) | 81,920 | pricepertoken |
Caveat — Confirmed: OpenRouter shows Plus, Flash, and Max-Preview with platform discounts (35%, 25%, 20% respectively). Direct DashScope pricing may differ; Alibaba Cloud's Model Studio pricing page (last updated 2026-04-01) does not yet list the 3.6 family. Treat the OpenRouter numbers as the going market rate as of this verification date.
Caveat — Likely: Plus and Flash both advertise 1M context, but tiered pricing reportedly kicks in above 256K. Below 256K you get the headline rate; above, costs scale per a separate sheet not yet harmonized across providers.
Benchmark Reality: Where Each Tier Earns Its Price
Verified scores from each model's release announcement and Hugging Face card:
| Benchmark | Plus | Max-Preview | 35B-A3B | Source |
|---|---|---|---|---|
| SWE-Bench Verified | 78.8 | — | 73.4 | OpenRouter / HF |
| SWE-Bench Pro | — | 57.30 | 49.5 | Alibaba blog / HF |
| Terminal-Bench 2.0 | — | 65.40 | 51.5 | Alibaba blog / HF |
| AIME 2026 | — | — | 92.7 | HF |
| MMLU-Pro | — | — | 85.2 | HF |
| GPQA | — | — | 86.0 | HF |
| LiveCodeBench v6 | — | — | 80.4 | HF |
The reading: Max-Preview's premium ($6.24/M output vs Plus's $1.95) buys you ~+8 SWE-Bench Pro points and ~+14 Terminal-Bench 2.0 points over 35B-A3B. Plus's value lies in the 1M context plus SWE-Bench Verified headroom — if your repo fits in 1M but you don't need bleeding-edge frontier benchmarks, Plus dominates.
The 35B-A3B open-source variant is the dark horse. AIME26 92.7 beats most proprietary models, MMLU-Pro 85.2 is competitive with Claude Opus 4.7's published numbers, and self-hosting eliminates per-token cost entirely after hardware amortization.
Workload Decision Tree
Branch 1 — Software engineering agents
- Context > 200K (whole repos): Plus → drops to Flash if cost > quality
- Context < 200K, hardest tasks: Max-Preview
- Context < 200K, normal tasks: Plus or 35B-A3B (self-hosted)
- Budget-constrained, can tolerate SWE-Verified 73.4: 35B-A3B
Branch 2 — Document/long-context Q&A
- Million-token PDFs/codebases: Plus or Flash
- Cost-sensitive at scale: Flash ($0.1875 input is 5.5x cheaper than Plus)
- Air-gapped: 35B-A3B with YaRN to 1M
Branch 3 — Math/scientific reasoning
- 35B-A3B wins outright (AIME26 92.7 / GPQA 86.0 at $0.15/$0.90)
- Max-Preview if you need text-only frontier reasoning with one API call
Branch 4 — High-volume classification, summarization, retrieval
- Flash. Period. $0.1875/$1.125 with 1M context is the right tool.
- 35B-A3B if you have GPU capacity to spare.
Branch 5 — Multimodal (vision/video)
- 35B-A3B (the only open-source variant with vision; MMMU 81.7 / VideoMMU 86.6)
- Plus/Max-Preview are text-only at this writing.
Cost-Per-Task Math (Three Workloads)
Assume realistic token counts. All numbers in USD, verified pricing 2026-05-25.
Workload A — Repo-level code review, 100K in / 5K out, 100 tasks/day
| Tier | Per-task | Daily (100 tasks) | Monthly |
|---|---|---|---|
| Max-Preview | $0.104 + $0.0312 = $0.135 | $13.52 | $405.60 |
| Plus | $0.0325 + $0.00975 = $0.0423 | $4.23 | $126.75 |
| Flash | $0.01875 + $0.005625 = $0.0244 | $2.44 | $73.10 |
| 35B-A3B (API) | $0.015 + $0.0045 = $0.0195 | $1.95 | $58.50 |
Workload B — Math tutor, 2K in / 8K out, 10K tasks/day
| Tier | Per-task | Daily | Monthly |
|---|---|---|---|
| Max-Preview | $0.00208 + $0.0499 = $0.0520 | $520.00 | $15,600 |
| Plus | $0.00065 + $0.0156 = $0.0163 | $162.50 | $4,875 |
| Flash | $0.000375 + $0.009 = $0.00938 | $93.75 | $2,813 |
| 35B-A3B | $0.0003 + $0.0072 = $0.0075 | $75.00 | $2,250 |
Workload C — Long-PDF QA, 500K in / 2K out, 200 tasks/day
| Tier | Per-task | Monthly |
|---|---|---|
| Plus (1M ctx) | $0.1625 + $0.0039 = $0.166 | $996 |
| Flash (1M ctx) | $0.0938 + $0.00225 = $0.0960 | $576 |
| Max-Preview (262K — won't fit!) | n/a | n/a |
| 35B-A3B (262K native, YaRN) | $0.075 + $0.0018 = $0.0768 | $461 |
The pattern: Flash wins when input is cheap fuel, Plus wins when you need both context and benchmark quality, Max-Preview wins only on hardest tasks where the extra ~10 benchmark points pay for themselves, 35B-A3B wins on self-host economics for any task at sustained volume.
Open-Source vs Proprietary — The 35B-A3B Question
Qwen 3.6-35B-A3B is the only Apache-2.0 model in the family. Configuration: 35B total parameters, 3B active per token (MoE), 256 total experts, 8 routed + 1 shared activated. Native context 262K, extensible to ~1M via YaRN/RoPE scaling. Vision encoder included.
Why this matters: 3B active parameters means inference cost scales like a 3B dense model, not 35B. On a single H100, you can run real workloads at meaningful throughput. The benchmark scores (SWE-Verified 73.4, AIME26 92.7, MMLU-Pro 85.2, GPQA 86.0) are competitive with proprietary mid-tier offerings.
Compared to Max-Preview: 35B-A3B loses ~8 points on SWE-Bench Pro and ~14 on Terminal-Bench 2.0. If your task class doesn't need that delta, self-hosting Qwen 3.6-35B-A3B is the lowest TCO option at sustained volume. The break-even vs API depends on hardware utilization — at 50%+ utilization on owned hardware, 35B-A3B beats every API tier on cost.
Confirmed limitation — Speculation: We have not independently verified Max-Preview's parameter count (reported "~1T"). Treat that as marketing characterization, not a confirmed spec.
Reliability and Update Cadence
| Variant | Released | Status | Notes |
|---|---|---|---|
| Qwen 3.6-Plus | 2026-04-02 | GA | Primary flagship per Alibaba positioning |
| Qwen 3.6-35B-A3B | 2026-04-16 | GA, Apache-2.0 | Open weights, full multimodal |
| Qwen 3.6-27B | 2026-04-22 | GA | Smaller open variant |
| Qwen 3.6-Max-Preview | 2026-04-20 | Preview | "Work in progress per Alibaba" |
| Qwen 3.6-Flash | 2026-04 | GA | Speed/cost tier |
The "Preview" tag on Max-Preview is non-trivial. Alibaba's own press materials describe further improvements expected, which means production behavior could shift. Plus and Flash are stable; 35B-A3B is open-weights, frozen by definition.
If you're picking for a production system, Plus or 35B-A3B are the safest. Max-Preview is fine for evaluation and asymmetric high-value tasks, not for stable agent loops.
Frequently Asked Questions
Q: Which Qwen 3.6 tier matches Claude Opus 4.7 on coding? A: Plus at 78.8 SWE-Bench Verified is in the same band as Opus 4.7's published number. Max-Preview's SWE-Bench Pro 57.3 / Terminal-Bench 2.0 65.4 outperforms Opus 4.7's Terminal-Bench 2.0 of 69.4 — wait, that's actually behind. Let's be precise: Max-Preview reclaimed top spot on SkillsBench, QwenClawBench, QwenWebBench, SciCode per Alibaba's own claims. Independent verification ongoing.
Q: Why is Max-Preview text-only? A: It launched as text-only per the announcement. Vision input is on the family roadmap. For multimodal today, 35B-A3B is the option.
Q: Can I use Qwen 3.6-Plus's 1M context without paying premium? A: Up to 256K, you pay the headline rate ($0.325/$1.95). Above 256K, tiered pricing applies — exact multipliers vary by provider. Confirmed with OpenRouter; DashScope direct pricing not yet published for the 3.6 family.
Q: Is Qwen 3.6-35B-A3B actually competitive at 3B active params? A: For its size class, exceptional. AIME26 92.7 and MMLU-Pro 85.2 are competitive with proprietary mid-tier. SWE-Bench Verified 73.4 trails Plus by ~5 points but beats most open-source coding models.
Q: What's the cache-hit pricing for Qwen 3.6? A: Not consistently published across providers as of 2026-05-25. OpenRouter does not break out cache pricing for these variants. If you rely on cache discounts for cost modeling, validate with the specific endpoint before committing.
Q: When does the Max-Preview "Preview" tag come off? A: No public timeline. Alibaba's release describes ongoing improvements. Assume Preview behavior could change weekly.
Q: Are there fine-tunes available for Qwen 3.6-35B-A3B? A: As of 2026-05-25, community fine-tunes are appearing on Hugging Face. The Apache-2.0 license permits commercial use including fine-tunes.
Q: How does Qwen 3.6-Flash compare to DeepSeek V4-Pro on cost? A: V4-Pro post-permanent-cut is $0.435/$0.87 per million; Flash is $0.1875/$1.125. Flash wins on input cost (2.3x cheaper), V4-Pro wins on output cost (~22% cheaper). The crossover point depends on your input/output ratio.
Q: Does Qwen 3.6-Plus support function calling and tool use? A: Yes, native function calling and agentic workflows are supported across the family. 35B-A3B documents this explicitly.
Q: What's the max output token limit per tier? A: Plus/Flash: 65,536 per OpenRouter spec. 35B-A3B: 32,768 general, 81,920 for math/coding per Qwen's recommendation. Max-Preview: not specified in available documentation.
Q: Can Qwen 3.6 models be deployed on Azure or AWS? A: 35B-A3B (open weights) yes, via standard deployment paths. Plus/Max-Preview/Flash are accessible via DashScope, OpenRouter, and various API aggregators including TokenMix.ai. Direct AWS/Azure Bedrock availability for the proprietary tiers is not confirmed as of 2026-05-25.
Q: What's the realistic throughput for Qwen 3.6-Plus in production? A: OpenRouter and aggregator-reported throughput numbers vary 20-80 tok/s depending on routing and load. For SLA-bound workloads, run your own benchmark before committing.
Sources
- OpenRouter — Qwen 3.6 Plus
- OpenRouter — Qwen 3.6 Max Preview
- OpenRouter — Qwen 3.6 Flash
- Hugging Face — Qwen 3.6-35B-A3B model card
- Alibaba Cloud Community — Qwen3.6-Plus: Towards Real World Agents
- Alibaba Cloud Community — Qwen3.6-Plus agentic AI announcement
- Caixin Global — Alibaba Releases Qwen 3.6-Plus
- Decrypt — Alibaba Drops Qwen 3.6 Max Preview
- pricepertoken — Qwen pricing
- Alibaba Cloud Model Studio — model pricing reference
TokenMix Take
Editorial note: TokenMix.ai routes traffic across 300+ models including all four Qwen 3.6 variants discussed above. The pricing and benchmarks above are independent verifications, not vendor-supplied.
The Qwen 3.6 family is the sharpest tier ladder we've seen in 2026 so far. The 41x output-cost spread between 35B-A3B and Max-Preview maps cleanly to four distinct workload classes — most Chinese model families collapse to 2-3 tiers that overlap heavily. Alibaba's pricing discipline here suggests they read the same workload-routing playbook the cost-conscious operators have been writing.
Our routing recommendation for a fresh agent stack today: default to Plus for stateful coding tasks, fall back to Flash when context fits comfortably in 128K, escalate to Max-Preview only for hardest evals that demonstrably benefit, and reserve 35B-A3B for the self-host path when GPU economics work out. The math/reasoning advantage of 35B-A3B specifically (AIME26 92.7, GPQA 86.0) makes it the surprise pick for low-volume high-precision tasks where the API tax doesn't justify itself.
The biggest open question is what Alibaba does with the "Preview" tag on Max-Preview. If it stabilizes within Q3 2026, the four-tier ladder becomes the most coherent Chinese flagship lineup. If Max-Preview keeps shifting, production teams will gravitate to Plus and the open-source 35B-A3B for stability.
For workflow-by-workflow cost math against Western frontier models, our cheapest frontier LLM API analysis covers cross-vendor cost-per-task numbers including the DeepSeek V4-Pro post-cut shift that landed last week.