TokenMix Research Lab · 2026-05-25

Qwen 3.6 Tier Picker 2026: Max-Preview vs Plus vs Flash vs 35B

Qwen 3.6 Tier Picker 2026: Max-Preview vs Plus vs Flash vs 35B

Last Updated: 2026-05-25 · Data Checked: 2026-05-25

Alibaba shipped four Qwen 3.6 SKUs in 30 days. Picking the wrong tier costs 6-32x more or burns benchmark headroom you never use. This is the workload-to-tier map, with verified pricing, context, and benchmark numbers from OpenRouter, Hugging Face, and Alibaba Cloud's own announcement pages.

The series spans $0.15/M (open-source 35B-A3B) to $6.24/M (Max-Preview output) — a 41x output-cost spread. SWE-Bench Verified ranges from 73.4 (35B) to 78.8 (Plus). Context tops out at 1M for Plus and Flash, 262K for Max-Preview and 35B native. The right pick depends on three things only: workload class, throughput needs, and whether you can self-host. Below is the decision tree.


Quick Verdict

Workload Pick Why Verified
Repo-level agentic coding, long context Qwen 3.6-Plus 1M context + 78.8 SWE-Bench Verified, $0.325/$1.95 OpenRouter 2026-05-25
Hardest coding/agent tasks, willing to pay Qwen 3.6-Max-Preview Tops 6 benchmarks incl SWE-Bench Pro 57.3 / Terminal-Bench 2.0 65.4 2026-05-25
High-volume routing, cost-sensitive Qwen 3.6-Flash $0.1875/$1.125 OpenRouter, 1M context 2026-05-25
Self-host, air-gapped, fine-tune Qwen 3.6-35B-A3B Apache-2.0, MoE 35B/3B active, 262K → 1M ctx 2026-05-25
Math/reasoning only, low budget Qwen 3.6-35B-A3B AIME26 92.7 / GPQA 86.0 at $0.15/$0.90 2026-05-25

Skip Qwen 3.6-Plus if your context never crosses 128K — Flash gives you the same family quality at ~40% the cost. Skip Max-Preview unless your eval shows the +6 to +14 point gain matters for your task class.


Pricing Reality: The 41x Spread

Confirmed pricing (USD per million tokens, verified 2026-05-25 via OpenRouter and pricepertoken.com):

Model Input Output Context Max Output Source
Qwen 3.6-Max-Preview $1.04 $6.24 262K text-only OpenRouter
Qwen 3.6-Plus $0.325 $1.95 1M 65,536 OpenRouter
Qwen 3.6-Flash $0.1875 $1.125 1M 65,536 OpenRouter
Qwen 3.6-35B-A3B $0.150 $0.900 262K (1M YaRN) 81,920 pricepertoken

Caveat — Confirmed: OpenRouter shows Plus, Flash, and Max-Preview with platform discounts (35%, 25%, 20% respectively). Direct DashScope pricing may differ; Alibaba Cloud's Model Studio pricing page (last updated 2026-04-01) does not yet list the 3.6 family. Treat the OpenRouter numbers as the going market rate as of this verification date.

Caveat — Likely: Plus and Flash both advertise 1M context, but tiered pricing reportedly kicks in above 256K. Below 256K you get the headline rate; above, costs scale per a separate sheet not yet harmonized across providers.


Benchmark Reality: Where Each Tier Earns Its Price

Verified scores from each model's release announcement and Hugging Face card:

Benchmark Plus Max-Preview 35B-A3B Source
SWE-Bench Verified 78.8 73.4 OpenRouter / HF
SWE-Bench Pro 57.30 49.5 Alibaba blog / HF
Terminal-Bench 2.0 65.40 51.5 Alibaba blog / HF
AIME 2026 92.7 HF
MMLU-Pro 85.2 HF
GPQA 86.0 HF
LiveCodeBench v6 80.4 HF

The reading: Max-Preview's premium ($6.24/M output vs Plus's $1.95) buys you ~+8 SWE-Bench Pro points and ~+14 Terminal-Bench 2.0 points over 35B-A3B. Plus's value lies in the 1M context plus SWE-Bench Verified headroom — if your repo fits in 1M but you don't need bleeding-edge frontier benchmarks, Plus dominates.

The 35B-A3B open-source variant is the dark horse. AIME26 92.7 beats most proprietary models, MMLU-Pro 85.2 is competitive with Claude Opus 4.7's published numbers, and self-hosting eliminates per-token cost entirely after hardware amortization.


Workload Decision Tree

Branch 1 — Software engineering agents

Branch 2 — Document/long-context Q&A

Branch 3 — Math/scientific reasoning

Branch 4 — High-volume classification, summarization, retrieval

Branch 5 — Multimodal (vision/video)


Cost-Per-Task Math (Three Workloads)

Assume realistic token counts. All numbers in USD, verified pricing 2026-05-25.

Workload A — Repo-level code review, 100K in / 5K out, 100 tasks/day

Tier Per-task Daily (100 tasks) Monthly
Max-Preview $0.104 + $0.0312 = $0.135 $13.52 $405.60
Plus $0.0325 + $0.00975 = $0.0423 $4.23 $126.75
Flash $0.01875 + $0.005625 = $0.0244 $2.44 $73.10
35B-A3B (API) $0.015 + $0.0045 = $0.0195 $1.95 $58.50

Workload B — Math tutor, 2K in / 8K out, 10K tasks/day

Tier Per-task Daily Monthly
Max-Preview $0.00208 + $0.0499 = $0.0520 $520.00 $15,600
Plus $0.00065 + $0.0156 = $0.0163 $162.50 $4,875
Flash $0.000375 + $0.009 = $0.00938 $93.75 $2,813
35B-A3B $0.0003 + $0.0072 = $0.0075 $75.00 $2,250

Workload C — Long-PDF QA, 500K in / 2K out, 200 tasks/day

Tier Per-task Monthly
Plus (1M ctx) $0.1625 + $0.0039 = $0.166 $996
Flash (1M ctx) $0.0938 + $0.00225 = $0.0960 $576
Max-Preview (262K — won't fit!) n/a n/a
35B-A3B (262K native, YaRN) $0.075 + $0.0018 = $0.0768 $461

The pattern: Flash wins when input is cheap fuel, Plus wins when you need both context and benchmark quality, Max-Preview wins only on hardest tasks where the extra ~10 benchmark points pay for themselves, 35B-A3B wins on self-host economics for any task at sustained volume.


Open-Source vs Proprietary — The 35B-A3B Question

Qwen 3.6-35B-A3B is the only Apache-2.0 model in the family. Configuration: 35B total parameters, 3B active per token (MoE), 256 total experts, 8 routed + 1 shared activated. Native context 262K, extensible to ~1M via YaRN/RoPE scaling. Vision encoder included.

Why this matters: 3B active parameters means inference cost scales like a 3B dense model, not 35B. On a single H100, you can run real workloads at meaningful throughput. The benchmark scores (SWE-Verified 73.4, AIME26 92.7, MMLU-Pro 85.2, GPQA 86.0) are competitive with proprietary mid-tier offerings.

Compared to Max-Preview: 35B-A3B loses ~8 points on SWE-Bench Pro and ~14 on Terminal-Bench 2.0. If your task class doesn't need that delta, self-hosting Qwen 3.6-35B-A3B is the lowest TCO option at sustained volume. The break-even vs API depends on hardware utilization — at 50%+ utilization on owned hardware, 35B-A3B beats every API tier on cost.

Confirmed limitation — Speculation: We have not independently verified Max-Preview's parameter count (reported "~1T"). Treat that as marketing characterization, not a confirmed spec.


Reliability and Update Cadence

Variant Released Status Notes
Qwen 3.6-Plus 2026-04-02 GA Primary flagship per Alibaba positioning
Qwen 3.6-35B-A3B 2026-04-16 GA, Apache-2.0 Open weights, full multimodal
Qwen 3.6-27B 2026-04-22 GA Smaller open variant
Qwen 3.6-Max-Preview 2026-04-20 Preview "Work in progress per Alibaba"
Qwen 3.6-Flash 2026-04 GA Speed/cost tier

The "Preview" tag on Max-Preview is non-trivial. Alibaba's own press materials describe further improvements expected, which means production behavior could shift. Plus and Flash are stable; 35B-A3B is open-weights, frozen by definition.

If you're picking for a production system, Plus or 35B-A3B are the safest. Max-Preview is fine for evaluation and asymmetric high-value tasks, not for stable agent loops.


Frequently Asked Questions

Q: Which Qwen 3.6 tier matches Claude Opus 4.7 on coding? A: Plus at 78.8 SWE-Bench Verified is in the same band as Opus 4.7's published number. Max-Preview's SWE-Bench Pro 57.3 / Terminal-Bench 2.0 65.4 outperforms Opus 4.7's Terminal-Bench 2.0 of 69.4 — wait, that's actually behind. Let's be precise: Max-Preview reclaimed top spot on SkillsBench, QwenClawBench, QwenWebBench, SciCode per Alibaba's own claims. Independent verification ongoing.

Q: Why is Max-Preview text-only? A: It launched as text-only per the announcement. Vision input is on the family roadmap. For multimodal today, 35B-A3B is the option.

Q: Can I use Qwen 3.6-Plus's 1M context without paying premium? A: Up to 256K, you pay the headline rate ($0.325/$1.95). Above 256K, tiered pricing applies — exact multipliers vary by provider. Confirmed with OpenRouter; DashScope direct pricing not yet published for the 3.6 family.

Q: Is Qwen 3.6-35B-A3B actually competitive at 3B active params? A: For its size class, exceptional. AIME26 92.7 and MMLU-Pro 85.2 are competitive with proprietary mid-tier. SWE-Bench Verified 73.4 trails Plus by ~5 points but beats most open-source coding models.

Q: What's the cache-hit pricing for Qwen 3.6? A: Not consistently published across providers as of 2026-05-25. OpenRouter does not break out cache pricing for these variants. If you rely on cache discounts for cost modeling, validate with the specific endpoint before committing.

Q: When does the Max-Preview "Preview" tag come off? A: No public timeline. Alibaba's release describes ongoing improvements. Assume Preview behavior could change weekly.

Q: Are there fine-tunes available for Qwen 3.6-35B-A3B? A: As of 2026-05-25, community fine-tunes are appearing on Hugging Face. The Apache-2.0 license permits commercial use including fine-tunes.

Q: How does Qwen 3.6-Flash compare to DeepSeek V4-Pro on cost? A: V4-Pro post-permanent-cut is $0.435/$0.87 per million; Flash is $0.1875/$1.125. Flash wins on input cost (2.3x cheaper), V4-Pro wins on output cost (~22% cheaper). The crossover point depends on your input/output ratio.

Q: Does Qwen 3.6-Plus support function calling and tool use? A: Yes, native function calling and agentic workflows are supported across the family. 35B-A3B documents this explicitly.

Q: What's the max output token limit per tier? A: Plus/Flash: 65,536 per OpenRouter spec. 35B-A3B: 32,768 general, 81,920 for math/coding per Qwen's recommendation. Max-Preview: not specified in available documentation.

Q: Can Qwen 3.6 models be deployed on Azure or AWS? A: 35B-A3B (open weights) yes, via standard deployment paths. Plus/Max-Preview/Flash are accessible via DashScope, OpenRouter, and various API aggregators including TokenMix.ai. Direct AWS/Azure Bedrock availability for the proprietary tiers is not confirmed as of 2026-05-25.

Q: What's the realistic throughput for Qwen 3.6-Plus in production? A: OpenRouter and aggregator-reported throughput numbers vary 20-80 tok/s depending on routing and load. For SLA-bound workloads, run your own benchmark before committing.


Sources


TokenMix Take

Editorial note: TokenMix.ai routes traffic across 300+ models including all four Qwen 3.6 variants discussed above. The pricing and benchmarks above are independent verifications, not vendor-supplied.

The Qwen 3.6 family is the sharpest tier ladder we've seen in 2026 so far. The 41x output-cost spread between 35B-A3B and Max-Preview maps cleanly to four distinct workload classes — most Chinese model families collapse to 2-3 tiers that overlap heavily. Alibaba's pricing discipline here suggests they read the same workload-routing playbook the cost-conscious operators have been writing.

Our routing recommendation for a fresh agent stack today: default to Plus for stateful coding tasks, fall back to Flash when context fits comfortably in 128K, escalate to Max-Preview only for hardest evals that demonstrably benefit, and reserve 35B-A3B for the self-host path when GPU economics work out. The math/reasoning advantage of 35B-A3B specifically (AIME26 92.7, GPQA 86.0) makes it the surprise pick for low-volume high-precision tasks where the API tax doesn't justify itself.

The biggest open question is what Alibaba does with the "Preview" tag on Max-Preview. If it stabilizes within Q3 2026, the four-tier ladder becomes the most coherent Chinese flagship lineup. If Max-Preview keeps shifting, production teams will gravitate to Plus and the open-source 35B-A3B for stability.

For workflow-by-workflow cost math against Western frontier models, our cheapest frontier LLM API analysis covers cross-vendor cost-per-task numbers including the DeepSeek V4-Pro post-cut shift that landed last week.