TokenMix Research Lab · 2026-04-24

Kimi K3 Preview: 4T Params, 1M Context, May 2026 Release Odds
Moonshot AI shipped Kimi K2.6 on April 20, 2026 — and the AI community read it as a harness built for the next drop. Kimi K3 is next, with Moonshot publicly teasing a 3-4 trillion parameter target, Kimi Linear attention, and a 1M-token context window. Prediction markets put the odds at 74% for a pre-May 2026 release, and the K2.5→K2.6 turnaround (days, not months) suggests K3 lands on the same compressed schedule. This preview separates what Moonshot has confirmed from community speculation, sizes K3 against GPT-5.5 and DeepSeek V4, and explains why the US labs should be watching. TokenMix.ai tracks live pricing and benchmarks across 300+ models including every Moonshot release from day one.
Table of Contents
- Confirmed vs Speculation
- Why K2.6 Is the K3 Harness
- The 3-4T Parameter Bet
- Kimi Linear Attention: What We Know
- 1M Context Window: Real or Marketing
- Continual Learning: The Moonshot Differentiator
- K3 vs GPT-5.5 vs DeepSeek V4
- Release Timing: Why Prediction Markets Favor May
- Pricing Prediction
- Who Should Wait for K3
- FAQ
Confirmed vs Speculation
| Claim | Status |
|---|---|
| Next-gen model named K3 | Confirmed — Moonshot public tease |
| Target 3-4T total parameters | Confirmed — founder commentary |
| Kimi Linear attention ships in K3 | Likely — Reddit AMA December 2025 |
| 1M-token context window | Confirmed — Moonshot roadmap |
| Continual learning across sessions | Signaled — framed as agentic advantage |
| "10× better than K2.5" claim | Speculation — co-host quote, not engineering spec |
| Pre-May 2026 release | 74% market odds — Manifold Markets April 2026 |
| Open-weight license | Expected — Moonshot has shipped every model open-weight |
| Pricing under DeepSeek V4-Pro | Likely — K2.6 at $0.60/$2.50 is already below |
| Native multimodal | Unclear — K2.6 added this; K3 likely extends |
Why K2.6 Is the K3 Harness
Read Moonshot's release cadence, not the press releases. K2.5 shipped as a 1T-parameter MoE aimed at agent swarms. K2.6 added long-horizon coding, 300 sub-agent orchestration, 4,000 coordinated steps, and caught up to Claude Opus 4.6 on agent benchmarks. That's a massive capability jump in three months.
The pattern matters: K2.6 isn't a terminal product. It's infrastructure. Moonshot built the serving stack, the agent swarm orchestrator, and the long-horizon execution framework so K3 has somewhere to land. When DeepSeek V4 dropped on April 24, 2026 at $0.30/$0.50 per MTok with 1M context and Apache 2.0 license, the pressure on Moonshot compressed further. Shipping K3 in days-not-months is the only competitive response.
Translation: K2.6's production release in April signals K3 arrives before Q3, most likely inside the May-June window the prediction markets are pricing.
The 3-4T Parameter Bet
Moonshot's founder has publicly framed K3 as Moonshot's answer to whether Chinese open-weight models can match US frontier parameter scale. Current points of comparison:
| Model | Total params | Active params | Release |
|---|---|---|---|
| Kimi K2.5 | 1T | 32B | 2025-11 |
| Kimi K2.6 | 1T | 32B | 2026-04-20 |
| DeepSeek V4-Pro | ~671B | ~37B | 2026-04-24 |
| GPT-5.5 | Not disclosed (~2-3T estimated) | N/A | 2026-04-23 |
| Claude Opus 4.7 | Not disclosed | N/A | 2026-04-16 |
| Kimi K3 (target) | 3T-4T | ~60-80B | 2026-Q2 projected |
A 3-4T parameter MoE puts K3 in direct parameter parity with OpenAI's rumored internal scale. The active-parameter ratio Moonshot has been running (3.2% for K2.x) suggests K3 at 4T would activate ~120-130B per token — roughly where DeepSeek V3.2 and GPT-5.4 sit. Training cost at that scale is ~$50-80M on current H200/H20 fleets; Moonshot's funding and Alibaba Cloud partnership make this reachable.
The real question isn't whether Moonshot can build a 4T MoE. It's whether the next training run gets diminishing returns or breaks through. K2.6 already matches Opus 4.6 on agent benchmarks at 1T. Scaling to 4T should reach Opus 4.7 / GPT-5.5 tier on reasoning and coding — unless the data wall everyone's been whispering about is real.
Kimi Linear Attention: What We Know
During a December 2025 Reddit AMA, Moonshot's research lead confirmed Kimi Linear attention will ship in K3. The architectural premise:
- Replace the O(n²) softmax attention in the standard Transformer with a linear-complexity variant
- Maintain the multi-head structure and retrieval quality for up to 1M tokens
- Target 2-3× throughput on long-context inference at equivalent hardware
Linear attention isn't new — Mamba, RWKV, and Gated Linear Attention have shipped variants. What Moonshot is betting on is a hybrid design that keeps softmax attention on short-range dependencies (where it matters for reasoning) and switches to linear attention beyond a context threshold (where cost blows up).
The payoff if it works: K3 could serve 1M-context requests at the per-token cost of a 128K model. That's a structural cost advantage no US lab has shipped yet. Combined with open weights, it compresses the long-context market further than DeepSeek V4's $0.30 input already did.
The risk: Linear attention variants consistently lose 2-5% on retrieval benchmarks vs full softmax. Moonshot's published research on Kimi Linear claims parity, but the industry has learned to discount first-party long-context claims after Llama 4 Scout's 10M context collapsed to ~15% accuracy at 128K in third-party testing. Expect K3's true long-context behavior to need independent verification.
1M Context Window: Real or Marketing
Moonshot has confirmed K3 ships with 1M-token context, matching DeepSeek V4 and Gemini 2.5 Pro. The confirmation matters because — as we saw with Llama 4 Scout — claimed context and useful context are rarely the same number.
What Moonshot has going for it:
- K2.6 already ships 1M context in production with verified retrieval quality
- Kimi Linear attention is specifically designed for long-context inference economics
- Moonshot has shipped long-context models since K1 (128K was a 2024 differentiator)
What to watch:
- Retrieval vs synthesis gap. K3 will almost certainly ace needle-in-haystack at 1M. The hard question is whether it holds multi-hop reasoning quality past 500K. That's the benchmark cluster where Scout failed.
- Price at 1M vs at 128K. DeepSeek V4 charges flat rate across the context window. If K3 charges premium for >128K, it signals the long-context economics aren't solved.
Probable outcome: K3 ships legit 1M context for retrieval and summarization workloads, but production teams should stress-test multi-hop reasoning past 300K before betting agent pipelines on it.
Continual Learning: The Moonshot Differentiator
This is the underappreciated K3 claim. Moonshot has signaled K3 will "improve agency and allow the agents to work effectively for much longer durations" through continual learning.
Two interpretations:
Interpretation A (conservative): K3 ships with stronger in-context learning — agents get better at recovering from errors within a single 1M-context session. This is incremental.
Interpretation B (aggressive): K3 ships with online weight updates for agent memory — the model literally learns from agent execution traces without requiring a full fine-tune. This would be the first production deployment of persistent learning in a frontier model.
Industry betting: Interpretation A is almost certain, Interpretation B is 20% probability. Either way, Moonshot is signaling K3 is the first Kimi model whose agent performance compounds with usage rather than resetting per session.
If Interpretation B ships, every agent framework — LangGraph, CrewAI, OpenAI Agents SDK — becomes a wrapper around a Moonshot-specific capability that no OpenAI or Anthropic model matches. That's a serious moat bet.
K3 vs GPT-5.5 vs DeepSeek V4
Projected K3 positioning based on Moonshot's roadmap and K2.6 baseline:
| Dimension | Kimi K3 (projected) | GPT-5.5 | DeepSeek V4-Pro |
|---|---|---|---|
| Release | 2026 Q2 (projected) | 2026-04-23 | 2026-04-24 |
| License | Open-weight (expected) | Closed | Apache 2.0 |
| Parameters | 3-4T MoE | Undisclosed | 671B MoE |
| Context | 1M | 1M | 1M |
| SWE-Bench Verified | ~85% projected | 88.7% | ~85% |
| Input price/MTok | ~$0.60-1.00 projected | $5.00 |