TokenMix Research Lab · 2026-04-24

Kimi K3 Preview: 4T Params, 1M Context, May 2026 Release Odds

Kimi K3 Preview: 4T Params, 1M Context, May 2026 Release Odds

Moonshot AI shipped Kimi K2.6 on April 20, 2026 — and the AI community read it as a harness built for the next drop. Kimi K3 is next, with Moonshot publicly teasing a 3-4 trillion parameter target, Kimi Linear attention, and a 1M-token context window. Prediction markets put the odds at 74% for a pre-May 2026 release, and the K2.5→K2.6 turnaround (days, not months) suggests K3 lands on the same compressed schedule. This preview separates what Moonshot has confirmed from community speculation, sizes K3 against GPT-5.5 and DeepSeek V4, and explains why the US labs should be watching. TokenMix.ai tracks live pricing and benchmarks across 300+ models including every Moonshot release from day one.

Table of Contents


Confirmed vs Speculation

Claim Status
Next-gen model named K3 Confirmed — Moonshot public tease
Target 3-4T total parameters Confirmed — founder commentary
Kimi Linear attention ships in K3 Likely — Reddit AMA December 2025
1M-token context window Confirmed — Moonshot roadmap
Continual learning across sessions Signaled — framed as agentic advantage
"10× better than K2.5" claim Speculation — co-host quote, not engineering spec
Pre-May 2026 release 74% market odds — Manifold Markets April 2026
Open-weight license Expected — Moonshot has shipped every model open-weight
Pricing under DeepSeek V4-Pro Likely — K2.6 at $0.60/$2.50 is already below
Native multimodal Unclear — K2.6 added this; K3 likely extends

Why K2.6 Is the K3 Harness

Read Moonshot's release cadence, not the press releases. K2.5 shipped as a 1T-parameter MoE aimed at agent swarms. K2.6 added long-horizon coding, 300 sub-agent orchestration, 4,000 coordinated steps, and caught up to Claude Opus 4.6 on agent benchmarks. That's a massive capability jump in three months.

The pattern matters: K2.6 isn't a terminal product. It's infrastructure. Moonshot built the serving stack, the agent swarm orchestrator, and the long-horizon execution framework so K3 has somewhere to land. When DeepSeek V4 dropped on April 24, 2026 at $0.30/$0.50 per MTok with 1M context and Apache 2.0 license, the pressure on Moonshot compressed further. Shipping K3 in days-not-months is the only competitive response.

Translation: K2.6's production release in April signals K3 arrives before Q3, most likely inside the May-June window the prediction markets are pricing.


The 3-4T Parameter Bet

Moonshot's founder has publicly framed K3 as Moonshot's answer to whether Chinese open-weight models can match US frontier parameter scale. Current points of comparison:

Model Total params Active params Release
Kimi K2.5 1T 32B 2025-11
Kimi K2.6 1T 32B 2026-04-20
DeepSeek V4-Pro ~671B ~37B 2026-04-24
GPT-5.5 Not disclosed (~2-3T estimated) N/A 2026-04-23
Claude Opus 4.7 Not disclosed N/A 2026-04-16
Kimi K3 (target) 3T-4T ~60-80B 2026-Q2 projected

A 3-4T parameter MoE puts K3 in direct parameter parity with OpenAI's rumored internal scale. The active-parameter ratio Moonshot has been running (3.2% for K2.x) suggests K3 at 4T would activate ~120-130B per token — roughly where DeepSeek V3.2 and GPT-5.4 sit. Training cost at that scale is ~$50-80M on current H200/H20 fleets; Moonshot's funding and Alibaba Cloud partnership make this reachable.

The real question isn't whether Moonshot can build a 4T MoE. It's whether the next training run gets diminishing returns or breaks through. K2.6 already matches Opus 4.6 on agent benchmarks at 1T. Scaling to 4T should reach Opus 4.7 / GPT-5.5 tier on reasoning and coding — unless the data wall everyone's been whispering about is real.


Kimi Linear Attention: What We Know

During a December 2025 Reddit AMA, Moonshot's research lead confirmed Kimi Linear attention will ship in K3. The architectural premise:

Linear attention isn't new — Mamba, RWKV, and Gated Linear Attention have shipped variants. What Moonshot is betting on is a hybrid design that keeps softmax attention on short-range dependencies (where it matters for reasoning) and switches to linear attention beyond a context threshold (where cost blows up).

The payoff if it works: K3 could serve 1M-context requests at the per-token cost of a 128K model. That's a structural cost advantage no US lab has shipped yet. Combined with open weights, it compresses the long-context market further than DeepSeek V4's $0.30 input already did.

The risk: Linear attention variants consistently lose 2-5% on retrieval benchmarks vs full softmax. Moonshot's published research on Kimi Linear claims parity, but the industry has learned to discount first-party long-context claims after Llama 4 Scout's 10M context collapsed to ~15% accuracy at 128K in third-party testing. Expect K3's true long-context behavior to need independent verification.


1M Context Window: Real or Marketing

Moonshot has confirmed K3 ships with 1M-token context, matching DeepSeek V4 and Gemini 2.5 Pro. The confirmation matters because — as we saw with Llama 4 Scout — claimed context and useful context are rarely the same number.

What Moonshot has going for it:

  1. K2.6 already ships 1M context in production with verified retrieval quality
  2. Kimi Linear attention is specifically designed for long-context inference economics
  3. Moonshot has shipped long-context models since K1 (128K was a 2024 differentiator)

What to watch:

Probable outcome: K3 ships legit 1M context for retrieval and summarization workloads, but production teams should stress-test multi-hop reasoning past 300K before betting agent pipelines on it.


Continual Learning: The Moonshot Differentiator

This is the underappreciated K3 claim. Moonshot has signaled K3 will "improve agency and allow the agents to work effectively for much longer durations" through continual learning.

Two interpretations:

Interpretation A (conservative): K3 ships with stronger in-context learning — agents get better at recovering from errors within a single 1M-context session. This is incremental.

Interpretation B (aggressive): K3 ships with online weight updates for agent memory — the model literally learns from agent execution traces without requiring a full fine-tune. This would be the first production deployment of persistent learning in a frontier model.

Industry betting: Interpretation A is almost certain, Interpretation B is 20% probability. Either way, Moonshot is signaling K3 is the first Kimi model whose agent performance compounds with usage rather than resetting per session.

If Interpretation B ships, every agent framework — LangGraph, CrewAI, OpenAI Agents SDK — becomes a wrapper around a Moonshot-specific capability that no OpenAI or Anthropic model matches. That's a serious moat bet.


K3 vs GPT-5.5 vs DeepSeek V4

Projected K3 positioning based on Moonshot's roadmap and K2.6 baseline:

Dimension Kimi K3 (projected) GPT-5.5 DeepSeek V4-Pro
Release 2026 Q2 (projected) 2026-04-23 2026-04-24
License Open-weight (expected) Closed Apache 2.0
Parameters 3-4T MoE Undisclosed 671B MoE
Context 1M 1M 1M
SWE-Bench Verified ~85% projected 88.7% ~85%
Input price/MTok ~$0.60-1.00 projected $5.00 .74
Output price/MTok ~$2.50-4.00 projected $30.00 $3.48
Agent swarm support Native (inherited K2.6) API-level API-level
Long-horizon coding Native Strong Strong
Continual learning Signaled No No

The bet Moonshot is placing: K3 matches GPT-5.5 on benchmarks at 1/10th the price with open weights and native agent swarm support. If it lands within 10% of GPT-5.5 on SWE-Bench Verified, it becomes the obvious default for any team building agents at scale — the same play DeepSeek V3 ran successfully against GPT-4o in 2024.

Where GPT-5.5 still wins (and will keep winning): Zero-shot reasoning edge cases, long-horizon planning without explicit agent frameworks, and the "just works" enterprise onboarding. K3 won't dislodge OpenAI from enterprise accounts that value predictability over cost.

Where DeepSeek V4 pressures K3: Both ship open-weight 1M context. DeepSeek V4's $0.30/$0.50 undercuts anything K3 will plausibly price at. If K3 launches above .00 input, expect DeepSeek to eat the price-sensitive end of the market and Moonshot to own the agent-orchestration premium.


Release Timing: Why Prediction Markets Favor May

Three signals line up:

  1. K2.6 shipped April 20, 2026 — finished harness infrastructure
  2. DeepSeek V4 shipped April 24, 2026 — competitive pressure forcing response
  3. Manifold prediction markets show 74% probability of K3 before May 2026 — crowd-sourced expectation synthesizing insider signals

Moonshot's historical pattern:

Model Announced Released Gap
Kimi K1 2024-10
Kimi K2 2025-06 2025-07 ~4 weeks
Kimi K2.5 2025-10 2025-11 ~2 weeks
Kimi K2.6 preview 2026-04-14 K2.6 final 2026-04-20 6 days

The trend is compression. If K3 preview drops any week now, the final release is 1-2 weeks behind it.

Highest-probability release window: May 10-31, 2026. That puts the announcement during Google I/O week or immediately after — maximum press attention cycle.


Pricing Prediction

Moonshot's K2.6 sits at $0.60 input / $2.50 output per MTok. K3 at 3-4× the parameters but with Kimi Linear attention economics should land at:

Projected K3 API pricing (per MTok):

Comparison anchors:

Model Input Output Per MTok total (even mix)
DeepSeek V4-Flash $0.14 $0.28 $0.21
Kimi K2.6 $0.60 $2.50 .55
Kimi K3 projected ~ .00 ~$3.50 ~$2.25
DeepSeek V4-Pro .74 $3.48 $2.61
GPT-5.5 $5.00 $30.00 7.50

Why this pricing bracket is probable: Moonshot needs to stay below DeepSeek V4-Pro to maintain open-weight appeal, but needs to price above K2.6 to reflect real capability gains. The /$3.50 bracket threads that needle.

If Moonshot surprises everyone with K2.6-tier pricing ($0.60 input), K3 becomes the open-weight default for agent production overnight.


Who Should Wait for K3

Your situation Wait for K3?
Building agent system for launch in 4-6 weeks Yes — ship on K2.6 now, migrate when K3 drops
Running production agents on GPT-5.4 at scale Hedge — start K3 evaluation plan, don't migrate yet
Cost-sensitive RAG, 1M context needed No — DeepSeek V4 ships today at $0.14/$0.28
Enterprise with compliance requirements No — Claude Opus 4.7 / GPT-5.5 already validated
Testing frontier reasoning ceiling Yes — K3 will likely redefine the open-weight ceiling
Agent swarm orchestration (100+ sub-agents) Either — K2.6 already supports 300 sub-agents
Fine-tuning own model on-prem Yes — K3 open weights likely

Pragmatic framework: If your launch timeline is >6 weeks out and you don't need 1M context right now, K2.6 is the hold-and-watch baseline. If you need production capacity this week, ship on K2.6 or DeepSeek V4 and treat K3 as a migration path, not a dependency. Teams running on closed-model APIs can monitor TokenMix.ai for live K3 availability — we'll list it the day it ships.


FAQ

When will Kimi K3 actually release?

Prediction markets assign 74% probability to a release before May 2026, with the highest-probability window being May 10-31, 2026. Moonshot has not confirmed an official date. K2.6's production release on April 20 is the strongest signal that K3's infrastructure is ready.

Will Kimi K3 be open-weight?

Moonshot has shipped every Kimi model so far as open-weight (K1 through K2.6). K3 is expected to follow the same licensing pattern, almost certainly Apache 2.0 or MIT. This is a core Moonshot differentiator vs OpenAI and Anthropic.

How does Kimi Linear attention differ from standard Transformer attention?

Kimi Linear replaces O(n²) softmax attention with a linear-complexity variant, similar to Mamba and RWKV but hybridized to retain softmax attention for short-range reasoning. The target is 2-3× throughput on 1M-context inference at equivalent hardware. Independent verification will be needed to confirm quality parity with standard attention.

Will Kimi K3 beat GPT-5.5?

On most benchmarks: probably no. On specific agent-orchestration and open-weight price-performance benchmarks: very likely yes. GPT-5.5 still holds the frontier ceiling on zero-shot reasoning (88.7% SWE-Bench Verified). K3's edge will be 8-10× cheaper API pricing with open weights while landing within 5-10% of GPT-5.5 capability. That's the exact value proposition that made DeepSeek V3 a runaway success.

How much will Kimi K3 cost per MTok?

Not announced. Projected range based on K2.6 baseline and 3-4T parameter scale-up: $0.80-1.20 input / $3.00-4.50 output per MTok, with cache hit at $0.16-0.25. This would keep K3 priced below DeepSeek V4-Pro ( .74/$3.48) and far below GPT-5.5 ($5.00/$30.00). TokenMix.ai will publish live pricing the moment the API goes live.

Should I migrate my K2.6 agents to K3 on day one?

No. Wait 2-4 weeks after release for third-party benchmark verification, especially on long-context retention past 500K tokens. K2.6 is stable and production-grade; premature migration risks regressions. The real migration trigger is when K3's independent agent-benchmark results beat K2.6 by >15% on your specific workload class.

What happens to Kimi K2.6 after K3 releases?

K2.6 will remain supported as a cheaper tier, similar to how DeepSeek kept V3.2 available at $0.14/$0.28 alongside V4-Pro. Expect K2.6 pricing to drop 20-40% when K3 lands, making it an even more attractive option for routine agent workloads.

Is Kimi K3 available on TokenMix.ai at launch?

Yes — TokenMix.ai aggregates 300+ models and adds new Moonshot releases the same day they ship production APIs. Kimi K2.6 was available on the platform within 24 hours of official release. K3 will follow the same pattern. One API key gets you K3 plus GPT-5.5, Claude Opus 4.7, DeepSeek V4, and every other frontier model with unified billing in RMB, USD, Alipay, or WeChat.


By TokenMix Research Lab · Updated 2026-04-24

Sources: Moonshot AI official, Kimi K2.6 release coverage — MarkTechPost, Manifold Markets K3 release odds, SiliconANGLE K2.6 coverage, Latent Space - Kimi K2.6 analysis, TokenMix.ai live model tracker