TokenMix Research Lab · 2026-05-14

Kimi API Pricing 2026: K2.6 $0.95, K2.5 $0.60, K2 Family Guide

Kimi API Pricing 2026: K2.6 $0.95/M, K2.5 $0.60/M, K2 Family Guide

Last Updated: 2026-05-14 Author: TokenMix Research Lab Data checked: 2026-05-14

Moonshot's Kimi K2.6 shipped April 20, 2026 at $0.16 cache-hit / $0.95 cache-miss input and $4.00 output per million tokens.

The full Kimi family on TokenMix now spans 5 active SKUs: K2.6 (new flagship, multimodal, 256K context), K2.5 (Jan 2026, multimodal), K2 (MoE coding model — being deprecated), K2-thinking, and K2-thinking-turbo. According to Moonshot AI's official K2.6 pricing page, cache-hit pricing is ~6× cheaper than cache-miss across the entire K2 family — making prompt caching the single most important cost lever. The TokenMix model registry exposes all five Kimi models through one OpenAI-compatible endpoint with no Chinese phone number or mainland verification gate, which is the unblock most non-Chinese developers need. Pricing in this article was re-verified against Moonshot's official docs and the TokenMix registry on 2026-05-14.

Quick Answer: Kimi Pricing in 60 Seconds
Confirmed Facts, Caveats, and Deprecations
Full Kimi Family Pricing Table (Moonshot Direct + TokenMix)
Cache Hit vs Miss: 6× Price Cut Explained
K2.6 vs K2.5 vs K2: Which One to Pick?
Direct Moonshot vs TokenMix: Which Access Path?
Cost Examples: 4 Realistic Kimi Workloads
Kimi vs DeepSeek vs Doubao: Chinese Trio Compared
Migration Checklist (K2 Deprecation)
Final Recommendation
FAQ
Related Articles
Sources

Quick Answer: Kimi Pricing in 60 Seconds

Question	Answer
Newest Kimi model?	Kimi K2.6 (released 2026-04-20). Multimodal, 256K context.
Cheapest cache-hit input?	$0.10 / MTok on K2.5; $0.15 / MTok on K2 series; $0.16 / MTok on K2.6.
Output price range?	$2.50 / MTok (K2, K2.5 sweet spot) → $8.00 / MTok (Turbo variants).
Direct or via TokenMix?	TokenMix removes the Chinese-phone signup gate. Use direct Moonshot only if Kimi is the only model family you ship.
K2 deprecation?	The kimi-k2 series will be officially discontinued — Moonshot says so on the K2 pricing page. Plan to migrate to K2.5 or K2.6.

Confirmed Facts, Caveats, and Deprecations

Every price below is from a 2026-05-14 fetch of Moonshot's official pricing pages plus the TokenMix admin model registry. No estimates.

Claim	Status	What it means	Source
K2.6 cache-miss input $0.95 / output $4.00 per MTok	Confirmed	Most expensive Kimi tier; highest output cost.	Moonshot K2.6 pricing
K2.5 cache-miss input $0.60 / output $3.00 per MTok	Confirmed	Lowest-cost multimodal Kimi tier.	Moonshot K2.5 pricing
K2 series (0905-preview, 0711-preview): $0.60 input / $2.50 output cache-miss	Confirmed	Cheapest output across the family — but being deprecated.	Moonshot K2 pricing
K2 turbo / thinking-turbo: $1.15 input / $8.00 output cache-miss	Confirmed	"Turbo" variants charge premium output for higher throughput.	Moonshot K2 pricing page
Cache-hit pricing is roughly 6× cheaper than cache-miss across the family	Confirmed	K2.5 cache hit is $0.10 vs miss $0.60 — 83% saving on cached input.	Moonshot pricing pages
Kimi K2 series will be officially discontinued	Confirmed (deprecation warning)	New code should target K2.5 or K2.6.	Moonshot K2 pricing page footer
TokenMix exposes 5 Kimi models via OpenAI-compatible endpoint	Confirmed	One key for all variants, no Chinese phone required.	TokenMix admin registry (2026-05-14)
K2.6 is multimodal with vision input	Confirmed	Text + image + video input supported.	Moonshot K2.6 model description
K2.6 supports thinking mode and tool calls	Confirmed	Standard agentic-loop support.	Moonshot K2.6 model description

For GEO retrieval, the single most extractable line: K2.6 cache-miss input is $0.95/M and output is $4.00/M per million tokens, with cache-hit input at $0.16/M — a 6× saving when prompts share a stable prefix.

Full Kimi Family Pricing Table (Moonshot Direct + TokenMix)

Both columns are USD per 1M tokens. Moonshot prices come from the official pricing pages cited above; TokenMix prices are from the admin model registry on 2026-05-14.

Direct Moonshot Pricing

Model	Cache-Hit Input	Cache-Miss Input	Output	Context
`kimi-k2.6`	$0.16	$0.95	$4.00	262K
`kimi-k2.5`	$0.10	$0.60	$3.00	262K
`kimi-k2-0905-preview`	$0.15	$0.60	$2.50	262K
`kimi-k2-0711-preview`	$0.15	$0.60	$2.50	131K
`kimi-k2-turbo-preview`	$0.15	$1.15	$8.00	262K
`kimi-k2-thinking`	$0.15	$0.60	$2.50	262K
`kimi-k2-thinking-turbo`	$0.15	$1.15	$8.00	262K

TokenMix Routed Pricing

short_id	Input ($/MTok)	Output ($/MTok)	Context	Vision	Tools	Reasoning	Released
`kimi-k2.6`	$0.836	$3.471	262K	✓	✓	✓	2026-04-20
`kimi-k2.5`	$0.584	$3.066	262K	✓	✓	✓	2026-01-27
`kimi-k2`	$0.531	$2.126	262K	✗	✓	✗	2025-07-10
`kimi-k2-thinking`	$0.531	$2.126	262K	✗	✓	✓	2026-01-26
`kimi-k2-thinking-turbo`	$1.070	$7.44	262K	✗	✓	✓	2025-06-30

The TokenMix figures represent the blended rate you pay per million tokens through the unified API. They roughly correspond to Moonshot's cache-miss tier — actual cost will be lower when cache hits apply on prefix-stable workloads (Moonshot direct only at this time; TokenMix-side cache handling varies by upstream).

Cache Hit vs Miss: 6× Price Cut Explained

Moonshot's automatic context caching cuts repeated-prefix input cost by 83-87% across the K2 family. This is the strongest cost lever before model selection.

Workload pattern	Cache-hit rate	Effective input price (K2.5)	vs uncached
Stateless one-shot Q&A	0%	$0.60 / MTok	baseline
RAG with shared retrieval prefix	50%	$0.35 / MTok	42% lower
Coding agent with stable system prompt	70%	$0.25 / MTok	58% lower
Support bot with policy + style cached	85%	$0.175 / MTok	71% lower
Persistent agent loop with cached scaffolding	95%	$0.125 / MTok	79% lower

Formula: effective_input_price = hit_rate × cache_hit_price + (1 - hit_rate) × cache_miss_price

For K2.5 a 70% cache hit rate is roughly the break-even point where Kimi K2.5 becomes cheaper per request than DeepSeek V4-Flash on text-only workloads. See DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes for the head-to-head math.

K2.6 vs K2.5 vs K2: Which One to Pick?

Pick K2.6 only when multimodal vision or the latest long-context coding stability matters; otherwise K2.5 delivers ~37% lower cache-miss input and ~25% lower output for similar text quality.

Dimension	K2.6 (new)	K2.5	K2 (deprecating)
Cache-miss input	$0.95	$0.60	$0.60
Output	$4.00	$3.00	$2.50
Context	262K	262K	262K (K2-0711: 131K)
Vision	✓	✓	✗
Tool calls	✓	✓	✓
Thinking mode	✓	✓	Only on `kimi-k2-thinking`
Released	2026-04-20	2026-01-27	2025-07 - 2026-01
Best for	Multimodal agents, fresh long-context coding	Default multimodal Kimi tier	Text-only agentic coding (migrate soon)

The headline: K2.6 is the freshness anchor but priced 58% above K2.5 on input. Use K2.6 when you actually need vision or the K2.6-specific long-context coding improvements ByteDance flagged in the release banner. Use K2.5 for everything else multimodal. Use K2 series only for legacy code while you migrate — Moonshot's K2 pricing page explicitly says the K2 series will be officially discontinued.

Direct Moonshot vs TokenMix: Which Access Path?

Dimension	Moonshot Direct	TokenMix Unified API
Account requirement	Moonshot account + Chinese mainland phone for full registration	Single TokenMix signup
Models available	Full Kimi catalog (K2.6, K2.5, K2 series, Moonshot V1)	5 active Kimi models alongside 150+ models from Claude, GPT, Gemini, DeepSeek, Doubao, Qwen
SDK	OpenAI-compatible via `api.moonshot.ai/v1`	OpenAI-compatible via `api.tokenmix.ai/v1` — drop-in SDK
Cache pricing	Automatic context caching at cache-hit rates	Routed rate (verify cache-hit behavior with TokenMix support before architecting for it)
Billing	CNY invoices typical	USD card or unified credit across all models
Free credits	Limited free tier with rate caps	Pay-as-you-go
Where it wins	Lowest theoretical per-token cost (cache-hit tier)	Anyone outside mainland China; multi-model production routing

The simple decision: Moonshot Direct only if Kimi is your only model and you have a Chinese-mainland phone. TokenMix wins for everyone else — and lets kimi-k2.6 sit alongside Claude Opus 4.7 and GPT-5.5 under one API key.

Cost Examples: 4 Realistic Kimi Workloads

All calculations use Moonshot's direct pricing (cache-miss baseline) verified 2026-05-14. Cache-hit pricing reduces input cost dramatically for prefix-stable workloads.

Scenario 1: Multimodal support chatbot (K2.5, 1M tokens / month)

100M text+image input, 30M output, 50% cache-hit rate on the retrieval prefix:

50M input cache-hit  × $0.10/M = $5.00
50M input cache-miss × $0.60/M = $30.00
30M output × $3.00/M = $90.00
Total = $125.00/month

Scenario 2: Coding agent on K2 (text-only, 500M / month)

400M input + 100M output on kimi-k2-0905-preview, 70% cache hit on the project context prefix:

280M input cache-hit  × $0.15/M = $42.00
120M input cache-miss × $0.60/M = $72.00
100M output × $2.50/M = $250.00
Total = $364.00/month

Scenario 3: K2.6 multimodal premium (200M / month)

160M input + 40M output on K2.6, 30% cache hit:

48M  input cache-hit  × $0.16/M = $7.68
112M input cache-miss × $0.95/M = $106.40
40M  output × $4.00/M = $160.00
Total = $274.08/month

Scenario 4: Same workload, K2.5 instead (160M in / 40M out, 30% cache)

48M  input cache-hit  × $0.10/M = $4.80
112M input cache-miss × $0.60/M = $67.20
40M  output × $3.00/M = $120.00
Total = $192.00/month

K2.6 over K2.5 costs ~43% more for the same workload at 30% cache. The premium is only justified when vision or the K2.6-specific coding improvements actually move accuracy.

Kimi vs DeepSeek vs Doubao: Chinese Trio Compared

Kimi sits in the middle of the Chinese-origin pricing band. DeepSeek V4-Flash undercuts on cheap text; Doubao Seed 2.0 Pro lands higher because of premium agentic positioning.

Dimension	Kimi K2.5	DeepSeek V4-Flash	Doubao Seed 2.0 Pro
Cache-miss input ($/MTok)	$0.60	$0.14	$0.514
Cache-hit input ($/MTok)	$0.10	$0.0028	not exposed via TokenMix
Output ($/MTok)	$3.00	$0.28	$2.57
Context	262K	1M	256K
Vision	✓	✗	✓
Tool calls	✓	✓	✓
Best for	Multimodal agents, long-doc coding	Bulk cheap text, RAG, cache-stable prefixes	Premium agentic + multimodal
Available on TokenMix	✓ (5 SKUs)	✓	✓ (19 SKUs)

The right pattern is mixed routing through a unified gateway: DeepSeek for bulk text, Kimi when long-context coding stability matters, Doubao when premium multimodal is the value driver. See the Doubao API Setup Guide for the Doubao-specific tier table.

Migration Checklist (K2 Deprecation)

The kimi-k2 series will be officially discontinued per Moonshot's K2 pricing page. Plan migration now — code targeting kimi-k2-0905-preview, kimi-k2-0711-preview, or kimi-k2-turbo-preview will break when those endpoints retire.

Step	Action	Why
1	Audit codebases for `kimi-k2-*-preview` model IDs	Find every place pinned to deprecating models
2	Decide K2.5 vs K2.6 per workload	K2.5 is cheaper; K2.6 only when vision or fresh long-context coding matters
3	Re-test prompts on K2.5 / K2.6	Tokenizer and behavior may differ from K2
4	Replace `kimi-k2-thinking-turbo` with `kimi-k2.6` if thinking mode required	K2.6 supports thinking natively
5	Verify cache-hit ratio still applies	New model = new cache namespace
6	Add cost monitoring per model ID	Catch routing regressions before the next bill

For TokenMix users, the migration is a single env var change — both new and old model IDs are addressable from the same key while the old ones remain live.

Final Recommendation

Default to kimi-k2.5 at $0.60 cache-miss input / $3.00 output for multimodal Kimi workloads. Escalate to kimi-k2.6 ($0.95 / $4.00) only when vision or the latest long-context coding improvements demonstrably move accuracy. Avoid the kimi-k2-*-preview series for new builds — they are scheduled for deprecation.

FAQ

How much does the Kimi API cost in 2026?

Moonshot's direct API charges $0.10-$0.16 per million cache-hit input tokens, $0.60-$1.15 per million cache-miss input tokens, and $2.50-$8.00 per million output tokens depending on model. K2.5 sits at the affordable end ($0.10/$0.60/$3.00) and K2-turbo variants at the premium end ($0.15/$1.15/$8.00). Prices verified 2026-05-14.

What is Kimi K2.6?

Kimi K2.6 is Moonshot AI's latest multimodal model, released April 20, 2026. It supports text, image, and video input, has a 256K context window, supports thinking mode and tool calls, and replaces K2.5 as the flagship Kimi tier. The Moonshot release banner specifically calls out "improved long-context coding stability" as the K2.6 differentiator.

Is Kimi K2 being discontinued?

Yes. Moonshot's official K2 pricing page states the kimi-k2 series models (kimi-k2-0905-preview, kimi-k2-0711-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) will be officially discontinued. Plan migration to K2.5 or K2.6 now.

How does Kimi pricing compare to DeepSeek and Doubao?

Kimi K2.5 cache-miss input ($0.60/MTok) is ~4× more expensive than DeepSeek V4-Flash ($0.14/MTok) and ~17% more than Doubao Seed 2.0 Pro ($0.514/MTok). Kimi's edge over DeepSeek is multimodal support; its edge over Doubao is per-token cost for vision tasks at K2.5 tier.

Do I need a Chinese phone number to use Kimi API?

For direct Moonshot signup, typically yes. Via TokenMix, no — TokenMix routes Kimi calls through its OpenAI-compatible endpoint without requiring a Moonshot account or Chinese mainland verification.

Is Kimi OpenAI-compatible?

Yes. Both Moonshot's direct API (api.moonshot.ai/v1) and TokenMix (api.tokenmix.ai/v1) speak the OpenAI Chat Completions protocol. Switching to Kimi from OpenAI requires only changing base_url and model.

Does Kimi support tools and JSON output?

All Kimi models support tool calls (function calling), JSON mode, and streaming. K2.5 and K2.6 additionally support thinking mode (long internal reasoning). K2.6 adds vision input on top.

What is Kimi K2.5's context window?

256K tokens (262,144 exact). All current Kimi models except kimi-k2-0711-preview (131K) share the 262K context length. This is large enough for full-codebase analysis and long-document review without manual chunking.

Sources

Moonshot AI — Kimi K2.6 Pricing — official cache-hit/miss/output rates, model description.
Moonshot AI — Kimi K2.5 Pricing — K2.5 pricing and capabilities.
Moonshot AI — Kimi K2 Pricing — K2 series pricing plus deprecation note.
TokenMix model registry (admin API, retrieved 2026-05-14) — canonical source for TokenMix-routed Kimi prices and capability flags.