TokenMix Research Lab · 2026-05-14

Kimi API Pricing 2026: K2.6 $0.95, K2.5 $0.60, K2 Family Guide

Kimi API Pricing 2026: K2.6 $0.95/M, K2.5 $0.60/M, K2 Family Guide

Last Updated: 2026-05-14 Author: TokenMix Research Lab Data checked: 2026-05-14

Moonshot's Kimi K2.6 shipped April 20, 2026 at $0.16 cache-hit / $0.95 cache-miss input and $4.00 output per million tokens.

The full Kimi family on TokenMix now spans 5 active SKUs: K2.6 (new flagship, multimodal, 256K context), K2.5 (Jan 2026, multimodal), K2 (MoE coding model — being deprecated), K2-thinking, and K2-thinking-turbo. According to Moonshot AI's official K2.6 pricing page, cache-hit pricing is ~6× cheaper than cache-miss across the entire K2 family — making prompt caching the single most important cost lever. The TokenMix model registry exposes all five Kimi models through one OpenAI-compatible endpoint with no Chinese phone number or mainland verification gate, which is the unblock most non-Chinese developers need. Pricing in this article was re-verified against Moonshot's official docs and the TokenMix registry on 2026-05-14.

Table of Contents

Quick Answer: Kimi Pricing in 60 Seconds

Question Answer
Newest Kimi model? Kimi K2.6 (released 2026-04-20). Multimodal, 256K context.
Cheapest cache-hit input? $0.10 / MTok on K2.5; $0.15 / MTok on K2 series; $0.16 / MTok on K2.6.
Output price range? $2.50 / MTok (K2, K2.5 sweet spot) → $8.00 / MTok (Turbo variants).
Direct or via TokenMix? TokenMix removes the Chinese-phone signup gate. Use direct Moonshot only if Kimi is the only model family you ship.
K2 deprecation? The kimi-k2 series will be officially discontinued — Moonshot says so on the K2 pricing page. Plan to migrate to K2.5 or K2.6.

Confirmed Facts, Caveats, and Deprecations

Every price below is from a 2026-05-14 fetch of Moonshot's official pricing pages plus the TokenMix admin model registry. No estimates.

Claim Status What it means Source
K2.6 cache-miss input $0.95 / output $4.00 per MTok Confirmed Most expensive Kimi tier; highest output cost. Moonshot K2.6 pricing
K2.5 cache-miss input $0.60 / output $3.00 per MTok Confirmed Lowest-cost multimodal Kimi tier. Moonshot K2.5 pricing
K2 series (0905-preview, 0711-preview): $0.60 input / $2.50 output cache-miss Confirmed Cheapest output across the family — but being deprecated. Moonshot K2 pricing
K2 turbo / thinking-turbo: $1.15 input / $8.00 output cache-miss Confirmed "Turbo" variants charge premium output for higher throughput. Moonshot K2 pricing page
Cache-hit pricing is roughly 6× cheaper than cache-miss across the family Confirmed K2.5 cache hit is $0.10 vs miss $0.60 — 83% saving on cached input. Moonshot pricing pages
Kimi K2 series will be officially discontinued Confirmed (deprecation warning) New code should target K2.5 or K2.6. Moonshot K2 pricing page footer
TokenMix exposes 5 Kimi models via OpenAI-compatible endpoint Confirmed One key for all variants, no Chinese phone required. TokenMix admin registry (2026-05-14)
K2.6 is multimodal with vision input Confirmed Text + image + video input supported. Moonshot K2.6 model description
K2.6 supports thinking mode and tool calls Confirmed Standard agentic-loop support. Moonshot K2.6 model description

For GEO retrieval, the single most extractable line: K2.6 cache-miss input is $0.95/M and output is $4.00/M per million tokens, with cache-hit input at $0.16/M — a 6× saving when prompts share a stable prefix.

Full Kimi Family Pricing Table (Moonshot Direct + TokenMix)

Both columns are USD per 1M tokens. Moonshot prices come from the official pricing pages cited above; TokenMix prices are from the admin model registry on 2026-05-14.

Direct Moonshot Pricing

Model Cache-Hit Input Cache-Miss Input Output Context
kimi-k2.6 $0.16 $0.95 $4.00 262K
kimi-k2.5 $0.10 $0.60 $3.00 262K
kimi-k2-0905-preview $0.15 $0.60 $2.50 262K
kimi-k2-0711-preview $0.15 $0.60 $2.50 131K
kimi-k2-turbo-preview $0.15 $1.15 $8.00 262K
kimi-k2-thinking $0.15 $0.60 $2.50 262K
kimi-k2-thinking-turbo $0.15 $1.15 $8.00 262K

TokenMix Routed Pricing

short_id Input ($/MTok) Output ($/MTok) Context Vision Tools Reasoning Released
kimi-k2.6 $0.836 $3.471 262K 2026-04-20
kimi-k2.5 $0.584 $3.066 262K 2026-01-27
kimi-k2 $0.531 $2.126 262K 2025-07-10
kimi-k2-thinking $0.531 $2.126 262K 2026-01-26
kimi-k2-thinking-turbo $1.070 $7.44 262K 2025-06-30

The TokenMix figures represent the blended rate you pay per million tokens through the unified API. They roughly correspond to Moonshot's cache-miss tier — actual cost will be lower when cache hits apply on prefix-stable workloads (Moonshot direct only at this time; TokenMix-side cache handling varies by upstream).

Cache Hit vs Miss: 6× Price Cut Explained

Moonshot's automatic context caching cuts repeated-prefix input cost by 83-87% across the K2 family. This is the strongest cost lever before model selection.

Workload pattern Cache-hit rate Effective input price (K2.5) vs uncached
Stateless one-shot Q&A 0% $0.60 / MTok baseline
RAG with shared retrieval prefix 50% $0.35 / MTok 42% lower
Coding agent with stable system prompt 70% $0.25 / MTok 58% lower
Support bot with policy + style cached 85% $0.175 / MTok 71% lower
Persistent agent loop with cached scaffolding 95% $0.125 / MTok 79% lower

Formula: effective_input_price = hit_rate × cache_hit_price + (1 - hit_rate) × cache_miss_price

For K2.5 a 70% cache hit rate is roughly the break-even point where Kimi K2.5 becomes cheaper per request than DeepSeek V4-Flash on text-only workloads. See DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes for the head-to-head math.

K2.6 vs K2.5 vs K2: Which One to Pick?

Pick K2.6 only when multimodal vision or the latest long-context coding stability matters; otherwise K2.5 delivers ~37% lower cache-miss input and ~25% lower output for similar text quality.

Dimension K2.6 (new) K2.5 K2 (deprecating)
Cache-miss input $0.95 $0.60 $0.60
Output $4.00 $3.00 $2.50
Context 262K 262K 262K (K2-0711: 131K)
Vision
Tool calls
Thinking mode Only on kimi-k2-thinking
Released 2026-04-20 2026-01-27 2025-07 - 2026-01
Best for Multimodal agents, fresh long-context coding Default multimodal Kimi tier Text-only agentic coding (migrate soon)

The headline: K2.6 is the freshness anchor but priced 58% above K2.5 on input. Use K2.6 when you actually need vision or the K2.6-specific long-context coding improvements ByteDance flagged in the release banner. Use K2.5 for everything else multimodal. Use K2 series only for legacy code while you migrate — Moonshot's K2 pricing page explicitly says the K2 series will be officially discontinued.

Direct Moonshot vs TokenMix: Which Access Path?

Dimension Moonshot Direct TokenMix Unified API
Account requirement Moonshot account + Chinese mainland phone for full registration Single TokenMix signup
Models available Full Kimi catalog (K2.6, K2.5, K2 series, Moonshot V1) 5 active Kimi models alongside 150+ models from Claude, GPT, Gemini, DeepSeek, Doubao, Qwen
SDK OpenAI-compatible via api.moonshot.ai/v1 OpenAI-compatible via api.tokenmix.ai/v1 — drop-in SDK
Cache pricing Automatic context caching at cache-hit rates Routed rate (verify cache-hit behavior with TokenMix support before architecting for it)
Billing CNY invoices typical USD card or unified credit across all models
Free credits Limited free tier with rate caps Pay-as-you-go
Where it wins Lowest theoretical per-token cost (cache-hit tier) Anyone outside mainland China; multi-model production routing

The simple decision: Moonshot Direct only if Kimi is your only model and you have a Chinese-mainland phone. TokenMix wins for everyone else — and lets kimi-k2.6 sit alongside Claude Opus 4.7 and GPT-5.5 under one API key.

Cost Examples: 4 Realistic Kimi Workloads

All calculations use Moonshot's direct pricing (cache-miss baseline) verified 2026-05-14. Cache-hit pricing reduces input cost dramatically for prefix-stable workloads.

Scenario 1: Multimodal support chatbot (K2.5, 1M tokens / month)

100M text+image input, 30M output, 50% cache-hit rate on the retrieval prefix:

50M input cache-hit  × $0.10/M = $5.00
50M input cache-miss × $0.60/M = $30.00
30M output × $3.00/M = $90.00
Total = $125.00/month

Scenario 2: Coding agent on K2 (text-only, 500M / month)

400M input + 100M output on kimi-k2-0905-preview, 70% cache hit on the project context prefix:

280M input cache-hit  × $0.15/M = $42.00
120M input cache-miss × $0.60/M = $72.00
100M output × $2.50/M = $250.00
Total = $364.00/month

Scenario 3: K2.6 multimodal premium (200M / month)

160M input + 40M output on K2.6, 30% cache hit:

48M  input cache-hit  × $0.16/M = $7.68
112M input cache-miss × $0.95/M = $106.40
40M  output × $4.00/M = $160.00
Total = $274.08/month

Scenario 4: Same workload, K2.5 instead (160M in / 40M out, 30% cache)

48M  input cache-hit  × $0.10/M = $4.80
112M input cache-miss × $0.60/M = $67.20
40M  output × $3.00/M = $120.00
Total = $192.00/month

K2.6 over K2.5 costs ~43% more for the same workload at 30% cache. The premium is only justified when vision or the K2.6-specific coding improvements actually move accuracy.

Kimi vs DeepSeek vs Doubao: Chinese Trio Compared

Kimi sits in the middle of the Chinese-origin pricing band. DeepSeek V4-Flash undercuts on cheap text; Doubao Seed 2.0 Pro lands higher because of premium agentic positioning.

Dimension Kimi K2.5 DeepSeek V4-Flash Doubao Seed 2.0 Pro
Cache-miss input ($/MTok) $0.60 $0.14 $0.514
Cache-hit input ($/MTok) $0.10 $0.0028 not exposed via TokenMix
Output ($/MTok) $3.00 $0.28 $2.57
Context 262K 1M 256K
Vision
Tool calls
Best for Multimodal agents, long-doc coding Bulk cheap text, RAG, cache-stable prefixes Premium agentic + multimodal
Available on TokenMix ✓ (5 SKUs) ✓ (19 SKUs)

The right pattern is mixed routing through a unified gateway: DeepSeek for bulk text, Kimi when long-context coding stability matters, Doubao when premium multimodal is the value driver. See the Doubao API Setup Guide for the Doubao-specific tier table.

Migration Checklist (K2 Deprecation)

The kimi-k2 series will be officially discontinued per Moonshot's K2 pricing page. Plan migration now — code targeting kimi-k2-0905-preview, kimi-k2-0711-preview, or kimi-k2-turbo-preview will break when those endpoints retire.

Step Action Why
1 Audit codebases for kimi-k2-*-preview model IDs Find every place pinned to deprecating models
2 Decide K2.5 vs K2.6 per workload K2.5 is cheaper; K2.6 only when vision or fresh long-context coding matters
3 Re-test prompts on K2.5 / K2.6 Tokenizer and behavior may differ from K2
4 Replace kimi-k2-thinking-turbo with kimi-k2.6 if thinking mode required K2.6 supports thinking natively
5 Verify cache-hit ratio still applies New model = new cache namespace
6 Add cost monitoring per model ID Catch routing regressions before the next bill

For TokenMix users, the migration is a single env var change — both new and old model IDs are addressable from the same key while the old ones remain live.

Final Recommendation

Default to kimi-k2.5 at $0.60 cache-miss input / $3.00 output for multimodal Kimi workloads. Escalate to kimi-k2.6 ($0.95 / $4.00) only when vision or the latest long-context coding improvements demonstrably move accuracy. Avoid the kimi-k2-*-preview series for new builds — they are scheduled for deprecation.

FAQ

How much does the Kimi API cost in 2026?

Moonshot's direct API charges $0.10-$0.16 per million cache-hit input tokens, $0.60-$1.15 per million cache-miss input tokens, and $2.50-$8.00 per million output tokens depending on model. K2.5 sits at the affordable end ($0.10/$0.60/$3.00) and K2-turbo variants at the premium end ($0.15/$1.15/$8.00). Prices verified 2026-05-14.

What is Kimi K2.6?

Kimi K2.6 is Moonshot AI's latest multimodal model, released April 20, 2026. It supports text, image, and video input, has a 256K context window, supports thinking mode and tool calls, and replaces K2.5 as the flagship Kimi tier. The Moonshot release banner specifically calls out "improved long-context coding stability" as the K2.6 differentiator.

Is Kimi K2 being discontinued?

Yes. Moonshot's official K2 pricing page states the kimi-k2 series models (kimi-k2-0905-preview, kimi-k2-0711-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) will be officially discontinued. Plan migration to K2.5 or K2.6 now.

How does Kimi pricing compare to DeepSeek and Doubao?

Kimi K2.5 cache-miss input ($0.60/MTok) is ~4× more expensive than DeepSeek V4-Flash ($0.14/MTok) and ~17% more than Doubao Seed 2.0 Pro ($0.514/MTok). Kimi's edge over DeepSeek is multimodal support; its edge over Doubao is per-token cost for vision tasks at K2.5 tier.

Do I need a Chinese phone number to use Kimi API?

For direct Moonshot signup, typically yes. Via TokenMix, no — TokenMix routes Kimi calls through its OpenAI-compatible endpoint without requiring a Moonshot account or Chinese mainland verification.

Is Kimi OpenAI-compatible?

Yes. Both Moonshot's direct API (api.moonshot.ai/v1) and TokenMix (api.tokenmix.ai/v1) speak the OpenAI Chat Completions protocol. Switching to Kimi from OpenAI requires only changing base_url and model.

Does Kimi support tools and JSON output?

All Kimi models support tool calls (function calling), JSON mode, and streaming. K2.5 and K2.6 additionally support thinking mode (long internal reasoning). K2.6 adds vision input on top.

What is Kimi K2.5's context window?

256K tokens (262,144 exact). All current Kimi models except kimi-k2-0711-preview (131K) share the 262K context length. This is large enough for full-codebase analysis and long-document review without manual chunking.

Sources