TokenMix Research Lab · 2026-06-15

AI API Pricing Index 2026: 123 LLM Models Compared (Live)

AI API Pricing Index 2026: 123 Models Compared (Live)

Last Updated: 2026-06-26
Author: TokenMix Research Lab
Data verified: 2026-06-26 — live TokenMix.ai gateway, 172 models across 17 vendors

The cheapest production LLM API on the TokenMix gateway right now is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens (Qwen). This page tracks live, verified gateway prices for 123 chat models across 17 vendors — and refreshes regularly.

All numbers below are real TokenMix.ai gateway prices in USD per 1M tokens, pulled directly from api.tokenmix.ai/api/models and reproducible by anyone. Context windows run from a few thousand tokens up to 2,000,000 (Grok 4.20 Non-Reasoning). We rank by a blended 3:1 input:output cost so a single number reflects realistic agent/chat workloads. For model-specific deep dives see our MiniMax M3, Qwen 3.7 Max and Tencent Hunyuan breakdowns.

How This Index Is Built

Prices are the TokenMix.ai unified-gateway rate — the actual price you pay routing through one OpenAI-compatible endpoint, not vendor list prices. We pull the full catalog from the public /api/models endpoint, keep only chat models with a non-zero token price, and rank by blended cost = (3 × input + 1 × output) / 4. Image/video/request-priced and zero-price models are excluded. Vendor list prices exist in the data but are inconsistent across providers, so this index reports gateway prices only — the figure that determines your real bill. See the cheapest-frontier-LLM cost-per-task guide for task-level math.

1. Cheapest Chat Models (blended 3:1)

#	Model	Vendor	In $/1M	Out $/1M	Blended
1	Qwen Turbo	Qwen	0.040	0.079	0.050
2	Doubao 1.5 Lite	ByteDance	0.044	0.088	0.055
3	Qwen Flash	Qwen	0.020	0.197	0.064
4	Qwen3 VL Flash	Qwen	0.020	0.204	0.066
5	Doubao Seed 1.6 Flash	ByteDance	0.022	0.221	0.072
6	Qwen3.5 Flash	Qwen	0.026	0.263	0.085
7	Doubao Seed 2.0 Mini	ByteDance	0.029	0.294	0.096
8	GPT-5 Nano	OpenAI	0.049	0.388	0.133
9	Qwen VL Plus	Qwen	0.106	0.265	0.146
10	Doubao Seed 1.8	ByteDance	0.118	0.294	0.162

2. Cheapest Model Per Vendor

Vendor	Cheapest Model	In $/1M	Out $/1M	Blended
Qwen	Qwen Turbo	0.040	0.079	0.050
ByteDance	Doubao 1.5 Lite	0.044	0.088	0.055
OpenAI	GPT-5 Nano	0.049	0.388	0.133
DeepSeek	DeepSeek V4 Flash	0.132	0.265	0.165
Google	Gemini 2.5 Flash Lite	0.097	0.388	0.170
Microsoft	Phi-4	0.100	0.400	0.175
Tencent	YT-VITA	0.164	0.479	0.243
xAI	Grok 4.1 Fast Reasoning	0.190	0.475	0.261
Mistral	Codestral	0.279	0.837	0.418
MiniMax	MiniMax M2.5	0.324	1.297	0.567
Meta	Llama 4 Maverick	0.372	1.581	0.674
Moonshot	Kimi K2 Thinking	0.529	2.118	0.926
Zhipu	GLM-4.7	0.558	2.046	0.930
Anthropic	Claude Haiku 4.5	1.000	5.000	2.000
Cohere	Command A	2.350	9.400	4.113

3. Cheapest Reasoning-Capable Models

#	Model	Vendor	In $/1M	Out $/1M	Context
1	Qwen3 VL Flash	Qwen	0.020	0.204	262,144
2	Doubao Seed 1.6 Flash	ByteDance	0.022	0.221	262,144
3	Doubao Seed 2.0 Mini	ByteDance	0.029	0.294	256,000
4	GPT-5 Nano	OpenAI	0.049	0.388	400,000
5	Doubao Seed 1.8	ByteDance	0.118	0.294	262,144
6	Doubao Seed 1.6	ByteDance	0.118	0.294	262,144
7	DeepSeek V4 Flash	DeepSeek	0.132	0.265	1,000,000
8	Gemini 2.5 Flash Lite	Google	0.097	0.388	1,048,576

4. Best-Value Long-Context (≥200K) Models

#	Model	Vendor	Context	In $/1M	Out $/1M
1	Qwen Turbo	Qwen	1,000,000	0.040	0.079
2	Qwen Flash	Qwen	1,000,000	0.020	0.197
3	Qwen3 VL Flash	Qwen	262,144	0.020	0.204
4	Doubao Seed 1.6 Flash	ByteDance	262,144	0.022	0.221
5	Qwen3.5 Flash	Qwen	1,000,000	0.026	0.263
6	Doubao Seed 2.0 Mini	ByteDance	256,000	0.029	0.294
7	GPT-5 Nano	OpenAI	400,000	0.049	0.388
8	Doubao Seed 1.8	ByteDance	262,144	0.118	0.294

5. Cost-Per-Task Example — 1M input + 0.5M output

Model	Vendor	Task cost
Qwen Turbo	Qwen	$0.079
Doubao 1.5 Lite	ByteDance	$0.088
Qwen Flash	Qwen	$0.118
Qwen3 VL Flash	Qwen	$0.122
Doubao Seed 1.6 Flash	ByteDance	$0.132
Qwen3.5 Flash	Qwen	$0.158

6. Premium Tier (for reference)

Model	Vendor	In $/1M	Out $/1M	Blended
GPT-5.4 Pro	OpenAI	29.100	174.600	65.475
GPT-5 Pro	OpenAI	14.550	116.400	40.013
o3 Pro	OpenAI	19.400	77.600	33.950
GPT-5.5	OpenAI	5.000	30.000	11.250
Claude Opus 4.8	Anthropic	5.000	25.000	10.000
Claude Opus 4.7	Anthropic	5.000	25.000	10.000

FAQ

What is the cheapest AI API in 2026?

On the TokenMix gateway, the cheapest chat model is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens. Among major vendors, Qwen, ByteDance (Doubao) and OpenAI's nano tier hold the lowest blended costs. See table 1 for the live top 10.

Are these prices the same as the official vendor prices?

No. These are TokenMix.ai unified-gateway prices — what you pay through a single OpenAI-compatible endpoint. They can be at, below, or above a vendor's list price depending on routing and volume. The gateway price is the figure that actually determines your bill.

How often does this pricing index update?

The underlying data is polled from the live gateway every 2 hours, and this page is refreshed regularly. The "Last Updated" date at the top reflects the latest verified pull. Prices on this page are baked into the HTML so they are reliably machine-readable, not loaded by JavaScript.

How are the models ranked?

By blended cost = (3 × input price + 1 × output price) / 4, in USD per 1M tokens. The 3:1 weighting approximates typical chat and agent workloads, where input tokens dominate. You can re-rank by raw input or output price using the columns in each table.

Which model has the longest context window?

In the current snapshot, Grok 4.20 Non-Reasoning (xAI) leads with a 2,000,000-token context window. See table 4 for the best-value long-context options ranked by price.

Can I access all these models from one API?

Yes. Every model in this index is reachable through the single TokenMix.ai OpenAI-compatible endpoint — no separate vendor accounts, and it works for models that are otherwise hard to reach outside their home region.

Source: live TokenMix.ai gateway, 172 models across 17 vendors, verified 2026-06-26. Reproduce: GET https://api.tokenmix.ai/api/models.