TokenMix Research Lab · 2026-06-15

AI API Pricing Index 2026: 122 LLM Models Compared (Live)

AI API Pricing Index 2026: 122 Models Compared (Live)

Last Updated: 2026-06-15
Author: TokenMix Research Lab
Data verified: 2026-06-15 — live TokenMix.ai gateway, 171 models across 17 vendors

The cheapest production LLM API on the TokenMix gateway right now is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens (Qwen). This page tracks live, verified gateway prices for 122 chat models across 17 vendors — and refreshes regularly.

All numbers below are real TokenMix.ai gateway prices in USD per 1M tokens, pulled directly from api.tokenmix.ai/api/models and reproducible by anyone. Context windows run from a few thousand tokens up to 2,000,000 (Grok 4.20 Non-Reasoning). We rank by a blended 3:1 input:output cost so a single number reflects realistic agent/chat workloads. For model-specific deep dives see our MiniMax M3, Qwen 3.7 Max and Tencent Hunyuan breakdowns.

How This Index Is Built

Prices are the TokenMix.ai unified-gateway rate — the actual price you pay routing through one OpenAI-compatible endpoint, not vendor list prices. We pull the full catalog from the public /api/models endpoint, keep only chat models with a non-zero token price, and rank by blended cost = (3 × input + 1 × output) / 4. Image/video/request-priced and zero-price models are excluded. Vendor list prices exist in the data but are inconsistent across providers, so this index reports gateway prices only — the figure that determines your real bill. See the cheapest-frontier-LLM cost-per-task guide for task-level math.

1. Cheapest Chat Models (blended 3:1)

# Model Vendor In $/1M Out $/1M Blended
1 Qwen Turbo Qwen 0.040 0.079 0.050
2 Doubao 1.5 Lite ByteDance 0.044 0.088 0.055
3 Qwen Flash Qwen 0.020 0.197 0.064
4 Qwen3 VL Flash Qwen 0.020 0.204 0.066
5 Doubao Seed 1.6 Flash ByteDance 0.022 0.219 0.071
6 Qwen3.5 Flash Qwen 0.026 0.263 0.085
7 Doubao Seed 2.0 Mini ByteDance 0.029 0.292 0.095
8 GPT-5 Nano OpenAI 0.049 0.388 0.133
9 Qwen VL Plus Qwen 0.106 0.265 0.146
10 Doubao Seed 1.8 ByteDance 0.117 0.292 0.161

2. Cheapest Model Per Vendor

Vendor Cheapest Model In $/1M Out $/1M Blended
Qwen Qwen Turbo 0.040 0.079 0.050
ByteDance Doubao 1.5 Lite 0.044 0.088 0.055
OpenAI GPT-5 Nano 0.049 0.388 0.133
DeepSeek DeepSeek V4 Flash 0.132 0.265 0.165
Google Gemini 2.5 Flash Lite 0.097 0.388 0.170
Microsoft Phi-4 0.100 0.400 0.175
Tencent YT-VITA 0.164 0.479 0.243
xAI Grok 4.1 Fast Reasoning 0.190 0.475 0.261
Mistral Codestral 0.279 0.837 0.418
MiniMax MiniMax M2.5 0.324 1.297 0.567
Meta Llama 4 Maverick 0.372 1.581 0.674
Moonshot Kimi K2 Thinking 0.529 2.118 0.926
Zhipu GLM-4.7 0.558 2.046 0.930
Anthropic Claude Haiku 4.5 1.000 5.000 2.000
Cohere Command A 2.350 9.400 4.113

3. Cheapest Reasoning-Capable Models

# Model Vendor In $/1M Out $/1M Context
1 Qwen3 VL Flash Qwen 0.020 0.204 262,144
2 Doubao Seed 1.6 Flash ByteDance 0.022 0.219 262,144
3 Doubao Seed 2.0 Mini ByteDance 0.029 0.292 256,000
4 GPT-5 Nano OpenAI 0.049 0.388 400,000
5 Doubao Seed 1.8 ByteDance 0.117 0.292 262,144
6 Doubao Seed 1.6 ByteDance 0.117 0.292 262,144
7 DeepSeek V4 Flash DeepSeek 0.132 0.265 1,000,000
8 Gemini 2.5 Flash Lite Google 0.097 0.388 1,048,576

4. Best-Value Long-Context (≥200K) Models

# Model Vendor Context In $/1M Out $/1M
1 Qwen Turbo Qwen 1,000,000 0.040 0.079
2 Qwen Flash Qwen 1,000,000 0.020 0.197
3 Qwen3 VL Flash Qwen 262,144 0.020 0.204
4 Doubao Seed 1.6 Flash ByteDance 262,144 0.022 0.219
5 Qwen3.5 Flash Qwen 1,000,000 0.026 0.263
6 Doubao Seed 2.0 Mini ByteDance 256,000 0.029 0.292
7 GPT-5 Nano OpenAI 400,000 0.049 0.388
8 Doubao Seed 1.8 ByteDance 262,144 0.117 0.292

5. Cost-Per-Task Example — 1M input + 0.5M output

Model Vendor Task cost
Qwen Turbo Qwen $0.079
Doubao 1.5 Lite ByteDance $0.088
Qwen Flash Qwen $0.118
Qwen3 VL Flash Qwen $0.122
Doubao Seed 1.6 Flash ByteDance $0.131
Qwen3.5 Flash Qwen $0.158

6. Premium Tier (for reference)

Model Vendor In $/1M Out $/1M Blended
GPT-5.4 Pro OpenAI 29.100 174.600 65.475
GPT-5 Pro OpenAI 14.550 116.400 40.013
o3 Pro OpenAI 19.400 77.600 33.950
GPT-5.5 OpenAI 5.000 30.000 11.250
Claude Opus 4.8 Anthropic 5.000 25.000 10.000
Claude Opus 4.7 Anthropic 5.000 25.000 10.000

FAQ

What is the cheapest AI API in 2026?

On the TokenMix gateway, the cheapest chat model is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens. Among major vendors, Qwen, ByteDance (Doubao) and OpenAI's nano tier hold the lowest blended costs. See table 1 for the live top 10.

Are these prices the same as the official vendor prices?

No. These are TokenMix.ai unified-gateway prices — what you pay through a single OpenAI-compatible endpoint. They can be at, below, or above a vendor's list price depending on routing and volume. The gateway price is the figure that actually determines your bill.

How often does this pricing index update?

The underlying data is polled from the live gateway every 2 hours, and this page is refreshed regularly. The "Last Updated" date at the top reflects the latest verified pull. Prices on this page are baked into the HTML so they are reliably machine-readable, not loaded by JavaScript.

How are the models ranked?

By blended cost = (3 × input price + 1 × output price) / 4, in USD per 1M tokens. The 3:1 weighting approximates typical chat and agent workloads, where input tokens dominate. You can re-rank by raw input or output price using the columns in each table.

Which model has the longest context window?

In the current snapshot, Grok 4.20 Non-Reasoning (xAI) leads with a 2,000,000-token context window. See table 4 for the best-value long-context options ranked by price.

Can I access all these models from one API?

Yes. Every model in this index is reachable through the single TokenMix.ai OpenAI-compatible endpoint — no separate vendor accounts, and it works for models that are otherwise hard to reach outside their home region.

Related Articles


Source: live TokenMix.ai gateway, 171 models across 17 vendors, verified 2026-06-15. Reproduce: GET https://api.tokenmix.ai/api/models.