TokenMix Research Lab · 2026-06-15

AI API Pricing Index 2026: 122 Models Compared (Live)
Last Updated: 2026-06-15
Author: TokenMix Research Lab
Data verified: 2026-06-15 — live TokenMix.ai gateway, 171 models across 17 vendors
The cheapest production LLM API on the TokenMix gateway right now is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens (Qwen). This page tracks live, verified gateway prices for 122 chat models across 17 vendors — and refreshes regularly.
All numbers below are real TokenMix.ai gateway prices in USD per 1M tokens, pulled directly from api.tokenmix.ai/api/models and reproducible by anyone. Context windows run from a few thousand tokens up to 2,000,000 (Grok 4.20 Non-Reasoning). We rank by a blended 3:1 input:output cost so a single number reflects realistic agent/chat workloads. For model-specific deep dives see our MiniMax M3, Qwen 3.7 Max and Tencent Hunyuan breakdowns.
How This Index Is Built
Prices are the TokenMix.ai unified-gateway rate — the actual price you pay routing through one OpenAI-compatible endpoint, not vendor list prices. We pull the full catalog from the public /api/models endpoint, keep only chat models with a non-zero token price, and rank by blended cost = (3 × input + 1 × output) / 4. Image/video/request-priced and zero-price models are excluded. Vendor list prices exist in the data but are inconsistent across providers, so this index reports gateway prices only — the figure that determines your real bill. See the cheapest-frontier-LLM cost-per-task guide for task-level math.
1. Cheapest Chat Models (blended 3:1)
| # | Model | Vendor | In $/1M | Out $/1M | Blended |
|---|---|---|---|---|---|
| 1 | Qwen Turbo | Qwen | 0.040 | 0.079 | 0.050 |
| 2 | Doubao 1.5 Lite | ByteDance | 0.044 | 0.088 | 0.055 |
| 3 | Qwen Flash | Qwen | 0.020 | 0.197 | 0.064 |
| 4 | Qwen3 VL Flash | Qwen | 0.020 | 0.204 | 0.066 |
| 5 | Doubao Seed 1.6 Flash | ByteDance | 0.022 | 0.219 | 0.071 |
| 6 | Qwen3.5 Flash | Qwen | 0.026 | 0.263 | 0.085 |
| 7 | Doubao Seed 2.0 Mini | ByteDance | 0.029 | 0.292 | 0.095 |
| 8 | GPT-5 Nano | OpenAI | 0.049 | 0.388 | 0.133 |
| 9 | Qwen VL Plus | Qwen | 0.106 | 0.265 | 0.146 |
| 10 | Doubao Seed 1.8 | ByteDance | 0.117 | 0.292 | 0.161 |
2. Cheapest Model Per Vendor
| Vendor | Cheapest Model | In $/1M | Out $/1M | Blended |
|---|---|---|---|---|
| Qwen | Qwen Turbo | 0.040 | 0.079 | 0.050 |
| ByteDance | Doubao 1.5 Lite | 0.044 | 0.088 | 0.055 |
| OpenAI | GPT-5 Nano | 0.049 | 0.388 | 0.133 |
| DeepSeek | DeepSeek V4 Flash | 0.132 | 0.265 | 0.165 |
| Gemini 2.5 Flash Lite | 0.097 | 0.388 | 0.170 | |
| Microsoft | Phi-4 | 0.100 | 0.400 | 0.175 |
| Tencent | YT-VITA | 0.164 | 0.479 | 0.243 |
| xAI | Grok 4.1 Fast Reasoning | 0.190 | 0.475 | 0.261 |
| Mistral | Codestral | 0.279 | 0.837 | 0.418 |
| MiniMax | MiniMax M2.5 | 0.324 | 1.297 | 0.567 |
| Meta | Llama 4 Maverick | 0.372 | 1.581 | 0.674 |
| Moonshot | Kimi K2 Thinking | 0.529 | 2.118 | 0.926 |
| Zhipu | GLM-4.7 | 0.558 | 2.046 | 0.930 |
| Anthropic | Claude Haiku 4.5 | 1.000 | 5.000 | 2.000 |
| Cohere | Command A | 2.350 | 9.400 | 4.113 |
3. Cheapest Reasoning-Capable Models
| # | Model | Vendor | In $/1M | Out $/1M | Context |
|---|---|---|---|---|---|
| 1 | Qwen3 VL Flash | Qwen | 0.020 | 0.204 | 262,144 |
| 2 | Doubao Seed 1.6 Flash | ByteDance | 0.022 | 0.219 | 262,144 |
| 3 | Doubao Seed 2.0 Mini | ByteDance | 0.029 | 0.292 | 256,000 |
| 4 | GPT-5 Nano | OpenAI | 0.049 | 0.388 | 400,000 |
| 5 | Doubao Seed 1.8 | ByteDance | 0.117 | 0.292 | 262,144 |
| 6 | Doubao Seed 1.6 | ByteDance | 0.117 | 0.292 | 262,144 |
| 7 | DeepSeek V4 Flash | DeepSeek | 0.132 | 0.265 | 1,000,000 |
| 8 | Gemini 2.5 Flash Lite | 0.097 | 0.388 | 1,048,576 |
4. Best-Value Long-Context (≥200K) Models
| # | Model | Vendor | Context | In $/1M | Out $/1M |
|---|---|---|---|---|---|
| 1 | Qwen Turbo | Qwen | 1,000,000 | 0.040 | 0.079 |
| 2 | Qwen Flash | Qwen | 1,000,000 | 0.020 | 0.197 |
| 3 | Qwen3 VL Flash | Qwen | 262,144 | 0.020 | 0.204 |
| 4 | Doubao Seed 1.6 Flash | ByteDance | 262,144 | 0.022 | 0.219 |
| 5 | Qwen3.5 Flash | Qwen | 1,000,000 | 0.026 | 0.263 |
| 6 | Doubao Seed 2.0 Mini | ByteDance | 256,000 | 0.029 | 0.292 |
| 7 | GPT-5 Nano | OpenAI | 400,000 | 0.049 | 0.388 |
| 8 | Doubao Seed 1.8 | ByteDance | 262,144 | 0.117 | 0.292 |
5. Cost-Per-Task Example — 1M input + 0.5M output
| Model | Vendor | Task cost |
|---|---|---|
| Qwen Turbo | Qwen | $0.079 |
| Doubao 1.5 Lite | ByteDance | $0.088 |
| Qwen Flash | Qwen | $0.118 |
| Qwen3 VL Flash | Qwen | $0.122 |
| Doubao Seed 1.6 Flash | ByteDance | $0.131 |
| Qwen3.5 Flash | Qwen | $0.158 |
6. Premium Tier (for reference)
| Model | Vendor | In $/1M | Out $/1M | Blended |
|---|---|---|---|---|
| GPT-5.4 Pro | OpenAI | 29.100 | 174.600 | 65.475 |
| GPT-5 Pro | OpenAI | 14.550 | 116.400 | 40.013 |
| o3 Pro | OpenAI | 19.400 | 77.600 | 33.950 |
| GPT-5.5 | OpenAI | 5.000 | 30.000 | 11.250 |
| Claude Opus 4.8 | Anthropic | 5.000 | 25.000 | 10.000 |
| Claude Opus 4.7 | Anthropic | 5.000 | 25.000 | 10.000 |
FAQ
What is the cheapest AI API in 2026?
On the TokenMix gateway, the cheapest chat model is Qwen Turbo at $0.040 input / $0.079 output per 1M tokens. Among major vendors, Qwen, ByteDance (Doubao) and OpenAI's nano tier hold the lowest blended costs. See table 1 for the live top 10.
Are these prices the same as the official vendor prices?
No. These are TokenMix.ai unified-gateway prices — what you pay through a single OpenAI-compatible endpoint. They can be at, below, or above a vendor's list price depending on routing and volume. The gateway price is the figure that actually determines your bill.
How often does this pricing index update?
The underlying data is polled from the live gateway every 2 hours, and this page is refreshed regularly. The "Last Updated" date at the top reflects the latest verified pull. Prices on this page are baked into the HTML so they are reliably machine-readable, not loaded by JavaScript.
How are the models ranked?
By blended cost = (3 × input price + 1 × output price) / 4, in USD per 1M tokens. The 3:1 weighting approximates typical chat and agent workloads, where input tokens dominate. You can re-rank by raw input or output price using the columns in each table.
Which model has the longest context window?
In the current snapshot, Grok 4.20 Non-Reasoning (xAI) leads with a 2,000,000-token context window. See table 4 for the best-value long-context options ranked by price.
Can I access all these models from one API?
Yes. Every model in this index is reachable through the single TokenMix.ai OpenAI-compatible endpoint — no separate vendor accounts, and it works for models that are otherwise hard to reach outside their home region.
Related Articles
- MiniMax M3 API: Pricing, Benchmarks & Access
- Qwen 3.7 Max API Pricing: vs Claude Opus 4.8 & GPT
- Tencent Hunyuan API Pricing: HY3 & HY2.0 English Access
- Cheapest Frontier LLM API by Cost per Task
- Best Chinese AI Models 2026: Comparison Guide
Source: live TokenMix.ai gateway, 171 models across 17 vendors, verified 2026-06-15. Reproduce: GET https://api.tokenmix.ai/api/models.