TokenMix Research Lab · 2026-06-15

Last Updated: 2026-06-15 Author: TokenMix Research Lab Data verified: 2026-06-15 — MiniMax official pricing (platform.minimax.io), MiniMax-M3 model card (Hugging Face), TokenMix.ai model tracker
MiniMax M3 API: Pricing, Benchmarks & How to Access (2026)
MiniMax M3 lands at $0.30 per 1M input / $1.20 per 1M output on the standard tier — roughly 6% of what GPT-5.5 charges, for an open-weight agentic model that MiniMax claims beats Claude Opus on web-browsing tasks. That is the headline. The catch: every benchmark you will see today is self-reported, and first-token latency is slow. This guide gives you the real numbers, the access paths, and where M3 actually makes sense.
Released June 1, 2026, M3 is MiniMax's first model on its new MSA ("Mixture of Sparse Attention") architecture — a ~428B-total / ~23B-active Mixture-of-Experts design with a native 1M-token context window. Based on MiniMax's official pricing, the launch price is a "permanent 50% off" rate, and the weights ship under the MiniMax community license, so you can self-host or rent it through a gateway. We pulled the live numbers and ran the cost math below.
Table of Contents
- Quick Verdict
- Quick Comparison: M3 vs the Field
- What MiniMax M3 Actually Is
- MiniMax M3 API Pricing, Tier by Tier
- Benchmarks: What's Claimed vs What's Verified
- Cost per Task: Real Math
- Latency and Reliability Caveats
- How to Access MiniMax M3
- Who Should Use M3 (Decision Guide)
- Final Recommendation
- FAQ
- Related Articles
Quick Verdict
| Status | Finding |
|---|---|
| ✅ Confirmed | Released 2026-06-01; open weights under MiniMax community license; native 1M context; standard price $0.30/$1.20 per 1M tokens (50%-off launch rate) |
| ✅ Confirmed | OpenAI-compatible API; available direct (platform.minimax.io) and through third-party gateways |
| 🟡 Likely | Strong agentic/tool-use performance — MiniMax reports SWE-Bench Pro 59.0% and BrowseComp 83.5, but these are vendor numbers |
| ⚠️ Risk | No independent benchmark exists yet. Treat every quality score as self-reported until third parties confirm |
| ⚠️ Risk | High first-token latency (~37–38s observed on long-context agentic runs) — bad fit for interactive chat, fine for batch/agent loops |
The 40-60 word takeaway for anyone skimming: MiniMax M3 is the cheapest serious agentic model on the market right now — about 6% of GPT-5.5's per-token price — with open weights and a 1M context. The trade-offs are unverified benchmarks and slow first-token latency. Use it for cost-sensitive agent and batch workloads, not low-latency chat.
Quick Comparison: M3 vs the Field
| Model | Input $/1M | Output $/1M | Context | Weights | Best for |
|---|---|---|---|---|---|
| MiniMax M3 (≤512K) | $0.30 | $1.20 | 1M | Open | Cheap agents, batch |
| MiniMax M3 (>512K) | $0.60 | $2.40 | 1M | Open | Long-context jobs |
| GPT-5.5 | $5.00 | $30.00 | — | Closed | Frontier reasoning |
| Claude Opus 4.8 | $5.00 | $25.00 | — | Closed | Agentic coding |
| Qwen 3.7 Max | $2.50 | $7.50 | 1M | Closed | Multilingual reasoning |
Pricing per MiniMax, the GPT-5.5 release pricing we tracked, and Alibaba/Anthropic list rates. M3's standard input price is 1/16th of GPT-5.5's and 1/8th of Qwen 3.7 Max's. On output the gap is wider still.
What MiniMax M3 Actually Is
M3 is a sparse Mixture-of-Experts model. Per the MiniMax-M3 model card on Hugging Face, it carries roughly 428B total parameters with about 23B active per token — a much sparser activation ratio than M2.5/M2.7, which means lower inference cost for a given capacity.
The architectural headline is MSA (Mixture of Sparse Attention). Instead of full attention across the whole context, M3 routes attention sparsely, which is how MiniMax keeps a 1M-token native window affordable. One practical note: the API enforces a minimum context allocation of 512K, which is why the pricing splits at the 512K boundary (more on that below).
Three things matter for buyers:
- Open weights. M3 ships under the MiniMax community license. You can self-host on your own GPUs, or rent it from any provider that hosts the weights — you are not locked to MiniMax's endpoint.
- Text-only, agent-tuned. M3 is a text/tool model, not multimodal. MiniMax positions it squarely at agentic workloads: long tool-call chains, browser automation, terminal tasks.
- New architecture = thin track record. MSA is new. There is no independent latency or quality data outside MiniMax's own reports yet, so early adopters are also early testers.
On TokenMix.ai's model tracker, M3 shows up with its live gateway price and a 1M-token context flag, alongside its M2.x predecessors for side-by-side comparison.
MiniMax M3 API Pricing, Tier by Tier
Based on MiniMax's official pricing page, M3 uses a two-tier structure keyed to context size. The rates below are the current "permanent 50% off" launch prices.
| Tier | Input $/1M | Cached input $/1M | Output $/1M |
|---|---|---|---|
| Standard (≤512K context) | $0.30 | $0.06 | $1.20 |
| Extended (>512K context) | $0.60 | $0.12 | $2.40 |
A few things to read carefully:
- The split is by context window, not by usage volume. If your request fits in 512K, you pay the standard rate. Push past 512K and the whole request bills at the extended rate.
- Cached input is 5x cheaper ($0.06 vs $0.30). For agent loops that re-send the same system prompt and tool definitions every turn, prompt caching is the single biggest lever on your bill.
- Output is 4x input, which is normal — but because M3's absolute output price ($1.20) is so low, output-heavy workloads stay cheap in absolute terms.
For comparison, Anthropic lists Claude Opus 4.8 at $5/$25 and OpenAI's GPT-5.5 sits at $5/$30. On output, M3's standard tier is ~25x cheaper than GPT-5.5 and ~21x cheaper than Opus 4.8.
What you pay through a gateway
Direct pricing assumes you hold a MiniMax account and top up its wallet. If you route M3 through a unified gateway instead, the rate differs slightly. On TokenMix.ai, M3 is listed at $0.587 per 1M input / $2.35 per 1M output — which tracks MiniMax's extended (>512K) tier and folds 300+ other models into one OpenAI-compatible key and one invoice. If your jobs stay under 512K and you only need MiniMax, going direct is cheapest; if you switch models often or want one bill, the gateway saves the operational overhead. We say which is cheaper for which case in the decision guide.
Benchmarks: What's Claimed vs What's Verified
This is the section to read slowly. As of 2026-06-15, no independent lab has published quality benchmarks for M3. Everything in the table below comes from MiniMax's own launch materials. We are reporting them as claims, not facts.
| Benchmark | MiniMax M3 (claimed) | Reference point |
|---|---|---|
| SWE-Bench Pro | 59.0% | Hard tier of SWE-Bench |
| Terminal-Bench 2.1 | 66.0% | Terminal/shell agent tasks |
| MCP Atlas | 74.2% | Tool-use via Model Context Protocol |
| OSWorld | 70.06% | Computer-use / GUI agents |
| BrowseComp | 83.5 | Web browsing — MiniMax cites Opus 4.7 at 79.3 |
The BrowseComp claim is the one that will travel: MiniMax says M3 outscores Claude Opus 4.7 on web browsing. Maybe. But "vendor benchmark beats competitor's older model" is exactly the kind of claim that needs third-party replication before you bet a production agent on it.
Our position, consistent with how TokenMix.ai treats every launch: discount self-reported scores by default. The numbers suggest M3 is genuinely capable at agentic tasks — the architecture and pricing back that up — but until an independent SWE-Bench or Terminal-Bench run lands, treat the percentages as direction, not magnitude. When independent data appears, it will show on the M3 model page.
Cost per Task: Real Math
Per-token prices are abstract. Here is what M3 actually costs on three common workloads, using the standard ($0.30/$1.20) tier and comparing to GPT-5.5 ($5/$30).
Workload 1 — a single coding-agent task (50K input, 10K output):
| Model | Calculation | Cost | vs GPT-5.5 |
|---|---|---|---|
| MiniMax M3 (standard) | (50K×$0.30 + 10K×$1.20)/1M | $0.027 | 4.9% |
| MiniMax M3 (via TokenMix) | (50K×$0.587 + 10K×$2.35)/1M | $0.053 | 9.6% |
| Claude Opus 4.8 | (50K×$5 + 10K×$25)/1M | $0.50 | 91% |
| GPT-5.5 | (50K×$5 + 10K×$30)/1M | $0.55 | 100% |
On this shape of task, M3's standard tier is ~4.9% of the GPT-5.5 cost — a 20x saving.
Workload 2 — a high-volume agent fleet (per day: 200M input, 40M output):
- M3: (200M × $0.30 + 40M × $1.20) / 1M = $60 + $48 = $108/day (~$3,240/month)
- GPT-5.5: (200M × $5 + 40M × $30) / 1M = $1,000 + $1,200 = $2,200/day (~$66,000/month)
- With prompt caching on the repeated system prompt (say 60% of input cached at $0.06), M3 drops further to roughly $79/day.
Workload 3 — a long-context document job (600K input, 50K output, extended tier):
- M3 extended: (600K × $0.60 + 50K × $2.40) / 1M = $0.36 + $0.12 = $0.48 per document
- Even at the doubled extended rate, a full 600K-token analysis costs under fifty cents.
The pattern is consistent: M3 turns workloads that were "too expensive to run at scale" on frontier models into routine line items. That is the whole pitch. For a structured way to model this across providers, see our AI API pricing calculator guide.
Latency and Reliability Caveats
Cheap tokens are worthless if the model is too slow for your use case. The honest caveat on M3: first-token latency is high. On long-context agentic runs, observed time-to-first-token has been in the ~37–38 second range. That is fine for a background agent grinding through a task queue. It is unacceptable for a chat UI where a user is watching a cursor blink.
Two implications:
- Good fit: batch processing, overnight agent runs, document pipelines, tool-call chains where total task time matters more than per-token responsiveness.
- Bad fit: interactive chatbots, autocomplete, anything with a human waiting on the first token.
Because M3 is new, throughput-under-load and uptime data are still thin. TokenMix.ai monitors live availability and latency across providers, and M3's reliability profile will firm up over the coming weeks on the models dashboard. Until then, build in fallback routing — if M3's endpoint stalls, you want a second model ready.
How to Access MiniMax M3
There are three practical paths:
- Direct via MiniMax. Create an account at platform.minimax.io, top up the wallet, and call the OpenAI-compatible endpoint. Cheapest per token if you stay under 512K and only need MiniMax. Requires managing a separate vendor account and balance.
- Self-host the weights. Because M3 is open, you can run it on your own infrastructure. Realistic only if you have serious GPU capacity — a 428B-param MoE is not a laptop model — but it removes per-token cost entirely and keeps data in-house.
- Through a unified gateway. Route M3 alongside other models with one key. Via TokenMix.ai's unified API, M3 is reachable through the same OpenAI-compatible interface as GPT, Claude, Gemini, Qwen, and DeepSeek — useful if you want fallback routing or already bill multiple models in one place. See the OpenAI-compatible API gateway guide for the SDK pattern.
All three speak the OpenAI Chat Completions format, so switching is usually a base-URL and model-name change, not a rewrite.
Who Should Use M3 (Decision Guide)
| If you need... | Choose | Why |
|---|---|---|
| Cheapest agentic tokens, latency-tolerant | MiniMax M3 (direct) | $0.30/$1.20 standard tier, unbeatable on price |
| One key across many models + fallback | M3 via TokenMix.ai | Unified billing, automatic failover |
| Low first-token latency for chat | Not M3 — use a fast model | M3's ~37s TTFT kills interactive UX |
| Long-context document analysis on a budget | M3 extended tier | 1M context, <$0.50 per 600K-token job |
| Verified, independently-benchmarked quality | Wait, or use Opus/GPT | M3 scores are self-reported as of 2026-06-15 |
| Data fully in-house | Self-host M3 weights | Open license permits it |
Final Recommendation
Based on the data as of 2026-06-15: MiniMax M3 is the best price-per-capability bet on the market for cost-sensitive agentic workloads — provided you can tolerate slow first-token latency and you treat its benchmarks as unverified. At $0.30/$1.20 standard, it undercuts every frontier model by an order of magnitude, ships open weights, and offers a native 1M context. For batch agents, document pipelines, and high-volume tool-use, it is the obvious value pick.
Two hard caveats keep it from being a blanket recommendation: there is no independent benchmark yet, and it is too slow for interactive chat. Run a pilot on your real workload, measure latency against your SLA, and keep a fallback model wired in. You can compare M3's live price and availability against 300+ other models on TokenMix.ai before committing.
FAQ
How much does the MiniMax M3 API cost?
On the standard tier (≤512K context), M3 costs $0.30 per 1M input tokens and $1.20 per 1M output tokens, with cached input at $0.06. Requests over 512K bill at the extended rate of $0.60/$2.40. These are MiniMax's "permanent 50% off" launch prices per platform.minimax.io.
Is MiniMax M3 cheaper than GPT-5.5 or Claude Opus?
Dramatically. M3's standard output price ($1.20) is about 4% of GPT-5.5's ($30) and roughly 5% of Claude Opus 4.8's ($25). On a typical 50K-input/10K-output coding task, M3 costs about $0.027 versus $0.55 on GPT-5.5.
Are MiniMax M3's benchmarks reliable?
Not yet. As of 2026-06-15, all published scores (SWE-Bench Pro 59.0%, BrowseComp 83.5, etc.) are self-reported by MiniMax. No independent lab has replicated them. Treat them as directional until third-party data lands.
Does MiniMax M3 have open weights?
Yes. M3 ships under the MiniMax community license, so you can self-host it or rent it from third-party providers, not just MiniMax's own endpoint.
What is the MiniMax M3 context window?
M3 supports a native 1M-token context. The API enforces a minimum 512K allocation, and pricing splits at that 512K boundary into standard and extended tiers.
Is MiniMax M3 good for chatbots?
No. M3's first-token latency runs around 37–38 seconds on long agentic runs, which is too slow for interactive chat. It is built for batch and agent workloads where total task time matters more than responsiveness.
How do I access MiniMax M3 from outside China?
Three ways: directly via platform.minimax.io (OpenAI-compatible), self-hosting the open weights, or through a unified gateway like TokenMix.ai that exposes M3 via the same OpenAI-compatible interface as GPT, Claude, and Gemini.
Related Articles
- MiniMax M2.7 Review: Latest Flagship After M2.5's SWE-Bench Win (2026)
- MiniMax M2.5 Review: 80.2% SWE-Bench Verified at $0.28/M (2026)
- Best Chinese AI Models 2026: Kimi, DeepSeek, Qwen, GLM Compared
- GPT-5.5 (Spud) Released: $5/$30 API Pricing & Benchmarks 2026
- Cheapest LLM API 2026: Real Cost per Task (Not Per Token)
About TokenMix
TokenMix.ai is a neutral AI model intelligence platform. We independently track pricing, benchmarks, and API reliability for 300+ models, and provide a single OpenAI-compatible gateway to access them — often well below official rates. We don't represent any model vendor; our job is to tell developers what's actually true when they pick a model or an API. Check live prices on /models, compare plans on /pricing, or read the integration docs at /docs.