TokenMix Research Lab · 2026-06-15

Last Updated: 2026-06-15 Author: TokenMix Research Lab Data verified: 2026-06-15 — MiniMax official pricing (platform.minimax.io), MiniMax-M3 model card (Hugging Face), TokenMix.ai model tracker

MiniMax M3 API: Pricing, Benchmarks & How to Access (2026)

MiniMax M3 lands at $0.30 per 1M input / $1.20 per 1M output on the standard tier — roughly 6% of what GPT-5.5 charges, for an open-weight agentic model that MiniMax claims beats Claude Opus on web-browsing tasks. That is the headline. The catch: every benchmark you will see today is self-reported, and first-token latency is slow. This guide gives you the real numbers, the access paths, and where M3 actually makes sense.

Released June 1, 2026, M3 is MiniMax's first model on its new MSA ("Mixture of Sparse Attention") architecture — a ~428B-total / ~23B-active Mixture-of-Experts design with a native 1M-token context window. Based on MiniMax's official pricing, the launch price is a "permanent 50% off" rate, and the weights ship under the MiniMax community license, so you can self-host or rent it through a gateway. We pulled the live numbers and ran the cost math below.

Quick Verdict
Quick Comparison: M3 vs the Field
What MiniMax M3 Actually Is
MiniMax M3 API Pricing, Tier by Tier
Benchmarks: What's Claimed vs What's Verified
Cost per Task: Real Math
Latency and Reliability Caveats
How to Access MiniMax M3
Who Should Use M3 (Decision Guide)
Final Recommendation
FAQ
Related Articles

Quick Verdict

Status	Finding
✅ Confirmed	Released 2026-06-01; open weights under MiniMax community license; native 1M context; standard price $0.30/$1.20 per 1M tokens (50%-off launch rate)
✅ Confirmed	OpenAI-compatible API; available direct (platform.minimax.io) and through third-party gateways
🟡 Likely	Strong agentic/tool-use performance — MiniMax reports SWE-Bench Pro 59.0% and BrowseComp 83.5, but these are vendor numbers
⚠️ Risk	No independent benchmark exists yet. Treat every quality score as self-reported until third parties confirm
⚠️ Risk	High first-token latency (~37–38s observed on long-context agentic runs) — bad fit for interactive chat, fine for batch/agent loops

The 40-60 word takeaway for anyone skimming: MiniMax M3 is the cheapest serious agentic model on the market right now — about 6% of GPT-5.5's per-token price — with open weights and a 1M context. The trade-offs are unverified benchmarks and slow first-token latency. Use it for cost-sensitive agent and batch workloads, not low-latency chat.

Quick Comparison: M3 vs the Field

Model	Input $/1M	Output $/1M	Context	Weights	Best for
MiniMax M3 (≤512K)	$0.30	$1.20	1M	Open	Cheap agents, batch
MiniMax M3 (>512K)	$0.60	$2.40	1M	Open	Long-context jobs
GPT-5.5	$5.00	$30.00	—	Closed	Frontier reasoning
Claude Opus 4.8	$5.00	$25.00	—	Closed	Agentic coding
Qwen 3.7 Max	$2.50	$7.50	1M	Closed	Multilingual reasoning

Pricing per MiniMax, the GPT-5.5 release pricing we tracked, and Alibaba/Anthropic list rates. M3's standard input price is 1/16th of GPT-5.5's and 1/8th of Qwen 3.7 Max's. On output the gap is wider still.

What MiniMax M3 Actually Is

M3 is a sparse Mixture-of-Experts model. Per the MiniMax-M3 model card on Hugging Face, it carries roughly 428B total parameters with about 23B active per token — a much sparser activation ratio than M2.5/M2.7, which means lower inference cost for a given capacity.

The architectural headline is MSA (Mixture of Sparse Attention). Instead of full attention across the whole context, M3 routes attention sparsely, which is how MiniMax keeps a 1M-token native window affordable. One practical note: the API enforces a minimum context allocation of 512K, which is why the pricing splits at the 512K boundary (more on that below).

Three things matter for buyers:

Open weights. M3 ships under the MiniMax community license. You can self-host on your own GPUs, or rent it from any provider that hosts the weights — you are not locked to MiniMax's endpoint.
Text-only, agent-tuned. M3 is a text/tool model, not multimodal. MiniMax positions it squarely at agentic workloads: long tool-call chains, browser automation, terminal tasks.
New architecture = thin track record. MSA is new. There is no independent latency or quality data outside MiniMax's own reports yet, so early adopters are also early testers.

On TokenMix.ai's model tracker, M3 shows up with its live gateway price and a 1M-token context flag, alongside its M2.x predecessors for side-by-side comparison.

MiniMax M3 API Pricing, Tier by Tier

Based on MiniMax's official pricing page, M3 uses a two-tier structure keyed to context size. The rates below are the current "permanent 50% off" launch prices.

Tier	Input $/1M	Cached input $/1M	Output $/1M
Standard (≤512K context)	$0.30	$0.06	$1.20
Extended (>512K context)	$0.60	$0.12	$2.40

A few things to read carefully:

The split is by context window, not by usage volume. If your request fits in 512K, you pay the standard rate. Push past 512K and the whole request bills at the extended rate.
Cached input is 5x cheaper ($0.06 vs $0.30). For agent loops that re-send the same system prompt and tool definitions every turn, prompt caching is the single biggest lever on your bill.
Output is 4x input, which is normal — but because M3's absolute output price ($1.20) is so low, output-heavy workloads stay cheap in absolute terms.

For comparison, Anthropic lists Claude Opus 4.8 at $5/$25 and OpenAI's GPT-5.5 sits at $5/$30. On output, M3's standard tier is ~25x cheaper than GPT-5.5 and ~21x cheaper than Opus 4.8.

What you pay through a gateway

Direct pricing assumes you hold a MiniMax account and top up its wallet. If you route M3 through a unified gateway instead, the rate differs slightly. On TokenMix.ai, M3 is listed at $0.587 per 1M input / $2.35 per 1M output — which tracks MiniMax's extended (>512K) tier and folds 300+ other models into one OpenAI-compatible key and one invoice. If your jobs stay under 512K and you only need MiniMax, going direct is cheapest; if you switch models often or want one bill, the gateway saves the operational overhead. We say which is cheaper for which case in the decision guide.

Benchmarks: What's Claimed vs What's Verified

This is the section to read slowly. As of 2026-06-15, no independent lab has published quality benchmarks for M3. Everything in the table below comes from MiniMax's own launch materials. We are reporting them as claims, not facts.

Benchmark	MiniMax M3 (claimed)	Reference point
SWE-Bench Pro	59.0%	Hard tier of SWE-Bench
Terminal-Bench 2.1	66.0%	Terminal/shell agent tasks
MCP Atlas	74.2%	Tool-use via Model Context Protocol
OSWorld	70.06%	Computer-use / GUI agents
BrowseComp	83.5	Web browsing — MiniMax cites Opus 4.7 at 79.3

The BrowseComp claim is the one that will travel: MiniMax says M3 outscores Claude Opus 4.7 on web browsing. Maybe. But "vendor benchmark beats competitor's older model" is exactly the kind of claim that needs third-party replication before you bet a production agent on it.

Our position, consistent with how TokenMix.ai treats every launch: discount self-reported scores by default. The numbers suggest M3 is genuinely capable at agentic tasks — the architecture and pricing back that up — but until an independent SWE-Bench or Terminal-Bench run lands, treat the percentages as direction, not magnitude. When independent data appears, it will show on the M3 model page.

Cost per Task: Real Math

Per-token prices are abstract. Here is what M3 actually costs on three common workloads, using the standard ($0.30/$1.20) tier and comparing to GPT-5.5 ($5/$30).

Workload 1 — a single coding-agent task (50K input, 10K output):

Model	Calculation	Cost	vs GPT-5.5
MiniMax M3 (standard)	(50K×$0.30 + 10K×$1.20)/1M	$0.027	4.9%
MiniMax M3 (via TokenMix)	(50K×$0.587 + 10K×$2.35)/1M	$0.053	9.6%
Claude Opus 4.8	(50K×$5 + 10K×$25)/1M	$0.50	91%
GPT-5.5	(50K×$5 + 10K×$30)/1M	$0.55	100%

On this shape of task, M3's standard tier is ~4.9% of the GPT-5.5 cost — a 20x saving.

Workload 2 — a high-volume agent fleet (per day: 200M input, 40M output):

M3: (200M × $0.30 + 40M × $1.20) / 1M = $60 + $48 = $108/day (~$3,240/month)
GPT-5.5: (200M × $5 + 40M × $30) / 1M = $1,000 + $1,200 = $2,200/day (~$66,000/month)
With prompt caching on the repeated system prompt (say 60% of input cached at $0.06), M3 drops further to roughly $79/day.

Workload 3 — a long-context document job (600K input, 50K output, extended tier):

M3 extended: (600K × $0.60 + 50K × $2.40) / 1M = $0.36 + $0.12 = $0.48 per document
Even at the doubled extended rate, a full 600K-token analysis costs under fifty cents.

The pattern is consistent: M3 turns workloads that were "too expensive to run at scale" on frontier models into routine line items. That is the whole pitch. For a structured way to model this across providers, see our AI API pricing calculator guide.

Latency and Reliability Caveats

Cheap tokens are worthless if the model is too slow for your use case. The honest caveat on M3: first-token latency is high. On long-context agentic runs, observed time-to-first-token has been in the ~37–38 second range. That is fine for a background agent grinding through a task queue. It is unacceptable for a chat UI where a user is watching a cursor blink.

Two implications:

Good fit: batch processing, overnight agent runs, document pipelines, tool-call chains where total task time matters more than per-token responsiveness.
Bad fit: interactive chatbots, autocomplete, anything with a human waiting on the first token.

Because M3 is new, throughput-under-load and uptime data are still thin. TokenMix.ai monitors live availability and latency across providers, and M3's reliability profile will firm up over the coming weeks on the models dashboard. Until then, build in fallback routing — if M3's endpoint stalls, you want a second model ready.

How to Access MiniMax M3

There are three practical paths:

Direct via MiniMax. Create an account at platform.minimax.io, top up the wallet, and call the OpenAI-compatible endpoint. Cheapest per token if you stay under 512K and only need MiniMax. Requires managing a separate vendor account and balance.
Self-host the weights. Because M3 is open, you can run it on your own infrastructure. Realistic only if you have serious GPU capacity — a 428B-param MoE is not a laptop model — but it removes per-token cost entirely and keeps data in-house.
Through a unified gateway. Route M3 alongside other models with one key. Via TokenMix.ai's unified API, M3 is reachable through the same OpenAI-compatible interface as GPT, Claude, Gemini, Qwen, and DeepSeek — useful if you want fallback routing or already bill multiple models in one place. See the OpenAI-compatible API gateway guide for the SDK pattern.

All three speak the OpenAI Chat Completions format, so switching is usually a base-URL and model-name change, not a rewrite.

Who Should Use M3 (Decision Guide)

If you need...	Choose	Why
Cheapest agentic tokens, latency-tolerant	MiniMax M3 (direct)	$0.30/$1.20 standard tier, unbeatable on price
One key across many models + fallback	M3 via TokenMix.ai	Unified billing, automatic failover
Low first-token latency for chat	Not M3 — use a fast model	M3's ~37s TTFT kills interactive UX
Long-context document analysis on a budget	M3 extended tier	1M context, <$0.50 per 600K-token job
Verified, independently-benchmarked quality	Wait, or use Opus/GPT	M3 scores are self-reported as of 2026-06-15
Data fully in-house	Self-host M3 weights	Open license permits it

Final Recommendation

Based on the data as of 2026-06-15: MiniMax M3 is the best price-per-capability bet on the market for cost-sensitive agentic workloads — provided you can tolerate slow first-token latency and you treat its benchmarks as unverified. At $0.30/$1.20 standard, it undercuts every frontier model by an order of magnitude, ships open weights, and offers a native 1M context. For batch agents, document pipelines, and high-volume tool-use, it is the obvious value pick.

Two hard caveats keep it from being a blanket recommendation: there is no independent benchmark yet, and it is too slow for interactive chat. Run a pilot on your real workload, measure latency against your SLA, and keep a fallback model wired in. You can compare M3's live price and availability against 300+ other models on TokenMix.ai before committing.

FAQ

How much does the MiniMax M3 API cost?

On the standard tier (≤512K context), M3 costs $0.30 per 1M input tokens and $1.20 per 1M output tokens, with cached input at $0.06. Requests over 512K bill at the extended rate of $0.60/$2.40. These are MiniMax's "permanent 50% off" launch prices per platform.minimax.io.

Is MiniMax M3 cheaper than GPT-5.5 or Claude Opus?

Dramatically. M3's standard output price ($1.20) is about 4% of GPT-5.5's ($30) and roughly 5% of Claude Opus 4.8's ($25). On a typical 50K-input/10K-output coding task, M3 costs about $0.027 versus $0.55 on GPT-5.5.

Are MiniMax M3's benchmarks reliable?

Not yet. As of 2026-06-15, all published scores (SWE-Bench Pro 59.0%, BrowseComp 83.5, etc.) are self-reported by MiniMax. No independent lab has replicated them. Treat them as directional until third-party data lands.

Does MiniMax M3 have open weights?

Yes. M3 ships under the MiniMax community license, so you can self-host it or rent it from third-party providers, not just MiniMax's own endpoint.

What is the MiniMax M3 context window?

M3 supports a native 1M-token context. The API enforces a minimum 512K allocation, and pricing splits at that 512K boundary into standard and extended tiers.

Is MiniMax M3 good for chatbots?

No. M3's first-token latency runs around 37–38 seconds on long agentic runs, which is too slow for interactive chat. It is built for batch and agent workloads where total task time matters more than responsiveness.

How do I access MiniMax M3 from outside China?

Three ways: directly via platform.minimax.io (OpenAI-compatible), self-hosting the open weights, or through a unified gateway like TokenMix.ai that exposes M3 via the same OpenAI-compatible interface as GPT, Claude, and Gemini.

About TokenMix

TokenMix.ai is a neutral AI model intelligence platform. We independently track pricing, benchmarks, and API reliability for 300+ models, and provide a single OpenAI-compatible gateway to access them — often well below official rates. We don't represent any model vendor; our job is to tell developers what's actually true when they pick a model or an API. Check live prices on /models, compare plans on /pricing, or read the integration docs at /docs.