TokenMix Research Lab · 2026-06-15

MiniMax M3 API: Pricing, Benchmarks & How to Access (2026)

Last Updated: 2026-06-15 Author: TokenMix Research Lab Data verified: 2026-06-15 — MiniMax official pricing (platform.minimax.io), MiniMax-M3 model card (Hugging Face), TokenMix.ai model tracker

MiniMax M3 API: Pricing, Benchmarks & How to Access (2026)

MiniMax M3 lands at $0.30 per 1M input / $1.20 per 1M output on the standard tier — roughly 6% of what GPT-5.5 charges, for an open-weight agentic model that MiniMax claims beats Claude Opus on web-browsing tasks. That is the headline. The catch: every benchmark you will see today is self-reported, and first-token latency is slow. This guide gives you the real numbers, the access paths, and where M3 actually makes sense.

Released June 1, 2026, M3 is MiniMax's first model on its new MSA ("Mixture of Sparse Attention") architecture — a ~428B-total / ~23B-active Mixture-of-Experts design with a native 1M-token context window. Based on MiniMax's official pricing, the launch price is a "permanent 50% off" rate, and the weights ship under the MiniMax community license, so you can self-host or rent it through a gateway. We pulled the live numbers and ran the cost math below.

Table of Contents

Quick Verdict

Status Finding
✅ Confirmed Released 2026-06-01; open weights under MiniMax community license; native 1M context; standard price $0.30/$1.20 per 1M tokens (50%-off launch rate)
✅ Confirmed OpenAI-compatible API; available direct (platform.minimax.io) and through third-party gateways
🟡 Likely Strong agentic/tool-use performance — MiniMax reports SWE-Bench Pro 59.0% and BrowseComp 83.5, but these are vendor numbers
⚠️ Risk No independent benchmark exists yet. Treat every quality score as self-reported until third parties confirm
⚠️ Risk High first-token latency (~37–38s observed on long-context agentic runs) — bad fit for interactive chat, fine for batch/agent loops

The 40-60 word takeaway for anyone skimming: MiniMax M3 is the cheapest serious agentic model on the market right now — about 6% of GPT-5.5's per-token price — with open weights and a 1M context. The trade-offs are unverified benchmarks and slow first-token latency. Use it for cost-sensitive agent and batch workloads, not low-latency chat.

Quick Comparison: M3 vs the Field

Model Input $/1M Output $/1M Context Weights Best for
MiniMax M3 (≤512K) $0.30 $1.20 1M Open Cheap agents, batch
MiniMax M3 (>512K) $0.60 $2.40 1M Open Long-context jobs
GPT-5.5 $5.00 $30.00 Closed Frontier reasoning
Claude Opus 4.8 $5.00 $25.00 Closed Agentic coding
Qwen 3.7 Max $2.50 $7.50 1M Closed Multilingual reasoning

Pricing per MiniMax, the GPT-5.5 release pricing we tracked, and Alibaba/Anthropic list rates. M3's standard input price is 1/16th of GPT-5.5's and 1/8th of Qwen 3.7 Max's. On output the gap is wider still.

What MiniMax M3 Actually Is

M3 is a sparse Mixture-of-Experts model. Per the MiniMax-M3 model card on Hugging Face, it carries roughly 428B total parameters with about 23B active per token — a much sparser activation ratio than M2.5/M2.7, which means lower inference cost for a given capacity.

The architectural headline is MSA (Mixture of Sparse Attention). Instead of full attention across the whole context, M3 routes attention sparsely, which is how MiniMax keeps a 1M-token native window affordable. One practical note: the API enforces a minimum context allocation of 512K, which is why the pricing splits at the 512K boundary (more on that below).

Three things matter for buyers:

On TokenMix.ai's model tracker, M3 shows up with its live gateway price and a 1M-token context flag, alongside its M2.x predecessors for side-by-side comparison.

MiniMax M3 API Pricing, Tier by Tier

Based on MiniMax's official pricing page, M3 uses a two-tier structure keyed to context size. The rates below are the current "permanent 50% off" launch prices.

Tier Input $/1M Cached input $/1M Output $/1M
Standard (≤512K context) $0.30 $0.06 $1.20
Extended (>512K context) $0.60 $0.12 $2.40

A few things to read carefully:

For comparison, Anthropic lists Claude Opus 4.8 at $5/$25 and OpenAI's GPT-5.5 sits at $5/$30. On output, M3's standard tier is ~25x cheaper than GPT-5.5 and ~21x cheaper than Opus 4.8.

What you pay through a gateway

Direct pricing assumes you hold a MiniMax account and top up its wallet. If you route M3 through a unified gateway instead, the rate differs slightly. On TokenMix.ai, M3 is listed at $0.587 per 1M input / $2.35 per 1M output — which tracks MiniMax's extended (>512K) tier and folds 300+ other models into one OpenAI-compatible key and one invoice. If your jobs stay under 512K and you only need MiniMax, going direct is cheapest; if you switch models often or want one bill, the gateway saves the operational overhead. We say which is cheaper for which case in the decision guide.

Benchmarks: What's Claimed vs What's Verified

This is the section to read slowly. As of 2026-06-15, no independent lab has published quality benchmarks for M3. Everything in the table below comes from MiniMax's own launch materials. We are reporting them as claims, not facts.

Benchmark MiniMax M3 (claimed) Reference point
SWE-Bench Pro 59.0% Hard tier of SWE-Bench
Terminal-Bench 2.1 66.0% Terminal/shell agent tasks
MCP Atlas 74.2% Tool-use via Model Context Protocol
OSWorld 70.06% Computer-use / GUI agents
BrowseComp 83.5 Web browsing — MiniMax cites Opus 4.7 at 79.3

The BrowseComp claim is the one that will travel: MiniMax says M3 outscores Claude Opus 4.7 on web browsing. Maybe. But "vendor benchmark beats competitor's older model" is exactly the kind of claim that needs third-party replication before you bet a production agent on it.

Our position, consistent with how TokenMix.ai treats every launch: discount self-reported scores by default. The numbers suggest M3 is genuinely capable at agentic tasks — the architecture and pricing back that up — but until an independent SWE-Bench or Terminal-Bench run lands, treat the percentages as direction, not magnitude. When independent data appears, it will show on the M3 model page.

Cost per Task: Real Math

Per-token prices are abstract. Here is what M3 actually costs on three common workloads, using the standard ($0.30/$1.20) tier and comparing to GPT-5.5 ($5/$30).

Workload 1 — a single coding-agent task (50K input, 10K output):

Model Calculation Cost vs GPT-5.5
MiniMax M3 (standard) (50K×$0.30 + 10K×$1.20)/1M $0.027 4.9%
MiniMax M3 (via TokenMix) (50K×$0.587 + 10K×$2.35)/1M $0.053 9.6%
Claude Opus 4.8 (50K×$5 + 10K×$25)/1M $0.50 91%
GPT-5.5 (50K×$5 + 10K×$30)/1M $0.55 100%

On this shape of task, M3's standard tier is ~4.9% of the GPT-5.5 cost — a 20x saving.

Workload 2 — a high-volume agent fleet (per day: 200M input, 40M output):

Workload 3 — a long-context document job (600K input, 50K output, extended tier):

The pattern is consistent: M3 turns workloads that were "too expensive to run at scale" on frontier models into routine line items. That is the whole pitch. For a structured way to model this across providers, see our AI API pricing calculator guide.

Latency and Reliability Caveats

Cheap tokens are worthless if the model is too slow for your use case. The honest caveat on M3: first-token latency is high. On long-context agentic runs, observed time-to-first-token has been in the ~37–38 second range. That is fine for a background agent grinding through a task queue. It is unacceptable for a chat UI where a user is watching a cursor blink.

Two implications:

Because M3 is new, throughput-under-load and uptime data are still thin. TokenMix.ai monitors live availability and latency across providers, and M3's reliability profile will firm up over the coming weeks on the models dashboard. Until then, build in fallback routing — if M3's endpoint stalls, you want a second model ready.

How to Access MiniMax M3

There are three practical paths:

  1. Direct via MiniMax. Create an account at platform.minimax.io, top up the wallet, and call the OpenAI-compatible endpoint. Cheapest per token if you stay under 512K and only need MiniMax. Requires managing a separate vendor account and balance.
  2. Self-host the weights. Because M3 is open, you can run it on your own infrastructure. Realistic only if you have serious GPU capacity — a 428B-param MoE is not a laptop model — but it removes per-token cost entirely and keeps data in-house.
  3. Through a unified gateway. Route M3 alongside other models with one key. Via TokenMix.ai's unified API, M3 is reachable through the same OpenAI-compatible interface as GPT, Claude, Gemini, Qwen, and DeepSeek — useful if you want fallback routing or already bill multiple models in one place. See the OpenAI-compatible API gateway guide for the SDK pattern.

All three speak the OpenAI Chat Completions format, so switching is usually a base-URL and model-name change, not a rewrite.

Who Should Use M3 (Decision Guide)

If you need... Choose Why
Cheapest agentic tokens, latency-tolerant MiniMax M3 (direct) $0.30/$1.20 standard tier, unbeatable on price
One key across many models + fallback M3 via TokenMix.ai Unified billing, automatic failover
Low first-token latency for chat Not M3 — use a fast model M3's ~37s TTFT kills interactive UX
Long-context document analysis on a budget M3 extended tier 1M context, <$0.50 per 600K-token job
Verified, independently-benchmarked quality Wait, or use Opus/GPT M3 scores are self-reported as of 2026-06-15
Data fully in-house Self-host M3 weights Open license permits it

Final Recommendation

Based on the data as of 2026-06-15: MiniMax M3 is the best price-per-capability bet on the market for cost-sensitive agentic workloads — provided you can tolerate slow first-token latency and you treat its benchmarks as unverified. At $0.30/$1.20 standard, it undercuts every frontier model by an order of magnitude, ships open weights, and offers a native 1M context. For batch agents, document pipelines, and high-volume tool-use, it is the obvious value pick.

Two hard caveats keep it from being a blanket recommendation: there is no independent benchmark yet, and it is too slow for interactive chat. Run a pilot on your real workload, measure latency against your SLA, and keep a fallback model wired in. You can compare M3's live price and availability against 300+ other models on TokenMix.ai before committing.

FAQ

How much does the MiniMax M3 API cost?

On the standard tier (≤512K context), M3 costs $0.30 per 1M input tokens and $1.20 per 1M output tokens, with cached input at $0.06. Requests over 512K bill at the extended rate of $0.60/$2.40. These are MiniMax's "permanent 50% off" launch prices per platform.minimax.io.

Is MiniMax M3 cheaper than GPT-5.5 or Claude Opus?

Dramatically. M3's standard output price ($1.20) is about 4% of GPT-5.5's ($30) and roughly 5% of Claude Opus 4.8's ($25). On a typical 50K-input/10K-output coding task, M3 costs about $0.027 versus $0.55 on GPT-5.5.

Are MiniMax M3's benchmarks reliable?

Not yet. As of 2026-06-15, all published scores (SWE-Bench Pro 59.0%, BrowseComp 83.5, etc.) are self-reported by MiniMax. No independent lab has replicated them. Treat them as directional until third-party data lands.

Does MiniMax M3 have open weights?

Yes. M3 ships under the MiniMax community license, so you can self-host it or rent it from third-party providers, not just MiniMax's own endpoint.

What is the MiniMax M3 context window?

M3 supports a native 1M-token context. The API enforces a minimum 512K allocation, and pricing splits at that 512K boundary into standard and extended tiers.

Is MiniMax M3 good for chatbots?

No. M3's first-token latency runs around 37–38 seconds on long agentic runs, which is too slow for interactive chat. It is built for batch and agent workloads where total task time matters more than responsiveness.

How do I access MiniMax M3 from outside China?

Three ways: directly via platform.minimax.io (OpenAI-compatible), self-hosting the open weights, or through a unified gateway like TokenMix.ai that exposes M3 via the same OpenAI-compatible interface as GPT, Claude, and Gemini.

Related Articles


About TokenMix

TokenMix.ai is a neutral AI model intelligence platform. We independently track pricing, benchmarks, and API reliability for 300+ models, and provide a single OpenAI-compatible gateway to access them — often well below official rates. We don't represent any model vendor; our job is to tell developers what's actually true when they pick a model or an API. Check live prices on /models, compare plans on /pricing, or read the integration docs at /docs.