MiniMax M3 Ships a Million-Token Context in Open Weights

On June 7, 2026, the weights for MiniMax M3 landed on Hugging Face, about a week after the API opened. That lag matters: model launches often happen in layers - teaser, API availability, router listings, benchmarks, and then open weights

and a lot of the early coverage treated the API launch as the full story. Now that the weights are actually there, the release is worth a proper look.

MiniMax, the Shanghai-based AI lab, released M3 on June 1, 2026. It is the first open-weight model to combine frontier-level coding performance, a 1-million-token context window, and native multimodal capabilities - image and video understanding - in a single architecture. That combination had previously been the exclusive territory of closed APIs.

What is actually new

The genuine engineering story in M3 is the attention mechanism. MiniMax itself abandoned sparse attention during its entire M2 generation in favor of full attention, calling the infrastructure "not yet mature" at the time. Returning to sparse attention with a new design called MiniMax Sparse Attention (MSA) and achieving order-of-magnitude speedups suggests the technology has caught up.

The mechanism works like this: MSA uses a lightweight index branch to scan incoming tokens and select which blocks of past tokens require attention, then runs attention only on those relevant blocks. At a 1-million-token context length, MSA reduces per-token compute to one-twentieth of the prior generation, delivers more than 9x faster prefill, and more than 15x faster decoding.

That is not a marginal improvement. A million-token window has existed on paper in various models, but the cost of actually using it has made it a spec-sheet number rather than a practical tool. MSA delivers roughly 15.6x faster decoding and 9.7x faster prefill at 1M context compared to MiniMax M2. That is what makes a 1M window affordable to actually use rather than a spec-sheet number.

The model card on Hugging Face confirms the architecture: MiniMax-M3 is a native multimodal model with 1M context, approximately 428B total parameters, and approximately 23B activated parameters. So it is a Mixture-of-Experts design - a large model that activates a small fraction of its weights per token, which is how you make something this size run at anything close to a reasonable speed.

The real capability claim is long-horizon persistence. M3 worked for nearly twelve hours without intervention, produced 18 commits and 23 figures, and confirmed the paper's key findings in one of MiniMax's internal reproduction experiments. Whether that translates to your actual workflows is a different question.

The benchmark numbers, treated honestly

The model scores 59.0% on SWE-Bench Pro, surpassing both OpenAI GPT-5.5 and Google Gemini 3.1 Pro on this widely used software engineering benchmark. That is the headline, and it earned attention.

Here is the caveat: several benchmark results were obtained on MiniMax's own infrastructure using agent scaffolding such as Claude Code and Mini-SWE-Agent. Independent third-party verification is still pending, and M3 has not yet appeared on the DeepSWE board for long-horizon software tasks.

There is a second, quieter caveat. MiniMax compared M3 against Claude Opus 4.7, even though Anthropic had already shipped Opus 4.8 a week earlier. Against the newer Opus 4.8, the coding gap is wider.

One known soft spot: abstract, fluid reasoning. The whole family of Chinese models has lagged here, and ARC-AGI-2 results earlier this year had the MiniMax line scoring low single digits. M3 is a strong coder and a strong agent. It is not, on the available evidence, a great abstract reasoner.

The pricing gap is real

The cost gap is substantial: MiniMax M3 API input pricing starts at approximately $0.30 per million tokens, while Claude Opus 4.7 charges $5.00 per million input tokens and $25.00 per million output tokens.

For agentic coding tasks - the use case M3 is designed for - that difference compounds fast. A coding agent that makes thousands of tool calls, re-reads growing context, and retries failed steps is not a chat app. Token counts multiply. At a tenth to a twentieth of the per-token cost of a closed frontier model, M3's economics look genuinely different, not just marginally better.

The license situation

This is where you need to slow down. MiniMax's relationship with licensing has been contentious. MiniMax-M2 shipped under a modified-MIT license that blocked commercial use without written permission. If M3 follows that precedent, expect downloadable weights with a non-commercial default and enterprise licensing available through direct sales.

A Hugging Face discussion on the M3 model card shows the same friction playing out again: the definition of commercial use is still ambiguous, with examples saying one thing and the definition another. With a strict interpretation, almost everything needs a permission and a "Built with MiniMax M3" display requirement.

If you want to self-host for commercial use, read the actual license text before you build anything around it. The weights being downloadable and the weights being usable for your product are not the same thing.

What to test before you commit

The practical question for any team looking at M3 is whether the 1M context actually holds up on their workloads. Long context helps only if the model can use it well. A 1M-token window is not valuable if the model loses the key instruction or misses the relevant file.

Agent teams should test context stability, multimodal grounding, tool-use reliability, thinking-versus-non-thinking economics, and API-versus-self-host tradeoffs before swapping production workloads.

The three things that would settle M3's real standing: independent benchmark runs that strip away MiniMax's own scaffolding; the model appearing on neutral leaderboards like DeepSWE; and the license terms being clarified so that commercial deployments have a clear path.

The architecture story is real. The cost story is real. The benchmark story needs external corroboration. The license story needs a lawyer's eye if you are building commercially. That is an honest summary of where M3 stands today.