MiniMax M3 is a MiniMax model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 512,000 token context window. It supports streaming, tool calling, vision through at least one Gateway vendor route.

MiniMax M3 pricing
Test MiniMax M3 with Merge Gateway’s Simulator

Ready to try it out?
Start routing requests to hundreds of large language models in your product within minutes.

Route requests to MiniMax M3 with Merge Gateway
1{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
111{
2 "mcpServers": {
3 "agent-handler": {
4 "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5 "headers": {
6 "Authorization": "Bearer yMt*****"
7 }
8 }
9 }
10}
11Explore other models available in Merge Gateway
MiniMax M3 FAQ
Heading
What other models does MiniMax offer?
MiniMax M3 is the company's latest flagship model, released June 1, 2026, and represents a major step up from the earlier M-series. Here are some other models MiniMax supports:
- MiniMax M2.7: MiniMax M2.7 is the predecessor to M3, offering standard and high-speed inference variants (M2.7 and M2.7-highspeed) for text generation tasks, with lower capability than M3 but more broadly available across inference providers at launch
- MiniMax M2.7-Highspeed: MiniMax M2.7-Highspeed is the faster-inference variant of M2.7, optimized for latency-sensitive workloads where output speed takes priority over maximum capability, useful for real-time applications that do not require M3's frontier-level coding or long-context performance
- MiniMax M2.5: MiniMax M2.5 is an earlier M-series model, available in standard and high-speed variants, suited for teams on older integrations or cost-constrained deployments that do not yet require the 1M-token context or multimodal capabilities introduced in M3
How does MiniMax M3 differ from MiniMax's other models?
MiniMax M3 is a generational leap over prior M-series models, combining frontier-level coding, a 1-million-token context window, and native multimodality in a single open-weight model for the first time.
- Context window: MiniMax M3 supports a 1M-token context window with a guaranteed minimum of 512k tokens. Earlier models in the M-series did not offer 1M-token context, making M3 the only MiniMax model suitable for full-document processing, large codebase ingestion, or extended multi-turn agent sessions
- Multimodal input: M3 accepts text, image, and video inputs and produces text output. Prior M-series models were primarily text-focused, requiring separate pipelines or third-party models for any visual input processing
- Architecture efficiency: M3 is built on MiniMax Sparse Attention (MSA), which reduces per-token compute at 1M context to one-twentieth of the prior generation while delivering over 9x faster prefill and more than 15x faster decoding compared to M2.x at equivalent context lengths
- Coding and agentic benchmarks: M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, and 83.5 on BrowseComp. These benchmarks were not a focus area for earlier M-series models, which were positioned as general-purpose chat and instruction-following models rather than agentic coding tools
- Pricing: M3 launched at $0.60 per 1M input tokens and $2.40 per 1M output tokens, with a 50% promotional discount active at launch. For input contexts exceeding 512k tokens, pricing increases to $1.20 input and $4.80 output per 1M tokens, reflecting the higher compute cost of very long contexts
M3 is best suited for teams building agentic systems, coding pipelines, or long-context retrieval applications that previously required multiple separate models to handle coding, vision, and extended-context tasks simultaneously.
What models should I consider using alongside MiniMax M3?
No single model is optimal for every task. Here are models worth pairing with MiniMax M3 depending on what your product needs:
- Claude Opus 4.7 (Anthropic): For tasks requiring the absolute highest reasoning accuracy, particularly in science, mathematics, or multi-step logic where MiniMax M3 approaches but has not yet surpassed Opus 4.7 on independent benchmarks, routing to Opus 4.7 provides a well-established reliability reference point from a different provider
- Qwen3.7 Max (Alibaba): For cost-sensitive agentic and coding tasks where M3's output verbosity at long context increases costs, Qwen3.7 Max at $2.50 per 1M input and $7.50 per 1M output provides comparable coding-era benchmark performance at a different pricing structure. Use it as a swap when M3's >512k context pricing tier applies
- GPT-4o mini (OpenAI): For high-frequency, lightweight tasks such as short-form generation, classification, or simple extraction where M3's 1M-token architecture is more than the task needs, GPT-4o mini offers consistent, low-latency output at a fraction of the per-token cost
- Gemini 3.1 Pro (Google): For multimodal tasks involving real-time audio or complex video understanding at scale, Gemini 3.1 Pro brings Google's production-grade multimodal infrastructure. MiniMax M3's video support is strong but newer and less independently verified at enterprise-scale multimodal throughput
- Llama 3.3 70B (Meta): For self-hosted, license-unrestricted inference on text-only workloads where privacy requirements prevent sending data to any hosted API, Llama 3.3 70B provides a mature, widely-deployed open-weight alternative that complements M3 for the subset of requests you must keep fully on-premise
What are the challenges of using MiniMax M3 in my product?
Like any production LLM, MiniMax M3 comes with tradeoffs worth planning for:
- Independent benchmark verification pending: Several of M3's headline benchmark scores, including the SWE-Bench Pro and Terminal-Bench results, were run on MiniMax's own infrastructure at launch. Independent third-party verification of these figures is still underway, which means production planning should treat some benchmark claims as preliminary until external evaluation confirms them
- Pricing step-up at long context: Input contexts exceeding 512k tokens trigger a pricing increase to $1.20 per 1M input and $4.80 per 1M output. For workflows that regularly use the full 1M-token window, this can more than double the effective input cost compared to the base tier. Budget planning must account for context length distribution in your traffic
- New provider maturity: MiniMax is a less established provider than Anthropic, OpenAI, or Google in terms of API uptime history, rate limit documentation, and enterprise SLA availability. Teams requiring guaranteed throughput commitments should verify current SLA terms before relying on MiniMax M3 as a sole production endpoint
- Provider dependency: Relying on MiniMax as a single provider creates fragility when the provider has an outage or deprecates a model version. As a newer entrant, MiniMax's deprecation cadence and migration support processes are less well-documented than those of more established providers
- Cost at scale: At $2.40 per 1M output tokens at standard pricing, M3 output costs compound quickly on high-volume workloads with long completions. Without output length controls or per-project budget limits, large agentic sessions with many tool calls can generate unexpectedly high spend
Why should I use Merge Gateway to route LLM requests with MiniMax M3 and every other model?
Using MiniMax M3 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:
- Intelligent routing and automatic failover: Merge routes around MiniMax outages automatically. Given MiniMax's newer provider status, having automatic failover to an equivalent model, such as Qwen3.7 Max or Claude Sonnet 4.5, without code changes is particularly valuable for production uptime
- One API, every provider: Access MiniMax M3 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, with no application code changes required
- Cost governance: Set hard or soft project budgets so MiniMax M3 spend stays within plan, particularly important given the pricing step-up for contexts over 512k tokens. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers
- Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision
- Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches MiniMax. Enforce per-project model and region policies without adding that logic to your application
How can I start using Merge Gateway to route requests with MiniMax M3?
Getting MiniMax M3 running through Merge Gateway takes a few minutes:
1. Create an account and get your API key from the dashboard.
2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.
3. Make your first request using the provider/model format. For MiniMax M3, the model string is minimax/minimax-m3. Swap the model string to route to any other provider without changing anything else.
4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming MiniMax M3 as primary with one fallback.
Full setup instructions and SDK references are in the Merge Gateway docs.
Try MiniMax M3 through Merge Gateway
Route, observe, and control AI requests across providers from one API.







