MiniMax M2:
Everything you need to know about the model

MiniMax M2 is a MiniMax model available through Merge Gateway via MiniMax. Use it with Gateway routing policies, spend controls, request logs, and a 204,800 token context window. It supports streaming, tool calling through at least one Gateway vendor route.

MiniMax M2 performance*

Intelligence - general reasoning and knowledge
36%
Coding - code generation and problem-solving
29%
*Performance data is provided by Artificial Analysis and is subject to change.

MiniMax M2 pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | Amazon Bedrock | $0.3000 | $1.20 | Yes | | MiniMax | $0.3000 | $1.20 | No |

Test MiniMax M2 with Merge Gateway’s Simulator

MiniMax M2
Synced
Synced
Run simulation to see response

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Route requests to MiniMax M2 with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to MiniMax M2 and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.
To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.
Install the Merge Gateway SDK
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Make your first API call
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Try a diffrent model
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Install the Merge Gateway SDK
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Make your first API call
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Try a diffrent model
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Install the Merge Gateway SDK
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Make your first API call
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Try a diffrent model
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Install the Merge Gateway SDK
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Make your first API call
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11
Try a diffrent model
Python
1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Explore other models available in Merge Gateway

model logo
Kimi K2 Thinking
model logo
Kimi K2 Thinking Turbo
model logo
Kimi K2 Turbo Preview
model logo
Llama 3.1 70B
model logo
Llama 3.1 8B
model logo
Llama 3.2 11B
model logo
Llama 3.2 1B
model logo
Llama 3.2 90B
model logo
Llama 33 70B Fp8
model logo
Llama 3 8B
model logo
Llama 4 Maverick 17B
model logo
Llama 4 Maverick Instruct Fp8
model logo
Llama 4 Scout 17B
model logo
Magistral Medium Latest
model logo
Magistral Small Latest
model logo
Meta.Llama3 70B Instruct V1:0
model logo
meta-llama/Llama-3.3-70B-Instruct
model logo
meta-llama/Llama-3.3-70B-Instruct-Turbo
model logo
meta-llama/Llama-4-Maverick-17B-128E-Instruct
model logo
MiniMax M2.1
model logo
MiniMax M2.5
model logo
MiniMax M2.5 Highspeed
model logo
MiniMax M2.7
model logo
MiniMax M2.7 Highspeed

MiniMax M2 FAQ

If you have additional questions about MiniMax M2, we've addressed several more below. Keep in mind that this information was written in June, 2026 and may change over time.

Heading

What other models does MiniMax offer?

MiniMax's M-series is a line of open-weight, reasoning-capable mixture-of-experts models designed to combine large-scale architecture with low active-parameter inference costs. Here are some other models MiniMax supports:

  • MiniMax-M2.1: An updated October 2025 checkpoint of the M2 architecture sharing the same 205k token context window, 230 billion total parameters, and 10 billion active parameters. It improves on M2's Intelligence Index score of 36 to 39 out of 100 on Artificial Analysis, ranking #24 of 88 comparable models, and generates output at 194.8 tokens per second, placing it in the top 5 for speed across all tracked models
  • MiniMax-M2.5: A February 2026 release further advancing the M-series reasoning capability to an Intelligence Index score of 42 out of 100 on Artificial Analysis, ranked #17 of 88 comparable reasoning models. It maintains the 205k token context window and generates 187.7 tokens per second at a blended rate of $0.29 per 1M tokens
  • MiniMax-M2.7: The current top-tier model from MiniMax, recommended by Artificial Analysis as the successor to M2.5. It represents MiniMax's most capable reasoning checkpoint and targets workloads requiring the highest accuracy available in the M-series lineup

How does MiniMax M2 differ from MiniMax's other models?

MiniMax M2 is the original production release of MiniMax's reasoning-capable MoE architecture, serving as the entry point to the M-series before subsequent updates improved intelligence scores and generation speed.

  • Intelligence ranking: MiniMax M2 achieves an Intelligence Index score of 36 out of 100 on Artificial Analysis, ranked #32 of 88 comparable reasoning models. MiniMax-M2.1 improves this to 39 at rank #24, MiniMax-M2.5 reaches 42 at rank #17, and MiniMax-M2.7 advances further still. M2 is described as "above average among comparable open weight models" on the platform
  • Speed: MiniMax M2 generates 96.6 output tokens per second, ranking #16 of 88 models on Artificial Analysis. MiniMax-M2.1 substantially outpaces it at 194.8 tokens per second (rank #4) and M2.5 at 187.7 tokens per second (rank #5). M2 is fast for its intelligence tier but lags significantly behind its successors in raw throughput
  • Pricing: MiniMax M2 is priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens, with a blended rate of $0.39 per 1M tokens. MiniMax-M2.1 is priced identically. MiniMax-M2.5 offers a lower blended rate of $0.29 per 1M tokens with a 48% cache discount, making the newer model both more capable and cheaper to run at scale
  • Reasoning capability: All M-series models including MiniMax M2 are reasoning models with extended thinking support. MiniMax M2 delivers this at the lowest intelligence benchmark score in the current M-series lineup, making it the most appropriate entry point for workloads where chain-of-thought output is needed but top-tier accuracy is not
  • Context window: MiniMax M2 supports a 205k token context window, consistent across all current M-series models. This large context capacity is a shared strength of the lineup and suitable for long-document, multi-turn, and retrieval-augmented workloads

MiniMax M2 is best suited for teams that want an open-weight, reasoning-capable model with a large context window at a competitive price point, and where M2's intelligence benchmarks are sufficient for the target task without needing the accuracy or throughput gains of M2.1 or M2.5.

What models should I consider using alongside MiniMax M2?

No single model is optimal for every task. Here are models worth pairing with MiniMax M2 depending on what your product needs:

  • MiniMax-M2.1 when higher reasoning accuracy or significantly faster output speed is required within the MiniMax family. At the same price point but with an Intelligence Index score of 39 versus M2's 36, and nearly double the tokens-per-second throughput, M2.1 is a natural upgrade for requests where M2's benchmarks or speed fall short
  • Claude Sonnet 4.5 for instruction-following-heavy tasks, structured output generation, and multi-turn conversations where consistent format adherence under varied prompts is critical. It complements MiniMax M2 in multi-provider pipelines where Anthropic's reliability track record adds confidence
  • Gemini 2.0 Flash for high-volume, low-complexity requests where inference speed and cost per token are the binding constraints. Its rapid tokens-per-second output pairs well with MiniMax M2's reasoning depth: route simple tasks to Gemini 2.0 Flash and reasoning-required requests to MiniMax M2
  • GPT-4o when multimodal input is required. MiniMax M2 is text-only, so workloads involving image understanding, screenshot analysis, or document parsing must route to a vision-capable model like GPT-4o before or alongside MiniMax M2 for any downstream reasoning step
  • Llama 3.3 70B for teams that want a self-hostable open-weight alternative alongside MiniMax M2's hosted API. Llama 3.3 70B handles general-purpose text tasks with greater deployment flexibility, useful as a failover or on-premises complement in privacy-sensitive environments

What are the challenges of using MiniMax M2 in my product?

Like any production LLM, MiniMax M2 comes with tradeoffs worth planning for:

  • Rapid model iteration: MiniMax has already released M2.1, M2.5, and M2.7 after the original M2. Artificial Analysis actively recommends considering the newer models instead of M2. Teams that build on M2 should plan for migration work as MiniMax continues to release updates and may eventually deprecate earlier checkpoints
  • Speed ceiling relative to successors: At 96.6 tokens per second, MiniMax M2 is fast relative to many models but generates output at roughly half the rate of MiniMax-M2.1 (194.8 tokens per second). For high-throughput streaming applications, this gap becomes significant at scale
  • Provider dependency: Relying on MiniMax as a single provider creates fragility when the provider has an outage or deprecates a model version. MiniMax is a smaller provider relative to OpenAI or Google, and its uptime history and SLA documentation are less publicly established
  • Cost at scale: At $1.20 per 1M output tokens, output costs compound at high request volumes. While M2's pricing is competitive at current rates, the high verbosity of the model class means that long or reasoning-heavy responses amplify per-request costs faster than expected
  • No multimodal input: MiniMax M2 is text-only and cannot process images, documents, or other media types. Any pipeline that requires visual or document understanding must introduce a separate multimodal model, adding latency and routing complexity.

Why should I use Merge Gateway to route LLM requests with MiniMax M2 and every other model?

Using MiniMax M2 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

  • One API, every provider: Access MiniMax M2 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string—no application code changes required
  • Intelligent routing and automatic failover: Merge routes around MiniMax outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40–60% without touching your application code, which is particularly valuable given MiniMax's smaller footprint compared to tier-one providers
  • Cost governance: Set hard or soft project budgets so MiniMax M2 spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers
  • Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision
  • Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches MiniMax. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with MiniMax M2?

Getting MiniMax M2 running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For MiniMax M2, the model string is minimax/minimax-m2. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming MiniMax M2 as primary with one fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try MiniMax M2 through Merge Gateway

Route, observe, and control AI requests across providers from one API.