Merge Landing Page

MiniMax M2.5:
Everything you need to know about the model

MiniMax M2.5 is a MiniMax model available through Merge Gateway via Parasail. Use it with Gateway routing policies, spend controls, request logs, and a 196,608 token context window. It supports streaming through at least one Gateway vendor route.

MiniMax M2.5 pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | Parasail | $0.3000 | $1.20 | Yes |

Test MiniMax M2.5 with Merge Gateway’s Simulator

MiniMax M2.5

Model

System prompt

Synced

User message

Synced

Response

Run simulation to see response

Cost

—

Tokens

—

Latency

—

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Start building for free

Get a demo

Route requests to MiniMax M2.5 with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to MiniMax M2.5 and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.

To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.

Install the Merge Gateway SDK

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Make your first API call

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Try a diffrent model

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Install the Merge Gateway SDK

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Make your first API call

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Try a diffrent model

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Install the Merge Gateway SDK

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Make your first API call

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Try a diffrent model

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Install the Merge Gateway SDK

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Make your first API call

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Try a diffrent model

Python

1{
2  "mcpServers": {
3    "agent-handler": {
4      "url": "https://ah-api-develop.merge.dev/api/v1/tool-packs/{TOOL_PACK_ID}/registered-users/{REGISTERED_USER_ID}/mcp",
5      "headers": {
6        "Authorization": "Bearer yMt*****"
7      }
8    }
9  }
10}
11

Explore other models available in Merge Gateway

Kimi K2 Thinking

Kimi K2 Thinking Turbo

Kimi K2 Turbo Preview

Llama 3.1 70B

Llama 3.1 8B

Llama 3.2 11B

Llama 3.2 1B

Llama 3.2 90B

Llama 33 70B Fp8

Llama 3 8B

Llama 4 Maverick 17B

Llama 4 Maverick Instruct Fp8

Llama 4 Scout 17B

Magistral Medium Latest

Magistral Small Latest

Meta.Llama3 70B Instruct V1:0

meta-llama/Llama-3.3-70B-Instruct

meta-llama/Llama-3.3-70B-Instruct-Turbo

meta-llama/Llama-4-Maverick-17B-128E-Instruct

MiniMax M2

MiniMax M2.1

MiniMax M2.5 Highspeed

MiniMax M2.7

MiniMax M2.7 Highspeed

MiniMax M2.5 FAQ

If you have additional questions about MiniMax M2.5, we've addressed several more below. Keep in mind that this information was written in June, 2026 and may change over time.

Heading

What other models does MiniMax offer?

MiniMax has developed a compact family of large-scale mixture-of-experts models, prioritizing high throughput and cost-efficient reasoning across its M2 series. Here are some other models MiniMax supports:

MiniMax M2.7: MiniMax M2.7 is MiniMax's more capable reasoning model, released approximately five weeks after M2.5 in March 2026 with the same 230-billion-parameter MoE architecture, a 205k-token context window, and an improved Intelligence Index score of 50 out of 88 comparable models, ranking seventh overall and making it the higher-capability option in MiniMax's current lineup

MiniMax Text-01: MiniMax Text-01 is MiniMax's earlier general-purpose model, a large-scale text model positioned before the M2 reasoning series, suited for applications that do not require chain-of-thought processing and prefer a more established model checkpoint

How does MiniMax M2.5 differ from MiniMax's other models?

MiniMax M2.5 is MiniMax's first publicly available reasoning model in the M2 series, optimized for speed and cost efficiency rather than peak intelligence among the M2 family.

Intelligence ranking: MiniMax M2.5 scores 42 on the Artificial Analysis Intelligence Index, ranking #17 of 88 comparable models. MiniMax M2.7 scores 50 and ranks #7, representing a meaningful quality gap for complex reasoning tasks

Speed: MiniMax M2.5 generates 191.0 tokens per second, ranking #5 of 88 comparable models and making it one of the fastest available reasoning models. MiniMax M2.7 outputs 60.7 tokens per second, placing M2.5 at more than three times M2.7's throughput

Pricing: MiniMax M2.5 is priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens, with a blended rate of $0.29 per 1M tokens. MiniMax M2.7 carries the same input and output pricing but a lower blended rate of $0.22 per 1M tokens at the 7:2:1 ratio

Context window: Both M2.5 and M2.7 support a 205k-token context window, offering comparable long-context capacity

Modality: MiniMax M2.5 accepts text input only and outputs text only. It does not support image or multimodal inputs

License: MiniMax M2.5 is released under the MIT license, making it fully permissive for commercial use without a separate agreement. MiniMax M2.7 requires a commercial licensing agreement for commercial deployments

MiniMax M2.5 is the preferred choice when throughput and cost efficiency are top priorities and peak reasoning quality is secondary. M2.7 is the upgrade path when reasoning accuracy matters more than speed.

What models should I consider using alongside MiniMax M2.5?

No single model is optimal for every task. Here are models worth pairing with MiniMax M2.5 depending on what your product needs:

MiniMax M2.7 (MiniMax): For requests that require deeper reasoning quality, route to MiniMax M2.7. It scores 8 points higher on the Intelligence Index and is the natural escalation path within the MiniMax family when M2.5's output quality is insufficient

Gemini 2.5 Flash (Google): For high-volume, latency-sensitive workloads involving both text and image inputs, Gemini 2.5 Flash provides multimodal support that MiniMax M2.5 lacks, along with competitive throughput for mixed-modality pipelines

DeepSeek R1 (DeepSeek): For open-weight reasoning tasks where self-hosting or provider redundancy is a goal, DeepSeek R1 competes directly with MiniMax M2.5 on reasoning benchmarks and provides additional deployment flexibility for teams that want to control their own inference infrastructure

Claude Sonnet 4 (Anthropic): For instruction-following-intensive tasks, structured data extraction, or use cases where consistent formatting under varied prompts is critical, Claude Sonnet 4 offers strong performance and broad benchmark transparency that complements MiniMax M2.5's speed advantage

GPT-4o mini (OpenAI): For lightweight classification, summarization, or short-form generation at the lowest cost tier, GPT-4o mini is a well-supported fallback option that handles simple tasks without drawing on MiniMax M2.5's reasoning capacity

What are the challenges of using MiniMax M2.5 in my product?

Like any production LLM, MiniMax M2.5 comes with tradeoffs worth planning for:

Text-only modality: MiniMax M2.5 does not support image or video inputs. Any workflow that includes visual content will require routing those requests to a separate multimodal model, which adds integration and routing complexity

Provider dependency: Relying on MiniMax as a single provider creates fragility when the provider has an outage or deprecates a model version. MiniMax's inference infrastructure is less broadly distributed than providers like OpenAI or Anthropic, which can amplify the impact of any availability disruption

Cost at scale: At $1.20 per 1M output tokens, costs compound at high volumes. A high-throughput application generating 50 million output tokens per month exceeds $60,000 per month without optimization strategies like prompt caching or selective routing to cheaper models for simpler tasks

Limited benchmark transparency: MiniMax M2.5 does not have widely published MMLU, HumanEval, GSM8K, or Arena Elo scores, which makes pre-deployment capability evaluation harder for teams accustomed to comparing models on those standard benchmarks

High latency to first token: MiniMax M2.5 has a time-to-first-token of 3.03 seconds, which is relatively high. For streaming applications where users see a visible delay before any output appears, this can negatively affect perceived responsiveness

Why should I use Merge Gateway to route LLM requests with MiniMax M2.5 and every other model?

Using MiniMax M2.5 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

One API, every provider: Access MiniMax M2.5 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, with no application code changes required

Intelligent routing and automatic failover: Merge routes around MiniMax outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40 to 60% without touching your application code

Cost governance: Set hard or soft project budgets so MiniMax M2.5 spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers

Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision

Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches MiniMax. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with MiniMax M2.5?

Getting MiniMax M2.5 running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For MiniMax M2.5, the model string is minimax/minimax-m2.5. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming MiniMax M2.5 as primary with MiniMax M2.7 as a quality escalation path and a lower-cost model as a cost fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try MiniMax M2.5 through Merge Gateway

Route, observe, and control AI requests across providers from one API.

Start building for free

Get a demo