GPT-4.1:
Everything you need to know about the model

GPT-4.1 is an OpenAI model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 1,047,576 token context window. It supports streaming, structured outputs, tool calling, vision through at least one Gateway vendor route.

GPT-4.1 pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | OpenAI | $2.00 | $8.00 | Yes |

Test GPT-4.1 with Merge Gateway’s Simulator

GPT-4.1
Synced
Synced
Run simulation to see response

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Route requests to GPT-4.1 with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to GPT-4.1 and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.
To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.
Install the Merge Gateway SDK
Python
1$ pip install merge-gateway-sdk
Send a request
Python
1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6    model="openai/gpt-5.2",
7    input=[
8        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10    ],
11)
12
13print(response.output[0].content[0].text)
Try a diffrent model
Swap the model string to route to a different provider. No other code changes needed.
Anthropic
1response = client.responses.create(
2    model="anthropic/claude-sonnet-4-20250514",
3    input=[
4        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
Point to Gateway
Python
1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/openai",
6)
Send a request
Use the standard chat.completions.create method. No provider prefix needed on the model name.
Python
1response = client.chat.completions.create(
2    model="gpt-5.2",
3    messages=[
4        {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
8
9print(response.choices[0].message.content)
Install packages
1npm install merge-gateway-ai-sdk-provider ai
Create the provider
TypeScript
1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4  apiKey: "YOUR_API_KEY",
5});
Send a request
Use generateText to send a request. Model names use the provider/model format.
TypeScript
1import { generateText } from "ai";
2
3const { text } = await generateText({
4  model: gateway("openai/gpt-4o"),
5  prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);
If you already have @ai-sdk/openai installed, point it at Gateway with a base URL change:
TypeScript
1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4  apiKey: "YOUR_API_KEY",
5  baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged
Install the Merge Gateway SDK
Anthropic SDK
1from anthropic import Anthropic
2
3client = Anthropic(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9    model="claude-sonnet-4-20250514",
10    max_tokens=1024,
11    messages=[
12        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13    ],
14)
15
16print(message.content[0].text)

Explore other models available in Merge Gateway

model logo
Amazon Nova 2 Lite
model logo
Amazon.Nova 2 Sonic
model logo
Amazon Nova Premier
model logo
Amazon Nova Pro
model logo
Claude Opus 4.6
model logo
Claude Opus 4.7
model logo
Claude Opus 4.8
model logo
Claude Sonnet 4.5
model logo
Claude Sonnet 4.6
model logo
Codestral
model logo
Codestral 25.08
model logo
DeepSeek V3
model logo
DeepSeek V3.2
model logo
DeepSeek V4 Flash
model logo
DeepSeek V4 Pro
model logo
Devstral 2512
model logo
Dola Seed 2.0 Code (preview)
model logo
Dola Seed 2.0 Lite
model logo
Dola Seed 2.0 Mini
model logo
Dola Seed 2.0 Pro
model logo
Gemini 2.5 Flash
model logo
Gemini 2.5 Flash Lite
model logo
Gemini 2.5 Pro
model logo
Gemini 3.1 Flash Lite

GPT-4.1 FAQ

If you still have questions about GPT-4.1, we've answered a few more below. Keep in mind this was written in June, 2026 and may not reflect the latest changes.

Heading

What other models does OpenAI offer?

OpenAI offers models across multiple capability and price tiers, from lightweight cost-efficient options to full-scale reasoning systems. Here are some other models OpenAI supports:

  • GPT-4o mini is OpenAI's most affordable model, priced at $0.15 per 1M input tokens. It's designed for high-volume tasks like classification and summarization where cost per request is the primary constraint
  • GPT-4.1 mini is a cost-efficient counterpart to GPT-4.1 with the same 1M token context window but at roughly one-fifth the input cost. It's suited for budget-conscious workflows that still need above-average intelligence
  • GPT-4o is OpenAI's multimodal flagship from the 4o series, optimized for vision tasks and general instruction following at a mid-tier price point
  • o3 is OpenAI's large-scale reasoning model built for complex multi-step problems, including scientific analysis and advanced coding, at the same $2.00 per 1M input price point as GPT-4.1
  • o4-mini is a compact reasoning model with a fast output speed of 159.9 tokens per second, suited for high-volume reasoning tasks where cost and throughput both matter
  • GPT-5 is OpenAI's most capable model with extended thinking, intended for the most demanding tasks where intelligence quality outweighs cost

How does GPT-4.1 differ from OpenAI's other models?

GPT-4.1 is a non-reasoning model in the upper-mid tier of OpenAI's lineup, distinguished by its very large context window and strong throughput.

  • Context window: GPT-4.1 supports a 1M token context window, matching GPT-4.1 mini and enabling use cases like full-codebase analysis, long conversation memory, and multi-document summarization that smaller context models cannot handle
  • Pricing: At $2.00 per 1M input tokens and $8.00 per 1M output tokens, GPT-4.1 costs 5x more on input than GPT-4.1 mini ($0.40) but is priced identically to o3, which offers deeper reasoning at the same price point
  • Speed: GPT-4.1 outputs 119.1 tokens per second, ranking 14th out of 71 evaluated models and making it one of the faster non-reasoning models in its tier
  • Intelligence: GPT-4.1 ranks 26/71 on the Artificial Analysis Intelligence Index, assessed as above average among non-reasoning models, placing it above GPT-4o mini but below the reasoning-capable o-series models
  • Modalities: Supports text and image input with text output, the same multimodal profile as GPT-4o and GPT-4.1 mini

GPT-4.1 is the right fit for long-context, latency-tolerant agentic workflows where a 1M token window and above-average intelligence are required without the extended latency of a reasoning model.

What models should I consider using alongside GPT-4.1?

No single model is optimal for every task. Here are models worth pairing with GPT-4.1 depending on what your product needs:

  • o3: Route to o3 for requests requiring deep multi-step reasoning, such as complex debugging or scientific analysis, where GPT-4.1's non-reasoning architecture would produce shallower results despite similar pricing
  • GPT-4.1 mini: Use GPT-4.1 mini for the same long-context workloads when cost is the priority, at $0.40 per 1M input tokens, and escalate to GPT-4.1 only for tasks that require higher intelligence
  • Claude Sonnet 4 (Anthropic): Claude Sonnet 4 competes in a similar price and capability tier and serves as a reliable cross-provider fallback, reducing dependency on OpenAI's availability for critical production workloads
  • Gemini 1.5 Pro (Google): For tasks requiring very long context at lower cost, Gemini 1.5 Pro is worth evaluating as a parallel option alongside GPT-4.1 for document-heavy retrieval or summarization pipelines
  • Mistral Large (Mistral AI): Mistral Large provides strong general-purpose performance at a lower price point and is worth using as a cost-saving fallback for tasks where GPT-4.1's full capability isn't required

What are the challenges of using GPT-4.1 in my product?

Like any production LLM, GPT-4.1 comes with tradeoffs worth planning for:

  • Provider dependency: Concentrating production traffic on a single OpenAI model means a deprecation cycle or availability incident can disrupt your product without a pre-configured fallback
  • Cost at scale: At $8.00 per 1M output tokens, output costs accumulate quickly in verbose generation workloads, such as long-form drafting or detailed code explanations, without active budget controls
  • Non-reasoning ceiling: GPT-4.1 ranks above average among non-reasoning models but sits below the o-series on complex analytical tasks. Pipelines that mix simple and hard queries benefit from an escalation route to a reasoning model rather than sending everything through GPT-4.1
  • Latency for long outputs: At 119.1 tokens per second, GPT-4.1 is fast for its tier, but generating large outputs over a 1M token context still takes meaningful wall-clock time that must be accounted for in user-facing response budgets
  • Knowledge cutoff: With a knowledge cutoff of May 31, 2024, GPT-4.1 may produce stale results for queries about recent events, requiring retrieval augmentation for time-sensitive workloads

Why should I use Merge Gateway to route LLM requests with GPT-4.1 and every other model?

Using GPT-4.1 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

  • One API, every provider: Access GPT-4.1 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, no application code changes required
  • Intelligent routing and automatic failover: Merge routes around OpenAI outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40-60% without touching your application code
  • Cost governance: Set hard or soft project budgets so GPT-4.1 spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers
  • Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision
  • Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches OpenAI. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with GPT-4.1?

Getting GPT-4.1 running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For GPT-4.1, the model string is openai/gpt-4.1. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming GPT-4.1 as primary with one fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try GPT-4.1 through Merge Gateway

Route, observe, and control AI requests across providers from one API.