GPT-5 API: pricing, performance, and how to route requests

GPT-5:
Everything you need to know about the model

GPT-5 is a OpenAI model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 272,000 token context window. It supports streaming, structured outputs, tool calling, vision through at least one Gateway vendor route.

GPT-5 performance*

Intelligence - general reasoning and knowledge

45%

Coding - code generation and problem-solving

36%

*Performance data is provided by Artificial Analysis and is subject to change.

GPT-5 pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | OpenAI | $1.25 | $10.00 | Yes |

Test GPT-5 with Merge Gateway’s Simulator

GPT-5

Model

System prompt

Synced

User message

Synced

Response

Run simulation to see response

Cost

—

Tokens

—

Latency

—

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Start building for free

Get a demo

Route requests to GPT-5 with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to GPT-5 and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.

To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.

Install the Merge Gateway SDK

Python

1$ pip install merge-gateway-sdk

Send a request

Python

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6    model="openai/gpt-5.2",
7    input=[
8        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10    ],
11)
12
13print(response.output[0].content[0].text)

Try a diffrent model

Swap the model string to route to a different provider. No other code changes needed.

Anthropic

1response = client.responses.create(
2    model="anthropic/claude-sonnet-4-20250514",
3    input=[
4        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)

Point to Gateway

Python

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/openai",
6)

Send a request

Use the standard chat.completions.create method. No provider prefix needed on the model name.

Python

1response = client.chat.completions.create(
2    model="gpt-5.2",
3    messages=[
4        {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
8
9print(response.choices[0].message.content)

Install packages

1npm install merge-gateway-ai-sdk-provider ai

Create the provider

TypeScript

1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4  apiKey: "YOUR_API_KEY",
5});

Send a request

Use generateText to send a request. Model names use the provider/model format.

TypeScript

1import { generateText } from "ai";
2
3const { text } = await generateText({
4  model: gateway("openai/gpt-4o"),
5  prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);

If you already have @ai-sdk/openai installed, point it at Gateway with a base URL change:

TypeScript

1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4  apiKey: "YOUR_API_KEY",
5  baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged

Install the Merge Gateway SDK

Anthropic SDK

1from anthropic import Anthropic
2
3client = Anthropic(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9    model="claude-sonnet-4-20250514",
10    max_tokens=1024,
11    messages=[
12        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13    ],
14)
15
16print(message.content[0].text)

Explore other models available in Merge Gateway

Amazon Nova 2 Lite

Amazon.Nova 2 Sonic

Amazon Nova Premier

Amazon Nova Pro

Claude Opus 4.6

Claude Opus 4.7

Claude Opus 4.8

Claude Sonnet 4.5

Claude Sonnet 4.6

Codestral

Codestral 25.08

DeepSeek V3

DeepSeek V3.2

DeepSeek V4 Flash

DeepSeek V4 Pro

Devstral 2512

Dola Seed 2.0 Code (preview)

Dola Seed 2.0 Lite

Dola Seed 2.0 Mini

Dola Seed 2.0 Pro

Gemini 2.5 Flash

Gemini 2.5 Flash Lite

Gemini 2.5 Pro

Gemini 3.1 Flash Lite

GPT-5 FAQ

If you have additional questions on GPT-5, we've addressed several more below. It's worth noting that this information was written in June, 2026 and is subject to change.

Heading

What other models does OpenAI offer?

OpenAI's model lineup covers a wide range of price and capability tiers, from budget inference to advanced reasoning. Here are some other models OpenAI supports:

GPT-4o mini: GPT-4o mini is OpenAI's most affordable model at $0.15 per 1M input tokens, designed for cost-sensitive, high-volume tasks like classification and extraction where advanced reasoning is not required

GPT-4.1 mini: GPT-4.1 mini is a cost-efficient model with a 1M token context window and above-average intelligence among non-reasoning models, suited for long-context tasks at a low price point

GPT-4.1: GPT-4.1 is a non-reasoning model with a 1M token context window, positioned as a strong general-purpose option for agentic and long-document workflows at mid-tier pricing

o3: o3 is OpenAI's large-scale reasoning model, built for complex multi-step analytical tasks, and shares the same $2.00 per 1M input price as GPT-4.1 while offering deeper reasoning capability

o4-mini: o4-mini is a compact reasoning model delivering 159.9 tokens per second, optimized for high-throughput reasoning workloads at a lower price than full reasoning models

GPT-4o: GPT-4o is OpenAI's multimodal flagship from the 4o series, suited for vision tasks and general instruction following at a mid-tier price point without requiring the full GPT-5 cost tier

How does GPT-5 differ from OpenAI's other models?

GPT-5 is OpenAI's most capable model, positioned at the top of the lineup with extended thinking capabilities and the highest output cost.

Intelligence: GPT-5 scores 45 on the Artificial Analysis Intelligence Index, ranking 40/150, making it OpenAI's highest-scoring model on this benchmark and placing it well above GPT-4.1 (26/71, as of 06/01/2026) and the o-series models

Context window: GPT-5 supports a 400k token context window, which is smaller than GPT-4.1 and GPT-4.1 mini's 1M token window but larger than o3's 200k limit, suited for most long-document tasks

Pricing: Input costs $1.25 per 1M tokens and output costs $10.00 per 1M tokens, making output roughly 1.25x more expensive than o3 and GPT-4.1 at the output tier

Latency: Time to first token averages 76.88 seconds, reflecting extended reasoning computation. This is the highest latency of any OpenAI model and makes GPT-5 unsuitable for real-time or interactive applications

Reasoning capability: GPT-5 includes extended thinking, placing it in the same architectural tier as o3 rather than as a pure non-reasoning model like GPT-4.1

GPT-5 is the right choice for tasks where accuracy and intelligence quality are the primary constraints, such as high-stakes document analysis, research synthesis, or complex agentic workflows where slower responses are acceptable.

What models should I consider using alongside GPT-5?

No single model is optimal for every task. Here are models worth pairing with GPT-5 depending on what your product needs:

GPT-4.1 mini: Use GPT-4.1 mini for the high-volume, lower-complexity requests in the same pipeline, keeping GPT-5 reserved for tasks that genuinely require its intelligence tier and avoiding unnecessary cost on simpler queries

o4-mini: Route to o4-mini for structured reasoning tasks that need faster responses, since o4-mini outputs at 159.9 tokens per second versus GPT-5's 90.0 tokens per second with significantly lower latency

Claude Opus 4 (Anthropic): Claude Opus 4 competes at the top of Anthropic's lineup and provides a cross-provider alternative for flagship-tier tasks, reducing reliance on a single provider for your most critical requests

Gemini 2.5 Pro (Google): For long-document tasks where GPT-5's 400k context is a ceiling, Gemini 2.5 Pro's extended context support provides an alternative for workloads that push beyond GPT-5's window

o3 (OpenAI): For purely reasoning-heavy tasks without the need for GPT-5's full intelligence level, o3 can handle complex multi-step problems at the same input price point ($2.00 per 1M tokens, as of 06/01/2026) while reducing output costs

What are the challenges of using GPT-5 in my product?

Like any production LLM, GPT-5 comes with tradeoffs worth planning for:

Provider dependency: Routing your most demanding workloads exclusively to OpenAI creates a fragile dependency. A GPT-5 availability incident or deprecation event directly impacts your highest-stakes pipelines without a fallback

Cost at scale: At $10.00 per 1M output tokens, verbose generation tasks accumulate costs quickly. A pipeline generating 10M output tokens per month runs $100k in output costs alone without active budget controls

High latency for interactive use: A 76.88 second time to first token makes GPT-5 incompatible with real-time chat, streaming interfaces, or any latency-sensitive user-facing feature. It is designed for batch and background workloads

Smaller context window than peers: At 400k tokens, GPT-5's context window is smaller than GPT-4.1's 1M token limit. Applications that need to process very long documents in a single call may need to combine GPT-5 with a larger-context fallback

Verbosity: GPT-5 generated 76M output tokens during the Intelligence Index evaluation, making it among the most verbose models evaluated. Verbose outputs drive up both cost and latency in production without additional output-length controls

Why should I use Merge Gateway to route LLM requests with GPT-5 and every other model?

Using GPT-5 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

One API, every provider: Access GPT-5 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, no application code changes required

Intelligent routing and automatic failover: Merge routes around OpenAI outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40-60% without touching your application code

Cost governance: Set hard or soft project budgets so GPT-5 spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers

Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision

Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches OpenAI. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with GPT-5?

Getting GPT-5 running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For GPT-5, the model string is openai/gpt-5. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming GPT-5 as primary with one fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try GPT-5 through Merge Gateway

Route, observe, and control AI requests across providers from one API.

Start building for free

Get a demo