Claude Sonnet 4.6 API: pricing, performance, and how to route requests

Claude Sonnet 4.6:
Everything you need to know about the model

Claude Sonnet 4.6 is a Anthropic model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 1,000,000 token context window. It supports streaming, structured outputs, tool calling, vision through at least one Gateway vendor route.

Claude Sonnet 4.6 performance*

Intelligence - general reasoning and knowledge

44%

Coding - code generation and problem-solving

46%

*Performance data is provided by Artificial Analysis and is subject to change.

Claude Sonnet 4.6 pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | Amazon Bedrock | $3.00 | $15.00 | Yes | | Anthropic | $3.00 | $15.00 | No |

Test Claude Sonnet 4.6 with Merge Gateway’s Simulator

Claude Sonnet 4.6

Model

System prompt

Synced

User message

Synced

Response

Run simulation to see response

Cost

—

Tokens

—

Latency

—

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Start building for free

Get a demo

Route requests to Claude Sonnet 4.6 with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to Claude Sonnet 4.6 and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.

To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.

Install the Merge Gateway SDK

Python

1$ pip install merge-gateway-sdk

Send a request

Python

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6    model="openai/gpt-5.2",
7    input=[
8        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10    ],
11)
12
13print(response.output[0].content[0].text)

Try a diffrent model

Swap the model string to route to a different provider. No other code changes needed.

Anthropic

1response = client.responses.create(
2    model="anthropic/claude-sonnet-4-20250514",
3    input=[
4        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)

Point to Gateway

Python

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/openai",
6)

Send a request

Use the standard chat.completions.create method. No provider prefix needed on the model name.

Python

1response = client.chat.completions.create(
2    model="gpt-5.2",
3    messages=[
4        {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
8
9print(response.choices[0].message.content)

Install packages

1npm install merge-gateway-ai-sdk-provider ai

Create the provider

TypeScript

1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4  apiKey: "YOUR_API_KEY",
5});

Send a request

Use generateText to send a request. Model names use the provider/model format.

TypeScript

1import { generateText } from "ai";
2
3const { text } = await generateText({
4  model: gateway("openai/gpt-4o"),
5  prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);

If you already have @ai-sdk/openai installed, point it at Gateway with a base URL change:

TypeScript

1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4  apiKey: "YOUR_API_KEY",
5  baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged

Install the Merge Gateway SDK

Anthropic SDK

1from anthropic import Anthropic
2
3client = Anthropic(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9    model="claude-sonnet-4-20250514",
10    max_tokens=1024,
11    messages=[
12        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13    ],
14)
15
16print(message.content[0].text)

Explore other models available in Merge Gateway

Qwen3-VL Flash

Qwen3-VL Plus

Qwen Flash

Qwen Plus

Titan Embed Text V2

Titan Text Large

UI-TARS-1.5-7B

Claude Sonnet 4.6 FAQ

If you have any more questions about Claude Sonnet 4.6, we've addressed several more below. This information was written in June, 2026 and is subject to change.

Heading

What other models does Anthropic offer?

Anthropic structures its lineup across three tiers: a fast cost-efficient model, a general-purpose mid-tier, and a flagship reasoning model. Here are some other models Anthropic supports:

Claude Haiku 4.5: Claude Haiku 4.5 is Anthropic's fastest and most cost-efficient model, priced at $1.00 input and $5.00 output per million tokens. It is designed for high-volume, low-latency tasks where output quality is less critical than throughput and cost

Claude Sonnet 4.5: Claude Sonnet 4.5 is the predecessor to Claude Sonnet 4.6, sitting in the same mid-tier position. Teams currently on Sonnet 4.5 will find Sonnet 4.6 to be the natural upgrade with improved intelligence benchmarks at a similar price point

Claude Opus 4.6: Claude Opus 4.6 is Anthropic's higher-tier Opus model, positioned above Sonnet for tasks requiring deeper reasoning and more capable output. It carries a higher price per token in exchange for stronger benchmark performance

Claude Opus 4.8: Claude Opus 4.8 is Anthropic's current flagship model, priced at $5.00 input and $25.00 output per million tokens. It ranks at the top of Anthropic's lineup and is built for the most complex, high-stakes use cases where maximum intelligence is the priority

How does Claude Sonnet 4.6 differ from Anthropic's other models?

Claude Sonnet 4.6 occupies the mid-tier of Anthropic's lineup, sitting above Haiku on capability and below Opus on cost and raw intelligence score.

Pricing: Claude Sonnet 4.6 is priced at $3.00 per million input tokens and $15.00 per million output tokens. That is three times the input cost of Claude Haiku 4.5 but 40% lower input cost than Claude Opus 4.8 at $5.00 per million tokens

Intelligence ranking: Claude Sonnet 4.6 scores 44 on the Artificial Analysis Intelligence Index, placing it #3 out of 71 evaluated models in its class. This puts it well above Claude Haiku 4.5 and close to Opus-tier performance at a meaningfully lower price

Context window: Claude Sonnet 4.6 supports a 1,000,000 token context window, matching the context capacity of Opus 4.8 and making it suitable for very long document processing without chunking

Speed: Output speed is 46.8 tokens per second with a time to first token of 1.49 seconds. This is slower than Claude Haiku 4.5 but faster than most reasoning-tier models, making it viable for near-real-time use cases

Output verbosity: Claude Sonnet 4.6 generates approximately 14 million output tokens during evaluation versus an average of 7.9 million across comparable models. At $15.00 per million output tokens, this verbosity has meaningful cost implications at scale

Capabilities: Claude Sonnet 4.6 accepts text and image inputs and produces text output. It is a non-reasoning model, meaning it does not expose a chain-of-thought before answering

Claude Sonnet 4.6 is the right choice when you need near-Opus intelligence at a lower price point and your use cases involve multimodal input or long context without requiring extended reasoning traces.

What models should I consider using alongside Claude Sonnet 4.6?

No single model is optimal for every task. Here are models worth pairing with Claude Sonnet 4.6 depending on what your product needs:

Claude Haiku 4.5: Route high-volume, low-complexity requests, such as classification, extraction, or short-form summarization, to Haiku 4.5 at $1.00 per million input tokens. Reserving Sonnet 4.6 for tasks that genuinely need its intelligence tier can cut costs significantly

Claude Opus 4.8: When a task requires the deepest available reasoning from Anthropic, such as complex multi-step analysis or difficult code generation, route it to Opus 4.8. Use Sonnet 4.6 as the default and escalate to Opus only when output quality falls short

Gemini 3 Flash: For latency-sensitive workflows where near-instant responses matter more than intelligence depth, Gemini 3 Flash offers very high output speed at a lower cost tier. Route streaming interfaces or live completions there while Sonnet 4.6 handles heavier workloads

GPT-5 Mini: For budget-conscious, high-throughput workloads that need OpenAI's architecture, GPT-5 Mini provides a cost-efficient alternative at roughly $0.25 per million input tokens. Useful as a fallback when Sonnet 4.6 usage pushes against budget limits

Llama 4 Scout: For workloads that require very long context retrieval at low cost, Llama 4 Scout's 10 million token context window and open-weight pricing make it a strong complement for document retrieval pipelines where Sonnet 4.6 handles the generation step

What are the challenges of using Claude Sonnet 4.6 in my product?

Like any production LLM, Claude Sonnet 4.6 comes with tradeoffs worth planning for:

Verbose output inflates costs: Claude Sonnet 4.6 generates roughly 1.8x the output tokens of the average model in its class. At $15.00 per million output tokens, this verbosity compounds quickly in production workloads where response length directly drives spend

Output pricing at scale: The $15.00 per million output token rate is high relative to cost-efficient alternatives. Workloads that generate long responses at volume, such as document drafting or detailed code explanations, need careful budget controls to avoid runaway costs

No reasoning traces: Claude Sonnet 4.6 is a non-reasoning model. For tasks that benefit from visible chain-of-thought, such as complex math, multi-step logic, or audit-friendly outputs, you will need to route to a reasoning-capable model or prompt for explicit step-by-step output

Provider dependency: Running exclusively on Anthropic creates fragility when the provider experiences an outage, rate limits your traffic, or deprecates a model version. Anthropic has moved through model versions quickly in the Claude 4 series

Cost at scale: At $3.00 input and $15.00 output per million tokens, Sonnet 4.6 is not a budget model. Without project-level spend caps and active routing controls, costs can scale unexpectedly as request volume grows

Why should I use Merge Gateway to route LLM requests with Claude Sonnet 4.6 and every other model?

Using Claude Sonnet 4.6 through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

One API, every provider: Access Claude Sonnet 4.6 and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, with no application code changes required

Intelligent routing and automatic failover: Merge routes around Anthropic outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40–60% without touching your application code. Given Sonnet 4.6's output pricing, routing simpler tasks to Haiku or another lower-cost model has a direct impact on spend

Cost governance: Set hard or soft project budgets so Claude Sonnet 4.6 spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers

Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision

Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches Anthropic. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with Claude Sonnet 4.6?

Getting Claude Sonnet 4.6 running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For Claude Sonnet 4.6, the model string is anthropic/claude-sonnet-4-6 (confirm the exact dated slug with Anthropic's API docs). Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. A practical starting point: name Claude Sonnet 4.6 as primary, with Claude Haiku 4.5 as a fallback for simpler tasks and Claude Opus 4.8 for ceiling-level requests.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try Claude Sonnet 4.6 through Merge Gateway

Route, observe, and control AI requests across providers from one API.

Start building for free

Get a demo