Gemini 3 Flash Preview API: pricing, performance, and how to route requests

Gemini 3 Flash Preview:
Everything you need to know about the model

Gemini 3 Flash Preview is a Google model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 1,048,576 token context window. It supports streaming, structured outputs, tool calling, vision through at least one Gateway vendor route.

Gemini 3 Flash Preview pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | Google | $0.5000 | $3.00 | No |

Test Gemini 3 Flash Preview with Merge Gateway’s Simulator

Gemini 3 Flash Preview

Model

System prompt

Synced

User message

Synced

Response

Run simulation to see response

Cost

—

Tokens

—

Latency

—

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Start building for free

Get a demo

Route requests to Gemini 3 Flash Preview with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to Gemini 3 Flash Preview and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.

To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.

Install the Merge Gateway SDK

Python

1$ pip install merge-gateway-sdk

Send a request

Python

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6    model="openai/gpt-5.2",
7    input=[
8        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10    ],
11)
12
13print(response.output[0].content[0].text)

Try a diffrent model

Swap the model string to route to a different provider. No other code changes needed.

Anthropic

1response = client.responses.create(
2    model="anthropic/claude-sonnet-4-20250514",
3    input=[
4        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)

Point to Gateway

Python

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/openai",
6)

Send a request

Use the standard chat.completions.create method. No provider prefix needed on the model name.

Python

1response = client.chat.completions.create(
2    model="gpt-5.2",
3    messages=[
4        {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
8
9print(response.choices[0].message.content)

Install packages

1npm install merge-gateway-ai-sdk-provider ai

Create the provider

TypeScript

1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4  apiKey: "YOUR_API_KEY",
5});

Send a request

Use generateText to send a request. Model names use the provider/model format.

TypeScript

1import { generateText } from "ai";
2
3const { text } = await generateText({
4  model: gateway("openai/gpt-4o"),
5  prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);

If you already have @ai-sdk/openai installed, point it at Gateway with a base URL change:

TypeScript

1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4  apiKey: "YOUR_API_KEY",
5  baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged

Install the Merge Gateway SDK

Anthropic SDK

1from anthropic import Anthropic
2
3client = Anthropic(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9    model="claude-sonnet-4-20250514",
10    max_tokens=1024,
11    messages=[
12        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13    ],
14)
15
16print(message.content[0].text)

Explore other models available in Merge Gateway

Gemini 3.5 Flash

Gemini 3 Flash

Gemini 3 Pro

Gemini 3 Pro Preview

Gemma 3 12B

Gemma 3 27B

Gemma 3 4B

GLM-4.5

GLM-4.5-Air

GLM-4.5-AirX

GLM-4.6

GLM 4.7

GLM-4.7

GLM 4.7 Flash

GLM-4.7 FlashX

GLM-5

GLM-5-Turbo

GPT-3.5 Turbo

GPT-3.5 Turbo 16K

GPT-4

GPT-4.1

GPT-4.1 Mini

GPT-4.1 Nano

Gemini 3 Flash Preview FAQ

In case you have any other questions on Gemini 3 Flash Preview, we've answered a few more below. It's worth noting that the information below was written in June, 2026 and is subject to change.

Heading

What other models does Google offer?

Google's Gemini lineup spans from lightweight flash-tier models optimized for throughput to flagship reasoning models built for complex analytical tasks. Here are some other models Google supports:

Gemini 3 Flash: Gemini 3 Flash is the stable release counterpart to this preview, offering the same cost-efficient Gemini 3 Flash tier with the reliability guarantees and API stability that production deployments require, making it the natural graduation path once this preview reaches general availability

Gemini 3 Pro: Gemini 3 Pro is Google's flagship model in the Gemini 3 generation, priced at $2.00 per million input tokens and $12.00 per million output tokens, suited for the most demanding analytical, agentic, and multi-step reasoning tasks where accuracy outweighs cost

Gemini 2.5 Pro: Gemini 2.5 Pro is Google's proven flagship from the prior generation, supporting a 1M-token context window with strong reasoning performance and offering a stable production alternative for teams that aren't yet ready to migrate to Gemini 3 models

Gemini 2.5 Flash: Gemini 2.5 Flash is a stable mid-tier model producing 203.2 tokens per second at $0.30 per million input tokens and $2.50 per million output tokens, well-suited for general-purpose instruction-following tasks at predictable cost

Gemini 2.5 Flash Lite: Gemini 2.5 Flash Lite is the fastest and least expensive option in Google's current lineup at $0.10 per million input tokens and $0.40 per million output tokens, delivering the highest output speed across evaluated models and designed for high-volume workloads where throughput dominates

How does Gemini 3 Flash Preview differ from Google's other models?

Gemini 3 Flash Preview is an early-access release of the Gemini 3 Flash model, offering the cost-efficient tier of the next-generation Gemini family before it reaches stable availability.

Preview status: Gemini 3 Flash Preview is a pre-release version, which means rate limits, API behavior, and availability guarantees differ from stable models. The stable Gemini 3 Flash has the same pricing structure but without the uncertainty that comes with preview-stage access

Pricing: Input is priced at $0.50 per million tokens and output at $3.00 per million tokens. That's one-quarter the input cost of Gemini 3 Pro ($2.00 per million) and roughly one-sixth the output cost ($12.00 per million), placing it firmly in the cost-efficient tier

Speed: Output speed is 169.3 tokens per second (as of 06/04/2026), which is faster than many mid-tier models but slower than Gemini 2.5 Flash (203.2 t/s) and Gemini 2.5 Flash Lite (259.8 t/s). Time to first token is 39.41 seconds (as of 06/04/2026), which is notably high and affects latency-sensitive applications

Intelligence Index: Ranked #20 out of 71 non-reasoning models on the Artificial Analysis Intelligence Index (as of 06/04/2026), placing it above the median for its tier and reflecting the intelligence gains that the Gemini 3 generation brings at this price point

Context window: Supports 1M tokens of input context, matching Gemini 3 Pro and Gemini 2.5 Pro, and far exceeding Gemini 2.5 Flash Lite for long-document workflows

Gemini 3 Flash Preview is worth evaluating for teams that want early access to Gemini 3 generation intelligence at flash pricing, but it isn't a production-ready substitute for stable models until it reaches general availability.

What models should I consider using alongside Gemini 3 Flash Preview?

No single model is optimal for every task. Here are models worth pairing with Gemini 3 Flash Preview depending on what your product needs:

Gemini 3 Flash (Google): For any traffic that requires stable API guarantees or predictable rate limits, routing to the stable Gemini 3 Flash as a fallback ensures continuity if the preview endpoint hits restrictions or is temporarily unavailable

Gemini 3 Pro (Google): For the subset of requests requiring deep multi-step reasoning, scientific analysis, or complex agentic behavior, routing those specifically to Gemini 3 Pro while keeping Gemini 3 Flash Preview as the default for standard tasks separates cost tiers cleanly

Claude Sonnet 4.5 (Anthropic): When structured output adherence, low hallucination rates on document-heavy prompts, or strict JSON schema compliance are required, Claude Sonnet 4.5 provides strong cross-provider redundancy and consistent formatting behavior

GPT-4.1 Mini (OpenAI): For high-volume, lower-complexity tasks like classification, entity extraction, or summarization where Google's API is unavailable or rate-limited, GPT-4.1 Mini serves as a reliable low-cost fallback with broad regional availability

Gemini 2.5 Flash Lite (Google): For bulk preprocessing steps before a Gemini 3 Flash Preview call, such as input filtering or topic classification, Gemini 2.5 Flash Lite at $0.10 per million input tokens reduces pipeline cost without affecting the quality of the main inference step

What are the challenges of using Gemini 3 Flash Preview in my product?

Like any production LLM, Gemini 3 Flash Preview comes with tradeoffs worth planning for:

Preview instability: As a pre-release model, Gemini 3 Flash Preview isn't subject to the same SLA, rate limit, or deprecation timeline commitments that apply to stable Google models. Shipping it as the primary model in a production path without a fallback creates real availability risk

Very high time to first token: At 39.41 seconds time to first token (as of 06/04/2026), Gemini 3 Flash Preview is among the slowest models for response initiation. Interactive use cases like chat interfaces or streaming completions will feel sluggish without explicit token-streaming and loading state handling

Provider dependency: Routing all traffic to a single Google endpoint means that quota restrictions, regional outages, or preview-specific rate limits affect every workflow at once. A cross-provider fallback is especially important for a preview-stage model

Cost at scale: At $3.00 per million output tokens, output costs accumulate quickly in high-throughput deployments. Applications with long completions, multi-turn conversation histories, or verbose system prompts should monitor output token volume closely

No extended reasoning: Gemini 3 Flash Preview doesn't include a reasoning or extended thinking mode. Tasks that benefit from chain-of-thought decomposition, such as competitive math or complex debugging, may underperform compared to Gemini 3 Pro or other reasoning-capable models

Why should I use Merge Gateway to route LLM requests with Gemini 3 Flash Preview and every other model?

Using Gemini 3 Flash Preview through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

One API, every provider: Access Gemini 3 Flash Preview and every other major LLM through a single endpoint and API key. Swap the model string to change providers without any application code changes, which is especially useful when graduating from preview to the stable Gemini 3 Flash release

Intelligent routing and automatic failover: Merge routes around Google outages and preview-tier availability constraints automatically. Routing policies based on cost, latency, or quality can reduce spend by 40-60% without touching your application code

Cost governance: Set hard or soft project budgets so Gemini 3 Flash Preview spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers

Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision

Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches Google. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with Gemini 3 Flash Preview?

Getting Gemini 3 Flash Preview running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For Gemini 3 Flash Preview, the model string is google/gemini-3-flash-preview. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming Gemini 3 Flash Preview as primary with the stable Gemini 3 Flash as fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try Gemini 3 Flash Preview through Merge Gateway

Route, observe, and control AI requests across providers from one API.

Start building for free

Get a demo