Gemini 3 Flash Preview is a Google model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 1,048,576 token context window. It supports streaming, structured outputs, tool calling, vision through at least one Gateway vendor route.

Gemini 3 Flash Preview pricing
Test Gemini 3 Flash Preview with Merge Gateway’s Simulator

Ready to try it out?
Start routing requests to hundreds of large language models in your product within minutes.

Route requests to Gemini 3 Flash Preview with Merge Gateway
1$ pip install merge-gateway-sdk1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6 model="openai/gpt-5.2",
7 input=[
8 {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9 {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10 ],
11)
12
13print(response.output[0].content[0].text)1response = client.responses.create(
2 model="anthropic/claude-sonnet-4-20250514",
3 input=[
4 {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5 {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6 ],
7)1from openai import OpenAI
2
3client = OpenAI(
4 api_key="YOUR_API_KEY",
5 base_url="https://api-gateway.merge.dev/v1/openai",
6)1response = client.chat.completions.create(
2 model="gpt-5.2",
3 messages=[
4 {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5 {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6 ],
7)
8
9print(response.choices[0].message.content)1npm install merge-gateway-ai-sdk-provider ai1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4 apiKey: "YOUR_API_KEY",
5});1import { generateText } from "ai";
2
3const { text } = await generateText({
4 model: gateway("openai/gpt-4o"),
5 prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4 apiKey: "YOUR_API_KEY",
5 baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged1from anthropic import Anthropic
2
3client = Anthropic(
4 api_key="YOUR_API_KEY",
5 base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9 model="claude-sonnet-4-20250514",
10 max_tokens=1024,
11 messages=[
12 {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13 ],
14)
15
16print(message.content[0].text)Explore other models available in Merge Gateway
Gemini 3 Flash Preview FAQ
Heading
What other models does Google offer?
Google's Gemini lineup spans from lightweight flash-tier models optimized for throughput to flagship reasoning models built for complex analytical tasks. Here are some other models Google supports:
- Gemini 3 Flash: Gemini 3 Flash is the stable release counterpart to this preview, offering the same cost-efficient Gemini 3 Flash tier with the reliability guarantees and API stability that production deployments require, making it the natural graduation path once this preview reaches general availability
- Gemini 3 Pro: Gemini 3 Pro is Google's flagship model in the Gemini 3 generation, priced at $2.00 per million input tokens and $12.00 per million output tokens, suited for the most demanding analytical, agentic, and multi-step reasoning tasks where accuracy outweighs cost
- Gemini 2.5 Pro: Gemini 2.5 Pro is Google's proven flagship from the prior generation, supporting a 1M-token context window with strong reasoning performance and offering a stable production alternative for teams that aren't yet ready to migrate to Gemini 3 models
- Gemini 2.5 Flash: Gemini 2.5 Flash is a stable mid-tier model producing 203.2 tokens per second at $0.30 per million input tokens and $2.50 per million output tokens, well-suited for general-purpose instruction-following tasks at predictable cost
- Gemini 2.5 Flash Lite: Gemini 2.5 Flash Lite is the fastest and least expensive option in Google's current lineup at $0.10 per million input tokens and $0.40 per million output tokens, delivering the highest output speed across evaluated models and designed for high-volume workloads where throughput dominates
How does Gemini 3 Flash Preview differ from Google's other models?
Gemini 3 Flash Preview is an early-access release of the Gemini 3 Flash model, offering the cost-efficient tier of the next-generation Gemini family before it reaches stable availability.
- Preview status: Gemini 3 Flash Preview is a pre-release version, which means rate limits, API behavior, and availability guarantees differ from stable models. The stable Gemini 3 Flash has the same pricing structure but without the uncertainty that comes with preview-stage access
- Pricing: Input is priced at $0.50 per million tokens and output at $3.00 per million tokens. That's one-quarter the input cost of Gemini 3 Pro ($2.00 per million) and roughly one-sixth the output cost ($12.00 per million), placing it firmly in the cost-efficient tier
- Speed: Output speed is 169.3 tokens per second (as of 06/04/2026), which is faster than many mid-tier models but slower than Gemini 2.5 Flash (203.2 t/s) and Gemini 2.5 Flash Lite (259.8 t/s). Time to first token is 39.41 seconds (as of 06/04/2026), which is notably high and affects latency-sensitive applications
- Intelligence Index: Ranked #20 out of 71 non-reasoning models on the Artificial Analysis Intelligence Index (as of 06/04/2026), placing it above the median for its tier and reflecting the intelligence gains that the Gemini 3 generation brings at this price point
- Context window: Supports 1M tokens of input context, matching Gemini 3 Pro and Gemini 2.5 Pro, and far exceeding Gemini 2.5 Flash Lite for long-document workflows
Gemini 3 Flash Preview is worth evaluating for teams that want early access to Gemini 3 generation intelligence at flash pricing, but it isn't a production-ready substitute for stable models until it reaches general availability.
What models should I consider using alongside Gemini 3 Flash Preview?
No single model is optimal for every task. Here are models worth pairing with Gemini 3 Flash Preview depending on what your product needs:
- Gemini 3 Flash (Google): For any traffic that requires stable API guarantees or predictable rate limits, routing to the stable Gemini 3 Flash as a fallback ensures continuity if the preview endpoint hits restrictions or is temporarily unavailable
- Gemini 3 Pro (Google): For the subset of requests requiring deep multi-step reasoning, scientific analysis, or complex agentic behavior, routing those specifically to Gemini 3 Pro while keeping Gemini 3 Flash Preview as the default for standard tasks separates cost tiers cleanly
- Claude Sonnet 4.5 (Anthropic): When structured output adherence, low hallucination rates on document-heavy prompts, or strict JSON schema compliance are required, Claude Sonnet 4.5 provides strong cross-provider redundancy and consistent formatting behavior
- GPT-4.1 Mini (OpenAI): For high-volume, lower-complexity tasks like classification, entity extraction, or summarization where Google's API is unavailable or rate-limited, GPT-4.1 Mini serves as a reliable low-cost fallback with broad regional availability
- Gemini 2.5 Flash Lite (Google): For bulk preprocessing steps before a Gemini 3 Flash Preview call, such as input filtering or topic classification, Gemini 2.5 Flash Lite at $0.10 per million input tokens reduces pipeline cost without affecting the quality of the main inference step
What are the challenges of using Gemini 3 Flash Preview in my product?
Like any production LLM, Gemini 3 Flash Preview comes with tradeoffs worth planning for:
- Preview instability: As a pre-release model, Gemini 3 Flash Preview isn't subject to the same SLA, rate limit, or deprecation timeline commitments that apply to stable Google models. Shipping it as the primary model in a production path without a fallback creates real availability risk
- Very high time to first token: At 39.41 seconds time to first token (as of 06/04/2026), Gemini 3 Flash Preview is among the slowest models for response initiation. Interactive use cases like chat interfaces or streaming completions will feel sluggish without explicit token-streaming and loading state handling
- Provider dependency: Routing all traffic to a single Google endpoint means that quota restrictions, regional outages, or preview-specific rate limits affect every workflow at once. A cross-provider fallback is especially important for a preview-stage model
- Cost at scale: At $3.00 per million output tokens, output costs accumulate quickly in high-throughput deployments. Applications with long completions, multi-turn conversation histories, or verbose system prompts should monitor output token volume closely
- No extended reasoning: Gemini 3 Flash Preview doesn't include a reasoning or extended thinking mode. Tasks that benefit from chain-of-thought decomposition, such as competitive math or complex debugging, may underperform compared to Gemini 3 Pro or other reasoning-capable models
Why should I use Merge Gateway to route LLM requests with Gemini 3 Flash Preview and every other model?
Using Gemini 3 Flash Preview through Merge Gateway gives you access to the model itself and the infrastructure layer around it:
- One API, every provider: Access Gemini 3 Flash Preview and every other major LLM through a single endpoint and API key. Swap the model string to change providers without any application code changes, which is especially useful when graduating from preview to the stable Gemini 3 Flash release
- Intelligent routing and automatic failover: Merge routes around Google outages and preview-tier availability constraints automatically. Routing policies based on cost, latency, or quality can reduce spend by 40-60% without touching your application code
- Cost governance: Set hard or soft project budgets so Gemini 3 Flash Preview spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers
- Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision
- Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches Google. Enforce per-project model and region policies without adding that logic to your application
How can I start using Merge Gateway to route requests with Gemini 3 Flash Preview?
Getting Gemini 3 Flash Preview running through Merge Gateway takes a few minutes:
1. Create an account and get your API key from the dashboard.
2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.
3. Make your first request using the provider/model format. For Gemini 3 Flash Preview, the model string is google/gemini-3-flash-preview. Swap the model string to route to any other provider without changing anything else.
4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming Gemini 3 Flash Preview as primary with the stable Gemini 3 Flash as fallback.
Full setup instructions and SDK references are in the Merge Gateway docs.
Try Gemini 3 Flash Preview through Merge Gateway
Route, observe, and control AI requests across providers from one API.






