Qwen3.5 Flash API: pricing, performance, and how to route requests

Qwen3.5 Flash:
Everything you need to know about the model

Qwen3.5 Flash is a Qwen model available through Merge Gateway. Use it with Gateway routing policies, spend controls, request logs, and a 1,000,000 token context window. It supports streaming through at least one Gateway vendor route.

Qwen3.5 Flash pricing

| Vendor | Input / 1M tokens | Output / 1M tokens | Zero data retention | | --- | ---: | ---: | --- | | Alibaba | $0.0290 | $0.2870 | No |

Test Qwen3.5 Flash with Merge Gateway’s Simulator

Qwen3.5 Flash

Model

System prompt

Synced

User message

Synced

Response

Run simulation to see response

Cost

—

Tokens

—

Latency

—

Ready to try it out?

Start routing requests to hundreds of large language models in your product within minutes.

Start building for free

Get a demo

Route requests to Qwen3.5 Flash with Merge Gateway

Merge Gateway is a unified LLM API that lets your product route requests to Qwen3.5 Flash and every other major model through a single endpoint. You get built-in fallback routing, per-request cost tracking, zero data retention support, and observability without changing your application architecture.

To get started in seconds, add our Gateway Implementation skill to your project, or pick your preferred SDK below. Check out our other quick start skills here.

Install the Merge Gateway SDK

Python

1$ pip install merge-gateway-sdk

Send a request

Python

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5response = client.responses.create(
6    model="openai/gpt-5.2",
7    input=[
8        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
9        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
10    ],
11)
12
13print(response.output[0].content[0].text)

Try a diffrent model

Swap the model string to route to a different provider. No other code changes needed.

Anthropic

1response = client.responses.create(
2    model="anthropic/claude-sonnet-4-20250514",
3    input=[
4        {"type": "message", "role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"type": "message", "role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)

Point to Gateway

Python

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/openai",
6)

Send a request

Use the standard chat.completions.create method. No provider prefix needed on the model name.

Python

1response = client.chat.completions.create(
2    model="gpt-5.2",
3    messages=[
4        {"role": "system", "content": "You are a helpful programming tutor. Explain the concepts clearly with practical examples."},
5        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
6    ],
7)
8
9print(response.choices[0].message.content)

Install packages

1npm install merge-gateway-ai-sdk-provider ai

Create the provider

TypeScript

1import { createMergeGateway } from "merge-gateway-ai-sdk-provider";
2
3const gateway = createMergeGateway({
4  apiKey: "YOUR_API_KEY",
5});

Send a request

Use generateText to send a request. Model names use the provider/model format.

TypeScript

1import { generateText } from "ai";
2
3const { text } = await generateText({
4  model: gateway("openai/gpt-4o"),
5  prompt: "Explain the concept of recursion in programming with a simple set of examples.",
6});
7
8console.log(text);

If you already have @ai-sdk/openai installed, point it at Gateway with a base URL change:

TypeScript

1import { createOpenAI } from "@ai-sdk/openai";
2
3const gateway = createOpenAI({
4  apiKey: "YOUR_API_KEY",
5  baseURL: "https://api-gateway.merge.dev/v1/ai-sdk",
6});
7
8// All generateText/streamText calls work unchanged

Install the Merge Gateway SDK

Anthropic SDK

1from anthropic import Anthropic
2
3client = Anthropic(
4    api_key="YOUR_API_KEY",
5    base_url="https://api-gateway.merge.dev/v1/anthropic",
6)
7
8message = client.messages.create(
9    model="claude-sonnet-4-20250514",
10    max_tokens=1024,
11    messages=[
12        {"role": "user", "content": "Explain the concept of recursion in programming with a simple set of examples."},
13    ],
14)
15
16print(message.content[0].text)

Explore other models available in Merge Gateway

Qwen3 30B A3B

Qwen3 30B A3B Instruct 2507

Qwen3-32B

Qwen3.5 122B A10B

Qwen3.5 27B

Qwen3.5-35B-A3B

Qwen3.5-397B-A17B

Qwen3.5 Plus

Qwen3.6 35B A3B

Qwen3.6 Flash

Qwen3.6 Plus

Qwen3.7 Max

Qwen3 8B

Qwen3-Coder-30B-A3B-Instruct

Qwen3-Coder-480B

Qwen3 Coder Flash

Qwen3 Coder Plus

Qwen3 Max

Qwen3 Next 80B A3B Thinking

Qwen 3 Next 80B Instruct

Qwen3-VL 235B A22B Thinking

Qwen3-VL 30B-A3B Instruct

Qwen3-VL 32B Instruct

Qwen3-VL-8B-Instruct

Qwen3.5 Flash FAQ

In case you have any other questions on Qwen3.5 Flash, we've answered a few more below. It's worth noting that the information below was written in June, 2026 and is subject to change.

Heading

What other models does Alibaba offer?

Qwen3.5 Flash is the faster, cost-optimized API tier in Alibaba's Qwen3.5 generation, which sits within a broader Qwen lineup that spans reasoning flagships, coding specialists, and multimodal models. Here are some other models Alibaba supports:

Qwen3.7 Max: Qwen3.7 Max is Alibaba's current proprietary flagship reasoning model, scoring 57 on the Artificial Analysis Intelligence Index and ranking in the top 10 of over 150 evaluated models, with a 1M-token context window and 182.7 tokens per second output speed at $2.50 per 1M input and $7.50 per 1M output

Qwen3.6 Plus: Qwen3.6 Plus is a proprietary mid-tier reasoning model with multimodal input support and a 1M-token context window, scoring 50 on the Intelligence Index at $0.50 per 1M input and $3.00 per 1M output, released April 2026

Qwen3.5 Plus: Qwen3.5 Plus is the higher-capability sibling in the Qwen3.5 API tier, built on the 397B A17B open-weight model, scoring 45 on the Intelligence Index and ranked 13th of 89 comparable models at $0.60 per 1M input and $3.60 per 1M output

Qwen3 Coder Next: Qwen3 Coder Next is Alibaba's open-weight coding-specialized model at 79.7B total parameters with a 256k-token context window, optimized for code generation and agentic programming tasks at $0.35 per 1M input and $1.20 per 1M output under Apache 2.0

Qwen3.5 Omni Flash: Qwen3.5 Omni Flash is a high-speed multimodal model from Alibaba accepting text, image, speech, and video as input and producing text and speech output, designed for low-cost high-throughput multimodal applications with a 256k-token context window

How does Qwen3.5 Flash differ from Alibaba's other models?

Qwen3.5 Flash is the throughput-optimized API tier in the Qwen3.5 generation, built on the 122B A10B open-weight MoE model, and is positioned as the cost-efficient high-speed option before the step up to Qwen3.5 Plus or the proprietary Qwen3.6 and Qwen3.7 tiers.

Speed: At 140.6 tokens per second (as of 06/04/2026), Qwen3.5 Flash is nearly three times faster than Qwen3.5 Plus (52.2 t/s) and competitive with the fastest models in Alibaba's lineup. This makes it the right choice when response latency matters more than peak benchmark quality

Intelligence Index: Qwen3.5 Flash scores 42 on the Artificial Analysis Intelligence Index, ranking 1st of 61 models in its class. It trails Qwen3.5 Plus (score: 45) and the proprietary tiers, but leads its speed class, meaning it offers the best quality-per-latency tradeoff among comparable fast models

Pricing: At $0.40 per 1M input and $3.20 per 1M output, Qwen3.5 Flash is modestly cheaper than Qwen3.5 Plus on input. Output pricing is close between the two tiers, so teams optimizing primarily for cost should also consider the overall latency profile before choosing between them

Context window: Qwen3.5 Flash offers a 262k-token context window, the same as Qwen3.5 Plus. Both are shorter than the 1M-token windows on Qwen3.6 Plus and Qwen3.7 Max, which matters for applications processing very long documents or multi-turn conversations approaching that ceiling

Modalities: Qwen3.5 Flash accepts text and image inputs and produces text only. It doesn't support audio or video, unlike the Qwen3.5 Omni Flash, which covers the full multimodal spectrum at a comparable speed tier

Qwen3.5 Flash is best suited for latency-sensitive, high-concurrency applications where 140+ tokens per second matters and the task complexity doesn't require Qwen3.5 Plus's higher benchmark quality. It's also a natural cost-conscious fallback route from Plus within a multi-model routing policy.

What models should I consider using alongside Qwen3.5 Flash?

No single model is optimal for every task. Here are models worth pairing with Qwen3.5 Flash depending on what your product needs:

Qwen3.5 Plus (Alibaba): For tasks that require higher reasoning accuracy than Qwen3.5 Flash's Intelligence Index score of 42 can reliably deliver, routing to Qwen3.5 Plus within the same context window and model family adds quality headroom at a modest cost increase, without switching providers

Qwen3.7 Max (Alibaba): When a task demands peak reasoning quality, such as complex multi-step analysis, hard math, or long agentic chains, routing to Qwen3.7 Max gives access to Alibaba's best available model at the cost of higher latency and per-token price

GPT-4.1 Mini (OpenAI): For simple, high-volume text tasks where Qwen3.5 Flash is more capable than needed, GPT-4.1 Mini offers broad regional availability and a lower blended cost across short-output workloads like classification, tagging, or simple extraction

Claude Haiku 3.5 (Anthropic): When very fast response times and low-cost per-turn are both critical and the task calls for reliable instruction adherence on short prompts, Claude Haiku 3.5 provides a battle-tested cross-provider option for simple structured generation tasks

Gemini 2.0 Flash (Google): When requests include audio or video content that Qwen3.5 Flash can't process natively, routing to Gemini 2.0 Flash covers those modalities at a comparable speed tier without requiring a separate high-cost model

What are the challenges of using Qwen3.5 Flash in my product?

Like any production LLM, Qwen3.5 Flash comes with tradeoffs worth planning for:

Quality ceiling at complex tasks: Scoring 42 on the Artificial Analysis Intelligence Index, Qwen3.5 Flash doesn't match the reasoning quality of Qwen3.5 Plus (score: 45) or proprietary tiers. For complex analytical, legal, or code-heavy tasks, routing to a higher-capability model is worth the added latency and cost

Output cost relative to input cost: At $3.20 per 1M output tokens, the output cost is disproportionately high relative to the $0.40 input rate. Workloads with long completions, iterative generation, or open-ended responses will accumulate costs faster than workloads with short outputs, so output length controls are important at production scale

Text and image input only: Qwen3.5 Flash doesn't support audio or video inputs. Pipelines that occasionally receive those content types need separate routing logic, which adds branching complexity and maintenance overhead

Provider dependency: Relying on Alibaba's DashScope API as a single provider creates fragility when the provider has an outage or deprecates a model version. Alibaba releases successive Qwen generations at a rapid pace, so model ID changes and deprecation timelines require active monitoring

Cost at scale: As request volume grows, token costs compound quickly without active cost management. Even at the Flash tier's pricing, high-concurrency deployments with long completions can exceed budget expectations without per-project spend limits in place

Why should I use Merge Gateway to route LLM requests with Qwen3.5 Flash and every other model?

Using Qwen3.5 Flash through Merge Gateway gives you access to the model itself and the infrastructure layer around it:

Build Your Own Router: Define what "best" means for your traffic by selecting from curated ML benchmarks or adding your own eval scores. The router scores each available model against your weights and picks the winner per request, with a plain-language explanation of every decision. This is particularly useful for routing between Qwen3.5 Flash and higher-capability tiers based on task complexity

One API, every provider: Access Qwen3.5 Flash and every other major LLM through a single endpoint and API key. Change providers by swapping the model string, with no application code changes required

Intelligent routing and automatic failover: Merge routes around Alibaba outages automatically. Routing policies based on cost, latency, or quality can reduce spend by 40 to 60% without touching your application code

Cost governance: Set hard or soft project budgets so Qwen3.5 Flash spend stays within plan. Every request is attributed to a model, project, and tag in a unified billing dashboard across all providers

Security and compliance controls: Apply DLP rules and prompt injection protection before every request reaches Alibaba. Enforce per-project model and region policies without adding that logic to your application

How can I start using Merge Gateway to route requests with Qwen3.5 Flash?

Getting Qwen3.5 Flash running through Merge Gateway takes a few minutes:

1. Create an account and get your API key from the dashboard.

2. Install the Merge Gateway SDK: run pip install merge-gateway-sdk (Python) or npm install merge-gateway-sdk (Node). Alternatively, if you're already using the OpenAI SDK, set base_url = "https://api-gateway.merge.dev/v1/openai" and your existing code works as-is.

3. Make your first request using the provider/model format. For Qwen3.5 Flash, the model string is alibaba/qwen3.5-flash. Swap the model string to route to any other provider without changing anything else.

4. Configure a routing policy in the dashboard to set failover behavior, cost limits, and optimization strategy. Your first policy can be as simple as naming Qwen3.5 Flash as primary with one fallback.

Full setup instructions and SDK references are in the Merge Gateway docs.

Try Qwen3.5 Flash through Merge Gateway

Route, observe, and control AI requests across providers from one API.

Start building for free

Get a demo