Table of contents

Just for you

3 ways that Merge enables you to keep your clients’ data safe

Introducing Destinations: data storage, your way

Inside Merge: how we’re building the leading sync engine

Ani Katipally

Software engineer

at Merge

Our sync jobs move millions of records every day for frontier LLM providers, leading banks, and thousands of other B2B SaaS companies.

To power this scale, our engineering team is constantly rethinking how we can deliver faster, more reliable, and more resilient integrations.

To that end, here are some of the measures we’ve recently taken to raise the bar for sync performance.

Evolving concurrency from batching to dynamic scheduling

Our initial implementation for concurrency involved fixed-size batches with a "Sync Issuer" coordinating work. This involved:

Processing API requests sequentially
Grouping substeps into fixed batches (e.g., batch size of 2)
Waiting for an entire batch completion before proceeding

This approach came with a few drawbacks. Notably, performance was constrained by the slowest batch member, and sync issuers were left waiting instead of making more API requests—leading to wasted time.

This led us to adopt a fundamentally different approach: “Dynamic Node Scheduling.”

Here’s a snapshot of how it works:

1. The Sync Issuer makes all API requests as quickly as possible.

2. Each result becomes a <code class="blog_inline-code">`QUEUED`</code> sync node.

3. Up to batch-size nodes run simultaneously as <code class="blog_inline-code">`RUNNING`</code>.

4. Completed nodes automatically trigger queued nodes.

This led us to eliminate the bottleneck caused by slow batch members, and it’s prevented idle sync issuer time. Taken together, we’ve sped up syncs by up to 15x.

Adopting intelligent rate limit management

Careful rate limit management makes syncs faster, as it eliminates the delays and retries associated with hitting actual rate limits.

With this in mind, we use a shared Redis cache to track API request activity across all concurrent processes.

This allows us to:

Monitor usage across different rate limit types (we’ve catalogued these for each integration)
Coordinate between multiple processing jobs
Trigger exceptions when approaching the 80% rate limit threshold
Schedule optimal retry timing based on encoded cooloff periods

This approach consistently pushes the boundaries of rate limit management at scale.

For example, we recently synced 1.3 million objects for a frontier LLM provider and operated within 3% of their theoretical maximum throughput by dynamically managing rate limits and backing off at the right times.

Engineering fault-tolerant infrastructure at scale

We’ve introduced fault-tolerant state persistence to ensure sync jobs survive interruptions.

When AWS issues a termination notice—or our memory monitoring detects trouble—the system immediately serializes the job’s entire state into a JSON snapshot. This snapshot, capturing hundreds of variables, is written to Elastic File System (EFS) within the two-minute window available.

When a replacement server comes online, it retrieves the state file, reconstructs the sync environment with complete fidelity, and resumes execution without losing progress. No manual intervention required.

This breakthrough lets us run jobs of any duration with confidence in their completion.

We’ve also realized significant business benefits: By leaning further into spot instances, we’ve achieved 40% daily compute cost savings; and our engineers are free from repetitive interventions in large account syncs.

Final thoughts

We’re proud of the progress so far, but our mission isn’t to be better than competitors. It’s to deliver the best sync performance possible for our customers.

With customer feedback and ongoing experimentation—whether it’s in scheduling, retry logic, or infrastructure resilience—we’ll continue to push the limits of what’s possible in data synchronization.

Ani Katipally

Software engineer

@Merge

How Merge accelerates AI adoption across engineering without compromising security

Company

How to create API tokens in Confluence (6 steps)

Insights

MCP server logs: overview, benefits, and tips for using them

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

But Merge isn’t just a Unified  API product. Merge is an integration platform to also manage customer integrations. gradient text

Add secure integrations to your products and AI agents with ease via Merge.

Just for you

3 ways that Merge enables you to keep your clients’ data safe

Introducing Destinations: data storage, your way

Inside Merge: how we’re building the leading sync engine

Evolving concurrency from batching to dynamic scheduling

This led us to eliminate the bottleneck caused by slow batch members, and it’s prevented idle sync issuer time. Taken together, we’ve sped up syncs by up to 15x.

Adopting intelligent rate limit management

For example, we recently synced 1.3 million objects for a frontier LLM provider and operated within 3% of their theoretical maximum throughput by dynamically managing rate limits and backing off at the right times.

Engineering fault-tolerant infrastructure at scale

This breakthrough lets us run jobs of any duration with confidence in their completion.

Final thoughts

Read more

How Merge accelerates AI adoption across engineering without compromising security

How to create API tokens in Confluence (6 steps)

MCP server logs: overview, benefits, and tips for using them

Subscribe to the Merge Blog

Add secure integrations to your products and AI agents with ease via Merge.

Just for you

3 ways that Merge enables you to keep your clients’ data safe

Introducing Destinations: data storage, your way

Inside Merge: how we’re building the leading sync engine

Evolving concurrency from batching to dynamic scheduling

This led us to eliminate the bottleneck caused by slow batch members, and it’s prevented idle sync issuer time. Taken together, we’ve sped up syncs by up to 15x.

Adopting intelligent rate limit management

For example, we recently synced 1.3 million objects for a frontier LLM provider and operated within 3% of their theoretical maximum throughput by dynamically managing rate limits and backing off at the right times.

Engineering fault-tolerant infrastructure at scale

This breakthrough lets us run jobs of any duration with confidence in their completion.

Final thoughts

Read more

How Merge accelerates AI adoption across engineering without compromising security

How to create API tokens in Confluence (6 steps)

MCP server logs: overview, benefits, and tips for using them

Subscribe to the Merge Blog

3 ways to drive business results with your new Merge integrations

3 ways to drive business results with your new Merge integrations

Get our best content every week