Table of contents

Add hundreds of integrations to your product through Merge’s Unified API
Get a demo

How to test MCP servers effectively (6 best practices)

Yash Gogri
Product Manager
@Merge

Model Context Protocol (MCP) servers are often poorly built.

Their tools, or the functionality they expose, may not have clear descriptions; their error messages may be ambiguous; their authentication mechanisms can vary across tools, and more.

These issues can lead your AI agents to underperform, whether it’s calling the wrong tools or it’s taking the incorrect steps when working with users on remediating issues.

To prevent many, if not most, issues, you can test an MCP server comprehensively before your AI agent uses it in production.

Here are a few tips to help you do just that.

Use sandbox data for every test

This best practice is the most intuitive. 

Your agentic use cases for an MCP server likely include using sensitive data, such as personally identifiable information (PII) on customers’ employees. And if you haven’t tested the MCP server carefully before using real-world data, you’re putting this data at risk of leaking to unauthorized individuals—which could have long-lasting consequences on your business.

To that end, only use sandbox data when you’re testing an MCP server.

Related: How to use MCP servers successfully

Set up a wide range of test scenarios 

Since your AI agents will likely support a diverse set of use cases, you’ll want to define a broad range of expected behaviors.

For each behavior, you should include the relevant prompt(s), the expected tool call(s), and the parameter assertions.

The table below lays out just one scenario.

A test scenario for an MCP server

The more time and effort you put into brainstorming different scenarios and outlining the expected behaviors, the better you can test the MCP server. So, it’s worth making this a comprehensive exercise that involves input from everyone who’s building to the MCP server and planning on using its tools.

Evaluate the tools’ hit rates

As you begin testing different scenarios, you should measure them against an increasingly important metric—a tool’s “hit rate”. 

This metric refers to how often your AI agents make the appropriate tool calls.

How a tool's hit rate is calculated
How to calculate a tool’s hit rate

An AI agent’s ability to both consistently determine whether to call a tool from a given prompt and call the correct one signals that the MCP server’s tools have comprehensive coverage, clear descriptions, and proper parameter schemas—all of which indicate the quality of the MCP server.

The ability to fine-tune an MCP server’s tool descriptions and schemas is also a strong indicator of quality, as it enables further improvements in a tool’s hit rate (but most out-of-the box MCP servers will, unfortunately, not provide this level of flexibility).

Related: How to observe your AI agent

Assess the tools’ success rates

Calling the correct tool is just half the battle. The other half is calling that tool successfully.

This includes both the initial tool call and, if there’s errors, any follow-up attempts.

How a tool's success rate is calculated
How to calculate a tool’s success rate

It’s important to measure this separately from a tool’s hit rate.

As mentioned earlier, a tool’s hit rate measures things like the tool coverage and a tool’s description quality. The success rate, on the other hand, shows how well a tool manages authentication, surfaces issues (e.g., reaching a rate limit), handles the parameters that get passed, and more. 

In other words, by separating the two metrics, you can better hone in on the strengths and weaknesses of a given tool. 

For instance, a tool with a high hit rate but a low success rate suggests that the MCP server provides clear descriptions and schemas but suffers from poor execution quality; but a high success rate and a low hit rate can mean that the MCP server’s tools aren’t comprehensive.

Related: MCP vs AI agents

Track unnecessary tool calls 

For any scenario you test, you’ll expect certain tool calls. But your AI agents may make additional ones that are not only unnecessary but also costly.

Here are just some of the issues of unneeded tool calls:

  • The workflow experiences additional latency, and for time-sensitive processes, this can meaningfully hurt its performance 
  • You may face additional costs over time
  • It can break your customer's trust with your product if they notice your AI agent making unnecessary tool calls to achieve a given outcome
  • The process of troubleshooting a workflow issue becomes more complex as there are more variables that need to be isolated and reviewed

Fortunately, if you explicitly lay out all of the expected tool calls across your test scenarios (as highlighted in our second best practice), this issue will be easy to identify and address over time.

Use a 3rd-party solution to run the tests

The process of retrieving sandbox data, performing tests across an MCP server’s tools, and analyzing the results that come back is an extremely complex, time-intensive, and error-prone process. 

Multiply this by all of the MCP servers you want to test and this scope of work will only grow exponentially.

To manage this effectively at scale, you can look into 3rd-party solutions that are purpose built to test MCP servers’ tools.

Merge, for example, is in the process of building an offering that’ll define expected agent behaviors across scenarios, support a validation engine that checks if agents called the correct tools with proper parameters, compare expected versus actual behaviors—and more.

Learn more by scheduling a demo with one of Merge's integration experts.

“It was the same process, go talk to their team, figure out their API. It was taking a lot of time. And then before we knew it, there was a laundry list of HR integrations being requested for our prospects and customers.”

Name
Position
Position
Yash Gogri
Product Manager
@Merge

Read more

How to test MCP servers effectively (6 best practices)

AI

Introducing Destinations: data storage, your way

Product

Why Merge is more extensible than you think: a guide for engineers

Product

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

Subscribe

But Merge isn’t just a Unified 
API product. Merge is an integration platform to also manage customer integrations.  gradient text
But Merge isn’t just a Unified 
API product. Merge is an integration platform to also manage customer integrations.  gradient text
But Merge isn’t just a Unified 
API product. Merge is an integration platform to also manage customer integrations.  gradient text
But Merge isn’t just a Unified 
API product. Merge is an integration platform to also manage customer integrations.  gradient text