MCP · June 28, 2026

Why Are AI Agents Calling Your MCP Tools Wrong?

97% of MCP tool descriptions have quality defects that cause AI agents to pick the wrong tool or pass wrong parameters. Here is what is actually going wrong and how to fix it.

Here's a number that should make you stop and think if you've shipped an MCP server: 97.1%. That's the share of MCP tool descriptions — across 856 tools from 103 major servers, audited in a February 2026 study — that contain at least one quality defect serious enough to cause an AI assistant to pick the wrong tool, pass the wrong parameters, or burn extra round-trips trying to figure out what you meant. If you've published a tool on an MCP server, there's a roughly 97-in-100 chance something in your description is actively working against the agents you're trying to serve.

So what's actually going wrong, and does any of it matter enough to fix?

Where does this data come from?

This post draws on two separate analyses of the MCP ecosystem. The first — published on arxiv in February 2026 — audited tool descriptions from 103 public MCP servers using a structured taxonomy of eight distinct quality defects. The second tracked 177,436 agent tools created between November 2024 and February 2026 across public MCP repositories, giving a 16-month picture of how tool usage is actually shifting as the ecosystem matures. Between the two, we get both a quality snapshot and a usage-trend view.

What does a "smelly" tool description actually look like?

The most common defect, hitting 56% of the 856 tools sampled, is what the researchers call Unclear Purpose: descriptions so vague the model cannot reliably distinguish this tool from a different one on the same server. Not wildly wrong — just ambiguous enough that the agent ends up guessing.

Imagine you've published two tools: get_product and fetch_item. Both could plausibly retrieve product data. If neither description explains the difference, the agent picks whichever name sounds most relevant to the task at hand — sometimes correctly, often not, always by accident. The model has no memory between sessions. Every call starts fresh. If your description doesn't make the distinction explicit, it won't be inferred.

The next most frequent problems are Unstated Limitations (the description doesn't say what the tool won't do, so agents try using it for out-of-scope tasks and get confusing results), Missing Usage Guidelines (no signal about when to call this tool vs a different one), and Opaque Parameters — fields named q, type, mode, or filters with no indication of valid values or expected format.

MCP Tool Description Quality: Audit of 856 Tools Across 103 Servers

Percentage of tools exhibiting each quality characteristic, from a structured audit of public MCP servers published February 2026.

Source: arxiv.org/abs/2602.14878 — MCP Tool Descriptions Are Smelly! (Feb 2026)

The real cost isn't just a wrong answer. It's extra round-trips. A model that calls the wrong tool, gets an unexpected response, and retries with a different tool is burning context window and latency. In multi-step agent workflows, it burns money. And for action tools — ones that write, send, or modify something — calling the wrong tool speculatively isn't just slow: it can't be undone.

Are action tools growing faster than your quality practices can keep up with?

This is where the usage-trend data gets important. Looking at the 177,000-tool dataset: in November 2024, action tools — those that modify external state — accounted for 27% of agent tool invocations. By February 2026, just 16 months later, that share had grown to 65%.

MCP Tool Type Shift: Action vs. Read Invocations (Nov 2024 to Feb 2026)

Share of agent tool invocations by type across 177,436 tools tracked in public MCP repositories over 16 months.

Source: arxiv.org/abs/2603.23802 — How are AI agents used? Evidence from 177,000 MCP tools (Mar 2026)

Read tools (the study calls them "perception tools") dropped from 73% to 35% over the same period. The overall ecosystem grew substantially — so read tools aren't being used less in absolute terms — but action tools are being added and called at a considerably faster rate.

Why does the description quality problem get worse as action tool usage rises? Because the consequences of a bad description scale with what the tool actually does. If an agent reads the wrong document due to a vague description, that's a recoverable error. If it creates the wrong record, sends the wrong notification, or submits a form with guessed parameters, the recovery cost is much higher. The ecosystem is moving toward higher-stakes tooling at the same time most servers have yet to update their descriptions to reflect it.

What does the broader landscape look like?

For context: as of June 2026, there are roughly 73,500 MCP servers across the major public registries — the official MCP registry plus Smithery, Glama, and others. Monthly SDK downloads crossed 97 million in March 2026, up from approximately 2 million at launch in November 2024. That's 4,750% growth in 16 months.

Software development still dominates heavily — 67% of the tools tracked in the 177,000-tool dataset are developer tooling, and software accounts for 90% of MCP server downloads. If your site serves non-developer users (e-commerce, content, professional services), the agents that will discover and interact with your MCP server are only starting to be built. The quality bar you set now will shape how those agents work with your site when they arrive.

So how do you actually fix this?

The most direct approach is to treat your tool descriptions as the primary interface — not the API schema underneath them. When an agent decides whether to call your tool, it's reading your description. It doesn't bring contextual knowledge about your product or domain. It has only what you've written.

For every tool you publish, ask one question: could someone who has never seen your site figure out, from the description alone, exactly when to call this tool instead of the other ones you've published? If the answer is no, the description isn't done yet.

Most descriptions fail on the cross-referencing problem. If you have search_products and get_product_by_id, both descriptions need to draw the distinction explicitly. "Use this when you have an exact product SKU. For keyword or category search, use search_products instead." That's one sentence. It prevents the agent from guessing and getting it wrong half the time.

For action tools specifically, your error responses are almost as important as your descriptions. Most MCP servers today return generic error objects — {"error": "invalid parameter"} — which tells the calling agent nothing about what to try next. Structured errors that explain what was wrong and what a valid input would look like ("quantity must be a positive integer between 1 and 99") let the agent recover and retry correctly rather than abandoning the task or guessing again.

The 73,000-server ecosystem is still heavily developer-focused. If you're in a non-developer vertical and you take the time to write genuinely clear, well-structured tool descriptions, you'll stand out in the registries where AI assistants discover what's available. The agents that can reliably use your tools will use them. The ones that can't will route around you — and you won't have any visibility into which is happening.