MCP in Anger: One Year of Building With the Protocol

I have been writing tools for agents since “tools for agents” was a credible category, which is to say since about 2023. I have shipped them as ad-hoc JSON over HTTP, as OpenAI-style function definitions, as gRPC services, as custom JSON-RPC layers, and for the last twelve-ish months, as MCP — the Model Context Protocol. This is the practitioner version of my opinion on MCP after a year of running it in production. It is not a tutorial. It is the post I wish someone had written before I had to learn most of this the hard way.

The headline is short. MCP is the right answer for most teams that are integrating against more than a handful of external systems, the wrong answer for teams that have one tightly-coupled product to ship, and an unstable answer for anyone who needs the protocol to behave like a hardened RPC layer. Below is the long version.

What MCP buys you, said honestly

The protocol’s basic pitch is sound. MCP gives you a uniform way to describe tools, resources, and prompts that an agent can consume, and a uniform way for an agent to talk to a server that provides them. The protocol-level uniformity is the part that actually does work for you in practice. The act of describing a tool as an MCP tool, rather than as “a thing this particular agent framework happens to support,” removes a class of integration bugs that I used to spend real time on.

The specific class of bugs is “the agent and the tool disagree about the tool’s interface.” In an MCP world, the interface is owned by the server, the agent reads it at connection time, and the agent’s framework is responsible for translating it into whatever the model needs to see. When the tool changes, the agent picks it up. When the agent’s framework changes, the tool does not have to be updated. That separation is genuinely valuable, and it is the reason I would tell a team running against more than a handful of integrations to use MCP rather than to roll their own JSON-RPC layer.

The other thing the protocol buys you is a real ecosystem. There are MCP servers for filesystem access, for source-control hosts, for databases, for browser automation, for cloud providers, for monitoring systems, for the popular file-storage services. You will write some of your own. You will not have to write all of them, and the ones you do not have to write are the ones with the most boring integration shapes — which is to say, the ones least worth your time. That is a meaningful productivity unlock.

A working MCP server, for the record, is small enough to fit in a few dozen lines of TypeScript:

import { McpServer } from "@modelcontextprotocol/sdk/server";
import { z } from "zod";

const server = new McpServer({
  name: "stack-quarterly-fetch",
  version: "0.2.0",
});

server.tool(
  "fetch_article",
  { url: z.string().url() },
  async ({ url }) => {
    const res = await fetch(url, { redirect: "follow" });
    if (!res.ok) {
      return { isError: true, content: [{ type: "text",
        text: `HTTP ${res.status}` }] };
    }
    const text = await res.text();
    return { content: [{ type: "text", text }] };
  }
);

server.connectStdio();

That is not a toy example. That is what a useful MCP server looks like in production once you have decided what its job is. The minimalism is the point.

What MCP costs you, said honestly

The pitch has costs. I will name three.

The first cost is indirection. Every MCP integration sits behind a server. The server is a process, which is a thing you have to deploy, monitor, restart, version, and reason about. For a team that is integrating against five external systems, that is five servers. For a team with one product and one in-house integration, the server is overhead. I have seen small teams adopt MCP for a single integration on the theory that they would grow into the protocol later, and then spend the next six months managing the extra process. If you have one integration, you may not need the protocol yet. The protocol does not start paying for itself until n is at least three or four.

The second cost is the maturity gap between the spec and the libraries. The spec is good. The libraries — across languages, across frameworks — are at very different levels of polish. The TypeScript SDK is the most mature I have worked with. The Python SDK has improved meaningfully in the last six months but still has rough edges around error semantics. Several community libraries are excellent for the common case and very thin for the uncommon cases. If you are working in a less popular language, expect to write more glue than the protocol pitch implies. Budget accordingly.

The third cost is the debugging story. MCP, like any RPC layer, hides a lot of complexity behind a clean interface. When the interface lies — when a tool reports success but the underlying operation failed, when a resource fetch returns stale content, when the agent’s framework misrepresents what it sent — you are debugging across three processes (your agent, your MCP client, your MCP server) and the logs do not always line up. I have spent more time than I want to admit putting print statements in MCP servers to figure out what the agent thought it was asking for. The tooling is improving. It is not at the level where debugging is fun yet.

The decision: when to reach for MCP

This is the part where I will be direct, because most of the writing on MCP in the last year has not been.

Reach for MCP if any of the following are true. You are integrating against three or more external systems. You expect the number of integrations to grow. You are an agency or platform whose product is “integrate with arbitrary customer systems,” in which case MCP is doing most of your integration work for you and you should be all-in. You are running a multi-agent system where multiple agents need access to the same tools and you want a single source of truth for tool definitions. You are publishing tools that other teams will consume, in which case the protocol is the right contract.

Do not reach for MCP yet if any of the following are true. You have one integration and it is staying that way. You are on a deadline measured in weeks for the first version of your product, and the overhead of a second process is not worth the protocol cleanliness. You are working in a language whose MCP SDK is not yet stable and you do not have the bandwidth to do glue work. Your team’s debugging culture cannot yet absorb a multi-process tracing problem.

The decision is more “is the protocol the right level for your problem” than “is the protocol good.” It is good. It is also not free.

Three patterns I would use again

I want to leave you with three patterns I have shipped on top of MCP that have held up.

The first is the read-write split. I configure each tool server with a clear declaration of which of its tools are read-only and which mutate state. The agent’s framework treats read-only tools as freely callable and mutating tools as requiring human-on-the-loop approval. The split is enforced at the server, not at the agent, which means a misconfigured agent cannot accidentally bypass the gate. The pattern has caught at least three would-be incidents in the last year for me, and I would not run an MCP-backed agentic system without it now.

# Server side: declare side-effect semantics explicitly.
@tool(side_effect="read_only")
def list_orders(customer_id: str) -> list[Order]: ...

@tool(side_effect="mutates", requires_approval=True)
def refund_order(order_id: str, amount: float) -> Refund: ...

The second is server-side rate limiting. Models will retry. Frameworks will retry. Agents will retry. The MCP server is the place that knows how much load the underlying service can take and is also the place that knows the agent’s calling pattern. Doing the rate-limiting at the server, rather than at the agent or the framework, keeps the limits in the place they are most enforceable. I treat MCP servers as a quiet kind of API gateway, and they earn their keep that way.

The third is strict input validation with helpful errors. The model will try to call your tools with the wrong arguments. It will hallucinate parameters. It will pass strings where you wanted ints. Validate strictly. When validation fails, return an error message that tells the model exactly what was wrong and what would have been valid. Models are surprisingly good at recovering from clear errors and surprisingly bad at recovering from vague ones. A few extra lines of error text per tool will save you hours of debugging.

server.tool(
  "create_invoice",
  {
    customer_id: z.string().regex(/^cust_[a-z0-9]+$/),
    amount_cents: z.number().int().positive(),
    currency: z.enum(["USD", "EUR", "GBP", "THB"]),
  },
  async (input) => {
    // ... validate succeeded; do the work ...
  },
  {
    onValidationError: (err) => ({
      isError: true,
      content: [{
        type: "text",
        text: `Invalid input: ${err.message}. ` +
              `Expected customer_id like "cust_abc123", ` +
              `amount_cents as a positive integer, ` +
              `currency as one of: USD, EUR, GBP, THB.`,
      }],
    }),
  }
);

A word on the wider stack

The MCP discussion is part of a wider stack question, which is how much of the orchestration layer the team owns versus how much it consumes. The most polished version of the consume-it pattern I have seen is the agentic workforce OS approach — running your agents on top of a platform that already owns the tool layer, the handoff layer, and the memory model. The trade-off is the one we discussed in this issue’s earlier piece: less direct control over each primitive, less abstraction debt across the stack.

If you are building on MCP because you want the protocol but not the rest of the platform, that is a fine reason. If you are building on MCP because you are stitching together every layer of your agentic stack from scratch, I would invite you to evaluate one of the packaged workforce-OS platforms before committing. MCP is the right protocol. It is not, by itself, the right level of abstraction for your team’s agentic stack. It is a primitive, and primitives reward teams that have decided which primitives they actually need.

That is the year’s report. Next issue I will write about the operational side of MCP — how to deploy servers in a small team, how to monitor them, and what the cost-of-ownership math actually looks like. For now: use MCP when it earns its keep. Skip it when it does not. And do not, under any circumstance, trust a benchmark someone posts about MCP without a methodology section.

— Reza Mokhtari