AI
MCP Servers in Production: When to Build, When to Skip
Model Context Protocol is the best primitive Anthropic shipped in 2025 — and the most over-applied. A field guide to when an MCP server is the right answer, when it's overkill, and how to operate one in production on Cloudflare Workers.
The Model Context Protocol (MCP) is the best primitive Anthropic shipped in 2025. It’s also the most over-applied.
In the year since launch, half the AI consultancies on the planet have a “we build MCP servers” line on their site, and a substantial fraction of those servers should never have been built. They’re agents in disguise, or HTTP APIs in disguise, or — most often — answers in search of a problem.
This is a field guide for deciding when MCP is the right answer, what shape the implementation should take, and how to operate one in production. We’re opinionated because we’ve watched teams over-engineer this primitive in three distinct ways, and the cleanup is painful.
What MCP actually is
A one-paragraph orientation, since the docs spread this across thirty pages.
MCP is a uniform API contract between an AI client (Claude Desktop, Cursor, your custom agent) and a server that exposes capabilities to that client. The capabilities are: tools (functions the LLM can call), resources (data the LLM can read), and prompts (parameterized templates). The transport is JSON-RPC, either over stdio (local) or HTTP/SSE (remote). The client and server can be written in any language. Anthropic publishes SDKs for TypeScript, Python, and several others.
That’s it. MCP is not magic. It’s a standard for plugging tools into AI clients, the way OpenAPI is a standard for plugging APIs into HTTP clients. The interesting question is when to use it.
When you should build an MCP server
There are exactly four scenarios where MCP earns its complexity. If your situation isn’t one of these, build something simpler.
1. The same tools need to be available to multiple AI clients
You have a set of capabilities — read from your CRM, query your warehouse, file a Linear ticket — that your team wants to use from Claude Desktop, from Cursor, from a custom agent in your product, and possibly from a future client that doesn’t exist yet. Writing the integration once as MCP and pointing every client at the same server beats writing four flavors of OpenAI function-calling and three flavors of Anthropic tool-use.
This is the strongest case for MCP. The protocol earns its weight when “multiple consumers” is real, not aspirational.
2. You want LLM-driven tools to be reusable across the organization
Internal use case: developers want to give Claude access to your company’s internal docs, support tickets, billing data. Building a single MCP server that any internal AI client can talk to means new tools onboard once. The alternative — every team rebuilding their own retrieval and tool-use scaffolding — is the ergonomics equivalent of every team running their own Postgres install.
3. The tools must outlive any specific agent implementation
You’re building tools you expect to use for years, but you don’t yet know which agent framework you’ll standardize on. MCP gives you a stable contract: change the agent, the tools keep working. If you encode tools directly into a specific agent SDK’s function-calling format, you’ve coupled tool lifetime to agent lifetime.
4. You’re publishing capabilities for external developers
You want third parties to build agents that can use your product’s tools. Publishing an MCP server is the cleanest distribution model — analogous to publishing an API, but designed for LLM consumption with all the tool / resource / prompt vocabulary already in place.
When you shouldn’t build one
The mirror cases. If you find yourself building an MCP server in any of these scenarios, you’ve taken a wrong turn.
1. There’s exactly one consumer and it’s your own agent
If the only thing that will ever talk to this server is an agent you also wrote, MCP is overhead. Use the agent SDK’s function-calling directly. You’re not getting any of the benefits of the standard — you’re just paying the cost.
2. You need stateful, session-aware behaviour
MCP servers are best when they’re stateless or when state is externalized (database, KV). If your “tool” needs to remember context across calls within a single conversation, you’re building an agent, not an MCP server. The protocol can model this — but you’ll fight it the whole way.
3. The thing you want to expose is a CRUD API
An MCP server that wraps every endpoint of your existing REST API, one tool per endpoint, is a smell. The LLM has to learn the entire surface area, and you’ve doubled your API maintenance burden. Better: build a small MCP server with 5–10 high-level capabilities (searchCustomers, summarizeAccount, flagForReview) on top of the existing API.
4. You want it because the term sounds modern
Resist this instinct. We’ve reviewed engineering plans where the deliverable was “stand up an MCP server” and the actual problem could have been solved with a single OpenAPI spec and three tool-use definitions. The discipline question to ask: “What can my client do with this MCP server that they couldn’t do with a 50-line wrapper around the same backend?” If the answer is nothing, don’t build the MCP server.
Architectural choices
If you’re past the “should we” question and into the “how”, three decisions drive everything else.
Stdio vs HTTP/SSE
- Stdio is for tools that run on the user’s machine, alongside Claude Desktop or Cursor. They have access to local files, local credentials, the local network. Distribution is via a binary or a
nodescript the user installs. - HTTP/SSE is for remote servers that any client (with the URL and an auth token) can talk to. The right choice for centralized internal tools, multi-tenant SaaS, and anything that lives in the cloud.
For our clients, the answer is almost always HTTP/SSE on Cloudflare Workers — globally distributed, scale-to-zero, no operational overhead.
Stateful vs stateless
Default to stateless. Each tool call is a fresh fetch; any persistence happens in D1, Durable Objects, or KV — not in the server’s memory. Stateless servers are trivially distributable, debuggable, and operationally boring.
Use stateful (typically via Durable Objects on Cloudflare) only when the tool semantics genuinely require it: live cursors in a collaborative document, in-flight workflows that span multiple LLM turns, real-time agent coordination.
Auth model
Three options, in order of preference for a remote MCP server:
- OAuth 2.1 — the MCP spec’s preferred path. The user grants the AI client a token; the AI client passes the token to the MCP server. Audit-friendly, revocable, scoped.
- API key in header — pragmatic for internal use cases or trusted third parties. We use this for client engagements where OAuth ceremony is overkill.
- mTLS — for B2B integrations between known machines. Heavier but completely sidesteps the token-leak risk.
What you should not do is rely on the AI client to keep secrets in plaintext config files. We’ve seen it. It’s bad. The leak vector is real.
A small MCP server in 80 lines
Here’s a remote MCP server on Cloudflare Workers that exposes one tool: getCompanyFacts. It reads from D1, returns structured results, and supports HTTP transport with bearer-token auth.
// src/mcp.ts
import type { D1Database } from '@cloudflare/workers-types';
interface Env {
DB: D1Database;
MCP_TOKEN: string;
}
interface JsonRpcRequest {
jsonrpc: '2.0';
id: number | string;
method: string;
params?: unknown;
}
const TOOLS = [
{
name: 'getCompanyFacts',
description: 'Look up structured facts about a company by ticker symbol.',
inputSchema: {
type: 'object',
properties: { ticker: { type: 'string', description: 'NYSE/NASDAQ ticker' } },
required: ['ticker'],
},
},
];
async function handle(req: JsonRpcRequest, env: Env) {
if (req.method === 'initialize') {
return {
jsonrpc: '2.0',
id: req.id,
result: {
protocolVersion: '2024-11-05',
serverInfo: { name: 'company-facts', version: '1.0' },
capabilities: { tools: {} },
},
};
}
if (req.method === 'tools/list') {
return { jsonrpc: '2.0', id: req.id, result: { tools: TOOLS } };
}
if (req.method === 'tools/call') {
const { name, arguments: args } = req.params as {
name: string;
arguments: { ticker: string };
};
if (name !== 'getCompanyFacts') {
return {
jsonrpc: '2.0',
id: req.id,
error: { code: -32601, message: `unknown tool: ${name}` },
};
}
const row = await env.DB.prepare(
'SELECT ticker, name, sector, market_cap FROM companies WHERE ticker = ?'
).bind(args.ticker.toUpperCase()).first();
return {
jsonrpc: '2.0',
id: req.id,
result: {
content: [
{
type: 'text',
text: row
? JSON.stringify(row, null, 2)
: `No company found with ticker ${args.ticker}`,
},
],
},
};
}
return {
jsonrpc: '2.0',
id: req.id,
error: { code: -32601, message: `unknown method: ${req.method}` },
};
}
export default {
async fetch(req: Request, env: Env): Promise<Response> {
if (req.method !== 'POST') return new Response('method not allowed', { status: 405 });
// Bearer token auth — see the Auth section.
const auth = req.headers.get('authorization');
if (auth !== `Bearer ${env.MCP_TOKEN}`) {
return new Response('unauthorized', { status: 401 });
}
const rpc = await req.json<JsonRpcRequest>();
const out = await handle(rpc, env);
return Response.json(out);
},
};
This is functional. Plug the URL into Claude Desktop’s MCP config (or any MCP client), and getCompanyFacts shows up as a tool. The Worker handles auth, query, response. No SDK, no framework, no abstraction we don’t understand. About 80 lines.
For production we’d add: structured logging, rate limiting (Cloudflare’s built-in), input validation with Zod, and graceful degradation when D1 is slow. None of those are MCP-specific — they’re standard production hygiene.
Operating an MCP server in production
The interesting parts of running an MCP server in production are the same as running any other API:
Observability. Log every JSON-RPC method call, every tool invocation, every error code. Workers Logs is free; ship them to Logpush if you need long retention.
Caching. Tool results are often expensive to compute and cacheable. Cache aggressively at the Worker layer with short TTLs (30–300 seconds). The LLM’s request patterns are bursty and re-fetch the same things repeatedly.
Rate limiting. An LLM client can call your tools in a tight loop if a prompt goes wrong. Cloudflare’s Rate Limiting handles this without code.
Versioning. When you change a tool’s input schema, version the server. The MCP protocolVersion field exists for exactly this. Don’t break existing clients silently.
Cost discipline. D1 reads are cheap. LLM-driven tool calls can blow your bill if a tool internally calls another LLM. Add per-token / per-minute caps that fail loudly.
Anti-patterns we’ve seen
A few things to avoid, drawn from MCP servers we’ve reviewed in the wild:
Tool soup. 30 tools, each one a thin wrapper around an HTTP endpoint. The LLM can’t reason about which to use. Better: 5 high-level tools that encapsulate workflows.
Implicit state. A server that returns different results to the same query depending on which tool the user called previously, with no parameter to express it. This is a stateful agent dressed as a server. Make it explicit, or make it stateless.
Slow tools. Tools that take 30 seconds to return. The LLM client times out. Either the tool needs a queue + polling pattern (return a job ID, expose a getJobStatus tool), or it needs to be faster.
Output that overflows the context window. A tool that dumps a 50,000-token JSON response. The model truncates, hallucinates. Tools should return summaries by default, with explicit pagination tools for full data.
Bottom line
MCP is the right answer when you have multiple AI clients, when capabilities outlive any specific agent, when you’re publishing tools to external developers. It’s the wrong answer when there’s one consumer, when state is implicit, or when you’re really just wrapping a CRUD API.
When it is the right answer, the right transport is HTTP/SSE on Cloudflare Workers. The right state model is stateless with externalised persistence. The right auth is OAuth 2.1 — or bearer token if the audience is trusted. The right shape is 5–10 high-level tools, not 30 thin wrappers.
The template, deploy-ready
The reference server above is published as an open-source Cloudflare Workers template:
github.com/setkernel/cf-mcp-template — clone, set a bearer-token secret,
npm run deploy, and you have a working MCP server with one sample tool. Drop-in connection instructions for Claude Desktop and Cursor included. MIT licensed.
The repo includes JSON-RPC 2.0 dispatch, structured error handling with proper RPC codes, a landing page for visitors who hit the URL directly, and a production-hardening checklist in the README (rate limiting, caching, output-size guards).
If you’re considering an MCP server and want a written second opinion before you build, we do MCP architecture reviews and reference implementations — scoped per project, in writing. Write a brief.