All articles
platform engineering · advanced ·

MCP in production: transports, server economics, and security patterns

A practical decision guide for deploying MCP servers covering transport selection, who pays for the LLM calls, filesystem security, and the mental model behind the three primitives.

claudellmmcpproductionprotocol-design

A developer deploying an MCP server faces three decisions that most tutorials don’t cover: which transport to use, who pays for the LLM calls, and how filesystem access gets secured. This guide walks through each one.

The three-primitive mental model

Before diving into transport decisions, it helps to understand MCP’s three primitives through the lens of who controls invocation:

MCP primitives by controller
PrimitiveControlled byUse case
ToolsModel (Claude)Give Claude new autonomous capabilities
ResourcesApplication codeFeed data into prompts or UI elements
PromptsUserUser-triggered, pre-built workflow templates

This matters because it determines where logic lives. Tools need robust schemas (the model decides when to invoke them). Resources benefit from caching (the app controls fetch timing). Prompts are UX surfaces (users choose when to apply them).

The transport decision matrix

MCP supports three transports. Each trades off bidirectional communication against deployment simplicity.

MCP transport comparison
TransportBidirectionalDeploymentBest for
STDIOFull (client ↔ server)Same machine onlyDevelopment, local tools
StreamableHTTP (stateful)Full (via SSE workaround)Remote, single instanceRemote servers needing full features
StreamableHTTP (stateless)Client → server onlyRemote, horizontally scaledHigh-scale deployments that can sacrifice sampling/progress

STDIO: the baseline

The client launches the server as a subprocess and communicates over stdin/stdout. After a three-message handshake (Initialize Request, Initialize Result, Initialized Notification), both sides can freely send requests and responses. This is the ideal transport — full bidirectional communication with no HTTP workarounds — but it only works when client and server run on the same machine.

Stateful StreamableHTTP: the SSE workaround

HTTP is designed for client-to-server requests, but MCP needs the server to initiate communication too (for sampling, logging, and progress notifications). StreamableHTTP solves this with Server-Sent Events over a long-lived GET connection.

The system creates two SSE connections:

  • Primary SSE connection: Stays open indefinitely for server-initiated requests
  • Tool-specific SSE connection: Created per tool call for progress and results, closes automatically

A session ID returned in the Initialize Result header ties them together and must appear in every subsequent request.

Stateless StreamableHTTP: when scaling forces tradeoffs

When you put the server behind a load balancer, all requests must route to the same instance (session affinity) because the SSE connections are tied to a specific server. Setting stateless_http=true eliminates session affinity requirements — but disables sampling, progress notifications, logging, and subscriptions.

Sampling economics: who pays for the LLM call?

Sampling is the mechanism that lets an MCP server request an LLM call through the client, rather than holding its own API key.

On the server side:

server-sampling.py python
result = await ctx.session.create_message(
    messages=[{"role": "user", "content": "Summarize this document"}],
    max_tokens=500
)

On the client side, a sampling callback handles the actual API call:

client-sampling-callback.py python
async def sampling_callback(context, params):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        messages=params.messages,
        max_tokens=params.max_tokens
    )
    return response

This pattern exists for a concrete economic reason: public MCP servers should not foot the bill for every user’s LLM calls. The server delegates token costs to the client, where each user or organization manages their own API budget. For internal tools, this keeps cost attribution clean; for public servers, it makes the economics sustainable.

Roots: filesystem security as a UX pattern

Roots let an MCP server access specific directories on the user’s machine. The key insight: roots are a UX concept, not just a permission mechanism.

A user can say “convert biking.mp4” without providing a full path. Claude uses list_roots to discover allowed directories, then uses file-reading tools to locate the file within those boundaries. This bridges the gap between how users describe files and where files actually live.

Notifications: real-time feedback for long-running tools

The Context object in the Python SDK provides two feedback channels:

progress-notifications.py python
@mcp.tool()
async def process_large_dataset(file_path: str, ctx: Context):
    await ctx.info("Starting analysis...")
    for i, chunk in enumerate(chunks):
        await ctx.report_progress(i, len(chunks))
        # process chunk
    await ctx.info("Analysis complete")

On the client side, logging and progress use separate callbacks:

  • Logging handler: Set once at the session level
  • Progress handler: Set per individual tool call

This granularity matters for production UIs — you might want global logging but per-request progress bars.

Takeaways

Transport choice sets the feature envelope

STDIO fits local development, stateful HTTP preserves full remote features, and stateless HTTP trades features for horizontal scale.

Sampling moves cost to the client

Public MCP servers should request model calls through the client so each user or organization pays for its own tokens.

Roots need explicit enforcement

Roots improve discoverability and permissions UX, but every file operation still needs an application-level path guard.

Primitive ownership matters

Tools are model-controlled, resources are app-controlled, and prompts are user-controlled, so each primitive carries different design obligations.