What is MCP?

The Model Context Protocol (MCP) is Anthropic's open specification for connecting AI models to external data sources, tools, and services. If you've used Claude Desktop with file access or GitHub integration, you've used MCP — it's the underlying protocol that lets models interact with real-world systems beyond their training data.

MCP has become the de-facto standard for AI tool integration in 2026, with adoption accelerating across the industry. Major players including Block, Cedar, and GitHub have built MCP servers for their platforms. The result: production AI systems are increasingly built on MCP connections — and when those connections fail, your AI assistant goes dark.

Monitoring MCP is now a critical infrastructure concern.

This guide covers what MCP monitoring looks like in production, what metrics matter, and how to build observability into your MCP-powered stack.


Why MCP Changes Observability Requirements

Traditional AI applications are self-contained. You send a prompt, the model responds, you log the interaction. Simple.

MCP breaks that model. When an LLM uses an MCP server to fetch real-time data, execute tools, or query databases, the response latency and reliability depend on external systems you don't directly control.

Consider what can go wrong:

  • MCP server goes down → The model loses access to tools and starts failing or returning degraded answers
  • Server response latency spikes → Every user query becomes slow, and you won't know until complaints roll in
  • Rate limit exceeded → Your AI assistant silently starts refusing requests
  • Server returns malformed data → The model receives corrupted context and produces wrong outputs
  • Network partition → The model appears to hang indefinitely

Without monitoring, you have no visibility into any of this. Your users experience a "broken AI" without you knowing why.

The MCP Architecture

A typical MCP setup has three components:

  1. MCP Host — The AI application (Claude Desktop, an AI agent framework, your custom app)
  2. MCP Client — The client library that manages connections to servers
  3. MCP Server — The server exposing tools/resources via the MCP specification

When a model calls a tool (like github.create_issue or filesystem.read_file), the request flows through this chain. Monitoring must cover each hop.


Core Metrics for MCP Observability

1. Server Availability and Uptime

Track whether your MCP servers are reachable. This sounds obvious but MCP servers are often stateless services that can crash or become unreachable without alerting.

What to monitor:

  • HTTP health endpoint (/health or similar)
  • Connection success rate from MCP client
  • Server process health (if running as a local server)

Alerting threshold: If a server is unreachable for > 30 seconds, alert on-call.

2. Request Volume and Error Rates

Count MCP calls and categorize outcomes:

Metric Description
mcp.requests.total Total MCP requests (by server, by tool)
mcp.errors.total Failed requests (4xx, 5xx, timeouts)
mcp.error_rate errors / requests ratio
mcp.tool_usage Per-tool call frequency

High error rates on specific tools indicate problems — either the tool is broken or the upstream service (e.g., GitHub API) is having issues.

3. Response Latency

MCP tool calls add latency to every AI response. If a tool call takes 5 seconds, the user's AI response is delayed by at least 5 seconds.

What to monitor:

  • mcp.latency.p50, p95, p99 by server and tool
  • Slow tool calls (> 2s) flagged separately
  • Latency breakdown: network vs server processing

Target: p99 MCP tool call latency < 1 second. Anything above 3s is a user experience problem.

4. Tool Availability

When an MCP server is up but certain tools are throttled or returning errors, that's different from server-down. You need tool-level availability tracking:

  • Which tools are throwing errors?
  • Which tools are returning rate limit errors (429)?
  • Are there tools that have been completely removed from a server?

5. Context Size and Token Usage

MCP resources add context to LLM requests. A single file read might add 10,000 tokens to your context. Monitor:

  • Average context size per MCP request
  • Token cost attribution by MCP server and tool
  • Context size outliers (> 50k tokens per request)

Large context = higher LLM costs. Know what's driving your token bills.


Implementing MCP Observability

Option 1: Custom Instrumentation with OpenTelemetry

The cleanest approach is instrumenting your MCP client with OpenTelemetry. Most MCP SDKs support this.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

# Instrument your MCP client
tracer = trace.get_tracer(__name__)

class ObservedMCPClient:
    def __init__(self, server_url):
        self.client = MCPClient(server_url)
    
    async def call_tool(self, tool_name, params):
        with tracer.start_as_current_span(f"mcp.{tool_name}") as span:
            span.set_attribute("mcp.server", self.server_url)
            span.set_attribute("mcp.tool", tool_name)
            
            start = time.time()
            try:
                result = await self.client.call_tool(tool_name, params)
                span.set_attribute("mcp.success", True)
                return result
            except Exception as e:
                span.set_attribute("mcp.success", False)
                span.set_attribute("mcp.error", str(e))
                raise
            finally:
                duration = time.time() - start
                span.set_attribute("mcp.duration_ms", duration * 1000)

Send spans to your observability backend (Grafana, Datadog, Honeycomb).

Option 2: MCP Server Middleware

If you control the MCP server, add middleware that emits metrics for every request:

# Example MCP server middleware (Python)
from functools import wraps

def metrics_middleware(next_handler):
    @wraps(next_handler)
    async def handler(request, ctx):
        start = time.time()
        try:
            result = await next_handler(request, ctx)
            METRICS["requests_total"].labels(
                server=ctx.server_name,
                tool=request.tool,
                status="success"
            ).inc()
            return result
        except Exception as e:
            METRICS["requests_total"].labels(
                server=ctx.server_name,
                tool=request.tool,
                status="error"
            ).inc()
            raise
        finally:
            METRICS["request_duration"].labels(
                server=ctx.server_name,
                tool=request.tool
            ).observe(time.time() - start)
    return handler

Option 3: Use an MCP-Powered Observability Platform

Several platforms have already built MCP servers for their own tools — meaning you can use an LLM to query your observability data via MCP. This is meta but useful:

  • Grafana MCP Server — Query dashboards and alerts via AI
  • PagerDuty MCP Server — Manage incidents via AI
  • Datadog MCP Server — Query metrics and logs conversationally

If you're already using these platforms, their MCP servers let you build AI-powered incident investigation workflows — with the MCP server itself being something to monitor.


Common MCP Failure Patterns and How to Detect Them

Pattern 1: Cascade Failure from Upstream API

An MCP server wraps a third-party API (GitHub, Slack, a database). When that API goes down or rate-limits, the MCP server starts returning errors. The LLM keeps calling it, generating error logs but no useful output.

Detection:

  • mcp.errors spikes with rate_limit or upstream_unavailable labels
  • Server is up (HTTP 200) but tools return 429 or 503

Response:

  • Alert on rate limit errors specifically
  • Consider circuit-breaking the MCP server when upstream is degraded

Pattern 2: Stale Context from Long-Poll Resources

MCP resources like file systems or database queries can return stale data. If a resource is cached and the underlying data changes, the model operates on outdated information.

Detection:

  • Monitor the mcp.resource.age metric — how old is the cached resource data?
  • Log resource refresh events and compare with data change events in the upstream system

Response:

  • Implement TTL-based cache invalidation on MCP servers
  • Alert when resource.age exceeds threshold for critical resources

Pattern 3: Tool Call Storms

A model with retry logic might repeatedly call a failing tool, generating a traffic spike on the MCP server and the upstream API.

Detection:

  • mcp.tool_calls_per_minute spikes unusually
  • Same tool called > 10 times in a 60-second window

Response:

  • Implement exponential backoff in the MCP client
  • Add per-tool rate limiting with feedback to the model ("rate limited, retry after 30s")

Pattern 4: Latency Regression After Server Update

MCP servers are often independently deployed services. An update might introduce a regression that slows down all tool calls by 2-3x.

Detection:

  • Track mcp.latency.p95 over time per server version
  • Alert when p95 latency increases > 50% week-over-week

Response:

  • Tag MCP server deployments with version numbers in metrics
  • Use canary deployments for MCP servers

Building an MCP Monitoring Dashboard

Here's the minimum viable dashboard for MCP observability:

Row 1 — Overview

  • Total MCP requests/minute
  • Error rate (% of requests that failed)
  • p99 latency

Row 2 — Per-Server Metrics

  • Requests per server (bar chart)
  • Error rate per server
  • Latency per server

Row 3 — Tool-Level Breakdown

  • Top 10 most-used MCP tools
  • Tools with highest error rates
  • Slowest tools (p99 > 2s)

Row 4 — Context Usage

  • Average tokens per MCP request
  • Servers/tools driving highest token usage

Alerting Rules for MCP

Alert Condition Severity
MCP server down Server unreachable for > 60s P1
High error rate mcp.error_rate > 5% over 5 min P2
Slow response mcp.latency.p99 > 5s over 10 min P2
Rate limit hit Any 429 response from MCP server P2
Tool unavailable Tool returns error for > 10% of calls P2
Token usage spike mcp.tokens_per_hour > 2x baseline P3

The MCP Observability Stack

Most teams building serious MCP infrastructure use:

  • Prometheus + Grafana — Open-source metrics and dashboards
  • OpenTelemetry — Trace instrumentation for MCP calls
  • Grafana Tempo — Distributed tracing for cross-service requests
  • PagerDuty — Alert routing for MCP-related incidents

If you're using Datadog, the Datadog Agent can auto-discover MCP servers running on your infrastructure and start collecting metrics without additional configuration.


Conclusion

MCP is moving from novelty to production infrastructure in 2026. As AI systems become more deeply integrated with external tools and data sources, the reliability of those integrations becomes critical.

The monitoring patterns are straightforward — availability, latency, error rates, and context size — but most teams haven't implemented them yet. Building MCP observability now means you're ahead of the curve when MCP becomes as standard as REST APIs in production AI systems.


Related Articles


Affiliate Disclosure: This article contains affiliate links to tools and services we recommend. We may earn a commission at no additional cost to you if you sign up through our links.