What is MCP?
The Model Context Protocol (MCP) is Anthropic's open specification for connecting AI models to external data sources, tools, and services. If you've used Claude Desktop with file access or GitHub integration, you've used MCP — it's the underlying protocol that lets models interact with real-world systems beyond their training data.
MCP has become the de-facto standard for AI tool integration in 2026, with adoption accelerating across the industry. Major players including Block, Cedar, and GitHub have built MCP servers for their platforms. The result: production AI systems are increasingly built on MCP connections — and when those connections fail, your AI assistant goes dark.
Monitoring MCP is now a critical infrastructure concern.
This guide covers what MCP monitoring looks like in production, what metrics matter, and how to build observability into your MCP-powered stack.
Why MCP Changes Observability Requirements
Traditional AI applications are self-contained. You send a prompt, the model responds, you log the interaction. Simple.
MCP breaks that model. When an LLM uses an MCP server to fetch real-time data, execute tools, or query databases, the response latency and reliability depend on external systems you don't directly control.
Consider what can go wrong:
- MCP server goes down → The model loses access to tools and starts failing or returning degraded answers
- Server response latency spikes → Every user query becomes slow, and you won't know until complaints roll in
- Rate limit exceeded → Your AI assistant silently starts refusing requests
- Server returns malformed data → The model receives corrupted context and produces wrong outputs
- Network partition → The model appears to hang indefinitely
Without monitoring, you have no visibility into any of this. Your users experience a "broken AI" without you knowing why.
The MCP Architecture
A typical MCP setup has three components:
- MCP Host — The AI application (Claude Desktop, an AI agent framework, your custom app)
- MCP Client — The client library that manages connections to servers
- MCP Server — The server exposing tools/resources via the MCP specification
When a model calls a tool (like github.create_issue or filesystem.read_file), the request flows through this chain. Monitoring must cover each hop.
Core Metrics for MCP Observability
1. Server Availability and Uptime
Track whether your MCP servers are reachable. This sounds obvious but MCP servers are often stateless services that can crash or become unreachable without alerting.
What to monitor:
- HTTP health endpoint (
/healthor similar) - Connection success rate from MCP client
- Server process health (if running as a local server)
Alerting threshold: If a server is unreachable for > 30 seconds, alert on-call.
2. Request Volume and Error Rates
Count MCP calls and categorize outcomes:
| Metric | Description |
|---|---|
mcp.requests.total | Total MCP requests (by server, by tool) |
mcp.errors.total | Failed requests (4xx, 5xx, timeouts) |
mcp.error_rate | errors / requests ratio |
mcp.tool_usage | Per-tool call frequency |
High error rates on specific tools indicate problems — either the tool is broken or the upstream service (e.g., GitHub API) is having issues.
3. Response Latency
MCP tool calls add latency to every AI response. If a tool call takes 5 seconds, the user's AI response is delayed by at least 5 seconds.
What to monitor:
mcp.latency.p50,p95,p99by server and tool- Slow tool calls (> 2s) flagged separately
- Latency breakdown: network vs server processing
Target: p99 MCP tool call latency < 1 second. Anything above 3s is a user experience problem.
4. Tool Availability
When an MCP server is up but certain tools are throttled or returning errors, that's different from server-down. You need tool-level availability tracking:
- Which tools are throwing errors?
- Which tools are returning rate limit errors (429)?
- Are there tools that have been completely removed from a server?
5. Context Size and Token Usage
MCP resources add context to LLM requests. A single file read might add 10,000 tokens to your context. Monitor:
- Average context size per MCP request
- Token cost attribution by MCP server and tool
- Context size outliers (> 50k tokens per request)
Large context = higher LLM costs. Know what's driving your token bills.
Implementing MCP Observability
Option 1: Custom Instrumentation with OpenTelemetry
The cleanest approach is instrumenting your MCP client with OpenTelemetry. Most MCP SDKs support this.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
# Instrument your MCP client
tracer = trace.get_tracer(__name__)
class ObservedMCPClient:
def __init__(self, server_url):
self.client = MCPClient(server_url)
async def call_tool(self, tool_name, params):
with tracer.start_as_current_span(f"mcp.{tool_name}") as span:
span.set_attribute("mcp.server", self.server_url)
span.set_attribute("mcp.tool", tool_name)
start = time.time()
try:
result = await self.client.call_tool(tool_name, params)
span.set_attribute("mcp.success", True)
return result
except Exception as e:
span.set_attribute("mcp.success", False)
span.set_attribute("mcp.error", str(e))
raise
finally:
duration = time.time() - start
span.set_attribute("mcp.duration_ms", duration * 1000)
Send spans to your observability backend (Grafana, Datadog, Honeycomb).
Option 2: MCP Server Middleware
If you control the MCP server, add middleware that emits metrics for every request:
# Example MCP server middleware (Python)
from functools import wraps
def metrics_middleware(next_handler):
@wraps(next_handler)
async def handler(request, ctx):
start = time.time()
try:
result = await next_handler(request, ctx)
METRICS["requests_total"].labels(
server=ctx.server_name,
tool=request.tool,
status="success"
).inc()
return result
except Exception as e:
METRICS["requests_total"].labels(
server=ctx.server_name,
tool=request.tool,
status="error"
).inc()
raise
finally:
METRICS["request_duration"].labels(
server=ctx.server_name,
tool=request.tool
).observe(time.time() - start)
return handler
Option 3: Use an MCP-Powered Observability Platform
Several platforms have already built MCP servers for their own tools — meaning you can use an LLM to query your observability data via MCP. This is meta but useful:
- Grafana MCP Server — Query dashboards and alerts via AI
- PagerDuty MCP Server — Manage incidents via AI
- Datadog MCP Server — Query metrics and logs conversationally
If you're already using these platforms, their MCP servers let you build AI-powered incident investigation workflows — with the MCP server itself being something to monitor.
Common MCP Failure Patterns and How to Detect Them
Pattern 1: Cascade Failure from Upstream API
An MCP server wraps a third-party API (GitHub, Slack, a database). When that API goes down or rate-limits, the MCP server starts returning errors. The LLM keeps calling it, generating error logs but no useful output.
Detection:
mcp.errorsspikes withrate_limitorupstream_unavailablelabels- Server is up (HTTP 200) but tools return 429 or 503
Response:
- Alert on rate limit errors specifically
- Consider circuit-breaking the MCP server when upstream is degraded
Pattern 2: Stale Context from Long-Poll Resources
MCP resources like file systems or database queries can return stale data. If a resource is cached and the underlying data changes, the model operates on outdated information.
Detection:
- Monitor the
mcp.resource.agemetric — how old is the cached resource data? - Log resource refresh events and compare with data change events in the upstream system
Response:
- Implement TTL-based cache invalidation on MCP servers
- Alert when
resource.ageexceeds threshold for critical resources
Pattern 3: Tool Call Storms
A model with retry logic might repeatedly call a failing tool, generating a traffic spike on the MCP server and the upstream API.
Detection:
mcp.tool_calls_per_minutespikes unusually- Same tool called > 10 times in a 60-second window
Response:
- Implement exponential backoff in the MCP client
- Add per-tool rate limiting with feedback to the model ("rate limited, retry after 30s")
Pattern 4: Latency Regression After Server Update
MCP servers are often independently deployed services. An update might introduce a regression that slows down all tool calls by 2-3x.
Detection:
- Track
mcp.latency.p95over time per server version - Alert when p95 latency increases > 50% week-over-week
Response:
- Tag MCP server deployments with version numbers in metrics
- Use canary deployments for MCP servers
Building an MCP Monitoring Dashboard
Here's the minimum viable dashboard for MCP observability:
Row 1 — Overview
- Total MCP requests/minute
- Error rate (% of requests that failed)
- p99 latency
Row 2 — Per-Server Metrics
- Requests per server (bar chart)
- Error rate per server
- Latency per server
Row 3 — Tool-Level Breakdown
- Top 10 most-used MCP tools
- Tools with highest error rates
- Slowest tools (p99 > 2s)
Row 4 — Context Usage
- Average tokens per MCP request
- Servers/tools driving highest token usage
Alerting Rules for MCP
| Alert | Condition | Severity |
|---|---|---|
| MCP server down | Server unreachable for > 60s | P1 |
| High error rate | mcp.error_rate > 5% over 5 min | P2 |
| Slow response | mcp.latency.p99 > 5s over 10 min | P2 |
| Rate limit hit | Any 429 response from MCP server | P2 |
| Tool unavailable | Tool returns error for > 10% of calls | P2 |
| Token usage spike | mcp.tokens_per_hour > 2x baseline | P3 |
The MCP Observability Stack
Most teams building serious MCP infrastructure use:
- Prometheus + Grafana — Open-source metrics and dashboards
- OpenTelemetry — Trace instrumentation for MCP calls
- Grafana Tempo — Distributed tracing for cross-service requests
- PagerDuty — Alert routing for MCP-related incidents
If you're using Datadog, the Datadog Agent can auto-discover MCP servers running on your infrastructure and start collecting metrics without additional configuration.
Conclusion
MCP is moving from novelty to production infrastructure in 2026. As AI systems become more deeply integrated with external tools and data sources, the reliability of those integrations becomes critical.
The monitoring patterns are straightforward — availability, latency, error rates, and context size — but most teams haven't implemented them yet. Building MCP observability now means you're ahead of the curve when MCP becomes as standard as REST APIs in production AI systems.
Related Articles
- Open Source LLM Monitoring Stack: 2026 Guide
- LLM Observability Tools 2026: Full Comparison
- eBPF Observability: Kernel-Level Monitoring for AI Infrastructure
- LLM Incident Postmortem: Template & Real Examples
Affiliate Disclosure: This article contains affiliate links to tools and services we recommend. We may earn a commission at no additional cost to you if you sign up through our links.