Why Dashboard Templates Matter for LLM Monitoring
Your LLM application is live. Requests are flowing. But can you answer these questions at 3 AM during an incident?
- Which model is consuming 80% of your GPU budget right now?
- What's the p95 time-to-first-token for your RAG pipeline?
- Which prompt version is causing hallucination spikes?
- What's your actual cost per 1,000 successful completions?
If you can't, you need a monitoring dashboard. Not a generic APM dashboard retrofitted for AI — a purpose-built LLM observability dashboard that tracks what matters: tokens, latency breakdowns, cost attribution, and quality signals.
This article gives you production-ready Grafana dashboard templates and Prometheus query libraries for LLM monitoring. Copy the JSON, import it into your Grafana instance, and have a working LLM dashboard in under 30 minutes.
The Five Panels Every LLM Dashboard Needs
Before diving into templates, here's the metric taxonomy that every LLM monitoring dashboard should cover:
1. Token Throughput Panel
Tracks total tokens processed per minute, split by input and output. This tells you volume load and detects traffic anomalies.
Prometheus metric: llm_tokens_total (counter, labels: model, provider, direction)
rate(llm_tokens_total[5m]) 2. Latency Breakdown Panel
LLM latency has three components that must be measured separately:
- Time to First Token (TTFT): How fast the model starts responding
- Time per Output Token (TPOT): Generation speed per token
- Total Request Duration: End-to-end latency
Prometheus metrics: llm_ttft_seconds, llm_tpot_seconds, llm_request_duration_seconds
Query for p95 TTFT:
histogram_quantile(0.95, rate(llm_ttft_seconds_bucket[5m])) 3. Cost Attribution Panel
LLM providers bill per token. You need cost visibility by model, provider, and endpoint.
Prometheus metric: llm_cost_total (counter, labels: model, provider)
Query — cost per hour by model:
sum by (model) (rate(llm_cost_total[1h])) * 3600 4. Error Rate Panel
LLM errors include rate limit responses (429), context length exceeded (400), API errors (500), and malformed outputs.
Prometheus metric: llm_requests_total (counter, labels: status_code)
Query — error rate:
sum(rate(llm_requests_total{status_code=~"4..|5.."}[5m])) / sum(rate(llm_requests_total[5m])) 5. Quality Signals Panel
Track semantic quality indicators: hallucination rate (if measured), retrieval precision (for RAG systems), and output validation pass rates.
Prometheus metric: llm_quality_score (gauge, labels: check_type)
Query — 7-day hallucination rate trend:
avg by (check_type) (llm_quality_score{check_type="hallucination"}[1h]) Grafana Dashboard JSON Template
This is a complete Grafana dashboard JSON for LLM monitoring. It includes:
- Row 1: Token throughput (input vs output, by model)
- Row 2: Latency percentiles (p50, p95, p99) for TTFT and total duration
- Row 3: Cost attribution by model and provider
- Row 4: Error rate and status code distribution
- Row 5: Context window utilization (for models with context length limits)
How to Import
- Copy the JSON below
- In Grafana: Dashboards → Import → Paste JSON
- Set your Prometheus data source
- Adjust variable values (
$model,$provider) to match your metric label names
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"panels": [],
"title": "Token Throughput",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Tokens/sec",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Input Tokens"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "blue",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Output Tokens"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "green",
"mode": "fixed"
}
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 1
},
"id": 2,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum by (model) (rate(llm_tokens_total{direction=\"input\"}[5m]))",
"legendFormat": "{{model}} - Input",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum by (model) (rate(llm_tokens_total{direction=\"output\"}[5m]))",
"legendFormat": "{{model}} - Output",
"refId": "B"
}
],
"title": "Token Throughput by Model (5m rate)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50000
},
{
"color": "red",
"value": 100000
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 12,
"y": 1
},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(llm_tokens_total)",
"legendFormat": "Total Tokens",
"refId": "A"
}
],
"title": "Total Tokens Processed (All Time)",
"type": "stat"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 9
},
"id": 5,
"panels": [],
"title": "Latency",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Seconds",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "line"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 2
},
{
"color": "red",
"value": 5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 10
},
"id": 6,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
"legendFormat": "p50 TTFT",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
"legendFormat": "p95 TTFT",
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
"legendFormat": "p99 TTFT",
"refId": "C"
}
],
"title": "Time to First Token (TTFT) Percentiles by Model",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Seconds",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 10
},
"id": 7,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
"legendFormat": "p50 Total",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
"legendFormat": "p95 Total",
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
"legendFormat": "p99 Total",
"refId": "C"
}
],
"title": "Total Request Duration Percentiles by Model",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 18
},
"id": 8,
"panels": [],
"title": "Cost Attribution",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "$/hour",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "bars",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "currencyUSD"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 19
},
"id": 9,
"options": {
"legend": {
"calcs": ["sum"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum by (model, provider) (rate(llm_cost_total[1h])) * 3600",
"legendFormat": "{{model}} ({{provider}})",
"refId": "A"
}
],
"title": "Cost per Hour by Model and Provider",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50
},
{
"color": "red",
"value": 200
}
]
},
"unit": "currencyUSD"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 12,
"y": 19
},
"id": 10,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(rate(llm_cost_total[1h])) * 720",
"legendFormat": "Est. Daily Cost",
"refId": "A"
}
],
"title": "Estimated Daily Cost",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 1000
},
{
"color": "red",
"value": 5000
}
]
},
"unit": "currencyUSD"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 18,
"y": 19
},
"id": 11,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(rate(llm_cost_total[1h])) * 720 * 30",
"legendFormat": "Est. Monthly Cost",
"refId": "A"
}
],
"title": "Estimated Monthly Cost",
"type": "stat"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 27
},
"id": 13,
"panels": [],
"title": "Error Rates & Health",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Error Rate (%)",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "line"
}
},
"mappings": [],
"max": 100,
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 1
},
{
"color": "red",
"value": 5
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 28
},
"id": 14,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "100 * sum by (model) (rate(llm_requests_total{status_code=~\"4..|5.."}[5m])) / sum by (model) (rate(llm_requests_total[5m]))",
"legendFormat": "{{model}} Error Rate",
"refId": "A"
}
],
"title": "Error Rate by Model (%)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
}
},
"mappings": []
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 28
},
"id": 15,
"options": {
"displayLabels": ["name", "value"],
"legend": {
"displayMode": "table",
"placement": "right",
"showLegend": true,
"values": ["value"]
},
"pieType": "pie",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum by (status_code) (rate(llm_requests_total[5m]))",
"legendFormat": "{{status_code}}",
"refId": "A"
}
],
"title": "Status Code Distribution (5m rate)",
"type": "piechart"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 36
},
"id": 16,
"panels": [],
"title": "Context Window Utilization",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "% of Max Context",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "line"
}
},
"mappings": [],
"max": 100,
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 80
},
{
"color": "red",
"value": 95
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 37
},
"id": 17,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "100 * avg by (model) (llm_context_tokens_used / llm_context_tokens_max)",
"legendFormat": "{{model}} - Avg Context %",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "100 * max by (model) (llm_context_tokens_used / llm_context_tokens_max)",
"legendFormat": "{{model}} - Max Context %",
"refId": "B"
}
],
"title": "Context Window Utilization by Model (%)",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": ["llm", "observability", "grafana", "prometheus"],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Prometheus",
"value": "prometheus"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h"]
},
"timezone": "browser",
"title": "LLM Monitoring Dashboard",
"uid": "llm-monitoring-dashboard",
"version": 1,
"weekStart": ""
} Prometheus Metrics Library for LLM Monitoring
The dashboard above requires these Prometheus metrics. Here's the complete metrics library you need to instrument your LLM application using the OpenTelemetry SDK:
Core Metrics (from prometheus-client or OpenTelemetry)
from prometheus_client import Counter, Histogram, Gauge
# Token counters
llm_tokens_total = Counter(
'llm_tokens_total',
'Total tokens processed',
['model', 'provider', 'direction'] # direction: input | output
)
# Latency histograms
llm_ttft_seconds = Histogram(
'llm_ttft_seconds',
'Time to first token in seconds',
['model', 'provider'],
buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)
llm_request_duration_seconds = Histogram(
'llm_request_duration_seconds',
'Total request duration in seconds',
['model', 'provider'],
buckets=[0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0]
)
# Cost counter (in USD)
llm_cost_total = Counter(
'llm_cost_total',
'Total API cost in USD',
['model', 'provider']
)
# Request counter
llm_requests_total = Counter(
'llm_requests_total',
'Total API requests',
['model', 'provider', 'status_code']
)
# Context window utilization
llm_context_tokens_used = Gauge(
'llm_context_tokens_used',
'Average tokens used per request in the last bucket',
['model', 'provider']
)
llm_context_tokens_max = Gauge(
'llm_context_tokens_max',
'Maximum context window size for the model',
['model', 'provider']
)
# Quality metrics (optional, for RAG/LLM-as-judge setups)
llm_quality_score = Gauge(
'llm_quality_score',
'Quality score from evaluation checks',
['model', 'provider', 'check_type'] # check_type: hallucination | relevance | coherence
) Complete Prometheus Scrape Config
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'llm-monitoring'
static_configs:
- targets: ['localhost:9091'] # your metrics endpoint
metrics_path: '/metrics'
scrape_interval: 15s
scrape_timeout: 10s OpenTelemetry Collector Config for LLM Metrics
If you're using the OpenTelemetry SDK (recommended for production), here's the Collector pipeline that ships LLM metrics to Prometheus:
# otel-collector-config.yaml
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'llm-apps'
static_configs:
- targets: ['llm-app:9091']
metrics_path: '/metrics'
processors:
batch:
timeout: 10s
send_batch_size: 1000
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
namespace: 'llm'
const_labels:
service: 'llm-monitoring'
prometheusremotewrite:
endpoint: "https://your-prometheus-remote-write-endpoint/api/v1/write"
# Add remote write auth here
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [prometheus, prometheusremotewrite] Grafana Cloud free tier includes 10K series and 50GB logs — enough for small production LLM deployments.
Alerting Rules for LLM Production
# /etc/prometheus/alerts/llm-alerts.yml
groups:
- name: llm-alerts
rules:
- alert: LLMP95LatencyHigh
expr: histogram_quantile(0.95, sum by (model) (rate(llm_ttft_seconds_bucket[5m]))) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "LLM p95 TTFT above 5s for {{ $labels.model }}"
- alert: LLMErrorRateHigh
expr: 100 * sum(rate(llm_requests_total{status_code~"5.."}[5m])) / sum(rate(llm_requests_total[5m])) > 1
for: 2m
labels:
severity: critical
annotations:
summary: "LLM 5xx error rate above 1%"
- alert: LLMContextWindowNearLimit
expr: 100 * avg(llm_context_tokens_used / llm_context_tokens_max) > 90
for: 10m
labels:
severity: warning
annotations:
summary: "Context window utilization above 90% — consider truncation or smarter chunking"
- alert: LLMDailyCostSpike
expr: sum(rate(llm_cost_total[1h])) * 720 > 500
for: 5m
labels:
severity: warning
annotations:
summary: "Estimated daily LLM cost projected above $500" What to Monitor Beyond These Templates
These dashboard templates cover the fundamentals. As your LLM application matures, add panels for:
- Prompt version tracking — Label requests with
prompt_versionto correlate output quality changes with prompt changes - Retrieval relevance scores — For RAG pipelines, track
llm_rag_relevance_scoreto monitor embedding model degradation - Model rollouts — Track
llm_model_versionto detect behavioral regressions when updating to new model versions - Cache hit rates — If using semantic caching, track
llm_cache_hit_rateto measure cache efficiency