Why Dashboard Templates Matter for LLM Monitoring

Your LLM application is live. Requests are flowing. But can you answer these questions at 3 AM during an incident?

  • Which model is consuming 80% of your GPU budget right now?
  • What's the p95 time-to-first-token for your RAG pipeline?
  • Which prompt version is causing hallucination spikes?
  • What's your actual cost per 1,000 successful completions?

If you can't, you need a monitoring dashboard. Not a generic APM dashboard retrofitted for AI — a purpose-built LLM observability dashboard that tracks what matters: tokens, latency breakdowns, cost attribution, and quality signals.

This article gives you production-ready Grafana dashboard templates and Prometheus query libraries for LLM monitoring. Copy the JSON, import it into your Grafana instance, and have a working LLM dashboard in under 30 minutes.

The Five Panels Every LLM Dashboard Needs

Before diving into templates, here's the metric taxonomy that every LLM monitoring dashboard should cover:

1. Token Throughput Panel

Tracks total tokens processed per minute, split by input and output. This tells you volume load and detects traffic anomalies.

Prometheus metric: llm_tokens_total (counter, labels: model, provider, direction)

rate(llm_tokens_total[5m])

2. Latency Breakdown Panel

LLM latency has three components that must be measured separately:

  • Time to First Token (TTFT): How fast the model starts responding
  • Time per Output Token (TPOT): Generation speed per token
  • Total Request Duration: End-to-end latency

Prometheus metrics: llm_ttft_seconds, llm_tpot_seconds, llm_request_duration_seconds

Query for p95 TTFT:

histogram_quantile(0.95, rate(llm_ttft_seconds_bucket[5m]))

3. Cost Attribution Panel

LLM providers bill per token. You need cost visibility by model, provider, and endpoint.

Prometheus metric: llm_cost_total (counter, labels: model, provider)

Query — cost per hour by model:

sum by (model) (rate(llm_cost_total[1h])) * 3600

4. Error Rate Panel

LLM errors include rate limit responses (429), context length exceeded (400), API errors (500), and malformed outputs.

Prometheus metric: llm_requests_total (counter, labels: status_code)

Query — error rate:

sum(rate(llm_requests_total{status_code=~"4..|5.."}[5m])) / sum(rate(llm_requests_total[5m]))

5. Quality Signals Panel

Track semantic quality indicators: hallucination rate (if measured), retrieval precision (for RAG systems), and output validation pass rates.

Prometheus metric: llm_quality_score (gauge, labels: check_type)

Query — 7-day hallucination rate trend:

avg by (check_type) (llm_quality_score{check_type="hallucination"}[1h])

Grafana Dashboard JSON Template

This is a complete Grafana dashboard JSON for LLM monitoring. It includes:

  • Row 1: Token throughput (input vs output, by model)
  • Row 2: Latency percentiles (p50, p95, p99) for TTFT and total duration
  • Row 3: Cost attribution by model and provider
  • Row 4: Error rate and status code distribution
  • Row 5: Context window utilization (for models with context length limits)

How to Import

  1. Copy the JSON below
  2. In Grafana: Dashboards → Import → Paste JSON
  3. Set your Prometheus data source
  4. Adjust variable values ($model, $provider) to match your metric label names
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 1,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "panels": [],
      "title": "Token Throughput",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Tokens/sec",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 20,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "normal"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "short"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "Input Tokens"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "blue",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Output Tokens"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "green",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model) (rate(llm_tokens_total{direction=\"input\"}[5m]))",
          "legendFormat": "{{model}} - Input",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model) (rate(llm_tokens_total{direction=\"output\"}[5m]))",
          "legendFormat": "{{model}} - Output",
          "refId": "B"
        }
      ],
      "title": "Token Throughput by Model (5m rate)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 50000
              },
              {
                "color": "red",
                "value": 100000
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 12,
        "y": 1
      },
      "id": 3,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(llm_tokens_total)",
          "legendFormat": "Total Tokens",
          "refId": "A"
        }
      ],
      "title": "Total Tokens Processed (All Time)",
      "type": "stat"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 9
      },
      "id": 5,
      "panels": [],
      "title": "Latency",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Seconds",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 2
              },
              {
                "color": "red",
                "value": 5
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 10
      },
      "id": 6,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p50 TTFT",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p95 TTFT",
          "refId": "B"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p99 TTFT",
          "refId": "C"
        }
      ],
      "title": "Time to First Token (TTFT) Percentiles by Model",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Seconds",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 10
      },
      "id": 7,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p50 Total",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p95 Total",
          "refId": "B"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p99 Total",
          "refId": "C"
        }
      ],
      "title": "Total Request Duration Percentiles by Model",
      "type": "timeseries"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 18
      },
      "id": 8,
      "panels": [],
      "title": "Cost Attribution",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "$/hour",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "bars",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "normal"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 19
      },
      "id": 9,
      "options": {
        "legend": {
          "calcs": ["sum"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model, provider) (rate(llm_cost_total[1h])) * 3600",
          "legendFormat": "{{model}} ({{provider}})",
          "refId": "A"
        }
      ],
      "title": "Cost per Hour by Model and Provider",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 50
              },
              {
                "color": "red",
                "value": 200
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 6,
        "x": 12,
        "y": 19
      },
      "id": 10,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(rate(llm_cost_total[1h])) * 720",
          "legendFormat": "Est. Daily Cost",
          "refId": "A"
        }
      ],
      "title": "Estimated Daily Cost",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1000
              },
              {
                "color": "red",
                "value": 5000
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 6,
        "x": 18,
        "y": 19
      },
      "id": 11,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(rate(llm_cost_total[1h])) * 720 * 30",
          "legendFormat": "Est. Monthly Cost",
          "refId": "A"
        }
      ],
      "title": "Estimated Monthly Cost",
      "type": "stat"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 27
      },
      "id": 13,
      "panels": [],
      "title": "Error Rates & Health",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Error Rate (%)",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 20,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "max": 100,
          "min": 0,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1
              },
              {
                "color": "red",
                "value": 5
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 28
      },
      "id": 14,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * sum by (model) (rate(llm_requests_total{status_code=~\"4..|5.."}[5m])) / sum by (model) (rate(llm_requests_total[5m]))",
          "legendFormat": "{{model}} Error Rate",
          "refId": "A"
        }
      ],
      "title": "Error Rate by Model (%)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            }
          },
          "mappings": []
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 28
      },
      "id": 15,
      "options": {
        "displayLabels": ["name", "value"],
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "showLegend": true,
          "values": ["value"]
        },
        "pieType": "pie",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (status_code) (rate(llm_requests_total[5m]))",
          "legendFormat": "{{status_code}}",
          "refId": "A"
        }
      ],
      "title": "Status Code Distribution (5m rate)",
      "type": "piechart"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 36
      },
      "id": 16,
      "panels": [],
      "title": "Context Window Utilization",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "% of Max Context",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "max": 100,
          "min": 0,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 80
              },
              {
                "color": "red",
                "value": 95
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 37
      },
      "id": 17,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * avg by (model) (llm_context_tokens_used / llm_context_tokens_max)",
          "legendFormat": "{{model}} - Avg Context %",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * max by (model) (llm_context_tokens_used / llm_context_tokens_max)",
          "legendFormat": "{{model}} - Max Context %",
          "refId": "B"
        }
      ],
      "title": "Context Window Utilization by Model (%)",
      "type": "timeseries"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["llm", "observability", "grafana", "prometheus"],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "label": "Datasource",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h"]
  },
  "timezone": "browser",
  "title": "LLM Monitoring Dashboard",
  "uid": "llm-monitoring-dashboard",
  "version": 1,
  "weekStart": ""
}
Advertisement
Advertisement

Prometheus Metrics Library for LLM Monitoring

The dashboard above requires these Prometheus metrics. Here's the complete metrics library you need to instrument your LLM application using the OpenTelemetry SDK:

Core Metrics (from prometheus-client or OpenTelemetry)

from prometheus_client import Counter, Histogram, Gauge

# Token counters
llm_tokens_total = Counter(
    'llm_tokens_total',
    'Total tokens processed',
    ['model', 'provider', 'direction']  # direction: input | output
)

# Latency histograms
llm_ttft_seconds = Histogram(
    'llm_ttft_seconds',
    'Time to first token in seconds',
    ['model', 'provider'],
    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

llm_request_duration_seconds = Histogram(
    'llm_request_duration_seconds',
    'Total request duration in seconds',
    ['model', 'provider'],
    buckets=[0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0]
)

# Cost counter (in USD)
llm_cost_total = Counter(
    'llm_cost_total',
    'Total API cost in USD',
    ['model', 'provider']
)

# Request counter
llm_requests_total = Counter(
    'llm_requests_total',
    'Total API requests',
    ['model', 'provider', 'status_code']
)

# Context window utilization
llm_context_tokens_used = Gauge(
    'llm_context_tokens_used',
    'Average tokens used per request in the last bucket',
    ['model', 'provider']
)

llm_context_tokens_max = Gauge(
    'llm_context_tokens_max',
    'Maximum context window size for the model',
    ['model', 'provider']
)

# Quality metrics (optional, for RAG/LLM-as-judge setups)
llm_quality_score = Gauge(
    'llm_quality_score',
    'Quality score from evaluation checks',
    ['model', 'provider', 'check_type']  # check_type: hallucination | relevance | coherence
)

Complete Prometheus Scrape Config

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: 'llm-monitoring'
    static_configs:
      - targets: ['localhost:9091']  # your metrics endpoint
    metrics_path: '/metrics'
    scrape_interval: 15s
    scrape_timeout: 10s

OpenTelemetry Collector Config for LLM Metrics

If you're using the OpenTelemetry SDK (recommended for production), here's the Collector pipeline that ships LLM metrics to Prometheus:

# otel-collector-config.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'llm-apps'
          static_configs:
            - targets: ['llm-app:9091']
          metrics_path: '/metrics'

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: 'llm'
    const_labels:
      service: 'llm-monitoring'

  prometheusremotewrite:
    endpoint: "https://your-prometheus-remote-write-endpoint/api/v1/write"
    # Add remote write auth here

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheus, prometheusremotewrite]
Tool Spotlight Grafana Cloud — Pre-built LLM Monitoring Dashboards

Grafana Cloud free tier includes 10K series and 50GB logs — enough for small production LLM deployments.

Alerting Rules for LLM Production

# /etc/prometheus/alerts/llm-alerts.yml
groups:
  - name: llm-alerts
    rules:
      - alert: LLMP95LatencyHigh
        expr: histogram_quantile(0.95, sum by (model) (rate(llm_ttft_seconds_bucket[5m]))) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LLM p95 TTFT above 5s for {{ $labels.model }}"
          
      - alert: LLMErrorRateHigh
        expr: 100 * sum(rate(llm_requests_total{status_code~"5.."}[5m])) / sum(rate(llm_requests_total[5m])) > 1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "LLM 5xx error rate above 1%"
          
      - alert: LLMContextWindowNearLimit
        expr: 100 * avg(llm_context_tokens_used / llm_context_tokens_max) > 90
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Context window utilization above 90% — consider truncation or smarter chunking"
          
      - alert: LLMDailyCostSpike
        expr: sum(rate(llm_cost_total[1h])) * 720 > 500
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Estimated daily LLM cost projected above $500"

What to Monitor Beyond These Templates

These dashboard templates cover the fundamentals. As your LLM application matures, add panels for:

  1. Prompt version tracking — Label requests with prompt_version to correlate output quality changes with prompt changes
  2. Retrieval relevance scores — For RAG pipelines, track llm_rag_relevance_score to monitor embedding model degradation
  3. Model rollouts — Track llm_model_version to detect behavioral regressions when updating to new model versions
  4. Cache hit rates — If using semantic caching, track llm_cache_hit_rate to measure cache efficiency