LLM Monitoring Dashboard Templates: Grafana + Prometheus

Why Dashboard Templates Matter for LLM Monitoring

Your LLM application is live. Requests are flowing. But can you answer these questions at 3 AM during an incident?

Which model is consuming 80% of your GPU budget right now?
What's the p95 time-to-first-token for your RAG pipeline?
Which prompt version is causing hallucination spikes?
What's your actual cost per 1,000 successful completions?

If you can't, you need a monitoring dashboard. Not a generic APM dashboard retrofitted for AI — a purpose-built LLM observability dashboard that tracks what matters: tokens, latency breakdowns, cost attribution, and quality signals.

This article gives you production-ready Grafana dashboard templates and Prometheus query libraries for LLM monitoring. Copy the JSON, import it into your Grafana instance, and have a working LLM dashboard in under 30 minutes.

The Five Panels Every LLM Dashboard Needs

Before diving into templates, here's the metric taxonomy that every LLM monitoring dashboard should cover:

1. Token Throughput Panel

Tracks total tokens processed per minute, split by input and output. This tells you volume load and detects traffic anomalies.

Prometheus metric: llm_tokens_total (counter, labels: model, provider, direction)

rate(llm_tokens_total[5m])

2. Latency Breakdown Panel

LLM latency has three components that must be measured separately:

Time to First Token (TTFT): How fast the model starts responding
Time per Output Token (TPOT): Generation speed per token
Total Request Duration: End-to-end latency

Prometheus metrics: llm_ttft_seconds, llm_tpot_seconds, llm_request_duration_seconds

Query for p95 TTFT:

histogram_quantile(0.95, rate(llm_ttft_seconds_bucket[5m]))

3. Cost Attribution Panel

LLM providers bill per token. You need cost visibility by model, provider, and endpoint.

Prometheus metric: llm_cost_total (counter, labels: model, provider)

Query — cost per hour by model:

sum by (model) (rate(llm_cost_total[1h])) * 3600

4. Error Rate Panel

LLM errors include rate limit responses (429), context length exceeded (400), API errors (500), and malformed outputs.

Prometheus metric: llm_requests_total (counter, labels: status_code)

Query — error rate:

sum(rate(llm_requests_total{status_code=~"4..|5.."}[5m])) / sum(rate(llm_requests_total[5m]))

5. Quality Signals Panel

Track semantic quality indicators: hallucination rate (if measured), retrieval precision (for RAG systems), and output validation pass rates.

Prometheus metric: llm_quality_score (gauge, labels: check_type)

Query — 7-day hallucination rate trend:

avg by (check_type) (llm_quality_score{check_type="hallucination"}[1h])

Grafana Dashboard JSON Template

This is a complete Grafana dashboard JSON for LLM monitoring. It includes:

Row 1: Token throughput (input vs output, by model)
Row 2: Latency percentiles (p50, p95, p99) for TTFT and total duration
Row 3: Cost attribution by model and provider
Row 4: Error rate and status code distribution
Row 5: Context window utilization (for models with context length limits)

How to Import

Copy the JSON below
In Grafana: Dashboards → Import → Paste JSON
Set your Prometheus data source
Adjust variable values ($model, $provider) to match your metric label names

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 1,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "panels": [],
      "title": "Token Throughput",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Tokens/sec",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 20,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "normal"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "short"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "Input Tokens"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "blue",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Output Tokens"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "green",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model) (rate(llm_tokens_total{direction=\"input\"}[5m]))",
          "legendFormat": "{{model}} - Input",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model) (rate(llm_tokens_total{direction=\"output\"}[5m]))",
          "legendFormat": "{{model}} - Output",
          "refId": "B"
        }
      ],
      "title": "Token Throughput by Model (5m rate)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 50000
              },
              {
                "color": "red",
                "value": 100000
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 12,
        "y": 1
      },
      "id": 3,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(llm_tokens_total)",
          "legendFormat": "Total Tokens",
          "refId": "A"
        }
      ],
      "title": "Total Tokens Processed (All Time)",
      "type": "stat"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 9
      },
      "id": 5,
      "panels": [],
      "title": "Latency",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Seconds",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 2
              },
              {
                "color": "red",
                "value": 5
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 10
      },
      "id": 6,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p50 TTFT",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p95 TTFT",
          "refId": "B"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_ttft_seconds_bucket[5m])))",
          "legendFormat": "p99 TTFT",
          "refId": "C"
        }
      ],
      "title": "Time to First Token (TTFT) Percentiles by Model",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Seconds",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 10
      },
      "id": 7,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.50, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p50 Total",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.95, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p95 Total",
          "refId": "B"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "histogram_quantile(0.99, sum by (model, le) (rate(llm_request_duration_seconds_bucket[5m])))",
          "legendFormat": "p99 Total",
          "refId": "C"
        }
      ],
      "title": "Total Request Duration Percentiles by Model",
      "type": "timeseries"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 18
      },
      "id": 8,
      "panels": [],
      "title": "Cost Attribution",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "$/hour",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "bars",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "normal"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 19
      },
      "id": 9,
      "options": {
        "legend": {
          "calcs": ["sum"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (model, provider) (rate(llm_cost_total[1h])) * 3600",
          "legendFormat": "{{model}} ({{provider}})",
          "refId": "A"
        }
      ],
      "title": "Cost per Hour by Model and Provider",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 50
              },
              {
                "color": "red",
                "value": 200
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 6,
        "x": 12,
        "y": 19
      },
      "id": 10,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(rate(llm_cost_total[1h])) * 720",
          "legendFormat": "Est. Daily Cost",
          "refId": "A"
        }
      ],
      "title": "Estimated Daily Cost",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1000
              },
              {
                "color": "red",
                "value": 5000
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 6,
        "x": 18,
        "y": 19
      },
      "id": 11,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum(rate(llm_cost_total[1h])) * 720 * 30",
          "legendFormat": "Est. Monthly Cost",
          "refId": "A"
        }
      ],
      "title": "Estimated Monthly Cost",
      "type": "stat"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 27
      },
      "id": 13,
      "panels": [],
      "title": "Error Rates & Health",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Error Rate (%)",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 20,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "max": 100,
          "min": 0,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1
              },
              {
                "color": "red",
                "value": 5
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 28
      },
      "id": 14,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * sum by (model) (rate(llm_requests_total{status_code=~\"4..|5.."}[5m])) / sum by (model) (rate(llm_requests_total[5m]))",
          "legendFormat": "{{model}} Error Rate",
          "refId": "A"
        }
      ],
      "title": "Error Rate by Model (%)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            }
          },
          "mappings": []
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 28
      },
      "id": 15,
      "options": {
        "displayLabels": ["name", "value"],
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "showLegend": true,
          "values": ["value"]
        },
        "pieType": "pie",
        "reduceOptions": {
          "calcs": ["lastNotNull"],
          "fields": "",
          "values": false
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "sum by (status_code) (rate(llm_requests_total[5m]))",
          "legendFormat": "{{status_code}}",
          "refId": "A"
        }
      ],
      "title": "Status Code Distribution (5m rate)",
      "type": "piechart"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 36
      },
      "id": 16,
      "panels": [],
      "title": "Context Window Utilization",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "% of Max Context",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "line"
            }
          },
          "mappings": [],
          "max": 100,
          "min": 0,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 80
              },
              {
                "color": "red",
                "value": 95
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 37
      },
      "id": 17,
      "options": {
        "legend": {
          "calcs": ["mean", "max"],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * avg by (model) (llm_context_tokens_used / llm_context_tokens_max)",
          "legendFormat": "{{model}} - Avg Context %",
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "expr": "100 * max by (model) (llm_context_tokens_used / llm_context_tokens_max)",
          "legendFormat": "{{model}} - Max Context %",
          "refId": "B"
        }
      ],
      "title": "Context Window Utilization by Model (%)",
      "type": "timeseries"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["llm", "observability", "grafana", "prometheus"],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "label": "Datasource",
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h"]
  },
  "timezone": "browser",
  "title": "LLM Monitoring Dashboard",
  "uid": "llm-monitoring-dashboard",
  "version": 1,
  "weekStart": ""
}

Prometheus Metrics Library for LLM Monitoring

The dashboard above requires these Prometheus metrics. Here's the complete metrics library you need to instrument your LLM application using the OpenTelemetry SDK:

Core Metrics (from `prometheus-client` or OpenTelemetry)

from prometheus_client import Counter, Histogram, Gauge

# Token counters
llm_tokens_total = Counter(
    'llm_tokens_total',
    'Total tokens processed',
    ['model', 'provider', 'direction']  # direction: input | output
)

# Latency histograms
llm_ttft_seconds = Histogram(
    'llm_ttft_seconds',
    'Time to first token in seconds',
    ['model', 'provider'],
    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

llm_request_duration_seconds = Histogram(
    'llm_request_duration_seconds',
    'Total request duration in seconds',
    ['model', 'provider'],
    buckets=[0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0]
)

# Cost counter (in USD)
llm_cost_total = Counter(
    'llm_cost_total',
    'Total API cost in USD',
    ['model', 'provider']
)

# Request counter
llm_requests_total = Counter(
    'llm_requests_total',
    'Total API requests',
    ['model', 'provider', 'status_code']
)

# Context window utilization
llm_context_tokens_used = Gauge(
    'llm_context_tokens_used',
    'Average tokens used per request in the last bucket',
    ['model', 'provider']
)

llm_context_tokens_max = Gauge(
    'llm_context_tokens_max',
    'Maximum context window size for the model',
    ['model', 'provider']
)

# Quality metrics (optional, for RAG/LLM-as-judge setups)
llm_quality_score = Gauge(
    'llm_quality_score',
    'Quality score from evaluation checks',
    ['model', 'provider', 'check_type']  # check_type: hallucination | relevance | coherence
)

Complete Prometheus Scrape Config

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: 'llm-monitoring'
    static_configs:
      - targets: ['localhost:9091']  # your metrics endpoint
    metrics_path: '/metrics'
    scrape_interval: 15s
    scrape_timeout: 10s

OpenTelemetry Collector Config for LLM Metrics

If you're using the OpenTelemetry SDK (recommended for production), here's the Collector pipeline that ships LLM metrics to Prometheus:

# otel-collector-config.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'llm-apps'
          static_configs:
            - targets: ['llm-app:9091']
          metrics_path: '/metrics'

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: 'llm'
    const_labels:
      service: 'llm-monitoring'

  prometheusremotewrite:
    endpoint: "https://your-prometheus-remote-write-endpoint/api/v1/write"
    # Add remote write auth here

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheus, prometheusremotewrite]

Tool Spotlight Grafana Cloud — Pre-built LLM Monitoring Dashboards

Grafana Cloud free tier includes 10K series and 50GB logs — enough for small production LLM deployments.

Alerting Rules for LLM Production

# /etc/prometheus/alerts/llm-alerts.yml
groups:
  - name: llm-alerts
    rules:
      - alert: LLMP95LatencyHigh
        expr: histogram_quantile(0.95, sum by (model) (rate(llm_ttft_seconds_bucket[5m]))) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LLM p95 TTFT above 5s for {{ $labels.model }}"
          
      - alert: LLMErrorRateHigh
        expr: 100 * sum(rate(llm_requests_total{status_code~"5.."}[5m])) / sum(rate(llm_requests_total[5m])) > 1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "LLM 5xx error rate above 1%"
          
      - alert: LLMContextWindowNearLimit
        expr: 100 * avg(llm_context_tokens_used / llm_context_tokens_max) > 90
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Context window utilization above 90% — consider truncation or smarter chunking"
          
      - alert: LLMDailyCostSpike
        expr: sum(rate(llm_cost_total[1h])) * 720 > 500
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Estimated daily LLM cost projected above $500"

What to Monitor Beyond These Templates

These dashboard templates cover the fundamentals. As your LLM application matures, add panels for:

Prompt version tracking — Label requests with prompt_version to correlate output quality changes with prompt changes
Retrieval relevance scores — For RAG pipelines, track llm_rag_relevance_score to monitor embedding model degradation
Model rollouts — Track llm_model_version to detect behavioral regressions when updating to new model versions
Cache hit rates — If using semantic caching, track llm_cache_hit_rate to measure cache efficiency