Alert Provisioning
Introduction
Alert provisioning in Grafana allows you to automate the creation and management of alerting rules and notification policies through configuration files rather than the UI. This approach is particularly valuable for large-scale deployments, consistent environments, and infrastructure-as-code practices.
In this guide, we'll explore how to provision alerts in Grafana, the benefits of this approach, and practical examples to help you implement alert provisioning in your own monitoring environments.
What is Alert Provisioning?
Alert provisioning is the practice of defining your Grafana alerting configuration as code or configuration files that can be version-controlled, automated, and deployed consistently across multiple Grafana instances. This contrasts with manual alert creation through the Grafana UI.
Key Benefits
- Version Control: Track changes to your alerting configuration over time
 - Automation: Integrate with CI/CD pipelines for automated deployments
 - Consistency: Ensure identical alert configurations across environments
 - Scalability: Efficiently manage large numbers of alerts
 - Disaster Recovery: Quickly restore alert configurations if needed
 
Alert Provisioning Methods
Grafana supports two primary methods for provisioning alerts:
- File-based provisioning: Using YAML configuration files
 - Terraform: Using the Grafana Terraform provider
 
Let's explore each approach.
File-Based Alert Provisioning
Grafana can load alert rules and notification policies from YAML files stored in specific directories.
Directory Structure
For file-based provisioning, Grafana looks for configuration files in the following locations:
/etc/grafana/provisioning/alerting/
├── rules/
│   ├── cpu_alerts.yaml
│   └── memory_alerts.yaml
└── notification_policies/
    └── team_policies.yaml
Provisioning Alert Rules
Alert rules are defined in YAML files with a specific structure. Here's an example:
apiVersion: 1
groups:
  - name: CPU Usage Alerts
    folder: Infrastructure
    interval: 60s
    rules:
      - name: High CPU Usage
        condition: B
        data:
          - refId: A
            datasourceUid: PBFA97CFB590B2093
            model:
              expr: avg by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100
              refId: A
          - refId: B
            datasourceUid: __expr__
            model:
              conditions:
                - evaluator:
                    params: [80]
                    type: gt
                  operator:
                    type: and
                  query:
                    params: [A]
                  reducer:
                    type: avg
              refId: B
        noDataState: OK
        execErrState: Error
        for: 5m
        labels:
          severity: warning
          category: system
        annotations:
          summary: High CPU usage on {{ $labels.instance }}
          description: "CPU usage has been above 80% for 5 minutes. Current value: {{ $value }}%"
Provisioning Notification Policies
Notification policies define how alerts are routed to notification channels. Here's an example:
apiVersion: 1
policies:
  - receiver: default-email
    group_by: ['alertname']
    routes:
      - receiver: ops-team-slack
        group_by: ['alertname', 'instance']
        matchers:
          - severity =~ "critical|warning"
            category = "system"
        group_wait: 30s
        group_interval: 5m
        repeat_interval: 4h
        mute_time_intervals:
          - weekends
      - receiver: dev-team-slack
        group_by: ['alertname', 'job']
        matchers:
          - category = "application"
        group_wait: 45s
Provisioning with Terraform
For teams already using Terraform for infrastructure management, Grafana's Terraform provider offers another way to provision alerts.
Installing the Grafana Terraform Provider
Add the following to your Terraform configuration:
terraform {
  required_providers {
    grafana = {
      source = "grafana/grafana"
      version = "1.28.0"
    }
  }
}
provider "grafana" {
  url  = "http://grafana.example.com"
  auth = var.grafana_api_key
}
Creating Alert Rules with Terraform
Here's an example of defining an alert rule with Terraform:
resource "grafana_rule_group" "cpu_alerts" {
  name             = "CPU Usage Alerts"
  folder_uid       = grafana_folder.infrastructure.uid
  interval_seconds = 60
  
  rule {
    name           = "High CPU Usage"
    for            = "5m"
    condition      = "B"
    no_data_state  = "OK"
    exec_err_state = "Error"
    
    data {
      ref_id = "A"
      datasource_uid = "PBFA97CFB590B2093"
      
      relative_time_range {
        from = 600
        to   = 0
      }
      
      model = jsonencode({
        expr = "avg by(instance) (rate(node_cpu_seconds_total{mode!=\"idle\"}[5m])) * 100"
        refId = "A"
      })
    }
    
    data {
      ref_id = "B"
      datasource_uid = "__expr__"
      
      relative_time_range {
        from = 0
        to   = 0
      }
      
      model = jsonencode({
        conditions = [{
          evaluator = {
            params = [80]
            type   = "gt"
          }
          operator = {
            type = "and"
          }
          query = {
            params = ["A"]
          }
          reducer = {
            type = "avg"
          }
        }]
        refId = "B"
      })
    }
    
    labels = {
      severity = "warning"
      category = "system"
    }
    
    annotations = {
      summary     = "High CPU usage on {{ $labels.instance }}"
      description = "CPU usage has been above 80% for 5 minutes. Current value: {{ $value }}%"
    }
  }
}
Practical Example: Implementing Alert Provisioning for a Web Application
Let's walk through a real-world example of setting up provisioned alerts for a web application environment.
Scenario
You manage a web application with:
- Frontend services
 - Backend API services
 - Database servers
 - Messaging queue
 
Step 1: Define Your Alert Rules Structure
Create a directory structure for your alert provisioning:
/provisioning/
├── alerting/
│   ├── rules/
│   │   ├── frontend_alerts.yaml
│   │   ├── backend_alerts.yaml