Prometheus Configuration

Introduction

Configuration is a fundamental aspect of setting up and running Prometheus effectively. Prometheus uses a YAML configuration file that defines everything from which targets to monitor to how long data should be retained. In this guide, we'll explore the structure of Prometheus configuration, learn how to define monitoring targets, and understand best practices for a well-configured monitoring system.

Proper configuration allows you to:

Define which services and endpoints Prometheus will monitor
Set up scrape intervals and timeouts
Configure rule files for alerting and recording
Set up service discovery mechanisms
Manage data retention and storage

Configuration File Basics

Prometheus uses a single YAML configuration file (conventionally named prometheus.yml) as its primary configuration source. When you start Prometheus, you typically specify this file using the --config.file flag.

Basic Structure

The configuration file has several main sections:

global: Settings that apply to the entire Prometheus server
scrape_configs: Defines the targets Prometheus will monitor
rule_files: Specifies recording and alerting rules
alerting: Configures where alert notifications should be sent
remote_write and remote_read: Set up integrations with remote storage systems

Let's look at a minimal configuration file:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

This simple configuration:

Sets the default scrape and rule evaluation interval to 15 seconds
Creates a single job named "prometheus" that monitors the Prometheus server itself

The `global` Section

The global section defines parameters that are valid in all other configuration contexts. These are the server's defaults.

global:
  # How frequently to scrape targets by default
  scrape_interval: 15s
  
  # How frequently to evaluate rules
  evaluation_interval: 15s
  
  # How long until a scrape request times out
  scrape_timeout: 10s
  
  # Rules for attaching labels to all metrics
  external_labels:
    environment: production
    region: us-east-1

The external_labels are particularly important - they are attached to all time series and alerts when communicating with external systems like remote storage or alertmanagers.

Scrape Configurations

The scrape_configs section is where you define the targets Prometheus will monitor. Each entry is called a "job" and can contain multiple target endpoints.

Static Configuration

The simplest way to define targets is with static_configs:

scrape_configs:
  - job_name: "api-server"
    scrape_interval: 5s  # Override global setting for this job
    static_configs:
      - targets: ["api.example.com:8080", "api-backup.example.com:8080"]
        labels:
          team: "backend"
          service: "api"

This configuration:

Creates a job named "api-server"
Overrides the global scrape interval to check these targets every 5 seconds
Defines two static targets to monitor
Attaches additional labels that will be added to all metrics from these targets

Using File-Based Service Discovery

For more dynamic environments, Prometheus offers various service discovery mechanisms. Here's how to use file-based discovery:

scrape_configs:
  - job_name: "file-sd-example"
    file_sd_configs:
      - files:
        - "targets/*.json"
        refresh_interval: 30s

With this configuration, Prometheus will look for target definition files in the targets/ directory and reload them every 30 seconds.

A matching JSON target file might look like:

[
  {
    "targets": ["server1:9100", "server2:9100"],
    "labels": {
      "env": "production",
      "job": "node-exporter"
    }
  }
]

Relabeling

Relabeling is a powerful feature that allows you to modify labels before metrics are ingested:

scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

This configuration:

Discovers all Kubernetes pods
Keeps only those with an annotation prometheus.io/scrape: "true"
Uses a pod annotation to set the metrics path

Rule Configuration

Rules allow Prometheus to precompute frequently needed expressions and create alerts.

rule_files:
  - "rules/recording_rules.yml"
  - "rules/alerting_rules.yml"

A simple alerting rule file might look like:

groups:
  - name: example
    rules:
      - alert: HighCPULoad
        expr: node_load1 > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU load on {{$labels.instance}}"
          description: "CPU load is above 80% for more than 5 minutes (current value: {{$value}})"

Alertmanager Configuration

To send alerts, you need to configure the Alertmanager connection:

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - "alertmanager:9093"

Storage Configuration

You can configure how Prometheus stores its time series data:

storage:
  tsdb:
    path: "/data"
    retention:
      time: 15d
      size: 50GB

This tells Prometheus to:

Store data in the /data directory
Keep data for 15 days OR until the storage reaches 50GB, whichever comes first

Practical Example: Complete Configuration

Let's put it all together with a more comprehensive real-world example:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s
  external_labels:
    environment: production
    region: us-east-1

# Rule files specifies a list of files from which rules are read
rule_files:
  - "rules/recording_rules.yml"
  - "rules/alerting_rules.yml"

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - "alertmanager:9093"

# Scrape configs
scrape_configs:
  # Self monitoring
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Node exporters
  - job_name: "node"
    file_sd_configs:
      - files:
        - "targets/nodes.json"
        refresh_interval: 5m
    relabel_configs:
      - source_labels: [env]
        action: replace
        target_label: environment

  # API servers with different scrape interval
  - job_name: "api-server"
    scrape_interval: 5s
    metrics_path: /metrics
    scheme: https
    basic_auth:
      username: prometheus
      password: secret_password
    static_configs:
      - targets: ["api.example.com:8443"]
        labels:
          team: "backend"

This configuration:

Sets global parameters for scraping and evaluation
Configures rule files for alerting and recording rules
Sets up the connection to Alertmanager
Monitors Prometheus itself, node exporters via file-based service discovery, and an API server with basic authentication

Visualizing Prometheus Configuration

How to Reload Configuration

Prometheus can reload its configuration without restarting. You can trigger a reload by sending a SIGHUP signal to the Prometheus process or by making an HTTP POST request to the /-/reload endpoint if the --web.enable-lifecycle flag is enabled.

For example:

curl -X POST http://localhost:9090/-/reload

Best Practices for Configuration

Use service discovery instead of static configs when possible to make your configuration adaptable to changing environments
Organize rule files logically by separating recording rules from alerting rules, and grouping rules by function or team
Set appropriate scrape intervals - more frequent for critical services, less frequent for stable services with little change
Use relabeling sparingly - while powerful, complex relabeling can make debugging difficult
Version control your configuration files to track changes and enable rollbacks
Use hierarchical structures - apply defaults in the global section and override only when necessary at the job level
Test configuration changes before applying them to production:
```
promtool check config prometheus.yml
```

Summary

Prometheus configuration is flexible and powerful, allowing you to define monitoring targets, alerting rules, and storage settings. The YAML-based configuration file provides a declarative way to specify what and how Prometheus should monitor.

Key points to remember:

The configuration file defines global settings, scrape targets, rules, and alerting
Service discovery mechanisms help adapt to dynamic environments
Relabeling allows for powerful transformations of metadata
Configuration can be reloaded without restarting Prometheus

Additional Resources

Exercises

Create a basic prometheus.yml file that monitors Prometheus itself and a Node Exporter instance.
Modify your configuration to add an alerting rule that fires when a target has been down for more than 5 minutes.
Set up file-based service discovery for a group of web servers, with different scrape intervals based on their importance.
Use relabeling to add environment and team labels based on the hostname of the target.
Configure Prometheus to keep metrics for 30 days, but limit the total storage size to 100GB.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Configuration File Basics​

Basic Structure​

The global Section​

Scrape Configurations​

Static Configuration​

Using File-Based Service Discovery​

Relabeling​

Rule Configuration​

Alertmanager Configuration​

Storage Configuration​

Practical Example: Complete Configuration​

Visualizing Prometheus Configuration​

How to Reload Configuration​

Best Practices for Configuration​

Summary​

Additional Resources​

Exercises​

Introduction

Configuration File Basics

Basic Structure

The `global` Section

Scrape Configurations

Static Configuration

Using File-Based Service Discovery

Relabeling

Rule Configuration

Alertmanager Configuration

Storage Configuration

Practical Example: Complete Configuration

Visualizing Prometheus Configuration

How to Reload Configuration

Best Practices for Configuration

Summary

Additional Resources

Exercises