Prometheus Configuration
Introduction
Configuration is a fundamental aspect of setting up and running Prometheus effectively. Prometheus uses a YAML configuration file that defines everything from which targets to monitor to how long data should be retained. In this guide, we'll explore the structure of Prometheus configuration, learn how to define monitoring targets, and understand best practices for a well-configured monitoring system.
Proper configuration allows you to:
- Define which services and endpoints Prometheus will monitor
- Set up scrape intervals and timeouts
- Configure rule files for alerting and recording
- Set up service discovery mechanisms
- Manage data retention and storage
Configuration File Basics
Prometheus uses a single YAML configuration file (conventionally named prometheus.yml
) as its primary configuration source. When you start Prometheus, you typically specify this file using the --config.file
flag.
Basic Structure
The configuration file has several main sections:
global
: Settings that apply to the entire Prometheus serverscrape_configs
: Defines the targets Prometheus will monitorrule_files
: Specifies recording and alerting rulesalerting
: Configures where alert notifications should be sentremote_write
andremote_read
: Set up integrations with remote storage systems
Let's look at a minimal configuration file:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
This simple configuration:
- Sets the default scrape and rule evaluation interval to 15 seconds
- Creates a single job named "prometheus" that monitors the Prometheus server itself
The global
Section
The global
section defines parameters that are valid in all other configuration contexts. These are the server's defaults.
global:
# How frequently to scrape targets by default
scrape_interval: 15s
# How frequently to evaluate rules
evaluation_interval: 15s
# How long until a scrape request times out
scrape_timeout: 10s
# Rules for attaching labels to all metrics
external_labels:
environment: production
region: us-east-1
The external_labels
are particularly important - they are attached to all time series and alerts when communicating with external systems like remote storage or alertmanagers.
Scrape Configurations
The scrape_configs
section is where you define the targets Prometheus will monitor. Each entry is called a "job" and can contain multiple target endpoints.
Static Configuration
The simplest way to define targets is with static_configs
:
scrape_configs:
- job_name: "api-server"
scrape_interval: 5s # Override global setting for this job
static_configs:
- targets: ["api.example.com:8080", "api-backup.example.com:8080"]
labels:
team: "backend"
service: "api"
This configuration:
- Creates a job named "api-server"
- Overrides the global scrape interval to check these targets every 5 seconds
- Defines two static targets to monitor
- Attaches additional labels that will be added to all metrics from these targets
Using File-Based Service Discovery
For more dynamic environments, Prometheus offers various service discovery mechanisms. Here's how to use file-based discovery:
scrape_configs:
- job_name: "file-sd-example"
file_sd_configs:
- files:
- "targets/*.json"
refresh_interval: 30s
With this configuration, Prometheus will look for target definition files in the targets/
directory and reload them every 30 seconds.
A matching JSON target file might look like:
[
{
"targets": ["server1:9100", "server2:9100"],
"labels": {
"env": "production",
"job": "node-exporter"
}
}
]
Relabeling
Relabeling is a powerful feature that allows you to modify labels before metrics are ingested:
scrape_configs:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
This configuration:
- Discovers all Kubernetes pods
- Keeps only those with an annotation
prometheus.io/scrape: "true"
- Uses a pod annotation to set the metrics path
Rule Configuration
Rules allow Prometheus to precompute frequently needed expressions and create alerts.
rule_files:
- "rules/recording_rules.yml"
- "rules/alerting_rules.yml"
A simple alerting rule file might look like:
groups:
- name: example
rules:
- alert: HighCPULoad
expr: node_load1 > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU load on {{$labels.instance}}"
description: "CPU load is above 80% for more than 5 minutes (current value: {{$value}})"
Alertmanager Configuration
To send alerts, you need to configure the Alertmanager connection:
alerting:
alertmanagers:
- static_configs:
- targets:
- "alertmanager:9093"
Storage Configuration
You can configure how Prometheus stores its time series data:
storage:
tsdb:
path: "/data"
retention:
time: 15d
size: 50GB
This tells Prometheus to:
- Store data in the
/data
directory - Keep data for 15 days OR until the storage reaches 50GB, whichever comes first
Practical Example: Complete Configuration
Let's put it all together with a more comprehensive real-world example:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
external_labels:
environment: production
region: us-east-1
# Rule files specifies a list of files from which rules are read
rule_files:
- "rules/recording_rules.yml"
- "rules/alerting_rules.yml"
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- "alertmanager:9093"
# Scrape configs
scrape_configs:
# Self monitoring
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
# Node exporters
- job_name: "node"
file_sd_configs:
- files:
- "targets/nodes.json"
refresh_interval: 5m
relabel_configs:
- source_labels: [env]
action: replace
target_label: environment
# API servers with different scrape interval
- job_name: "api-server"
scrape_interval: 5s
metrics_path: /metrics
scheme: https
basic_auth:
username: prometheus
password: secret_password
static_configs:
- targets: ["api.example.com:8443"]
labels:
team: "backend"
This configuration:
- Sets global parameters for scraping and evaluation
- Configures rule files for alerting and recording rules
- Sets up the connection to Alertmanager
- Monitors Prometheus itself, node exporters via file-based service discovery, and an API server with basic authentication
Visualizing Prometheus Configuration
How to Reload Configuration
Prometheus can reload its configuration without restarting. You can trigger a reload by sending a SIGHUP
signal to the Prometheus process or by making an HTTP POST request to the /-/reload
endpoint if the --web.enable-lifecycle
flag is enabled.
For example:
curl -X POST http://localhost:9090/-/reload
Best Practices for Configuration
-
Use service discovery instead of static configs when possible to make your configuration adaptable to changing environments
-
Organize rule files logically by separating recording rules from alerting rules, and grouping rules by function or team
-
Set appropriate scrape intervals - more frequent for critical services, less frequent for stable services with little change
-
Use relabeling sparingly - while powerful, complex relabeling can make debugging difficult
-
Version control your configuration files to track changes and enable rollbacks
-
Use hierarchical structures - apply defaults in the
global
section and override only when necessary at the job level -
Test configuration changes before applying them to production:
bashpromtool check config prometheus.yml
Summary
Prometheus configuration is flexible and powerful, allowing you to define monitoring targets, alerting rules, and storage settings. The YAML-based configuration file provides a declarative way to specify what and how Prometheus should monitor.
Key points to remember:
- The configuration file defines global settings, scrape targets, rules, and alerting
- Service discovery mechanisms help adapt to dynamic environments
- Relabeling allows for powerful transformations of metadata
- Configuration can be reloaded without restarting Prometheus
Additional Resources
Exercises
-
Create a basic
prometheus.yml
file that monitors Prometheus itself and a Node Exporter instance. -
Modify your configuration to add an alerting rule that fires when a target has been down for more than 5 minutes.
-
Set up file-based service discovery for a group of web servers, with different scrape intervals based on their importance.
-
Use relabeling to add environment and team labels based on the hostname of the target.
-
Configure Prometheus to keep metrics for 30 days, but limit the total storage size to 100GB.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)