Prometheus Naming Conventions
Introduction
When working with Prometheus, having consistent and meaningful metric names is crucial for creating maintainable monitoring systems. Good naming conventions make metrics easier to discover, understand, and use in queries. This guide covers the standard naming patterns used in Prometheus metrics and best practices that will help you design a clean and consistent metrics architecture.
Basics of Prometheus Metric Names
Prometheus metric names follow a specific structure that helps users quickly understand what the metric represents and how it should be interpreted.
Metric Name Format
Metric names in Prometheus:
- Must match the regex
[a-zA-Z_:][a-zA-Z0-9_:]*
- Should be human-readable and meaningful
- Typically use underscores (
_
) to separate words - Should have a domain-specific prefix
- Typically include units as a suffix
Here's the standard format for a Prometheus metric name:
<prefix>_<name>_<unit>_<suffix>
For example: http_requests_total
or node_memory_usage_bytes
Naming Components Explained
Metric Prefixes
Metric names should begin with a prefix that identifies the monitored system:
http_
- for HTTP-related metricsnode_
- for machine-level metricsprocess_
- for process-level metricsapp_
- for application-specific metrics
Example:
# Good - Uses appropriate prefix
http_requests_total
# Not recommended - Missing prefix
requests_total
Metric Suffixes and Units
The suffix typically describes the unit of measurement:
_total
- for counters_count
- for count values in histograms and summaries_sum
- for sum values in histograms and summaries_bucket
- for histogram buckets_ratio
- for ratios or percentages (0-1)_percent
- for percentages (0-100)_bytes
- for sizes in bytes_seconds
- for durations or time periods
Example:
# Good - Includes unit
process_cpu_seconds_total
# Not recommended - Missing unit
process_cpu
Naming by Metric Types
Each Prometheus metric type has specific naming conventions:
Counters
Counters should use the _total
suffix to indicate they only increase over time:
# Counter examples
http_requests_total
errors_total
http_request_duration_seconds_total
Gauges
Gauges represent a snapshot value that can go up or down, so they should not use the _total
suffix:
# Gauge examples
memory_usage_bytes
cpu_usage_percent
active_connections
queue_length
Histograms and Summaries
These complex metrics automatically create several related time series with specific suffixes:
# Histogram generates:
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="0.5"}
http_request_duration_seconds_bucket{le="1"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
# Summary generates:
http_request_duration_seconds{quantile="0.5"}
http_request_duration_seconds{quantile="0.9"}
http_request_duration_seconds{quantile="0.99"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
Best Practices
1. Be Consistent
Consistency helps both humans and machines understand your metrics:
# Consistent naming
api_http_requests_total
api_http_failures_total
api_http_duration_seconds_sum
# Inconsistent naming (avoid)
api_requests
http_api_failures
http_duration_ms_api
2. Include Units
Always include units in metric names to avoid confusion:
# Good - Units included
memory_usage_bytes
request_duration_seconds
file_size_bytes
# Not recommended - Missing units
memory
request_duration
file_size
3. Use Appropriate Labels Instead of Embedding Information in Names
Use labels to differentiate aspects of what's being measured rather than creating new metrics:
# Bad approach - Information embedded in name
http_requests_login_total
http_requests_logout_total
http_requests_profile_total
# Better approach - Using labels
http_requests_total{endpoint="/login"}
http_requests_total{endpoint="/logout"}
http_requests_total{endpoint="/profile"}
4. Avoid Reserved Words
Avoid using reserved words or characters that have special meaning in PromQL:
job
- used as a label by Prometheusinstance
- used as a label by Prometheus- Special characters like
*
,{
,}
,[
,]
Real-World Examples
Let's look at some examples from common exporters and applications:
Node Exporter Metrics
The Node Exporter follows consistent naming conventions:
# CPU metrics
node_cpu_seconds_total{mode="idle"}
node_cpu_seconds_total{mode="user"}
# Memory metrics
node_memory_MemTotal_bytes
node_memory_MemFree_bytes
# Disk metrics
node_disk_io_time_seconds_total
node_disk_read_bytes_total
Custom Application Metrics
Here's how you might instrument a web application:
// Counter for HTTP requests
const httpRequestsTotal = new prometheus.Counter({
name: 'app_http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'path', 'status']
});
// Gauge for active connections
const activeConnections = new prometheus.Gauge({
name: 'app_active_connections',
help: 'Current number of active connections'
});
// Histogram for request duration
const httpRequestDurationSeconds = new prometheus.Histogram({
name: 'app_http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'path'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
});
OpenMetrics Standard
Prometheus naming conventions align with the OpenMetrics standard, which provides more formal specifications for metrics. Key aspects from the OpenMetrics standard include:
- Metric names must match the regex
[a-zA-Z_:][a-zA-Z0-9_:]*
- Metric names beginning with
_
are reserved - Colons (
:
) are reserved for user-defined recording rules
Evolving Metrics Over Time
When evolving your metrics, follow these principles:
- Avoid renaming metrics - This creates discontinuity in data
- Add new metrics rather than changing existing ones
- Use recording rules to create aggregated metrics while preserving raw data
# Example recording rule to create a new aggregated metric
groups:
- name: example
rules:
- record: job:http_requests_total:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
Visualization Considerations
Good naming conventions make it easier to create meaningful dashboards:
Common Pitfalls
- Inconsistent casing: Stick to snake_case for all metrics
- Missing units: Always include relevant units in metric names
- Too many labels: Use labels judiciously, as they increase cardinality
- Embedding label values in names: Use label dimensions instead of creating multiple metrics
Summary
Effective Prometheus naming conventions follow these key principles:
- Use clear, descriptive names with appropriate prefixes
- Include units in metric names
- Follow type-specific naming patterns (e.g.,
_total
for counters) - Use labels rather than creating multiple metrics
- Maintain consistency across your metrics ecosystem
By following these conventions, you'll create a monitoring system that is easier to query, understand, and maintain.
Additional Resources
Exercises
-
Rename these poorly named metrics to follow Prometheus conventions:
requests
memory_usage
errors_api_login
duration_ms
-
Create a proper naming scheme for metrics that would monitor:
- Database connection pool usage
- API endpoint response times
- Failed login attempts
- Queue processing rates
-
Convert these metrics into a single metric with appropriate labels:
http_get_requests_total
http_post_requests_total
http_put_requests_total
http_delete_requests_total
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)