Prometheus Summaries

Introduction

In Prometheus monitoring, a Summary is one of the four core metric types that allows you to track the distribution of values over time. Unlike counters or gauges that track single values, summaries are designed to capture and analyze observations that vary across a range of values—such as request durations, response sizes, or resource consumption measurements.

Summaries are particularly useful when you need to:

Track percentiles (quantiles) of observed values
Calculate average values within sliding time windows
Analyze distributions without predefined buckets (unlike histograms)

By the end of this guide, you'll understand how summaries work, when to use them, and how to implement them in your applications.

How Summaries Work

A summary metric in Prometheus consists of multiple components:

A count of the total number of observations
A sum of all observed values
Quantiles (configurable percentiles) of the observed values

When you track a value using a summary, Prometheus calculates and exposes these components so you can analyze the distribution of values over time.

Summary vs. Histogram

Before diving deeper, it's important to understand when to use a summary versus a histogram:

Summary	Histogram
Client-side quantile calculation	Server-side quantile calculation
More accurate for specific quantiles	Better for exploring data with flexible quantiles
Higher client CPU usage	Lower client CPU usage
Cannot be aggregated across instances	Can be aggregated across instances

Choose a summary when you need precise quantiles calculated on the client side and don't need to aggregate them across multiple instances.

Creating Summaries in Prometheus Client Libraries

Let's look at how to create and use summaries with different Prometheus client libraries.

Go Example

package main

import (
    "math/rand"
    "net/http"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    requestDuration = prometheus.NewSummary(
        prometheus.SummaryOpts{
            Name:       "http_request_duration_seconds",
            Help:       "HTTP request duration in seconds",
            Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
        },
    )
)

func init() {
    prometheus.MustRegister(requestDuration)
}

func main() {
    // Simulate requests with random durations
    go func() {
        for {
            // Simulate a request duration between 0 and 500ms
            duration := rand.Float64() * 0.5
            requestDuration.Observe(duration)
            time.Sleep(100 * time.Millisecond)
        }
    }()

    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

This example creates a summary metric that tracks HTTP request durations. The Objectives map defines the quantiles to track along with their allowed error margin.

Python Example

from prometheus_client import Summary, start_http_server
import random
import time

# Create a summary to track request duration
REQUEST_DURATION = Summary(
    'http_request_duration_seconds',
    'HTTP request duration in seconds',
    ['endpoint']
)

# A decorator to measure the execution time of a function
@REQUEST_DURATION.labels('api').time()
def process_request():
    # Simulate processing time
    time.sleep(random.uniform(0, 0.5))
    return "processed"

if __name__ == '__main__':
    # Start up the server to expose the metrics
    start_http_server(8000)
    
    # Generate some requests
    while True:
        process_request()
        time.sleep(1)

In this Python example, we create a summary and use its .time() decorator to automatically time a function's execution. The metric is labeled by endpoint to allow for more granular analysis.

Java Example

import io.prometheus.client.Summary;
import io.prometheus.client.exporter.HTTPServer;

import java.io.IOException;
import java.util.Random;

public class SummaryExample {
    private static final Summary requestDuration = Summary.build()
            .name("http_request_duration_seconds")
            .help("HTTP request duration in seconds")
            .quantile(0.5, 0.05)  // 50th percentile with 5% error
            .quantile(0.9, 0.01)  // 90th percentile with 1% error
            .quantile(0.99, 0.001) // 99th percentile with 0.1% error
            .register();
    
    public static void main(String[] args) throws IOException, InterruptedException {
        // Start an HTTP server to expose metrics
        HTTPServer server = new HTTPServer(8080);
        Random random = new Random();
        
        while (true) {
            // Use a timer to automatically time execution and observe the duration
            Summary.Timer timer = requestDuration.startTimer();
            try {
                // Simulate work
                Thread.sleep(random.nextInt(500));
            } finally {
                timer.observeDuration();
            }
            Thread.sleep(100);
        }
    }
}

Understanding Summary Output

When you scrape a summary metric, you'll see multiple time series like this:

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds summary
http_request_duration_seconds{quantile="0.5"} 0.24714789
http_request_duration_seconds{quantile="0.9"} 0.41428945
http_request_duration_seconds{quantile="0.99"} 0.48916918
http_request_duration_seconds_sum 612.92
http_request_duration_seconds_count 1592

This output shows:

Three quantile values (50th, 90th, and 99th percentiles)
The sum of all observed values (612.92 seconds)
The count of observations (1592)

Practical Use Cases

Case 1: Monitoring API Response Times

One of the most common applications of summaries is tracking API response times:

const client = require('prom-client');

// Create a summary
const apiResponseTime = new client.Summary({
  name: 'api_response_time_seconds',
  help: 'Response time of the API in seconds',
  labelNames: ['endpoint', 'status_code'],
  percentiles: [0.5, 0.9, 0.95, 0.99]
});

// In your request handler
app.get('/api/data', (req, res) => {
  const end = apiResponseTime.startTimer({ endpoint: '/api/data' });
  
  // Process request
  fetchData()
    .then(data => {
      res.status(200).json(data);
      end({ status_code: 200 });
    })
    .catch(err => {
      res.status(500).json({ error: err.message });
      end({ status_code: 500 });
    });
});

With this implementation, you can answer questions like:

What's the median response time for the /api/data endpoint?
What is the 99th percentile response time for successful requests?
How does the response time distribution change during peak hours?

Case 2: Tracking Resource Consumption

You can use summaries to track resource consumption patterns:

from prometheus_client import Summary
import psutil
import time

# Create a summary for memory usage (percentage)
MEMORY_USAGE = Summary(
    'system_memory_usage_percent',
    'System memory usage in percent',
    ['process']
)

# Create a summary for CPU usage (percentage)
CPU_USAGE = Summary(
    'system_cpu_usage_percent',
    'System CPU usage in percent',
    ['process']
)

def collect_metrics():
    while True:
        # Observe memory usage
        memory_percent = psutil.virtual_memory().percent
        MEMORY_USAGE.labels('system').observe(memory_percent)
        
        # Observe CPU usage
        cpu_percent = psutil.cpu_percent(interval=1)
        CPU_USAGE.labels('system').observe(cpu_percent)
        
        time.sleep(10)

This allows you to analyze:

Typical memory usage patterns (median usage)
Peak memory consumption (99th percentile)
Variations in CPU utilization throughout the day

Best Practices for Using Summaries

Choose Appropriate Quantiles: Only track the quantiles you actually need. Each quantile increases memory usage.
Use Proper Error Margins: Each quantile can have an error margin. Smaller error margins require more memory.
Consider Summary vs Histogram: Use summaries when you need precise quantiles calculated on the client side and don't need to aggregate them.
Add Useful Labels: Include labels that help you analyze the data, but avoid high-cardinality labels.
Set Reasonable Timeouts: For timing operations, make sure to set reasonable timeouts to avoid skewed metrics from hung operations.

Common Pitfalls

Aggregation Limitations: Quantiles from summaries cannot be meaningfully aggregated across instances. If you need to aggregate percentiles, use histograms instead.
Memory Usage: Tracking many quantiles with low error margins can significantly increase memory usage.
Duration Calculation: When manually calculating durations, be careful about time units and precision.
Label Cardinality: Avoid using high-cardinality labels with summaries to prevent memory bloat.

PromQL Queries for Summaries

Here are some useful PromQL queries for working with summaries:

# Get the 90th percentile request duration for an API endpoint
http_request_duration_seconds{quantile="0.9",endpoint="/api/users"}

# Calculate request rate
rate(http_request_duration_seconds_count{endpoint="/api/users"}[5m])

# Calculate average request duration over 5 minutes
rate(http_request_duration_seconds_sum{endpoint="/api/users"}[5m]) / 
rate(http_request_duration_seconds_count{endpoint="/api/users"}[5m])

Summary

In this guide, we've covered:

What Prometheus summaries are and how they work
When to use summaries vs. histograms
How to implement summaries in different programming languages
How to interpret summary metrics
Practical use cases for summaries
Best practices and common pitfalls
Useful PromQL queries for analyzing summary data

Summaries are a powerful tool in your monitoring toolkit when you need to track distributions of values and analyze percentiles. By choosing the right metric type for your use case and following best practices, you can build effective monitoring solutions that help you understand your system's behavior.

Additional Resources

Exercises

Implement a summary metric to track the response size of an API endpoint.
Create a dashboard that shows the 50th, 90th, and 99th percentiles of response times.
Compare the results of using a summary versus a histogram for the same metric.
Implement a summary with appropriate labels to track database query times by query type.
Use the rate() function with summary metrics to analyze throughput and latency together.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

How Summaries Work​

Summary vs. Histogram​

Creating Summaries in Prometheus Client Libraries​

Go Example​

Python Example​

Java Example​

Understanding Summary Output​

Practical Use Cases​

Case 1: Monitoring API Response Times​

Case 2: Tracking Resource Consumption​

Best Practices for Using Summaries​

Common Pitfalls​

PromQL Queries for Summaries​

Summary​

Additional Resources​

Exercises​