Prometheus Custom Collectors
Introduction
In your monitoring journey with Prometheus, you'll eventually encounter scenarios where the default exporters don't provide the specific metrics you need. This is where Custom Collectors come into play. Custom collectors allow you to define and implement your own metrics collection logic, enabling you to monitor virtually any aspect of your applications or systems.
In this guide, we'll explore how to create custom collectors in Prometheus, understand their internal workings, and implement practical examples that demonstrate their real-world applications.
What Are Custom Collectors?
A collector in Prometheus is a component responsible for gathering specific metrics. While Prometheus provides many ready-to-use exporters (like the Node Exporter for hardware and OS metrics), custom collectors let you define exactly what and how to measure.
Custom collectors implement the Collector
interface, which requires methods to:
- Describe the metrics being collected
- Collect the current values of those metrics
When to Use Custom Collectors
You might need custom collectors when:
- You need to monitor a system without an existing exporter
- You want to instrument your application with business-specific metrics
- You need to collect metrics from multiple sources and present them in a unified way
- The default exporters don't provide the granularity or specific metrics you need
Creating Basic Custom Collectors
Let's start by understanding how to implement a custom collector in Go, which is the most common language for Prometheus instrumentation.
The Collector Interface
In Prometheus's client libraries, collectors must implement two key methods:
type Collector interface {
// Describe sends the super-set of all possible descriptors of metrics
Describe(chan<- *Desc)
// Collect is called by the Prometheus registry when collecting metrics
Collect(chan<- Metric)
}
A Simple Example: System Uptime Collector
Let's create a custom collector that reports system uptime (which might not be available in your environment through standard exporters):
package main
import (
"fmt"
"log"
"net/http"
"os/exec"
"strconv"
"strings"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// UptimeCollector implements the Collector interface
type UptimeCollector struct {
uptimeMetric *prometheus.Desc
}
// NewUptimeCollector creates a new UptimeCollector
func NewUptimeCollector() *UptimeCollector {
return &UptimeCollector{
uptimeMetric: prometheus.NewDesc(
"system_uptime_seconds",
"Current system uptime in seconds",
nil, nil,
),
}
}
// Describe implements the prometheus.Collector interface
func (c *UptimeCollector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.uptimeMetric
}
// Collect implements the prometheus.Collector interface
func (c *UptimeCollector) Collect(ch chan<- prometheus.Metric) {
// Execute the 'uptime' command and parse its output
cmd := exec.Command("cat", "/proc/uptime")
output, err := cmd.Output()
if err != nil {
log.Printf("Error executing uptime command: %v", err)
return
}
// Parse the output to get uptime in seconds
uptimeString := strings.Split(string(output), " ")[0]
uptime, err := strconv.ParseFloat(uptimeString, 64)
if err != nil {
log.Printf("Error parsing uptime: %v", err)
return
}
// Create a metric with the uptime value
ch <- prometheus.MustNewConstMetric(
c.uptimeMetric,
prometheus.GaugeValue,
uptime,
)
}
func main() {
// Create a new registry
reg := prometheus.NewRegistry()
// Create and register our custom collector
collector := NewUptimeCollector()
reg.MustRegister(collector)
// Expose metrics on /metrics endpoint
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))
fmt.Println("Starting server on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
When you run this code and access http://localhost:8080/metrics
, you'll see output similar to:
# HELP system_uptime_seconds Current system uptime in seconds
# TYPE system_uptime_seconds gauge
system_uptime_seconds 345678.45
Understanding Metric Types in Custom Collectors
When creating custom collectors, you'll need to choose the appropriate metric type for each measurement. Prometheus supports four main metric types:
- Counter: A value that only increases (e.g., number of requests processed)
- Gauge: A value that can go up and down (e.g., current memory usage)
- Histogram: Samples observations and counts them in configurable buckets (e.g., request durations)
- Summary: Similar to histograms, but also calculates configurable quantiles (e.g., 95th percentile of request durations)
Let's explore a more complex example that uses different metric types.
Advanced Example: Database Connection Pool Collector
Monitoring a database connection pool is a common requirement. Let's create a custom collector for a fictional database pool:
package main
import (
"fmt"
"log"
"math/rand"
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// DBPoolCollector collects metrics about a database connection pool
type DBPoolCollector struct {
activeConnections *prometheus.Desc
maxConnections *prometheus.Desc
connectionsCreated *prometheus.Desc
queryDuration *prometheus.Desc
}
// NewDBPoolCollector creates a new DBPoolCollector
func NewDBPoolCollector() *DBPoolCollector {
return &DBPoolCollector{
activeConnections: prometheus.NewDesc(
"db_pool_connections_active",
"The number of active connections in the database pool",
[]string{"db_name"}, nil,
),
maxConnections: prometheus.NewDesc(
"db_pool_connections_max",
"The maximum number of connections allowed",
[]string{"db_name"}, nil,
),
connectionsCreated: prometheus.NewDesc(
"db_pool_connections_created_total",
"The total number of connections created",
[]string{"db_name"}, nil,
),
queryDuration: prometheus.NewDesc(
"db_pool_query_duration_seconds",
"The duration of database queries in seconds",
[]string{"db_name", "query_type"}, nil,
),
}
}
// Describe implements the prometheus.Collector interface
func (c *DBPoolCollector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.activeConnections
ch <- c.maxConnections
ch <- c.connectionsCreated
ch <- c.queryDuration
}
// Collect implements the prometheus.Collector interface
func (c *DBPoolCollector) Collect(ch chan<- prometheus.Metric) {
// In a real scenario, these would come from your actual DB pool
// For this example, we'll simulate the values
// Simulate active connections (gauge)
activeConns := float64(rand.Intn(100))
ch <- prometheus.MustNewConstMetric(
c.activeConnections,
prometheus.GaugeValue,
activeConns,
"production_db", // label value for db_name
)
// Maximum connections (gauge)
ch <- prometheus.MustNewConstMetric(
c.maxConnections,
prometheus.GaugeValue,
200,
"production_db",
)
// Total connections created (counter)
// In a real implementation, this would be a cumulative value
ch <- prometheus.MustNewConstMetric(
c.connectionsCreated,
prometheus.CounterValue,
1000 + float64(rand.Intn(100)),
"production_db",
)
// Query durations for different query types (histogram data)
// In a real scenario, you would have actual timing data
queryTypes := []string{"select", "insert", "update", "delete"}
for _, queryType := range queryTypes {
var baseDuration float64
switch queryType {
case "select":
baseDuration = 0.05
case "insert":
baseDuration = 0.1
case "update":
baseDuration = 0.15
case "delete":
baseDuration = 0.12
}
// Add some randomness to the durations
duration := baseDuration + rand.Float64()*0.1
ch <- prometheus.MustNewConstMetric(
c.queryDuration,
prometheus.GaugeValue, // In a real implementation, you might use a Histogram
duration,
"production_db", queryType,
)
}
}
func main() {
// Seed the random number generator
rand.Seed(time.Now().UnixNano())
// Create a new registry
reg := prometheus.NewRegistry()
// Create and register our custom collector
collector := NewDBPoolCollector()
reg.MustRegister(collector)
// Expose metrics on /metrics endpoint
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))
fmt.Println("Starting server on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
When this code runs, it will expose metrics like:
# HELP db_pool_connections_active The number of active connections in the database pool
# TYPE db_pool_connections_active gauge
db_pool_connections_active{db_name="production_db"} 87
# HELP db_pool_connections_max The maximum number of connections allowed
# TYPE db_pool_connections_max gauge
db_pool_connections_max{db_name="production_db"} 200
# HELP db_pool_connections_created_total The total number of connections created
# TYPE db_pool_connections_created_total counter
db_pool_connections_created_total{db_name="production_db"} 1042
# HELP db_pool_query_duration_seconds The duration of database queries in seconds
# TYPE db_pool_query_duration_seconds gauge
db_pool_query_duration_seconds{db_name="production_db",query_type="select"} 0.123
db_pool_query_duration_seconds{db_name="production_db",query_type="insert"} 0.187
db_pool_query_duration_seconds{db_name="production_db",query_type="update"} 0.226
db_pool_query_duration_seconds{db_name="production_db",query_type="delete"} 0.144
Using Labels in Custom Collectors
As you've seen in the previous example, labels provide a powerful way to add dimensions to your metrics. They allow you to:
- Categorize metrics (e.g., by database name, server instance, or query type)
- Query and filter metrics in Prometheus expressions
- Create more targeted alerts and dashboards
When designing your custom collectors, carefully consider which labels to include:
- Use labels for dimensions that are important for querying and alerting
- Avoid high-cardinality labels (e.g., user IDs or timestamps) as they can impact Prometheus performance
- Keep label names and values consistent across related metrics
Registering Custom Collectors
There are two main ways to register your custom collectors:
1. Register directly with a registry:
reg := prometheus.NewRegistry()
collector := NewMyCustomCollector()
reg.MustRegister(collector)
2. Use the default registry:
collector := NewMyCustomCollector()
prometheus.MustRegister(collector)
Using a custom registry is useful when you want to expose different sets of metrics on different endpoints or when you want to control exactly which metrics are exposed.
Real-World Applications
Let's look at some practical scenarios where custom collectors are valuable:
1. External API Monitoring
// APIHealthCollector monitors external API health and response times
type APIHealthCollector struct {
apiHealth *prometheus.Desc
apiResponseTime *prometheus.Desc
}
func (c *APIHealthCollector) Collect(ch chan<- prometheus.Metric) {
// Check multiple APIs
apis := map[string]string{
"payment_gateway": "https://payment.example.com/health",
"auth_service": "https://auth.example.com/health",
"data_service": "https://data.example.com/health",
}
for name, url := range apis {
// Measure response time
startTime := time.Now()
resp, err := http.Get(url)
duration := time.Since(startTime).Seconds()
// Record response time
ch <- prometheus.MustNewConstMetric(
c.apiResponseTime,
prometheus.GaugeValue,
duration,
name,
)
// Record health status (1 = healthy, 0 = unhealthy)
var health float64 = 0
if err == nil && resp.StatusCode == 200 {
health = 1
}
ch <- prometheus.MustNewConstMetric(
c.apiHealth,
prometheus.GaugeValue,
health,
name,
)
}
}
2. File System Monitoring
// FileSystemCollector monitors specific directories
type FileSystemCollector struct {
directorySize *prometheus.Desc
fileCount *prometheus.Desc
}
func (c *FileSystemCollector) Collect(ch chan<- prometheus.Metric) {
// Monitor critical directories
directories := []string{"/var/log", "/tmp", "/var/lib/mysql"}
for _, dir := range directories {
// Get directory size and file count
var size int64 = 0
var count int64 = 0
err := filepath.Walk(dir, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
size += info.Size()
count++
}
return nil
})
if err == nil {
ch <- prometheus.MustNewConstMetric(
c.directorySize,
prometheus.GaugeValue,
float64(size),
dir,
)
ch <- prometheus.MustNewConstMetric(
c.fileCount,
prometheus.GaugeValue,
float64(count),
dir,
)
}
}
}
3. Business Metrics Collector
Business metrics are often overlooked but can provide valuable insights:
// BusinessMetricsCollector collects business-related metrics
type BusinessMetricsCollector struct {
activeUsers *prometheus.Desc
conversionRate *prometheus.Desc
averageOrderValue *prometheus.Desc
}
func (c *BusinessMetricsCollector) Collect(ch chan<- prometheus.Metric) {
// In a real application, these would come from your database or analytics service
// Simulate active users count from different regions
regions := []string{"north_america", "europe", "asia", "other"}
for _, region := range regions {
var baseUsers float64
switch region {
case "north_america":
baseUsers = 5000
case "europe":
baseUsers = 3500
case "asia":
baseUsers = 4200
case "other":
baseUsers = 1800
}
activeUsers := baseUsers + rand.Float64()*500
ch <- prometheus.MustNewConstMetric(
c.activeUsers,
prometheus.GaugeValue,
activeUsers,
region,
)
}
// Conversion rates for different product categories
categories := []string{"electronics", "clothing", "home_goods", "food"}
for _, category := range categories {
var baseRate float64
switch category {
case "electronics":
baseRate = 0.032
case "clothing":
baseRate = 0.045
case "home_goods":
baseRate = 0.028
case "food":
baseRate = 0.067
}
conversionRate := baseRate + (rand.Float64()-0.5)*0.01
ch <- prometheus.MustNewConstMetric(
c.conversionRate,
prometheus.GaugeValue,
conversionRate,
category,
)
// Average order values
var baseOrderValue float64
switch category {
case "electronics":
baseOrderValue = 250
case "clothing":
baseOrderValue = 85
case "home_goods":
baseOrderValue = 120
case "food":
baseOrderValue = 45
}
orderValue := baseOrderValue * (1 + (rand.Float64()-0.5)*0.2)
ch <- prometheus.MustNewConstMetric(
c.averageOrderValue,
prometheus.GaugeValue,
orderValue,
category,
)
}
}
Best Practices for Custom Collectors
When implementing custom collectors, follow these best practices:
-
Naming Conventions: Follow Prometheus naming conventions
- Use lowercase with underscores (snake_case)
- Include relevant units (e.g.,
_seconds
,_bytes
,_total
) - Use consistent prefixes for related metrics
-
Performance Considerations:
- Keep metric collection lightweight; heavy operations can impact your application
- Implement caching for expensive operations
- Consider timeouts for external dependencies
-
Error Handling:
- Handle errors gracefully in the
Collect
method - Log issues but don't block metric collection if one metric fails
- Provide fallback values when appropriate
- Handle errors gracefully in the
-
Documentation:
- Add helpful descriptions to your metrics
- Document the meaning of labels
- Include unit information in the metric name or description
-
Testing:
- Write unit tests for your collectors
- Simulate edge cases and error conditions
- Test the performance impact of your collectors
Flow Diagram of a Custom Collector
Here's a diagram showing how custom collectors fit into the Prometheus ecosystem:
Summary
Custom collectors are a powerful feature of Prometheus that allow you to extend its monitoring capabilities to virtually any system or application. By implementing the Collector
interface, you can create metrics tailored to your specific needs, whether they're technical metrics like system performance or business metrics like conversion rates.
In this guide, we've explored:
- What custom collectors are and when to use them
- How to implement the
Collector
interface - Different metric types and their appropriate uses
- Using labels to add dimensions to your metrics
- Real-world examples of custom collectors
- Best practices for designing and implementing collectors
Custom collectors provide the flexibility you need to build a comprehensive monitoring solution that covers not just standard system metrics, but also application-specific and business-relevant metrics that can give you deeper insights into your systems.
Exercises
- Create a custom collector that monitors the number of files in a specific directory and their total size.
- Extend the database connection pool collector to include metrics about query errors and slow queries.
- Implement a custom collector for a third-party API that your application depends on, tracking response times and error rates.
- Create a collector that provides business metrics relevant to your application domain (e.g., user registrations, active sessions, or transaction values).
- Design a custom collector that combines data from multiple sources into a single coherent set of metrics.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)