Skip to main content

Custom Metric Collection

Introduction

In the world of monitoring, predefined metrics can take you only so far. To gain deeper insights into your specific applications and services, you'll often need to create and collect custom metrics tailored to your unique use cases.

Custom metric collection in Prometheus allows you to instrument your code with specific measurements that matter to your application's performance, health, and business logic. Whether you're tracking user logins, order processing times, or memory usage patterns specific to your application, custom metrics provide the visibility you need.

In this guide, we'll explore how to define, implement, and collect custom metrics using Prometheus client libraries, understand best practices, and see real-world applications of custom metric collection.

Understanding Prometheus Metric Types

Before diving into creating custom metrics, let's understand the four fundamental metric types in Prometheus:

Counter

A counter is a cumulative metric that represents a single monotonically increasing value. Counters can only increase or be reset to zero (usually when the process restarts).

Use cases:

  • Number of requests processed
  • Number of errors
  • Total tasks completed

Gauge

A gauge represents a single numerical value that can arbitrarily go up and down.

Use cases:

  • Memory usage
  • Current temperature
  • Number of active connections

Histogram

A histogram samples observations and counts them in configurable buckets. It also provides a sum of all observed values.

Use cases:

  • Request durations
  • Response sizes
  • Latency measurements

Summary

Similar to a histogram, a summary samples observations. Instead of buckets, it calculates configurable quantiles over a sliding time window.

Use cases:

  • Request durations with quantile calculations
  • When you need precise percentile measurements

Creating Custom Metrics with Client Libraries

Prometheus offers client libraries for many programming languages. Let's explore how to create custom metrics in some popular ones:

Go

go
package main

import (
"net/http"
"time"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
// Counter example
requestCounter := promauto.NewCounter(prometheus.CounterOpts{
Name: "myapp_requests_total",
Help: "The total number of processed requests",
})

// Gauge example
connectionGauge := promauto.NewGauge(prometheus.GaugeOpts{
Name: "myapp_active_connections",
Help: "The current number of active connections",
})

// Histogram example
durationHistogram := promauto.NewHistogram(prometheus.HistogramOpts{
Name: "myapp_request_duration_seconds",
Help: "Request duration distribution",
Buckets: prometheus.LinearBuckets(0.01, 0.05, 10), // 10 buckets, from 0.01 to 0.46 seconds
})

// Simulate some metrics
go func() {
for {
requestCounter.Inc()
connectionGauge.Set(float64(100 + time.Now().Second()))
durationHistogram.Observe(0.1 + float64(time.Now().Nanosecond())/1e9)
time.Sleep(1 * time.Second)
}
}()

// Expose metrics on /metrics endpoint
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":2112", nil)
}

Python

python
from prometheus_client import start_http_server, Counter, Gauge, Histogram
import random
import time

# Create metrics
REQUEST_COUNT = Counter('myapp_requests_total', 'Total app requests')
ACTIVE_CONNECTIONS = Gauge('myapp_active_connections', 'Number of active connections')
REQUEST_DURATION = Histogram('myapp_request_duration_seconds',
'Request duration in seconds',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 5])

# Start server
start_http_server(8000)

# Generate some metrics
while True:
# Increment counter
REQUEST_COUNT.inc()

# Set gauge to random value
connection_count = random.randint(80, 120)
ACTIVE_CONNECTIONS.set(connection_count)

# Observe histogram value
duration = random.random() * 0.5
REQUEST_DURATION.observe(duration)

time.sleep(1)

Java

java
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
import io.prometheus.client.exporter.HTTPServer;

import java.io.IOException;
import java.util.Random;

public class CustomMetricsExample {
static final Counter requestCounter = Counter.build()
.name("myapp_requests_total")
.help("Total requests processed")
.register();

static final Gauge connectionGauge = Gauge.build()
.name("myapp_active_connections")
.help("Current number of active connections")
.register();

static final Histogram requestDuration = Histogram.build()
.name("myapp_request_duration_seconds")
.help("Request duration distribution")
.buckets(0.01, 0.05, 0.1, 0.5, 1, 5)
.register();

public static void main(String[] args) throws IOException, InterruptedException {
HTTPServer server = new HTTPServer(8000);
Random random = new Random();

while (true) {
// Increment counter
requestCounter.inc();

// Update gauge
connectionGauge.set(80 + random.nextInt(41));

// Record histogram value
requestDuration.observe(random.nextDouble() * 0.5);

Thread.sleep(1000);
}
}
}

Node.js

javascript
const express = require('express');
const client = require('prom-client');
const app = express();

// Create a Registry to register the metrics
const register = new client.Registry();
client.collectDefaultMetrics({ register });

// Create custom metrics
const requestCounter = new client.Counter({
name: 'myapp_requests_total',
help: 'Total number of requests',
registers: [register]
});

const connectionGauge = new client.Gauge({
name: 'myapp_active_connections',
help: 'Number of active connections',
registers: [register]
});

const requestDuration = new client.Histogram({
name: 'myapp_request_duration_seconds',
help: 'Request duration distribution',
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
registers: [register]
});

// Simulate metrics
setInterval(() => {
requestCounter.inc();
connectionGauge.set(80 + Math.floor(Math.random() * 41));
requestDuration.observe(Math.random() * 0.5);
}, 1000);

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});

app.listen(8000, () => {
console.log('Server is running on http://localhost:8000');
});

Best Practices for Custom Metric Collection

1. Naming Conventions

Follow a consistent naming pattern for your metrics:

<namespace>_<subsystem>_<name>_<unit>[_info]

For example:

  • http_requests_total
  • node_memory_usage_bytes
  • api_request_duration_seconds

2. Labels and Cardinality

Use labels to add dimensions to your metrics, but be cautious about cardinality explosion:

python
# Good - Low cardinality
HTTP_REQUESTS = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'status_code', 'endpoint']
)

# Bad - High cardinality (user_id could have millions of values)
HTTP_REQUESTS = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'status_code', 'user_id']
)

3. Choosing the Right Metric Type

Select the appropriate metric type based on what you're measuring:

  • Use counters for events or totals
  • Use gauges for current values
  • Use histograms for distributions of values, especially latencies

4. Documentation

Always include comprehensive help text for each metric:

go
requestCounter := promauto.NewCounter(prometheus.CounterOpts{
Name: "myapp_requests_total",
Help: "The total number of HTTP requests processed, labeled by method, status code, and endpoint",
})

Real-World Custom Metric Collection Example

Let's build a more comprehensive example that monitors a fictional e-commerce application:

E-commerce Application Metrics

python
from prometheus_client import start_http_server, Counter, Gauge, Histogram, Summary
import random
import time

# Business metrics
CHECKOUT_COUNTER = Counter('ecommerce_checkouts_total',
'Total number of completed checkouts')

CART_ABANDONMENT = Counter('ecommerce_cart_abandonments_total',
'Total number of abandoned shopping carts')

PRODUCT_VIEWS = Counter('ecommerce_product_views_total',
'Product views',
['product_category'])

# Technical metrics
API_REQUEST_DURATION = Histogram('ecommerce_api_request_duration_seconds',
'API request duration in seconds',
['endpoint'],
buckets=[0.01, 0.05, 0.1, 0.5, 1, 5])

DB_CONNECTION_POOL = Gauge('ecommerce_db_connections_active',
'Number of active database connections')

PAYMENT_PROCESSING_TIME = Summary('ecommerce_payment_processing_seconds',
'Time spent processing payments')

# Start Prometheus HTTP server
start_http_server(8000)
print("Metrics available at http://localhost:8000/metrics")

# Simulate application activity
product_categories = ['electronics', 'clothing', 'home', 'food', 'toys']
api_endpoints = ['products', 'cart', 'checkout', 'user', 'search']

while True:
# Simulate product views
category = random.choice(product_categories)
PRODUCT_VIEWS.labels(product_category=category).inc()

# Simulate checkouts and cart abandonments
if random.random() < 0.1: # 10% checkout
CHECKOUT_COUNTER.inc()
if random.random() < 0.2: # 20% abandon cart
CART_ABANDONMENT.inc()

# Simulate API requests with different durations
endpoint = random.choice(api_endpoints)
duration = 0.05 + (random.random() * 0.3) # Between 0.05s and 0.35s
API_REQUEST_DURATION.labels(endpoint=endpoint).observe(duration)

# Simulate DB connection pool fluctuations
connections = random.randint(5, 20)
DB_CONNECTION_POOL.set(connections)

# Simulate payment processing time
if random.random() < 0.05: # 5% of iterations process a payment
payment_time = 0.5 + (random.random() * 2.0) # Between 0.5s and 2.5s
PAYMENT_PROCESSING_TIME.observe(payment_time)

time.sleep(0.1) # Generate metrics quickly for demonstration

Resulting Metrics

This example would generate metrics such as:

# HELP ecommerce_checkouts_total Total number of completed checkouts
# TYPE ecommerce_checkouts_total counter
ecommerce_checkouts_total 42

# HELP ecommerce_cart_abandonments_total Total number of abandoned shopping carts
# TYPE ecommerce_cart_abandonments_total counter
ecommerce_cart_abandonments_total 87

# HELP ecommerce_product_views_total Product views
# TYPE ecommerce_product_views_total counter
ecommerce_product_views_total{product_category="electronics"} 132
ecommerce_product_views_total{product_category="clothing"} 98
ecommerce_product_views_total{product_category="home"} 65
ecommerce_product_views_total{product_category="food"} 43
ecommerce_product_views_total{product_category="toys"} 54

# HELP ecommerce_api_request_duration_seconds API request duration in seconds
# TYPE ecommerce_api_request_duration_seconds histogram
...

# HELP ecommerce_db_connections_active Number of active database connections
# TYPE ecommerce_db_connections_active gauge
ecommerce_db_connections_active 12

# HELP ecommerce_payment_processing_seconds Time spent processing payments
# TYPE ecommerce_payment_processing_seconds summary
...

Visualizing Custom Metrics

Once you've collected custom metrics, you can create meaningful dashboards in Grafana. Here's a simple example of PromQL queries for our e-commerce metrics:

  1. Checkout Conversion Rate:

    sum(rate(ecommerce_checkouts_total[5m])) / sum(rate(ecommerce_product_views_total[5m]))
  2. Cart Abandonment Rate:

    sum(rate(ecommerce_cart_abandonments_total[5m])) / (sum(rate(ecommerce_cart_abandonments_total[5m])) + sum(rate(ecommerce_checkouts_total[5m])))
  3. API Latency by Endpoint (95th Percentile):

    histogram_quantile(0.95, sum(rate(ecommerce_api_request_duration_seconds_bucket[5m])) by (endpoint, le))
  4. Top Product Categories by Views:

    topk(3, sum(rate(ecommerce_product_views_total[1h])) by (product_category))

Push vs. Pull for Custom Metrics

Prometheus primarily uses a pull model where the Prometheus server scrapes metrics endpoints. However, sometimes you need to push metrics:

When to Use Push Gateway

  • Short-lived jobs that may complete before scraping
  • Batch jobs
  • Systems behind firewalls without direct access

Example using the Push Gateway:

python
from prometheus_client import Counter, push_to_gateway

job_completion = Counter('batch_job_completions_total', 'Number of completed batch jobs')

# Do some work
job_completion.inc()

# Push to Pushgateway
push_to_gateway('localhost:9091', job='batch_processor', registry=None)

Implementing a Custom Collector

Sometimes you need to collect metrics from systems that don't support Prometheus directly. You can implement a custom collector:

python
from prometheus_client.core import GaugeMetricFamily, CounterMetricFamily, REGISTRY

class CustomCollector(object):
def collect(self):
# Yield metrics
c = CounterMetricFamily('my_custom_counter', 'Description of counter', labels=['label1'])
c.add_metric(['value1'], 15)
yield c

g = GaugeMetricFamily('my_custom_gauge', 'Description of gauge', labels=['label1'])
g.add_metric(['value1'], 12.3)
yield g

# Register the custom collector
REGISTRY.register(CustomCollector())

Working with Exporters

When you can't instrument your application directly, use or build an exporter that converts metrics from one format to Prometheus format.

Here's a simple example of a custom exporter for a legacy API:

python
import requests
import time
from prometheus_client import start_http_server, Gauge, Counter

# Define metrics
USERS_ONLINE = Gauge('legacy_users_online', 'Number of users currently online')
API_ERRORS = Counter('legacy_api_errors_total', 'Number of legacy API errors')

def scrape_legacy_api():
try:
# Call legacy API
response = requests.get('http://legacy-service/stats')
data = response.json()

# Update Prometheus metrics
USERS_ONLINE.set(data['active_users'])

if not response.ok:
API_ERRORS.inc()
except Exception:
API_ERRORS.inc()

# Start server
start_http_server(8000)

# Main loop
while True:
scrape_legacy_api()
time.sleep(15) # Scrape every 15 seconds

Advanced Custom Metrics

Let's look at some more advanced metrics concepts:

Multi-process Metrics Collection

When running multiple instances of an application, you need to handle metrics appropriately:

python
from prometheus_client import Counter, multiprocess, CollectorRegistry, start_http_server
import os

# Setup registry for multiprocess collection
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)

# Create a counter with the registry
c = Counter('my_counter', 'My counter help', registry=registry)

# Increment counter
c.inc()

# Expose metrics
start_http_server(8000, registry=registry)

# Important: Set the environment variable
os.environ['PROMETHEUS_MULTIPROC_DIR'] = '/tmp'

Metric with Timestamps

For special cases, you might need to add timestamps to metrics:

python
from prometheus_client import Counter, Gauge, generate_latest, REGISTRY
import time

g = Gauge('my_gauge', 'Description', ['label'])

# Set gauge with timestamp
g.labels('value').set_to_current_time()

# Create your own timestamp (unix seconds)
g.labels('another_value').set(15, 1623185425)

Summary

Custom metric collection in Prometheus provides a powerful way to gain deep insights into your applications and infrastructure. In this guide, we've covered:

  • The four types of Prometheus metrics: Counter, Gauge, Histogram, and Summary
  • How to implement custom metrics in various programming languages
  • Best practices for naming, labeling, and documenting metrics
  • Real-world examples of custom metrics for business and technical monitoring
  • Advanced topics like the Push Gateway, custom collectors, and exporters

By implementing custom metrics, you can:

  • Track business-relevant indicators
  • Measure technical performance
  • Create comprehensive monitoring dashboards
  • Set up meaningful alerts

Additional Resources

Exercises

  1. Basic Instrumentation: Add custom metrics to an existing application to track:

    • Number of API requests
    • Response times
    • Error rates
  2. Business Metrics: Design and implement metrics for a fictional e-commerce site that would help answer these questions:

    • What's the conversion rate?
    • Which products are most viewed?
    • When do we experience the most traffic?
  3. Grafana Dashboard: Create a Grafana dashboard showing your custom metrics with:

    • A graph of request rates over time
    • A heatmap of request durations
    • A gauge showing current active users
    • A table of top endpoints by request count
  4. Custom Exporter: Build a simple exporter that collects system information not available through the node exporter and exposes it as Prometheus metrics.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)