Node Exporter

Introduction

Node Exporter is a critical component in the Prometheus monitoring ecosystem that enables system-level metrics collection from Unix-like operating systems. As part of the Prometheus Exporters family, Node Exporter specifically focuses on hardware and OS metrics such as CPU usage, memory, disk I/O, filesystem fullness, and network statistics.

Unlike application-specific exporters, Node Exporter provides visibility into the underlying infrastructure running your applications. This makes it an essential tool for identifying performance bottlenecks, predicting resource exhaustion, and understanding system behavior during incidents.

What is Node Exporter?

Node Exporter is an official Prometheus exporter designed to expose a wide variety of hardware and kernel-related metrics about the host machine. The name "Node" refers to a host machine or node in your infrastructure.

Node Exporter runs as a daemon on the target systems you want to monitor, collecting metrics that aren't available to Prometheus by default. These metrics are exposed via an HTTP endpoint (typically on port 9100) that Prometheus can scrape at regular intervals.

Key Features of Node Exporter

Comprehensive metrics collection: Collects hundreds of metrics across various subsystems
Modular collector design: Enables enabling/disabling specific collectors based on needs
Low resource footprint: Minimal impact on system performance
Cross-platform support: Works on various Unix-like systems (Linux, FreeBSD, macOS, etc.)
Standardized metric naming: Follows Prometheus naming conventions

Installation and Setup

Prerequisites

A Unix-like operating system (Linux, FreeBSD, Darwin, etc.)
Root or sudo access (for some collectors)
Basic understanding of Prometheus concepts

Installing Node Exporter on Linux

Method 1: Using Binary Release

Download the latest release from the Prometheus downloads page:

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz

Extract the downloaded archive:

tar xvfz node_exporter-1.6.1.linux-amd64.tar.gz

Move to the extracted directory:

cd node_exporter-1.6.1.linux-amd64

Run Node Exporter:

./node_exporter

You should see output similar to:

level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:182 msg="Starting node_exporter" version="1.6.1"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:183 msg="Build context" build_context="go=1.20.4 user=root date=20230607-15:47:21 sha=9a51a674eb32454e9aa91855e2a03cb1"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:185 msg="Enabled collectors"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:197 collector=arp
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:197 collector=bcache
...
level=info ts=2023-06-16T14:30:25.043Z caller=tls_config.go:195 msg="TLS is disabled." http2=false
level=info ts=2023-06-16T14:30:25.043Z caller=node_exporter.go:1375 msg="Listening on" address=:9100
level=info ts=2023-06-16T14:30:25.043Z caller=node_exporter.go:1376 msg="Listening on" address=[::]:9100

Method 2: Using Docker

If you prefer using Docker, you can run Node Exporter as a container:

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  --name node_exporter \
  prom/node_exporter:latest \
  --path.rootfs=/host

Method 3: Using Package Managers

On Debian/Ubuntu:

sudo apt-get update
sudo apt-get install prometheus-node-exporter

On RHEL/CentOS:

sudo yum install prometheus-node-exporter

Running Node Exporter as a Service

For production environments, it's recommended to run Node Exporter as a systemd service:

Create a Node Exporter user:

sudo useradd --no-create-home --shell /bin/false node_exporter

Move the binary to a standard location:

sudo cp node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create a systemd service file:

sudo nano /etc/systemd/system/node_exporter.service

Add the following content:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Check the status:

sudo systemctl status node_exporter

Configuring Prometheus to Scrape Node Exporter

Once Node Exporter is running, you need to configure Prometheus to scrape metrics from it:

Open your Prometheus configuration file (prometheus.yml):

sudo nano /etc/prometheus/prometheus.yml

Add a scrape configuration for Node Exporter:

scrape_configs:
  # Other scrape configs...

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

If you're monitoring multiple nodes, you can add multiple targets:

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets:
        - 'node1.example.com:9100'
        - 'node2.example.com:9100'
        - 'node3.example.com:9100'

Reload Prometheus to apply changes:

curl -X POST http://localhost:9090/-/reload

Understanding Node Exporter Metrics

Node Exporter exposes a wide variety of metrics. Here are some of the most important categories:

CPU Metrics

node_cpu_seconds_total: Seconds the CPUs spent in each mode
node_load1, node_load5, node_load15: System load averages

Memory Metrics

node_memory_MemTotal_bytes: Total memory
node_memory_MemFree_bytes: Free memory
node_memory_MemAvailable_bytes: Available memory

Disk Metrics

node_filesystem_avail_bytes: Filesystem space available
node_filesystem_size_bytes: Filesystem size
node_disk_io_time_seconds_total: Total seconds spent doing I/O

Network Metrics

node_network_receive_bytes_total: Network bytes received
node_network_transmit_bytes_total: Network bytes transmitted
node_network_up: Network interface up (1) or down (0)

Working with Node Exporter Metrics

Let's explore some practical examples of how to use Node Exporter metrics with PromQL (Prometheus Query Language).

Example 1: CPU Usage Percentage

To calculate the CPU usage percentage:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

This query:

Takes the rate of increase in idle CPU time over 1 minute
Multiplies by 100 to get a percentage
Subtracts from 100 to get the usage percentage rather than idle percentage

Example 2: Memory Usage Percentage

To calculate memory usage percentage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Example 3: Disk Space Usage Percentage

For disk space usage percentage:

(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

Example 4: Network I/O

For network traffic rate:

rate(node_network_receive_bytes_total[5m])

rate(node_network_transmit_bytes_total[5m])

Enabling/Disabling Specific Collectors

Node Exporter has a modular architecture with different collectors for various metrics. You can enable or disable specific collectors based on your needs.

To see available collectors:

./node_exporter --help

To run Node Exporter with only specific collectors:

./node_exporter --collector.disable-defaults --collector.cpu --collector.meminfo --collector.loadavg

To run Node Exporter with all default collectors except a few:

./node_exporter --no-collector.wifi --no-collector.hwmon

Common Collectors

Collector	Description	Default
cpu	CPU statistics	Enabled
diskstats	Disk I/O statistics	Enabled
filesystem	Filesystem statistics	Enabled
loadavg	System load average	Enabled
meminfo	Memory statistics	Enabled
netdev	Network interface statistics	Enabled
netstat	Network statistics from /proc/net/netstat	Enabled
stat	Kernel statistics from /proc/stat	Enabled
time	Current system time	Enabled
uname	System information	Enabled

Creating Dashboards with Node Exporter Metrics

Grafana is commonly used to visualize Prometheus metrics. Here's a simple example of how to create a basic system monitoring dashboard:

Add Prometheus as a data source in Grafana
Create a new dashboard
Add panels for key metrics:
- CPU Usage
- Memory Usage
- Disk Usage
- Network I/O

For example, to create a CPU usage panel:

Add a new panel

Use this query:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

Set the visualization type to Graph or Gauge
Set appropriate thresholds (e.g., yellow at 70%, red at 90%)

Creating Alerts with Node Exporter Metrics

You can set up alerts based on Node Exporter metrics to be notified of potential issues:

Example Alert Rule in Prometheus

groups:
- name: node_exporter_alerts
  rules:
  - alert: HighCPULoad
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU load (instance {{ $labels.instance }})"
      description: "CPU load is above 80% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

  - alert: HighMemoryLoad
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage (instance {{ $labels.instance }})"
      description: "Memory usage is above 80% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

  - alert: DiskSpaceRunningOut
    expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 20
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Disk space running out (instance {{ $labels.instance }})"
      description: "Disk space is below 20% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

Advanced Node Exporter Usage

Custom Textfile Collector

Node Exporter includes a "textfile" collector that can read metrics from files in a directory. This allows you to extend Node Exporter with custom metrics:

Run Node Exporter with the textfile collector enabled:

./node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

Create a directory for the textfile collector:

sudo mkdir -p /var/lib/node_exporter/textfile_collector

Create a file with metrics in the Prometheus format:

echo '# HELP custom_metric_example Example of a custom metric
# TYPE custom_metric_example gauge
custom_metric_example{label="example"} 1.0' > /var/lib/node_exporter/textfile_collector/custom_metrics.prom

Prometheus will now scrape these custom metrics along with the standard Node Exporter metrics.

Monitoring NVIDIA GPUs

If you need to monitor NVIDIA GPUs, you can use the nvidia_gpu_exporter alongside Node Exporter:

docker run --privileged --rm -e NVIDIA_VISIBLE_DEVICES=all -p 9835:9835 nvcr.io/nvidia/k8s/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04

Then add it to your Prometheus configuration:

scrape_configs:
  - job_name: 'nvidia_gpu'
    static_configs:
      - targets: ['localhost:9835']

Troubleshooting Node Exporter

Common Issues and Solutions

Node Exporter won't start
- Check for permission issues
- Verify the binary is executable
- Check if another process is using port 9100
Metrics not appearing in Prometheus
- Verify Node Exporter is running: curl http://localhost:9100/metrics
- Check Prometheus configuration
- Check network connectivity and firewall rules
High resource usage
- Disable collectors you don't need
- Increase scrape interval in Prometheus

Debugging Tips

Run Node Exporter with debug logging:

./node_exporter --log.level=debug

Check which collectors are enabled:

./node_exporter --collector.disable-defaults --collector.cpu

Test the metrics endpoint manually:

curl http://localhost:9100/metrics

Node Exporter Architecture Diagram

To better understand how Node Exporter fits into the Prometheus ecosystem:

Summary

Node Exporter is a powerful tool for monitoring system-level metrics in Prometheus. Key takeaways include:

Node Exporter provides comprehensive hardware and OS metrics collection
It's easy to install and configure with minimal resource overhead
The modular collector design allows for customization based on your needs
Node Exporter metrics can be used to build effective dashboards and alerts
The textfile collector enables extending Node Exporter with custom metrics

By monitoring system-level metrics with Node Exporter, you can gain valuable insights into your infrastructure's health and performance, enabling proactive monitoring and faster troubleshooting.

Additional Resources

Exercises

Install Node Exporter on a test system and configure Prometheus to scrape it.
Create a Grafana dashboard showing CPU, memory, disk, and network metrics.
Set up alert rules for high CPU usage, low disk space, and high memory usage.
Use the textfile collector to create a custom metric that tracks the number of users logged into the system.
Compare the resource usage of different collectors and determine which ones you need for your specific monitoring requirements.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is Node Exporter?​

Key Features of Node Exporter​

Installation and Setup​

Prerequisites​

Installing Node Exporter on Linux​

Method 1: Using Binary Release​

Method 2: Using Docker​

Method 3: Using Package Managers​

Running Node Exporter as a Service​

Configuring Prometheus to Scrape Node Exporter​

Understanding Node Exporter Metrics​

CPU Metrics​

Memory Metrics​

Disk Metrics​

Network Metrics​

Working with Node Exporter Metrics​

Example 1: CPU Usage Percentage​

Example 2: Memory Usage Percentage​

Example 3: Disk Space Usage Percentage​

Example 4: Network I/O​

Enabling/Disabling Specific Collectors​

Common Collectors​

Creating Dashboards with Node Exporter Metrics​

Creating Alerts with Node Exporter Metrics​

Example Alert Rule in Prometheus​

Advanced Node Exporter Usage​

Custom Textfile Collector​

Monitoring NVIDIA GPUs​

Troubleshooting Node Exporter​

Common Issues and Solutions​

Debugging Tips​

Node Exporter Architecture Diagram​

Summary​

Additional Resources​

Exercises​