Node Exporter
Introduction
Node Exporter is a critical component in the Prometheus monitoring ecosystem that enables system-level metrics collection from Unix-like operating systems. As part of the Prometheus Exporters family, Node Exporter specifically focuses on hardware and OS metrics such as CPU usage, memory, disk I/O, filesystem fullness, and network statistics.
Unlike application-specific exporters, Node Exporter provides visibility into the underlying infrastructure running your applications. This makes it an essential tool for identifying performance bottlenecks, predicting resource exhaustion, and understanding system behavior during incidents.
What is Node Exporter?
Node Exporter is an official Prometheus exporter designed to expose a wide variety of hardware and kernel-related metrics about the host machine. The name "Node" refers to a host machine or node in your infrastructure.
Node Exporter runs as a daemon on the target systems you want to monitor, collecting metrics that aren't available to Prometheus by default. These metrics are exposed via an HTTP endpoint (typically on port 9100) that Prometheus can scrape at regular intervals.
Key Features of Node Exporter
- Comprehensive metrics collection: Collects hundreds of metrics across various subsystems
- Modular collector design: Enables enabling/disabling specific collectors based on needs
- Low resource footprint: Minimal impact on system performance
- Cross-platform support: Works on various Unix-like systems (Linux, FreeBSD, macOS, etc.)
- Standardized metric naming: Follows Prometheus naming conventions
Installation and Setup
Prerequisites
- A Unix-like operating system (Linux, FreeBSD, Darwin, etc.)
- Root or sudo access (for some collectors)
- Basic understanding of Prometheus concepts
Installing Node Exporter on Linux
Method 1: Using Binary Release
- Download the latest release from the Prometheus downloads page:
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
- Extract the downloaded archive:
tar xvfz node_exporter-1.6.1.linux-amd64.tar.gz
- Move to the extracted directory:
cd node_exporter-1.6.1.linux-amd64
- Run Node Exporter:
./node_exporter
You should see output similar to:
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:182 msg="Starting node_exporter" version="1.6.1"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:183 msg="Build context" build_context="go=1.20.4 user=root date=20230607-15:47:21 sha=9a51a674eb32454e9aa91855e2a03cb1"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:185 msg="Enabled collectors"
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:197 collector=arp
level=info ts=2023-06-16T14:30:25.042Z caller=node_exporter.go:197 collector=bcache
...
level=info ts=2023-06-16T14:30:25.043Z caller=tls_config.go:195 msg="TLS is disabled." http2=false
level=info ts=2023-06-16T14:30:25.043Z caller=node_exporter.go:1375 msg="Listening on" address=:9100
level=info ts=2023-06-16T14:30:25.043Z caller=node_exporter.go:1376 msg="Listening on" address=[::]:9100
Method 2: Using Docker
If you prefer using Docker, you can run Node Exporter as a container:
docker run -d \
--net="host" \
--pid="host" \
-v "/:/host:ro,rslave" \
--name node_exporter \
prom/node_exporter:latest \
--path.rootfs=/host
Method 3: Using Package Managers
On Debian/Ubuntu:
sudo apt-get update
sudo apt-get install prometheus-node-exporter
On RHEL/CentOS:
sudo yum install prometheus-node-exporter
Running Node Exporter as a Service
For production environments, it's recommended to run Node Exporter as a systemd service:
- Create a Node Exporter user:
sudo useradd --no-create-home --shell /bin/false node_exporter
- Move the binary to a standard location:
sudo cp node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
- Create a systemd service file:
sudo nano /etc/systemd/system/node_exporter.service
- Add the following content:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
- Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
- Check the status:
sudo systemctl status node_exporter
Configuring Prometheus to Scrape Node Exporter
Once Node Exporter is running, you need to configure Prometheus to scrape metrics from it:
- Open your Prometheus configuration file (
prometheus.yml
):
sudo nano /etc/prometheus/prometheus.yml
- Add a scrape configuration for Node Exporter:
scrape_configs:
# Other scrape configs...
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
- If you're monitoring multiple nodes, you can add multiple targets:
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets:
- 'node1.example.com:9100'
- 'node2.example.com:9100'
- 'node3.example.com:9100'
- Reload Prometheus to apply changes:
curl -X POST http://localhost:9090/-/reload
Understanding Node Exporter Metrics
Node Exporter exposes a wide variety of metrics. Here are some of the most important categories:
CPU Metrics
node_cpu_seconds_total
: Seconds the CPUs spent in each modenode_load1
,node_load5
,node_load15
: System load averages
Memory Metrics
node_memory_MemTotal_bytes
: Total memorynode_memory_MemFree_bytes
: Free memorynode_memory_MemAvailable_bytes
: Available memory
Disk Metrics
node_filesystem_avail_bytes
: Filesystem space availablenode_filesystem_size_bytes
: Filesystem sizenode_disk_io_time_seconds_total
: Total seconds spent doing I/O
Network Metrics
node_network_receive_bytes_total
: Network bytes receivednode_network_transmit_bytes_total
: Network bytes transmittednode_network_up
: Network interface up (1) or down (0)
Working with Node Exporter Metrics
Let's explore some practical examples of how to use Node Exporter metrics with PromQL (Prometheus Query Language).
Example 1: CPU Usage Percentage
To calculate the CPU usage percentage:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)
This query:
- Takes the rate of increase in idle CPU time over 1 minute
- Multiplies by 100 to get a percentage
- Subtracts from 100 to get the usage percentage rather than idle percentage
Example 2: Memory Usage Percentage
To calculate memory usage percentage:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Example 3: Disk Space Usage Percentage
For disk space usage percentage:
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100
Example 4: Network I/O
For network traffic rate:
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])
Enabling/Disabling Specific Collectors
Node Exporter has a modular architecture with different collectors for various metrics. You can enable or disable specific collectors based on your needs.
To see available collectors:
./node_exporter --help
To run Node Exporter with only specific collectors:
./node_exporter --collector.disable-defaults --collector.cpu --collector.meminfo --collector.loadavg
To run Node Exporter with all default collectors except a few:
./node_exporter --no-collector.wifi --no-collector.hwmon
Common Collectors
Collector | Description | Default |
---|---|---|
cpu | CPU statistics | Enabled |
diskstats | Disk I/O statistics | Enabled |
filesystem | Filesystem statistics | Enabled |
loadavg | System load average | Enabled |
meminfo | Memory statistics | Enabled |
netdev | Network interface statistics | Enabled |
netstat | Network statistics from /proc/net/netstat | Enabled |
stat | Kernel statistics from /proc/stat | Enabled |
time | Current system time | Enabled |
uname | System information | Enabled |
Creating Dashboards with Node Exporter Metrics
Grafana is commonly used to visualize Prometheus metrics. Here's a simple example of how to create a basic system monitoring dashboard:
- Add Prometheus as a data source in Grafana
- Create a new dashboard
- Add panels for key metrics:
- CPU Usage
- Memory Usage
- Disk Usage
- Network I/O
For example, to create a CPU usage panel:
- Add a new panel
- Use this query:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)
- Set the visualization type to Graph or Gauge
- Set appropriate thresholds (e.g., yellow at 70%, red at 90%)
Creating Alerts with Node Exporter Metrics
You can set up alerts based on Node Exporter metrics to be notified of potential issues:
Example Alert Rule in Prometheus
groups:
- name: node_exporter_alerts
rules:
- alert: HighCPULoad
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU load (instance {{ $labels.instance }})"
description: "CPU load is above 80% for 5 minutes
VALUE = {{ $value }}
LABELS: {{ $labels }}"
- alert: HighMemoryLoad
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage (instance {{ $labels.instance }})"
description: "Memory usage is above 80% for 5 minutes
VALUE = {{ $value }}
LABELS: {{ $labels }}"
- alert: DiskSpaceRunningOut
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 20
for: 5m
labels:
severity: warning
annotations:
summary: "Disk space running out (instance {{ $labels.instance }})"
description: "Disk space is below 20% for 5 minutes
VALUE = {{ $value }}
LABELS: {{ $labels }}"
Advanced Node Exporter Usage
Custom Textfile Collector
Node Exporter includes a "textfile" collector that can read metrics from files in a directory. This allows you to extend Node Exporter with custom metrics:
- Run Node Exporter with the textfile collector enabled:
./node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
- Create a directory for the textfile collector:
sudo mkdir -p /var/lib/node_exporter/textfile_collector
- Create a file with metrics in the Prometheus format:
echo '# HELP custom_metric_example Example of a custom metric
# TYPE custom_metric_example gauge
custom_metric_example{label="example"} 1.0' > /var/lib/node_exporter/textfile_collector/custom_metrics.prom
- Prometheus will now scrape these custom metrics along with the standard Node Exporter metrics.
Monitoring NVIDIA GPUs
If you need to monitor NVIDIA GPUs, you can use the nvidia_gpu_exporter
alongside Node Exporter:
docker run --privileged --rm -e NVIDIA_VISIBLE_DEVICES=all -p 9835:9835 nvcr.io/nvidia/k8s/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04
Then add it to your Prometheus configuration:
scrape_configs:
- job_name: 'nvidia_gpu'
static_configs:
- targets: ['localhost:9835']
Troubleshooting Node Exporter
Common Issues and Solutions
-
Node Exporter won't start
- Check for permission issues
- Verify the binary is executable
- Check if another process is using port 9100
-
Metrics not appearing in Prometheus
- Verify Node Exporter is running:
curl http://localhost:9100/metrics
- Check Prometheus configuration
- Check network connectivity and firewall rules
- Verify Node Exporter is running:
-
High resource usage
- Disable collectors you don't need
- Increase scrape interval in Prometheus
Debugging Tips
- Run Node Exporter with debug logging:
./node_exporter --log.level=debug
- Check which collectors are enabled:
./node_exporter --collector.disable-defaults --collector.cpu
- Test the metrics endpoint manually:
curl http://localhost:9100/metrics
Node Exporter Architecture Diagram
To better understand how Node Exporter fits into the Prometheus ecosystem:
Summary
Node Exporter is a powerful tool for monitoring system-level metrics in Prometheus. Key takeaways include:
- Node Exporter provides comprehensive hardware and OS metrics collection
- It's easy to install and configure with minimal resource overhead
- The modular collector design allows for customization based on your needs
- Node Exporter metrics can be used to build effective dashboards and alerts
- The textfile collector enables extending Node Exporter with custom metrics
By monitoring system-level metrics with Node Exporter, you can gain valuable insights into your infrastructure's health and performance, enabling proactive monitoring and faster troubleshooting.
Additional Resources
- Official Node Exporter GitHub Repository
- Prometheus Documentation
- Grafana Dashboard for Node Exporter
Exercises
- Install Node Exporter on a test system and configure Prometheus to scrape it.
- Create a Grafana dashboard showing CPU, memory, disk, and network metrics.
- Set up alert rules for high CPU usage, low disk space, and high memory usage.
- Use the textfile collector to create a custom metric that tracks the number of users logged into the system.
- Compare the resource usage of different collectors and determine which ones you need for your specific monitoring requirements.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)