Nagios Plugins
Plugin: scripts.d.plugin Module: nagios
Overview
This collector runs Nagios-compatible plugins and custom scripts. It provides:
- Check state monitoring — tracks whether each check returns OK, WARNING, CRITICAL, or UNKNOWN
- Execution metrics — measures run duration, CPU time, and memory usage of each check
- Automatic performance data charts — any Nagios performance data in the check output is parsed and charted automatically
- Threshold-based alerting — when performance data includes warning/critical thresholds, Netdata derives threshold state and creates built-in alerts
Netdata executes each configured command on a schedule, reads the process exit code to determine the check state, and parses the standard output for a status message and optional performance data. Any performance data is automatically converted into charts.
You can use packaged Nagios plugins or write your own scripts — any executable that follows the Nagios plugin output format will work.
Nagios Plugin Output Format
A Nagios-compatible plugin communicates through two channels: the process exit code and standard output. For the full specification, see the Nagios Plugin Development Guidelines.
Exit Codes
The exit code is the only thing that determines the check state — the output text is for display purposes only.
| Exit Code | State | Meaning |
|---|---|---|
| 0 | OK | Check passed |
| 1 | WARNING | Above warning threshold or degraded |
| 2 | CRITICAL | Above critical threshold or service down |
| 3 | UNKNOWN | Invalid arguments or internal error |
Standard Output
The output follows this structure:
STATUS TEXT | perfdata1=val;warn;crit;min;max perfdata2=val
LONG OUTPUT LINE 1
LONG OUTPUT LINE 2 | more_perfdata=val
| Part | Description |
|---|---|
| Status text | Text before the pipe on the first line. Shown as the job's status message. |
| Performance data | Text after the pipe on any line. Parsed into charts automatically. |
| Long output | Lines 2+ before the pipe. Additional detail text. |
Note: The pipe separator is optional. Without it, the entire first line is the status text and no performance data charts are created.
Performance Data Format
Each performance data metric uses this format:
'label'=value[UOM];[warn];[crit];[min];[max]
| Field | Required | Description |
|---|---|---|
label | Yes | Metric name. Quote with single quotes if it contains spaces. |
value | Yes | Numeric value. |
UOM | No | Unit of measurement (see table below). |
warn | No | Warning threshold range. |
crit | No | Critical threshold range. |
min | No | Minimum possible value. |
max | No | Maximum possible value. |
Separate multiple metrics with spaces.
Supported Units of Measurement (UOM):
| UOM | Meaning | How Netdata charts it |
|---|---|---|
| (none) | Unitless number | Charted as-is |
s | Seconds (also ms, us, ns) | Normalized to seconds |
% | Percentage | Charted as percentage |
B | Bytes (also KB, MB, GB, TB) | Charted in bytes |
b | Bits (also Kb, Mb, Gb, Tb) | Charted in bits |
c | Continuous counter | Charted as incremental rate |
Threshold Ranges
Thresholds use the format [@]start:end, where a bare number like 10 is shorthand for 0:10 and ~ represents negative infinity (no lower bound). An alert triggers when the value falls outside the range (or inside with the @ prefix):
| Range | Alert when... |
|---|---|
10 | value < 0 or value > 10 |
10: | value < 10 |
~:10 | value > 10 |
10:20 | value < 10 or value > 20 |
@10:20 | 10 ≤ value ≤ 20 |
When warn and crit ranges are provided on non-counter metrics, Netdata automatically derives a threshold state (ok / warning / critical) and creates charts with built-in alerts.
Common threshold patterns:
| I want to alert when... | warn | crit |
|---|---|---|
| Value exceeds a limit (e.g., response time > 2s) | ~:2 | ~:5 |
| Value drops below a floor (e.g., free space < 10%) | 10: | 5: |
| Value is outside a band (e.g., temperature 20–80) | 20:80 | 10:90 |
Example
A minimal Nagios-compatible script:
#!/bin/sh
echo "OK - 85% free memory | free_pct=85%;20:;10:;0;100 used_kb=2380912KB;;;0;16380000"
exit 0
This produces:
- Check state: OK (exit code 0)
- Status text:
OK - 85% free memory - Charts:
free_pct(percentage with warning/critical thresholds) andused_kb(bytes)
Retry behavior: When a check returns a non-OK state, Netdata does not alert immediately. The check enters a soft state and retries at the retry_interval rate. Only after max_check_attempts consecutive failures does it become a hard state and trigger alerts. If the check recovers during retries, it returns to OK without alerting. The retry dimension on state charts indicates a soft state is in progress.
This collector is supported on all platforms.
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
No additional permissions are required by the collector itself. If a check needs access to protected files, sockets, or system commands, provide that access to the check command or helper it uses.
Default Behavior
Auto-Detection
No automatic detection is performed. Add one or more jobs explicitly and point each job to the script or executable you want Netdata to run.
Limits
Each job runs one configured command. Additional charts are created only when the check emits Nagios performance data.
Performance Impact
Each job starts an external command. The impact depends mostly on how often the job runs and how expensive the check command itself is.
Setup
Prerequisites
Install check commands
Install the Nagios plugins or other Nagios-compatible scripts that you want Netdata to run.
Most Linux distributions provide Nagios plugin packages:
# Debian/Ubuntu
apt install nagios-plugins
# RHEL/CentOS/Fedora
dnf install nagios-plugins-all
Make sure the configured command path exists and is executable by the netdata user.
Prepare custom check scripts
If you are writing your own check scripts instead of using packaged Nagios plugins:
- Place scripts anywhere accessible to the
netdatauser (e.g.,/usr/local/lib/netdata/checks/) - Make scripts executable:
chmod +x /path/to/script.sh - Test as the
netdatauser to verify permissions and environment:sudo -u netdata /path/to/script.sh - Verify the exit code:
echo $?(must be 0, 1, 2, or 3) - Verify the output matches the Nagios plugin output format described in the Overview above
Configuration
Options
Add jobs under jobs:. Each job runs one Nagios-compatible check command.
Config options
| Group | Option | Description | Default | Required |
|---|---|---|---|---|
| Collection | update_every | How often the collector's internal scheduler ticks, in seconds. Controls chart granularity. In most cases you only need to set check_interval. | 10 | no |
| Target | plugin | Absolute path to the Nagios-compatible executable to run. This can be a packaged Nagios plugin or your own executable. If you need a script interpreter, point plugin to that interpreter and pass the script path in args. The command should return exit code 0, 1, 2, or 3 and may print performance data after |. | yes | |
| args | Arguments passed to the command. | no | ||
| arg_values | Values exposed to $ARG1$ through $ARG32$ for macro expansion. The first value maps to $ARG1$, the second to $ARG2$, and so on. | no | ||
| working_directory | Working directory used when running the command. | no | ||
| Scheduling | timeout | Maximum time allowed for one command run. If the check exceeds this limit, the job state becomes timeout. | 5s | no |
| check_interval | Interval between regular checks. | 5m | no | |
| retry_interval | Interval between retries while a check remains in a non-OK soft state. | 1m | no | |
| max_check_attempts | Number of attempts before a non-OK result becomes a hard state. | 3 | no | |
| check_period | Name of the time period that controls when the job is allowed to run. The built-in 24x7 period (always allowed) is the default. Outside the active period, the check does not execute and the job state becomes paused. | 24x7 | no | |
| time_periods | Custom named time periods defined inside the same job. Supports weekly, nth_weekday, and date rule types. | no | ||
| Environment | environment | Extra environment variables added on top of the collector's limited execution baseline. The check does not inherit the full Netdata process environment. | no | |
| custom_vars | Custom service variables exposed to the check as Nagios-style macros. | no | ||
| Virtual Node | vnode | Associate the job with a virtual node so the check can use host-specific labels and macros. | no | |
| Misc | notes | Optional notes for the job definition. | no |
environment
A key-value map of environment variables injected into the check's process. Use this when your script depends on variables that are not part of the collector's default environment.
jobs:
- name: oracle_check
plugin: /usr/local/bin/check_oracle.sh
environment:
ORACLE_HOME: /opt/oracle/product/19c
LD_LIBRARY_PATH: /opt/oracle/product/19c/lib
custom_vars
A key-value map of custom service variables. Each entry is exposed as a NAGIOS__SERVICE<UPPERCASE_KEY> environment variable and can be referenced in args using the Nagios macro syntax $_SERVICE<KEY>$.
jobs:
- name: check_db
plugin: /usr/lib/nagios/plugins/check_pgsql
args: ["-H", "$_SERVICEDBHOST$", "-d", "$_SERVICEDBNAME$"]
custom_vars:
DBHOST: db.example.com
DBNAME: production
via File
The configuration file name for this integration is scripts.d/nagios.conf.
You can edit the configuration file using the edit-config script from the
Netdata config directory.
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config scripts.d/nagios.conf
Examples
Basic check
Run a Nagios check command on a fixed interval.
Config
jobs:
- name: ping_localhost
plugin: /usr/lib/nagios/plugins/check_ping
args: ["-H", "127.0.0.1", "-w", "100.0,20%", "-c", "200.0,40%"]
timeout: 5s
check_interval: 1m
retry_interval: 30s
max_check_attempts: 3
End-to-end custom script
Write a custom check script, then configure Netdata to run it.
1. Create the script (e.g., /usr/local/lib/netdata/checks/check_api.sh):
#!/bin/sh
# Check HTTP endpoint health
URL="http://localhost:8080/health"
response=$(curl -s -o /dev/null -w "%{http_code} %{time_total}" --max-time 5 "$URL" 2>/dev/null)
curl_exit=$?
if [ "$curl_exit" -ne 0 ]; then
echo "UNKNOWN - Could not connect to $URL (curl exit code $curl_exit)"
exit 3
fi
http_code=$(echo "$response" | cut -d' ' -f1)
response_time=$(echo "$response" | cut -d' ' -f2)
if [ "$http_code" -ge 500 ]; then
echo "CRITICAL - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
exit 2
elif [ "$http_code" -ne 200 ]; then
echo "WARNING - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
exit 1
fi
echo "OK - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
exit 0
2. Make it executable and test it:
chmod +x /usr/local/lib/netdata/checks/check_api.sh
sudo -u netdata /usr/local/lib/netdata/checks/check_api.sh
echo "Exit code: $?"
3. Add the configuration below, then restart Netdata (sudo systemctl restart netdata). After restarting, look for nagios.job.execution_state and related charts in the Netdata dashboard.
Config
jobs:
- name: api_health
plugin: /usr/local/lib/netdata/checks/check_api.sh
timeout: 10s
check_interval: 1m
retry_interval: 30s
max_check_attempts: 3
Custom script (minimal)
Run your own Nagios-compatible shell script with minimal configuration.
Config
jobs:
- name: custom_memory_check
plugin: /opt/netdata/check_memory.sh
timeout: 5s
check_interval: 1m
Check with a job-local schedule
Run a check only during selected hours by defining time periods inside the job.
Config
jobs:
- name: business_hours_http
plugin: /usr/lib/nagios/plugins/check_http
args: ["-H", "example.com"]
check_period: business_hours
time_periods:
- name: business_hours
alias: Business hours
rules:
- type: weekly
days: [monday, tuesday, wednesday, thursday, friday]
ranges: ["09:00-18:00"]
Check with virtual node macros
Run a check against a virtual node and fill command arguments from Nagios-style macros.
Config
jobs:
- name: check_ssh
plugin: /usr/lib/nagios/plugins/check_ssh
args: ["-H", "$HOSTADDRESS$", "-p", "$ARG1$"]
arg_values: ["22"]
vnode: remote-server
check_interval: 5m
Alerts
The following alerts are available:
| Alert name | On metric | Description |
|---|---|---|
| nagios_job_execution_state_warn | nagios.job.execution_state | Nagios job ${label:nagios_job} is in WARNING state |
| nagios_job_execution_state_crit | nagios.job.execution_state | Nagios job ${label:nagios_job} is in CRITICAL state |
| nagios_job_perfdata_threshold_state_warn | nagios.job.perfdata_threshold_state | Nagios job ${label:nagios_job} perfdata ${label:perfdata_value} is in WARNING threshold state |
| nagios_job_perfdata_threshold_state_crit | nagios.job.perfdata_threshold_state | Nagios job ${label:nagios_job} perfdata ${label:perfdata_value} is in CRITICAL threshold state |
Metrics
Metrics grouped by scope.
The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
Each configured job produces execution state and resource usage charts. When a check emits Nagios performance data, additional charts are created automatically for each metric. Non-counter perfdata with warning/critical thresholds also get threshold state charts for alerting.
Per job
These metrics refer to each configured check job.
Labels:
| Label | Description |
|---|---|
| nagios_job | Job name as defined in the configuration. |
| perfdata_value | Identifies which performance data metric a threshold state belongs to. Format is <unit_class>_<label>, where <unit_class> is derived from the UOM (time, bytes, bits, percent, or generic) and <label> is the sanitized metric label from the check output. For example, repl_lag=5s produces time_repl_lag. |
Metrics:
| Metric | Dimensions | Unit |
|---|---|---|
| nagios.job.execution_state | ok, warning, critical, unknown, timeout, paused, retry | state |
| nagios.job.perfdata_threshold_state | no_threshold, ok, warning, critical, retry | state |
| nagios.job.execution_duration | duration | seconds |
| nagios.job.execution_cpu_total | total | seconds |
| nagios.job.execution_max_rss | rss | bytes |
Troubleshooting
The command cannot be executed
Confirm that the path in plugin exists, is executable, and can be accessed by the netdata user. If the check depends on external files or helpers, verify those paths and permissions too.
No performance-data charts appear
Performance-data charts are created only when the check prints Nagios performance data after the | separator. If the command returns only a status line without performance data, Netdata will still show the job state but no extra charts.
Some performance-data values are ignored
Check that each metric uses the Nagios performance-data format label=value[UOM];warn;crit;min;max and that multiple metrics are separated by spaces. If a label contains spaces, quote it. Netdata charts the main value for every perfdata metric, and for non-counter metrics it derives threshold state from warn and crit; it does not create separate charts for raw min, max, or raw threshold bounds.
The job state does not match the output text
The visible text does not decide the state. Netdata uses the process exit code instead: 0 for OK, 1 for WARNING, 2 for CRITICAL, and 3 for UNKNOWN. If the check exceeds the configured timeout, Netdata reports timeout even if the script never had a chance to print its own final state. If the current time is outside check_period, Netdata reports paused until the check is allowed to run again.
Only the first output line appears as the main status
This is expected. Netdata uses the first line as the summary shown for the job. Additional lines are kept as long output, and any | sections found on later lines are also parsed for performance data.
Macros are not expanded as expected
Check that positional values are provided in arg_values, custom service variables are defined in custom_vars, and any virtual-node labels needed for host macros are present on the selected vnode.
The script works in a shell but fails under Netdata
Nagios checks run with a limited execution environment rather than inheriting the full Netdata process environment. If the script depends on extra variables, set them explicitly in environment instead of relying on ambient shell state.
Built-in alerts cover warning and critical states only
This collector installs stock Netdata health alerts for the warning and critical states on nagios.job.execution_state and nagios.job.perfdata_threshold_state. Both stock alert families suppress soft retry states by checking that retry is not active. If you also want alerts for unknown, timeout, paused, or more specific perfdata behavior, build your own rules on top of these contexts. The nagios.job.perfdata_threshold_state chart uses the perfdata_value label to identify which perfdata metric each threshold state belongs to.
Configuration changes are not picked up
After editing scripts.d/nagios.conf, restart the Netdata Agent for changes to take effect: sudo systemctl restart netdata.
Script stderr output is not visible
Netdata captures the check's standard output for status and performance data parsing. Standard error (stderr) is logged by the collector but not used for state or charts. If your script writes errors to stderr, check the Netdata error log for details.
Windows checks need an executable entry point
The collector runs the command named in plugin directly. On Windows, point plugin to an executable or to an interpreter such as powershell.exe and pass the script path in args.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.