Nagios Plugins and Custom Scripts

Plugin: scripts.d.plugin Module: nagios

Overview

This collector runs Nagios-compatible plugins and custom scripts in any language (Bash, PowerShell, Python, Go, etc.). It provides:

Check state monitoring — tracks whether each check returns OK, WARNING, CRITICAL, or UNKNOWN
Execution metrics — measures run duration, CPU time, and memory usage of each check
Automatic performance data charts — any Nagios performance data in the check output is parsed and charted automatically
Threshold-based alerting — when performance data includes warning/critical thresholds, Netdata derives threshold state and creates built-in alerts

Netdata executes each configured command on a schedule, reads the process exit code to determine the check state, and parses the standard output for a status message and optional performance data. Any performance data is automatically converted into charts.

tip

You can use packaged Nagios plugins or write your own scripts — any executable that follows the Nagios plugin output format will work.

Nagios Plugin Output Format

A Nagios-compatible plugin communicates through two channels: the process exit code and standard output. For the full specification, see the Nagios Plugin Development Guidelines.

Exit Codes

The exit code is the only thing that determines the check state — the output text is for display purposes only.

Exit Code	State	Meaning
0	OK	Check passed
1	WARNING	Above warning threshold or degraded
2	CRITICAL	Above critical threshold or service down
3	UNKNOWN	Invalid arguments or internal error

Standard Output

The output follows this structure:

STATUS TEXT | perfdata1=val;warn;crit;min;max perfdata2=val
LONG OUTPUT LINE 1
LONG OUTPUT LINE 2 | more_perfdata=val

Part	Description
Status text	Text before the pipe on the first line. Shown as the job's status message.
Performance data	Text after the pipe on any line. Parsed into charts automatically.
Long output	Lines 2+ before the pipe. Additional detail text.

Note: The pipe separator is optional. Without it, the entire first line is the status text and no performance data charts are created.

Performance Data Format

Each performance data metric uses this format:

'label'=value[UOM];[warn];[crit];[min];[max]

Field	Required	Description
`label`	Yes	Metric name. Quote with single quotes if it contains spaces.
`value`	Yes	Numeric value.
`UOM`	No	Unit of measurement (see table below).
`warn`	No	Warning threshold range.
`crit`	No	Critical threshold range.
`min`	No	Minimum possible value.
`max`	No	Maximum possible value.

Separate multiple metrics with spaces.

Supported Units of Measurement (UOM):

UOM	Meaning	How Netdata charts it
(none)	Unitless number	Charted as-is
`s`	Seconds (also `ms`, `us`, `ns`)	Normalized to seconds
`%`	Percentage	Charted as percentage
`B`	Bytes (also `KB`, `MB`, `GB`, `TB`)	Charted in bytes
`b`	Bits (also `Kb`, `Mb`, `Gb`, `Tb`)	Charted in bits
`c`	Continuous counter	Charted as incremental rate

Threshold Ranges

Thresholds use the format [@]start:end, where a bare number like 10 is shorthand for 0:10 and ~ represents negative infinity (no lower bound). An alert triggers when the value falls outside the range (or inside with the @ prefix):

Range	Alert when...
`10`	value < 0 or value > 10
`10:`	value < 10
`~:10`	value > 10
`10:20`	value < 10 or value > 20
`@10:20`	10 ≤ value ≤ 20

When warn and crit ranges are provided on non-counter metrics, Netdata automatically derives a threshold state (ok / warning / critical) and creates charts with built-in alerts.

Common threshold patterns:

I want to alert when...	`warn`	`crit`
Value exceeds a limit (e.g., response time > 2s)	`~:2`	`~:5`
Value drops below a floor (e.g., free space < 10%)	`10:`	`5:`
Value is outside a band (e.g., temperature 20–80)	`20:80`	`10:90`

Example

A minimal Nagios-compatible script:

#!/bin/sh
echo "OK - 85% free memory | free_pct=85%;20:;10:;0;100 used_kb=2380912KB;;;0;16380000"
exit 0

This produces:

Check state: OK (exit code 0)
Status text: OK - 85% free memory
Charts: free_pct (percentage with warning/critical thresholds) and used_kb (bytes)

info

Retry behavior: When a check returns a non-OK state, Netdata does not alert immediately. The check enters a soft state and retries at the retry_interval rate. Only after max_check_attempts consecutive failures does it become a hard state and trigger alerts. If the check recovers during retries, it returns to OK without alerting. The retry dimension on state charts indicates a soft state is in progress.

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

No additional permissions are required by the collector itself. If a check needs access to protected files, sockets, or system commands, provide that access to the check command or helper it uses.

Default Behavior

Auto-Detection

No automatic detection is performed. Add one or more jobs explicitly and point each job to the script or executable you want Netdata to run.

Limits

Each job runs one configured command. Additional charts are created only when the check emits Nagios performance data.

Performance Impact

Each job starts an external command. The impact depends mostly on how often the job runs and how expensive the check command itself is.

Setup

Prerequisites

Security requirements for plugin executables

Netdata validates the plugin path before execution. On Linux/macOS, the executable must meet these requirements:

Must be a regular file (not a directory or device node)
Must be executable (at least one execute bit set)
Owned by root
Not writable by group or others (no g+w or o+w)
All ancestor directories (up to and including /) owned by root
All ancestor directories not writable by group or others
Symlinks are resolved — the target must meet these rules

On Windows, path validation is not enforced. Ensure executables are stored in directories with appropriate ACLs.

This prevents local privilege escalation through a modified check script. If validation fails, the job will not start and an error is logged.

caution

Linux/macOS: Using an interpreter (e.g. /bin/bash) as plugin with a script path in args is discouraged. Netdata validates the interpreter binary but cannot verify scripts passed via args. A writable script in args is a privilege escalation vector. Instead, make scripts directly executable and point plugin to the script itself.

Windows: Point plugin directly to a .ps1, .bat, or .cmd script — Netdata invokes the correct interpreter automatically. Path validation is not enforced on Windows — ensure scripts are stored in directories with appropriate ACLs.

Install check commands

Install the Nagios plugins or other Nagios-compatible scripts that you want Netdata to run.

Most Linux distributions provide Nagios plugin packages:

# Debian/Ubuntu
apt install nagios-plugins

# RHEL/CentOS/Fedora
dnf install nagios-plugins-all

Packaged Nagios plugins are typically installed as root-owned executables, which satisfies the security requirements above.

Prepare custom check scripts

If you are writing your own check scripts instead of using packaged Nagios plugins:

Place scripts in a root-owned directory (e.g., /usr/local/lib/netdata/checks/)
Set ownership and permissions: sudo chown root:root /path/to/script.sh && sudo chmod 755 /path/to/script.sh
Test as the netdata user to verify permissions and environment: sudo -u netdata /path/to/script.sh
Verify the exit code: echo $? (must be 0, 1, 2, or 3)
Verify the output matches the Nagios plugin output format described in the Overview above

Configuration

Options

Add jobs under jobs:. Each job runs one Nagios-compatible check command.

Config options

Group	Option	Description	Default	Required
Collection	update_every	Minimum resolution of the collector's scheduler, in seconds. `check_interval` and `retry_interval` are rounded up to the nearest multiple of this value. For example, if `update_every` is 10 and `check_interval` is 25s, the check actually runs every 30s. In most cases the default is fine — just set `check_interval`.	10	no
Target	check_name	Name that identifies this check type for chart grouping and metric naming. If omitted, Netdata derives it from the basename of `plugin` (removing any file extension). Use this when the plugin filename is generic and you want a more descriptive chart section — for example, when multiple jobs run `check_nrpe` against different remote checks, set `check_name` to distinguish them (`check_disk`, `check_load`, etc.). In the dashboard, charts appear under `Synthetic > Nagios > Perfdata > <check_name>`. For example, with `check_name: check_memory` and a script that outputs `caches=2380912KB`, Netdata creates: - `nagios.perfdata.check_memory.job.execution_state` — check state (ok, warning, critical, unknown, timeout, paused, retry) - `nagios.perfdata.check_memory.bytes_caches` — perfdata value chart - `nagios.perfdata.check_memory.bytes_caches_threshold_state` — threshold state (if warn/crit thresholds are present)		no
	plugin	Absolute path to the Nagios-compatible check command to run. This can be a packaged Nagios plugin or your own executable script. The executable must be root-owned and not writable by group or others (see prerequisites). The command should return exit code `0`, `1`, `2`, or `3` and may print performance data after `\|`.		yes
	args	Arguments passed to the command.		no
	arg_values	Values exposed to $ARG1$ through $ARG32$ for macro expansion. The first value maps to $ARG1$ , the second to $ARG2$ , and so on.		no
	working_directory	Working directory used when running the command.		no
Scheduling	timeout	Maximum time allowed for one command run. If the check exceeds this limit, the job state becomes `timeout`.	5s	no
	check_interval	Interval between regular checks.	5m	no
	retry_interval	Interval between retries while a check remains in a non-OK soft state.	1m	no
	max_check_attempts	Number of attempts before a non-OK result becomes a hard state.	3	no
	check_period	Name of the time period that controls when the job is allowed to run. The built-in `24x7` period (always allowed) is the default. Outside the active period, the check does not execute and the job state becomes `paused`.	24x7	no
	time_periods	Custom named time periods defined inside the same job. Supports `weekly`, `nth_weekday`, and `date` rule types.		no
Environment	environment	Extra environment variables added on top of the collector's limited execution baseline. The check does not inherit the full Netdata process environment.		no
	custom_vars	Custom service variables exposed to the check as Nagios-style macros.		no
Virtual Node	vnode	Associate the job with a virtual node so the check can use host-specific labels and macros.		no
Misc	notes	Optional notes for the job definition.		no

environment

A key-value map of environment variables injected into the check's process. Use this when your script depends on variables that are not part of the collector's default environment.

jobs:
  - name: oracle_check
    plugin: /usr/local/bin/check_oracle.sh
    environment:
      ORACLE_HOME: /opt/oracle/product/19c
      LD_LIBRARY_PATH: /opt/oracle/product/19c/lib

custom_vars

A key-value map of custom service variables. Each entry is exposed as a NAGIOS__SERVICE<UPPERCASE_KEY> environment variable and can be referenced in args using the Nagios macro syntax $_SERVICE<KEY>$ .

jobs:
  - name: check_db
    plugin: /usr/lib/nagios/plugins/check_pgsql
    args: ["-H", "$_SERVICEDBHOST$", "-d", "$_SERVICEDBNAME$"]
    custom_vars:
      DBHOST: db.example.com
      DBNAME: production

via File

The configuration file name for this integration is scripts.d/nagios.conf.

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config scripts.d/nagios.conf

Examples

Basic check

Run a Nagios check command on a fixed interval.

Config

jobs:
  - name: ping_localhost
    plugin: /usr/lib/nagios/plugins/check_ping
    args: ["-H", "127.0.0.1", "-w", "100.0,20%", "-c", "200.0,40%"]
    timeout: 5s
    check_interval: 1m
    retry_interval: 30s
    max_check_attempts: 3

End-to-end custom script

Write a custom check script, then configure Netdata to run it.

1. Create the script (e.g., /usr/local/lib/netdata/checks/check_api.sh):

#!/bin/sh
# Check HTTP endpoint health
URL="http://localhost:8080/health"

response=$(curl -s -o /dev/null -w "%{http_code} %{time_total}" --max-time 5 "$URL" 2>/dev/null)
curl_exit=$?

if [ "$curl_exit" -ne 0 ]; then
    echo "UNKNOWN - Could not connect to $URL (curl exit code $curl_exit)"
    exit 3
fi

http_code=$(echo "$response" | cut -d' ' -f1)
response_time=$(echo "$response" | cut -d' ' -f2)

if [ "$http_code" -ge 500 ]; then
    echo "CRITICAL - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
    exit 2
elif [ "$http_code" -ne 200 ]; then
    echo "WARNING - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
    exit 1
fi

echo "OK - $URL returned HTTP $http_code | response_time=${response_time}s;2;5;0;"
exit 0

2. Set ownership, permissions, and test it:

sudo chown root:root /usr/local/lib/netdata/checks/check_api.sh
sudo chmod 755 /usr/local/lib/netdata/checks/check_api.sh
sudo -u netdata /usr/local/lib/netdata/checks/check_api.sh
echo "Exit code: $?"

3. Add the configuration below, then restart Netdata (sudo systemctl restart netdata). After restarting, look for nagios.job.execution_state and related charts in the Netdata dashboard.

Config

jobs:
  - name: api_health
    plugin: /usr/local/lib/netdata/checks/check_api.sh
    timeout: 10s
    check_interval: 1m
    retry_interval: 30s
    max_check_attempts: 3

Custom script (minimal)

Run your own Nagios-compatible shell script with minimal configuration.

Config

jobs:
  - name: custom_memory_check
    plugin: /opt/netdata/check_memory.sh
    timeout: 5s
    check_interval: 1m

Windows PowerShell check

On Windows, point plugin directly to a .ps1 script. Netdata automatically invokes it through powershell.exe with -NoProfile -ExecutionPolicy Bypass -File. The .bat and .cmd scripts are also supported (invoked via cmd.exe /c).

1. Create the script (e.g., C:\Netdata\checks\check_service.ps1):

# Check if a Windows service is running
param([string]$ServiceName = "W3SVC")

$svc = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue

if (-not $svc) {
    Write-Host "UNKNOWN - Service $ServiceName not found | running=0;;;0;1"
    exit 3
}

if ($svc.Status -eq 'Running') {
    Write-Host "OK - $ServiceName is running | running=1;;;0;1"
    exit 0
} else {
    Write-Host "CRITICAL - $ServiceName is $($svc.Status) | running=0;;;0;1"
    exit 2
}

2. Test from PowerShell (run as the user the Netdata service runs under):

powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Netdata\checks\check_service.ps1"
echo "Exit code: $LASTEXITCODE"

3. Add the configuration below, then restart Netdata (Restart-Service netdata).

Config

jobs:
  - name: service_health_win
    plugin: C:\Netdata\checks\check_service.ps1
    timeout: 10s
    check_interval: 1m

Remote check via NRPE

Run a check on a remote host using check_nrpe. This works exactly like a Nagios NRPE configuration — install nagios-nrpe-plugin and point to the remote NRPE agent. Increase timeout if the remote host is slow to respond.

Config

jobs:
  - name: remote_disk
    plugin: /usr/lib/nagios/plugins/check_nrpe
    args: ["-H", "192.168.1.10", "-c", "check_disk", "-a", "20% 10% /"]
    timeout: 30s
    check_interval: 5m

Check with a job-local schedule

Run a check only during selected hours by defining time periods inside the job.

Config

jobs:
  - name: business_hours_http
    plugin: /usr/lib/nagios/plugins/check_http
    args: ["-H", "example.com"]
    check_period: business_hours
    time_periods:
      - name: business_hours
        alias: Business hours
        rules:
          - type: weekly
            days: [monday, tuesday, wednesday, thursday, friday]
            ranges: ["09:00-18:00"]

Check with virtual node macros

Run a check against a virtual node and fill command arguments from Nagios-style macros.

Config

jobs:
  - name: check_ssh
    plugin: /usr/lib/nagios/plugins/check_ssh
    args: ["-H", "$HOSTADDRESS$", "-p", "$ARG1$"]
    arg_values: ["22"]
    vnode: remote-server
    check_interval: 5m

Alerts

The following alerts are available:

Alert name	On metric	Description
nagios_job_execution_state_warn	nagios.job.execution_state	Nagios job ${label:nagios_job} is in WARNING state
nagios_job_execution_state_crit	nagios.job.execution_state	Nagios job ${label:nagios_job} is in CRITICAL state
nagios_job_perfdata_threshold_state_warn	nagios.job.perfdata_threshold_state	Nagios job ${label:nagios_job} perfdata ${label:perfdata_value} is in WARNING threshold state
nagios_job_perfdata_threshold_state_crit	nagios.job.perfdata_threshold_state	Nagios job ${label:nagios_job} perfdata ${label:perfdata_value} is in CRITICAL threshold state

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Each configured job produces execution state and resource usage charts. When a check emits Nagios performance data, additional charts are created automatically for each metric. Non-counter perfdata with warning/critical thresholds also get threshold state charts for alerting.

Per job

These metrics refer to each configured check job.

Labels:

Label	Description
nagios_job	Job name as defined in the configuration.
perfdata_value	Identifies which performance data metric a threshold state belongs to. Format is `<unit_class>_<label>`, where `<unit_class>` is derived from the UOM (`time`, `bytes`, `bits`, `percent`, or `generic`) and `<label>` is the sanitized metric label from the check output. For example, `repl_lag=5s` produces `time_repl_lag`.

Metrics:

Metric	Dimensions	Unit
nagios.job.execution_state	ok, warning, critical, unknown, timeout, paused, retry	state
nagios.job.perfdata_threshold_state	no_threshold, ok, warning, critical, retry	state
nagios.job.execution_duration	duration	seconds
nagios.job.execution_cpu_total	total	seconds
nagios.job.execution_max_rss	rss	bytes

Troubleshooting

The command cannot be executed

Confirm that the path in plugin exists, is executable, and can be accessed by the netdata user. If the check depends on external files or helpers, verify those paths and permissions too.

No performance-data charts appear

Performance-data charts are created only when the check prints Nagios performance data after the | separator. If the command returns only a status line without performance data, Netdata will still show the job state but no extra charts.

Some performance-data values are ignored

Check that each metric uses the Nagios performance-data format label=value[UOM];warn;crit;min;max and that multiple metrics are separated by spaces. If a label contains spaces, quote it. Netdata charts the main value for every perfdata metric, and for non-counter metrics it derives threshold state from warn and crit; it does not create separate charts for raw min, max, or raw threshold bounds.

The job state does not match the output text

The visible text does not decide the state. Netdata uses the process exit code instead: 0 for OK, 1 for WARNING, 2 for CRITICAL, and 3 for UNKNOWN. If the check exceeds the configured timeout, Netdata reports timeout even if the script never had a chance to print its own final state. If the current time is outside check_period, Netdata reports paused until the check is allowed to run again.

Only the first output line appears as the main status

This is expected. Netdata uses the first line as the summary shown for the job. Additional lines are kept as long output, and any | sections found on later lines are also parsed for performance data.

Macros are not expanded as expected

Check that positional values are provided in arg_values, custom service variables are defined in custom_vars, and any virtual-node labels needed for host macros are present on the selected vnode.

The script works in a shell but fails under Netdata

Nagios checks run with a limited execution environment rather than inheriting the full Netdata process environment. If the script depends on extra variables, set them explicitly in environment instead of relying on ambient shell state.

Built-in alerts cover warning and critical states only

This collector installs stock Netdata health alerts for the warning and critical states on nagios.job.execution_state and nagios.job.perfdata_threshold_state. Both stock alert families suppress soft retry states by checking that retry is not active. If you also want alerts for unknown, timeout, paused, or more specific perfdata behavior, build your own rules on top of these contexts. The nagios.job.perfdata_threshold_state chart uses the perfdata_value label to identify which perfdata metric each threshold state belongs to.

Configuration changes are not picked up

After editing scripts.d/nagios.conf, restart the Netdata Agent for changes to take effect: sudo systemctl restart netdata.

Script stderr output is not visible

Netdata captures the check's standard output for status and performance data parsing. Standard error (stderr) is logged by the collector but not used for state or charts. If your script writes errors to stderr, check the Netdata error log for details.

Job state is always timeout

The default timeout is 5 seconds, which is too short for many checks — especially remote checks (check_nrpe, check_ssh) or HTTP checks with SSL negotiation. Increase the timeout value in your job configuration (e.g. timeout: 30s).

Check works as root but fails under Netdata

The Netdata Agent runs as the netdata user. If a check needs to read protected files, access SNMP, or connect to local sockets, it must be accessible to the netdata user. Test as that user first: sudo -u netdata /path/to/check. Common fixes include adding the netdata user to the required system group or using sudo with a specific NOPASSWD rule for the check command.

Windows script support

On Windows, point plugin directly to a .ps1, .bat, or .cmd script. Netdata automatically invokes .ps1 scripts through powershell.exe and .bat/.cmd scripts through cmd.exe. Ensure scripts are stored in directories with appropriate ACLs.

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

Overview​

Exit Codes​

Standard Output​

Performance Data Format​

Threshold Ranges​

Example​

Default Behavior​

Auto-Detection​

Limits​

Performance Impact​

Setup​

Prerequisites​

Security requirements for plugin executables​

Install check commands​

Prepare custom check scripts​

Configuration​

Options​

environment​

custom_vars​

via File​

Examples​

Basic check​

End-to-end custom script​

Custom script (minimal)​

Windows PowerShell check​

Remote check via NRPE​

Check with a job-local schedule​

Check with virtual node macros​

Alerts​

Metrics​

Per job​

Troubleshooting​

The command cannot be executed​

No performance-data charts appear​

Some performance-data values are ignored​

The job state does not match the output text​

Only the first output line appears as the main status​

Macros are not expanded as expected​

The script works in a shell but fails under Netdata​

Built-in alerts cover warning and critical states only​

Configuration changes are not picked up​

Script stderr output is not visible​

Job state is always timeout​

Check works as root but fails under Netdata​

Windows script support​

Overview

Exit Codes

Standard Output

Performance Data Format

Threshold Ranges

Example

Default Behavior

Auto-Detection

Limits

Performance Impact

Setup

Prerequisites

Security requirements for plugin executables

Install check commands

Prepare custom check scripts

Configuration

Options

environment

custom_vars

via File

Examples

Basic check

End-to-end custom script

Custom script (minimal)

Windows PowerShell check

Remote check via NRPE

Check with a job-local schedule

Check with virtual node macros

Alerts

Metrics

Per job

Troubleshooting

The command cannot be executed

No performance-data charts appear

Some performance-data values are ignored

The job state does not match the output text

Only the first output line appears as the main status

Macros are not expanded as expected

The script works in a shell but fails under Netdata

Built-in alerts cover warning and critical states only

Configuration changes are not picked up

Script stderr output is not visible

Job state is always timeout

Check works as root but fails under Netdata

Windows script support