Nodes Ephemerality in Netdata
Node Types
Netdata categorizes nodes into two types:
Type | Description | Common Use Cases |
---|---|---|
Ephemeral | Expected to disconnect or reconnect frequently | - Auto-scaling cloud instances - Dynamic containers and VMs - IoT devices with intermittent connectivity - Development/test environments with frequent restarts |
Permanent | Expected to maintain continuous connectivity | - Production servers - Core infrastructure nodes - Critical monitoring systems - Stable database servers |
Note: Disconnections in permanent nodes indicate potential system failures and require immediate attention.
Key Benefits
- Reduced Alert Noise: Disconnection alerts now apply only to permanent nodes, helping teams focus on actual issues.
- Improved Dynamic Infrastructure Support: Auto-scaling cloud instances, containers, and other temporary resources can be designated as ephemeral to prevent unnecessary alerts.
- Automated Node Cleanup: Ephemeral nodes are removed based on configurable retention periods, keeping dashboards relevant and uncluttered.
Configuring Ephemeral Nodes
By default, Netdata treats all nodes as permanent. To mark a node as ephemeral:
- Open the
netdata.conf
file on the target node. - Add the following configuration:
[global]
is ephemeral node = yes - Restart the node.
This setting applies the _is_ephemeral
host label, which propagates to Netdata Parents and Netdata Cloud.
Alerts for Parent Nodes
Netdata v2.3.0 introduces two new alerts specifically for permanent nodes:
Alert | Trigger Condition |
---|---|
streaming_never_connected | A permanent node has never connected to a Netdata Parent. |
streaming_disconnected | A previously connected permanent node has disconnected. |
Monitoring Child Node Status
To investigate an alert:
- Open the
Top
tab in your Netdata dashboard. - Select the
Netdata-streaming
function. - Review the node status table:
- Red lines: Connection issues when nodes attempt to connect to a Parent.
- Yellow lines: Restreaming issues when a Parent streams data to another Parent.
- Color highlighting applies only to permanent nodes.
- Use the
Ephemerality
filter to view only permanent nodes. - Check
InStatus
,InReason
, andInAge
for incoming connection status. - Check
OutStatus
,OutReason
, andOutAge
for outgoing streaming status.
Managing Offline Nodes
The Netdata CLI tool has two commands for working with archived nodes.
mark-stale-nodes-ephemeral
To mark a permanently offline nodes, including virtual nodes, as ephemeral:
netdatacli mark-stale-nodes-ephemeral <node_id | machine_guid | hostname | ALL_NODES>
This keeps the previously collected metrics data available for querying and clears any active alerts.
Note: Nodes will revert to permanent status if they reconnect unless explicitly configured as ephemeral in
netdata.conf
.
remove-stale-node
To fully remove permanently offline nodes:
netdatacli remove-stale-node <node_id | machine_guid | hostname | ALL_NODES>
This is like the mark-stale-nodes-ephemeral
subcommand, but it also removes the nodes so they are no longer available for querying.
Cloud Integration
In Netdata Cloud, ephemeral nodes remain visible but marked as 'stale' as long as at least one Agent reports having queryable metrics data for that node. Once all Agents report the node as offline, ephemeral nodes are automatically removed from Cloud.
From v2.3.0 onward, Netdata Cloud sends unreachable-node notifications only for permanent nodes, reducing unnecessary alerts.
Automatically Removing Ephemeral Nodes
By default, Netdata does not automatically remove disconnected ephemeral nodes. To enable automatic cleanup:
- Open the
netdata.conf
file on Netdata Parent nodes. - Add the following configuration:
[db]
cleanup ephemeral hosts after = 1d - Restart the node.
This setting removes ephemeral nodes from queries after 24 hours of disconnection. Once all parent nodes remove a node, Netdata Cloud automatically deletes it as well.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.