Nodes Ephemerality in Netdata
Overview
Netdata v2.3.0 changes how ephemeral nodes are defined and managed in distributed monitoring environments This update enhances monitoring reliability while providing flexibility for dynamic infrastructure management.
Key Changes:
Netdata now defines ephemeral nodes as "nodes that are expected to disconnect without raising alerts," replacing the previous definition of nodes that are forgotten after one day of disconnection. This change provides three major benefits:
- Improved Permanent Node Monitoring: Disconnection alerts are now triggered only for permanent nodes, reducing alert noise and helping teams focus on genuine operational issues.
- Better Support for Dynamic Infrastructure: Organizations using auto-scaling cloud instances, containers, and other dynamic resources can now designate nodes as ephemeral, preventing unnecessary alerts.
- Automated Node Management: The system automatically removes ephemeral nodes based on configurable retention periods, maintaining clean and relevant monitoring dashboards.
Node Types
Netdata supports two types of nodes:
Type | Description | Common Examples |
---|---|---|
Ephemeral | Nodes expected to disconnect or reconnect frequently | • Auto-scaling cloud instances • Dynamic containers and VMs • IoT devices with intermittent connectivity • Development/test environments with frequent restarts |
Permanent | Nodes expected to maintain continuous connectivity | • Production servers • Core infrastructure nodes • Critical monitoring systems • Stable database servers |
Note: Disconnections in permanent nodes indicate potential system failures requiring immediate attention.
Setting Up Ephemeral Nodes
By default, Netdata treats all nodes as permanent. To mark a node as ephemeral:
- Open
netdata.conf
on the target node - Add the following configuration:
[global]
is ephemeral node = yes - Restart the node
This configuration sets the _is_ephemeral
host label which propagates to Netdata Parents and Netdata Cloud.
Alerts: Parent Node Alerts
Netdata v2.3.0 adds two alerts specifically for permanent nodes:
Alert | Triggers |
---|---|
streaming_never_connected | When permanent nodes have never connected to a Netdata Parent |
streaming_disconnected | When previously connected permanent nodes disconnect |
Monitoring Child Node Status
To investigate alert:
- Navigate to the
Top
tab in your dashboard - Select the
Netdata-streaming
function - Review the detailed node status table:
- Red lines: Node connection problems (when nodes attempt to connect to this Parent)
- Yellow lines: Restreaming issues (when this Parent attempts to stream data to other Parent nodes)
- Color highlighting applies only to permanent nodes
- Filter by
Ephemerality
to focus on permanent nodes - Use
InStatus
,InReason
, andInAge
columns fto analyze nodes connecting to this parent - Use
OutStatus
,OutReason
, andOutAge
columns to analyze this Parent's restreaming to other Parent nodes
Managing Archived Nodes
To clear alerts for permanently offline nodes:
netdatacli mark-stale-nodes-ephemeral <node_id | machine_guid | hostname | ALL_NODES>
Note: Nodes will revert to permanent status if they reconnect unless configured as ephemeral in their
netdata.conf
.
Cloud Integration
Starting with v2.3.0, Netdata Cloud sends node-unreachable notifications exclusively for permanent nodes, improving alert relevance.
Automatic Ephemeral Nodes Cleanup
The automatic removal of disconnected ephemeral nodes is disabled by default in v2.3.0+. To enable this feature:
Edit the
netdata.conf
file on Netdata Parent nodesAdd the following configuration:
[db]
cleanup ephemeral hosts after = 1dRestart the node
This setting removes ephemeral nodes from queries 24 hours after disconnection. When all parent nodes remove a node, Netdata Cloud automatically deletes it too.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.