Skip to main content

Node States

Netdata provides dashboards at multiple levels of your infrastructure. Each level displays node states based on what it can observe. This page explains what each state means, when transitions happen, and how to configure the behavior.

Dashboard Levels

Netdata's distributed architecture provides three observation points:

DashboardWhat You SeeHow to Access
AgentLocal node onlyhttp://node-ip:19999
ParentAll nodes streaming to this Parenthttp://parent-ip:19999
Netdata CloudAll nodes claimed to your Spacehttps://app.netdata.cloud

How data flows:

Agent → (streaming) → Parent → (ACLK) → Netdata Cloud
Agent → (ACLK) → Netdata Cloud (standalone, no Parent)

Node states reflect this flow: if a link breaks, states change based on where data is still available.

States on Netdata Cloud

StateMeaning
LiveNode is connected to Netdata Cloud (directly or via Parents) and providing live metrics
StaleNode disconnected, but a Parent connected to Netdata Cloud has its historical data
OfflineNode is disconnected and no data is available
UnseenNode was claimed but has never connected

Stale vs Offline

The difference is data availability:

ScenarioStateCan Query Data?
Child disconnected, Parent connected to CloudStaleYes, via Parent
Standalone Agent disconnectedOfflineNo
Child disconnected, all Parents disconnected from CloudOfflineNo

Stale nodes remain queryable because the Parent serves as a data cache. This is why you cannot delete Stale nodes from the UI—they still have accessible data.

States on Parent Dashboards

Parents display nodes that stream (or have streamed) to them:

StateMeaning
LiveNode is actively streaming metrics
StaleNode stopped streaming, historical data retained

Parents don't show Offline or Unseen states. When a node's retention expires or cleanup runs, it disappears from the Parent's view entirely.

State Mapping: Parent → Cloud

When a Parent connects to Netdata Cloud, it reports the state of all its children:

Parent SeesCloud ShowsWhy
LiveLiveData flowing through Parent
StaleStaleParent connected to Cloud has historical data
(removed)OfflineNo data source available

High-availability setups (recommended): With two Parents (Child → P1 → P2), children stream to one Parent, which replicates to the other. Both Parents connect to Cloud.

If the child connects to Cloud only via Parents:

EventResultWhy
P1 disconnects from CloudNo change (Live)P1 still runs, replicates to P2, P2 reports to Cloud
P1 stopsBrief Stale, then LiveChild fails over to P2
P2 disconnects from CloudNo change (Live)P1 still connected to Cloud
P2 stopsNo change (Live)Child still streams to P1, P1 reports to Cloud
Both disconnect from CloudOfflineNo Parent can report to Cloud

If the child also connects directly to Cloud, it remains Live regardless of Parent status.

Single Parent setups: When the only Parent disconnects from Cloud, all its children become Offline because Cloud can no longer query their data.

When a Parent reconnects: Children with retained data appear as Stale (or Live if actively streaming).

Transition Timings

Detection Speed

Netdata Cloud detects agent/parent disconnection:

EventDetection TimeMechanism
Agent or Parent loses Cloud connection~60 secondsMQTT keepalive (60s interval)
UI reflects state change1-2 minutesCloud processing + UI refresh

Parent detects child disconnection:

EventDetection TimeMechanism
Child shuts down gracefullyImmediateSocket close detected
Child crashes or network drops~60 secondsTCP keepalive probes (30s idle + 3×10s probes)
Child silently stops sending data10 minutesIdle activity timeout

These timings are hardcoded and not user-configurable.

Standalone Agent Transitions

A standalone Agent connects directly to Cloud without a Parent.

EventFromToTiming
Agent starts, connects to CloudUnseen/OfflineLiveImmediate on connection
Agent stops or loses networkLiveOfflineImmediate to ~60 seconds
Agent restartsOfflineLiveImmediate on reconnection

No Stale state: Standalone agents go directly to Offline because there's no Parent holding their data.

Child Node Transitions

A child streams metrics to a Parent, which connects to Cloud.

EventFromToTiming
Child connects to ParentUnseen/OfflineLiveImmediate
Child stops streamingLiveStaleImmediate to ~60 seconds (see Detection Speed)
Child restarts streamingStaleLiveImmediate
All Parents go offlineLive/StaleOfflineImmediate to ~60 seconds
Parent reconnects (child still down)OfflineStaleImmediate (if data retained)

First Connection

EventFromToTiming
Node claimed to Space-UnseenImmediate
Node connects for first timeUnseenLiveImmediate on connection

Automatic Cleanup

Netdata Cloud Cleanup

Cloud automatically removes nodes that remain Offline or Unseen:

Node TypeCleanup AfterNotes
Standalone agents (0 hops)7 daysDirect Cloud connection
Child nodes (1+ hops)48 hoursConnected via Parent
Unseen nodes48 hoursClaimed but never connected

Stale nodes are never automatically removed. They have queryable data via their Parent.

These thresholds are managed by Netdata Cloud infrastructure and are not user-configurable.

Configuration Options

Ephemeral Nodes

For dynamic infrastructure (auto-scaling groups, containers, spot instances), mark nodes as ephemeral:

# On the child node's netdata.conf
[global]
is ephemeral node = yes

Effects:

  • No disconnection alerts for this node
  • Node label _is_ephemeral=true propagates to Parents and Cloud

Marking Existing Nodes as Ephemeral

To mark already-offline nodes as ephemeral (clears alerts, keeps data queryable):

netdatacli mark-stale-nodes-ephemeral <node-id | hostname | ALL_NODES>

Removing Nodes

To force-remove a Stale node:

netdatacli remove-stale-node <node-id | hostname | ALL_NODES>

This sends an offline signal to Cloud. The node transitions to Offline and becomes eligible for cleanup (or immediate UI deletion).

See Remove Node for detailed instructions.

Connection Hops

Check how a node connects to Cloud:

HopsMeaning
0Direct Cloud connection (standalone)
1Connected via one Parent
2+Connected via chained Parents

View hops in Netdata Cloud by clicking the node info button.

Nodes with more hops have more potential failure points, but also benefit from Parent data caching (Stale state instead of Offline).

Troubleshooting

Log Filtering with MESSAGE_ID

Netdata logs include MESSAGE_IDs for filtering specific events. Use journalctl to view relevant logs:

# Cloud connection events (ACLK)
journalctl -u netdata MESSAGE_ID=acb33cb9-5778-476b-aac7-02eb7e4e151d

# Streaming from children (on Parent)
journalctl -u netdata MESSAGE_ID=ed4cdb8f-1beb-4ad3-b57c-b3cae2d162fa

# Streaming to parent (on Child)
journalctl -u netdata MESSAGE_ID=6e2e3839-0676-4896-8b64-6045dbf28d66

Node shows Stale, expected Live

Cause: Node stopped streaming to its Parent.

Check:

  1. Is the node's Netdata Agent running? systemctl status netdata
  2. Can the node reach the Parent? Check network/firewall
  3. Check streaming config: cat /etc/netdata/stream.conf
  4. Check agent logs: journalctl -u netdata | grep -i stream

Node shows Offline, expected Stale

Cause: Either it's a standalone Agent, or all its Parents are disconnected from Cloud.

Check:

  1. Is this a standalone Agent or does it stream to a Parent?
  2. If streaming: Is the Parent online and connected to Cloud?
  3. Check Parent's dashboard—does it show the child?

Node shows Unseen

Cause: Node was claimed but never successfully connected to Cloud.

Check:

  1. Is the Netdata Agent running?
  2. Can the agent reach app.netdata.cloud? Check firewall/proxy
  3. Is the claiming token correct?
  4. Check agent logs: journalctl -u netdata | grep -i aclk

All children went Offline simultaneously

Cause: All Parents lost their Cloud connection. With HA setups (two Parents), this only happens if both disconnect.

Check:

  1. Are the Parents online? systemctl status netdata
  2. Can they reach Cloud? Check network
  3. Check Parent logs: journalctl -u netdata | grep -i aclk

Can't delete node from UI

Cause: Node is Stale (has data via Parent). UI prevents deletion to protect queryable data.

Solution: Use CLI to remove:

netdatacli remove-stale-node <node-id>

Node reappears after deletion

Cause: Agent is still running and configured to reconnect.

Solution:

  1. Stop the agent: systemctl stop netdata
  2. Remove claim: rm /var/lib/netdata/cloud.d/claimed_id
  3. Clear environment variables if set

See Also


Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.