Skip to main content

Netdata Streaming Routing

Streaming routing controls how Netdata child nodes connect to parent nodes when multiple parents are available. It handles three key operations: initial parent selection, connection management, and failover.

Prerequisites

This feature requires configuring streaming in netdata.conf. See Streaming Configuration for setup instructions.

How Streaming Routing Works

1. Initial Parent Selection

When a child node starts, it queries all configured parents simultaneously to determine the best connection:

How it works:

  1. Child sends HTTP requests to all parents in parallel
  2. Each parent responds with:
  • Last timestamp of this child's data (if any)
  • Random seed for load balancing
  1. Child calculates time delta for each parent
  2. Selection based on data recency (not data amount)

Example:

# In child's stream.conf
[stream]
enabled = yes
destination = parent-a:19999 parent-b:19999 parent-c:19999
api key = YOUR_API_KEY

With this configuration:

Child Node startup:

├─→ Parent A (has historical data) ✓ Selected (random between A & B)
├─→ Parent B (has historical data)
└─→ Parent C (no historical data) ← Lower priority

2. Connection Management

Once connected, the child maintains a persistent connection:

  • Connection timeout: 60 seconds (default)
  • Keepalive: Continuous streaming maintains connection
  • No automatic rebalancing: Child stays connected until failure
  • Data integrity: Historical metrics are replicated automatically after reconnection
Data Recovery

Netdata automatically replicates missing historical data when reconnection occurs. Data is only lost if:

  • Child restarts during disconnection AND
  • Child uses memory mode = ram (metrics stored in memory) AND
  • Disconnection exceeds retention period (default: 1 hour for RAM mode)

For persistent data, use memory mode = dbengine.

Important

Children do not automatically reconnect to their original parent after failover. This prevents connection flapping but requires manual intervention for load redistribution.

3. Failover and Reconnection

When the active connection fails, the child repeats the parent selection process:

Smart Failover

Unlike traditional round-robin failover, Netdata re-evaluates all parents on each attempt. This means a child might connect to a different parent than expected if data states have changed.

Failover Example:

Normal:     Child → Parent A

Failure: Child ✗ Parent A (connection lost)
Child → Parent B (immediate failover)

Recovery: Parent A comes back online
Child → Parent B (stays connected - no automatic switch)

Key Routing Behaviors

BehaviorDescriptionImpact
Data Recency PrioritySelects parent with most recent data (lowest time delta)Minimizes gap in historical data
Parallel Parent QueryQueries all parents simultaneously via HTTPFast parent selection, no sequential delays
Sticky ConnectionsNo automatic rebalancing after failoverRequires manual intervention to redistribute load
Smart FailoverRe-evaluates all parents on each connection attemptMay connect to different parent based on current data state
Connection PersistenceMaintains connection until failure occursPrevents unnecessary reconnections and data gaps
No Health ChecksDoesn't proactively test parent availabilityDiscovers failures only when connection breaks
Randomized DelaysReconnection waits random time (5s to configured maximum)Prevents thundering herd during mass reconnections

Configuration Reference

Essential Parameters

[stream]
# Streaming targets (space-separated list)
# Order doesn't matter - selection is based on data recency
destination = parent1:19999 parent2:19999 parent3:19999

# Reconnection delay - randomized between 5 and this value (seconds)
# Default: 5, Minimum: 5
reconnect delay seconds = 5

# Initial connection timeout
timeout seconds = 60

Multi-Tier Setup

For larger deployments:

Child Nodes ──→ Parent Proxies ──→ Ultimate Parents
(forward only) (store & analyze)

Configure intermediate parents as proxies to distribute load without storage overhead.

Monitoring Streaming Status

Check Connection Status

Using the UI

The Netdata Streaming function (under the "Functions" tab) provides:

  • Comprehensive overview of all streaming connections
  • Status, replication completion time, and connection details
  • Works on both parent and child nodes:
    • On child: Shows outgoing connections
    • On parent: Shows incoming connections (InHops = 1 for direct children, >1 for proxied connections)

Viewing Logs

# Check journal for streaming-related messages
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep stream

Verify Parent Connectivity

# Test each parent
nc -zv parent-a 19999
nc -zv parent-b 19999
Troubleshooting

If a child connects to an unexpected parent, check the data retention on all parents. The child prefers parents that already have its historical data.

Common Scenarios

ScenarioWhat HappensWhy
Parent A failsChild switches to Parent BAutomatic failover
All parents failChild cycles through list every secondContinuous retry
Parent A recoversChild stays on Parent BNo automatic rebalancing
New child startsRandomly selects from parents with dataLoad distribution
Maintenance Planning

When taking a parent offline for maintenance, its children will failover to other parents and won't automatically return. Plan capacity accordingly.

Best Practices

  1. List parents in priority order - First parent is preferred if all equal
  2. Configure at least 3 parents - Ensures availability during maintenance
  3. Monitor parent data completeness - Affects routing decisions
  4. Plan maintenance carefully - Children won't automatically return

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.