Netdata Streaming Routing

Streaming routing controls how Netdata child nodes connect to parent nodes when multiple parents are available. It handles three key operations: initial parent selection, connection management, and failover.

Prerequisites

This feature requires configuring streaming in netdata.conf. See Streaming Configuration for setup instructions.

How Streaming Routing Works

1. Initial Parent Selection

When a child node starts, it queries all configured parents simultaneously to determine the best connection:

How it works:

Child sends HTTP requests to all parents in parallel
Each parent responds with:

Last timestamp of this child's data (if any)
Random seed for load balancing

Child calculates time delta for each parent
Selection based on data recency (not data amount)

Example:

# In child's stream.conf
[stream]
    enabled = yes
    destination = parent-a:19999 parent-b:19999 parent-c:19999
    api key = YOUR_API_KEY

With this configuration:

Child Node startup:
    │
    ├─→ Parent A (has historical data) ✓ Selected (random between A & B)
    ├─→ Parent B (has historical data) 
    └─→ Parent C (no historical data)   ← Lower priority

2. Connection Management

Once connected, the child maintains a persistent connection:

Connection timeout: 60 seconds (default)
Keepalive: Continuous streaming maintains connection
No automatic rebalancing: Child stays connected until failure
Data integrity: Historical metrics are replicated automatically after reconnection

Data Recovery

Netdata automatically replicates missing historical data when reconnection occurs. Data is only lost if:

Child restarts during disconnection AND
Child uses memory mode = ram (metrics stored in memory) AND
Disconnection exceeds retention period (default: 1 hour for RAM mode)

For persistent data, use memory mode = dbengine.

Important

Children do not automatically reconnect to their original parent after failover. This prevents connection flapping but requires manual intervention for load redistribution.

3. Failover and Reconnection

When the active connection fails, the child repeats the parent selection process:

Smart Failover

Unlike traditional round-robin failover, Netdata re-evaluates all parents on each attempt. This means a child might connect to a different parent than expected if data states have changed.

Failover Example:

Normal:     Child → Parent A
            
Failure:    Child ✗ Parent A (connection lost)
            Child → Parent B (immediate failover)
            
Recovery:   Parent A comes back online
            Child → Parent B (stays connected - no automatic switch)

Key Routing Behaviors

Behavior	Description	Impact
Data Recency Priority	Selects parent with most recent data (lowest time delta)	Minimizes gap in historical data
Parallel Parent Query	Queries all parents simultaneously via HTTP	Fast parent selection, no sequential delays
Sticky Connections	No automatic rebalancing after failover	Requires manual intervention to redistribute load
Smart Failover	Re-evaluates all parents on each connection attempt	May connect to different parent based on current data state
Connection Persistence	Maintains connection until failure occurs	Prevents unnecessary reconnections and data gaps
No Health Checks	Doesn't proactively test parent availability	Discovers failures only when connection breaks
Randomized Delays	Reconnection waits random time (5s to configured maximum)	Prevents thundering herd during mass reconnections

Configuration Reference

Essential Parameters

[stream]
    # Streaming targets (space-separated list)
    # Order doesn't matter - selection is based on data recency
    destination = parent1:19999 parent2:19999 parent3:19999
    
    # Reconnection delay - randomized between 5 and this value (seconds)
    # Default: 5, Minimum: 5
    reconnect delay seconds = 5
    
    # Initial connection timeout
    timeout seconds = 60

Multi-Tier Setup

For larger deployments:

Child Nodes ──→ Parent Proxies ──→ Ultimate Parents
                 (forward only)      (store & analyze)

Configure intermediate parents as proxies to distribute load without storage overhead.

Monitoring Streaming Status

Check Connection Status

Using the UI

The Netdata Streaming function (under the "Functions" tab) provides:

Comprehensive overview of all streaming connections
Status, replication completion time, and connection details
Works on both parent and child nodes:
- On child: Shows outgoing connections
- On parent: Shows incoming connections (InHops = 1 for direct children, >1 for proxied connections)

Viewing Logs

# Check journal for streaming-related messages
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep stream

Verify Parent Connectivity

# Test each parent
nc -zv parent-a 19999
nc -zv parent-b 19999

Troubleshooting

If a child connects to an unexpected parent, check the data retention on all parents. The child prefers parents that already have its historical data.

Common Scenarios

Scenario	What Happens	Why
Parent A fails	Child switches to Parent B	Automatic failover
All parents fail	Child cycles through list every second	Continuous retry
Parent A recovers	Child stays on Parent B	No automatic rebalancing
New child starts	Randomly selects from parents with data	Load distribution

Maintenance Planning

When taking a parent offline for maintenance, its children will failover to other parents and won't automatically return. Plan capacity accordingly.

Best Practices

List parents in priority order - First parent is preferred if all equal
Configure at least 3 parents - Ensures availability during maintenance
Monitor parent data completeness - Affects routing decisions
Plan maintenance carefully - Children won't automatically return

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

How Streaming Routing Works​

1. Initial Parent Selection​

2. Connection Management​

3. Failover and Reconnection​

Key Routing Behaviors​

Configuration Reference​

Essential Parameters​

Multi-Tier Setup​

Monitoring Streaming Status​

Check Connection Status​

Using the UI​

Viewing Logs​

Verify Parent Connectivity​

Common Scenarios​

Best Practices​