Metric Correlations
The Metric Correlations feature helps you quickly identify metrics and charts relevant to a specific time window of interest, allowing for faster root cause analysis.
By filtering the standard Netdata dashboard to display only the most relevant charts, Metric Correlations makes it easier to pinpoint anomalies and investigate issues.
Since it leverages every available metric in your infrastructure with up to 1-second granularity, Metric Correlations provides highly accurate insights.
Using Metric Correlations
When viewing the Metrics tab or a single-node dashboard, you'll find the Metric Correlations button in the top-right corner.
To start:
- Click Metric Correlations.
- Highlight a selection of metrics on a single chart. The selected timeframe must be at least 15 seconds.
- The menu displays details about the selected area and reference baseline. Metric Correlations compares the highlighted window to a reference baseline, which is four times its length and precedes it immediately.
- Click Find Correlations. This button is only active if a valid timeframe is selected.
- The process evaluates all available metrics and returns a filtered Netdata dashboard showing only the most changed metrics between the baseline and the highlighted window.
- If needed, select another window and press Find Correlations again to refine your analysis.
Metric Correlations Options
Metric Correlations offers adjustable parameters for deeper data exploration. Since different data types and incidents require different approaches, these settings allow for flexible analysis.
Method
Two algorithms are available for scoring metrics based on changes between the baseline and highlight windows:
KS2
(Kolmogorov-Smirnov Test): A statistical method comparing distributions between the highlighted and baseline windows to detect significant changes. Implementation details.Volume
: A heuristic approach based on percentage change in averages, designed to handle edge cases. Implementation details.
Aggregation
To accommodate different window lengths, Netdata aggregates raw data as needed. The default aggregation method is Average
, but you can also choose Median
, Min
, Max
, or Stddev
.
Data Type
Netdata assigns an Anomaly Bit to each metric in real-time, flagging whether it deviates significantly from normal behavior. You can analyze either raw data or anomaly rates:
Metrics
: Runs Metric Correlations on raw metric values.Anomaly Rate
: Runs Metric Correlations on anomaly rates for each metric.
Metric Correlations on the Agent
Metric Correlations (MC) requests to Netdata Cloud are handled in two ways:
- If MC is enabled on any node, the request is routed to the highest-level node (a Parent node or the node itself).
- If MC is not enabled on any node, Netdata Cloud processes the request by collecting data from nodes and computing correlations on its backend.
Usage Tips
When running Metric Correlations from the Metrics tab across multiple nodes, refine your results by grouping by node:
- Run MC on all nodes if you're unsure which ones are relevant.
- Group the most interesting charts by node to determine whether changes affect all nodes or just a subset.
- If a subset of nodes stands out, filter for those nodes and rerun MC to get more precise results.
Choose the
Volume
algorithm for sparse metrics (e.g., request latency with few requests). Otherwise, useKS2
.KS2
is ideal for detecting complex distribution changes, such as shifts in variance.Volume
is better for detecting metrics that were inactive and then spiked (or vice versa).
Example:
Volume
can highlight network traffic suddenly turning on.KS2
can detect entropy distribution changes missed byVolume
.
Combine
Volume
andAnomaly Rate
to identify the most anomalous metrics within a timeframe. Expand the anomaly rate chart to visualize results more clearly.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.