How to optimize the Netdata Agent's performance

We designed the Netdata Agent to be incredibly lightweight, even when it's collecting a few thousand dimensions every second and visualizing that data into hundreds of charts. The Agent itself should never use more than 1% of a single CPU core, roughly 100 MiB of RAM, and minimal disk I/O to collect, store, and visualize all this data.

We take this scalability seriously. We have one user running Netdata on a system with 144 cores and 288 threads. Despite collecting 100,000 metrics every second, the Agent still only uses 9% CPU utilization on a single core.

But not everyone has such powerful systems at their disposal. For example, you might run the Agent on a cloud VM with only 512 MiB of RAM, or an IoT device like a Raspberry Pi. In these cases, reducing Netdata's footprint beyond its already diminutive size can pay big dividends, giving your services more horsepower while still monitoring the health and the performance of the node, OS, hardware, and applications.

Prerequisites#

  • A node running the Netdata Agent.
  • Familiarity with configuring the Netdata Agent with edit-config.

If you're not familiar with how to configure the Netdata Agent, read our node configuration doc before continuing with this guide. This guide assumes familiarity with the Netdata config directory, using edit-config, and the process of uncommenting/editing various settings in netdata.conf and other configuration files.

What affects Netdata's performance?#

Netdata's performance is primarily affected by data collection/retention and clients accessing data.

You can configure almost all aspects of data collection/retention, and certain aspects of clients accessing data. For example, you can't control how many users might be viewing a local Agent dashboard, viewing an infrastructure in real-time with Netdata Cloud, or running Metric Correlations.

The Netdata Agent runs with the lowest possible process scheduling policy, which is nice 19, and uses the idle process scheduler. Together, these settings ensure that the Agent only gets CPU resources when the node has CPU resources to space. If the node reaches 100% CPU utilization, the Agent is stopped first to ensure your applications get any available resources. In addition, under heavy load, collectors that require disk I/O may stop and show gaps in charts.

Let's walk through the best ways to improve the Netdata Agent's performance.

Reduce collection frequency#

The fastest way to improve the Agent's resource utilization is to reduce how often it collects metrics.

Global#

If you don't need per-second metrics, or if the Netdata Agent uses a lot of CPU even when no one is viewing that node's dashboard, configure the Agent to collect metrics less often.

Open netdata.conf and edit the update every setting. The default is 1, meaning that the Agent collects metrics every second.

If you change this to 2, Netdata enforces a minimum update every setting of 2 seconds, and collects metrics every other second, which will effectively halve CPU utilization. Set this to 5 or 10 to collect metrics every 5 or 10 seconds, respectively.

[global]
update every = 5

Specific plugin or collector#

Every collector and plugin has its own update every setting, which you can also change in the go.d.conf, python.d.conf, node.d.conf, or charts.d.conf files, or in individual collector configuration files. If the update every for an individual collector is less than the global, the Netdata Agent uses the global setting. See the enable or configure a collector doc for details.

To reduce the frequency of an internal plugin/collector, open netdata.conf and find the appropriate section. For example, to reduce the frequency of the apps plugin, which collects and visualizes metrics on application resource utilization:

[plugin:apps]
update every = 5

To configure an individual collector, open its specific configuration file with edit-config and look for the update_every setting. For example, to reduce the frequency of the nginx collector, run sudo ./edit-config go.d/nginx.conf:

# [ GLOBAL ]
update_every: 10

Disable unneeded plugins or collectors#

If you know that you don't need an entire plugin or a specific collector, you can disable any of them. Keep in mind that if a plugin/collector has nothing to do, it simply shuts down and does not consume system resources. You will only improve the Agent's performance by disabling plugins/collectors that are actively collecting metrics.

Open netdata.conf and scroll down to the [plugins] section. To disable any plugin, uncomment it and set the value to no. For example, to explicitly keep the proc and go.d plugins enabled while disabling python.d, charts.d, and node.d.

[plugins]
proc = yes
python.d = no
charts.d = no
node.d = no
go.d = yes

Disable specific collectors by opening their respective plugin configuration files, uncommenting the line for the collector, and setting its value to no.

sudo ./edit-config go.d.conf
sudo ./edit-config python.d.conf
sudo ./edit-config node.d.conf
sudo ./edit-config charts.d.conf

For example, to disable a few Python collectors:

modules:
apache: no
dockerd: no
fail2ban: no

Lower memory usage for metrics retention#

Reduce the disk space that the database engine uses to retain metrics by editing the dbengine multihost disk space option in netdata.conf. The default value is 256, but can be set to a minimum of 64. By reducing the disk space allocation, Netdata also needs to store less metadata in the node's memory.

The page cache size option also directly impacts Netdata's memory usage, but has a minimum value of 32.

Reducing the value of dbengine multihost disk space does slim down Netdata's resource usage, but it also reduces how long Netdata retains metrics. Find the right balance of performance and metrics retention by using the dbengine calculator.

All the settings are found in the [global] section of netdata.conf:

[global]
memory mode = dbengine
page cache size = 32
dbengine multihost disk space = 256

Run Netdata behind Nginx#

A dedicated web server like Nginx provides far more robustness than the Agent's internal web server. Nginx can handle more concurrent connections, reuse idle connections, and use fast gzip compression to reduce payloads.

For details on installing Nginx as a proxy for the local Agent dashboard, see our Nginx doc.

After you complete Nginx setup according to the doc linked above, we recommend setting keepalive to 1024, and using gzip compression with the following options in the location / block:

location / {
...
gzip on;
gzip_proxied any;
gzip_types *;
}

Finally, edit netdata.conf with the following settings:

[global]
bind socket to IP = 127.0.0.1
access log = none
disconnect idle web clients after seconds = 3600
enable web responses gzip compression = no

Disable/lower gzip compression for the dashboard#

If you choose not to run the Agent behind Nginx, you can disable or lower the Agent's web server's gzip compression. While gzip compression does reduce the size of the HTML/CSS/JS payload, it does use additional CPU while a user is looking at the local Agent dashboard.

To disable gzip compression, open netdata.conf and find the [web] section:

[web]
enable gzip compression = no

Or to lower the default compression level:

[web]
enable gzip compression = yes
gzip compression level = 1

Disable logs#

If you installation is working correctly, and you're not actively auditing Netdata's logs, disable them in netdata.conf.

[global]
debug log = none
error log = none
access log = none

What's next?#

We hope this guide helped you better understand how to optimize the performance of the Netdata Agent.

Now that your Agent is running smoothly, we recommend you secure your nodes if you haven't already.

Next, dive into some of Netdata's more complex features, such as configuring its health watchdog or exporting metrics to an external time-series database.

Monitor everything in real time – for free

Troubleshoot slowdowns and anomalies in your infrastructure with thousands of per-second metrics, meaningful visualizations, and insightful health alarms with zero configuration.

Get Netdata