Skip to main content
Version: nightly

Nvidia GPU monitoring with Netdata

Monitors performance metrics (memory usage, fan speed, pcie bandwidth utilization, temperature, etc.) using the nvidia-smi CLI tool.

Warning: under development, collects fewer metrics then python version.

Metrics

All metrics have "nvidia_smi." prefix.

Labels per scope:

  • gpu: uuid, product_name.
  • mig: gpu_uuid, gpu_product_name, gpu_instance_id
MetricScopeDimensionsUnitsXMLCSV
gpu_pcie_bandwidth_usagegpurx, txB/syesno
gpu_pcie_bandwidth_utilizationgpurx, tx%yesno
gpu_fan_speed_percgpufan_speed%yesyes
gpu_utilizationgpugpu%yesyes
gpu_memory_utilizationgpumemory%yesyes
gpu_decoder_utilizationgpudecoder%yesno
gpu_encoder_utilizationgpuencoder%yesno
gpu_frame_buffer_memory_usagegpufree, used, reservedByesyes
gpu_bar1_memory_usagegpufree, usedByesno
gpu_temperaturegputemperatureCelsiusyesyes
gpu_voltagegpuvoltageVyesno
gpu_clock_freqgpugraphics, video, sm, memMHzyesyes
gpu_power_drawgpupower_drawWattsyesyes
gpu_performance_stategpuP0-P15stateyesyes
gpu_mig_mode_current_statusgpuenabled, disabledstatusyesno
gpu_mig_devices_countgpumigdevicesyesno
gpu_mig_frame_buffer_memory_usagemigfree, used, reservedByesno
gpu_mig_bar1_memory_usagemigfree, usedByesno

Configuration

This module supports data collection in CSV and XML formats. The default is CSV.

  • XML provides more metrics, but requesting GPU information consumes more CPU, especially if there are multiple GPU cards in the system.
  • CSV provides fewer metrics, but is much lighter than XML in terms of CPU usage.

The format can be changed in the configuration file.

Edit the go.d/nvidia_smi.conf configuration file using edit-config from the Netdata config directory, which is typically at /etc/netdata.

jobs:
- name: nvidia_smi
use_csv_format: no # set to 'no' to use the XML format.

Troubleshooting

To troubleshoot issues with the nvidia_smi collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
  • Switch to the netdata user.

    sudo -u netdata -s
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m nvidia_smi

Was this page helpful?

Contribute