Skip to main content

Memory modules (DIMMs)

Plugin: proc.plugin Module: /sys/devices/system/edac/mc

Overview

The Error Detection and Correction (EDAC) subsystem is detecting and reporting errors in the system's memory, primarily ECC (Error-Correcting Code) memory errors.

The collector provides data for:

  • Per memory controller (MC): correctable and uncorrectable errors. These can be of 2 kinds:

    • errors related to a DIMM
    • errors that cannot be associated with a DIMM
  • Per memory DIMM: correctable and uncorrectable errors. There are 2 kinds:

    • memory controllers that can identify the physical DIMMS and report errors directly for them,
    • memory controllers that report errors for memory address ranges that can be linked to dimms. In this case the DIMMS reported may be more than the physical DIMMS installed.

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

This integration doesn't support auto-detection.

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per memory controller

These metrics refer to the memory controller.

Labels:

LabelDescription
controllermcX directory name of this memory controller.
mc_nameMemory controller type.
size_mbThe amount of memory in megabytes that this memory controller manages.
max_locationLast available memory slot in this memory controller.

Metrics:

MetricDimensionsUnit
mem.edac_mc_errorscorrectable, uncorrectable, correctable_noinfo, uncorrectable_noinfoerrors

Per memory module

These metrics refer to the memory module (or rank, depends on the memory controller).

Labels:

LabelDescription
controllermcX directory name of this memory controller.
dimmdimmX or rankX directory name of this memory module.
dimm_dev_typeType of DRAM device used in this memory module. For example, x1, x2, x4, x8.
dimm_edac_modeUsed type of error detection and correction. For example, S4ECD4ED would mean a Chipkill with x4 DRAM.
dimm_labelLabel assigned to this memory module.
dimm_locationLocation of the memory module.
dimm_mem_typeType of the memory module.
sizeThe amount of memory in megabytes that this memory module manages.

Metrics:

MetricDimensionsUnit
mem.edac_mc_errorscorrectable, uncorrectableerrors

Alerts

The following alerts are available:

Alert nameOn metricDescription
ecc_memory_mc_noinfo_correctable mem.edac_mc_errorsmemory controller ${label:controller} ECC correctable errors (unknown DIMM slot)
ecc_memory_mc_noinfo_uncorrectable mem.edac_mc_errorsmemory controller ${label:controller} ECC uncorrectable errors (unknown DIMM slot)
ecc_memory_dimm_correctable mem.edac_mc_dimm_errorsDIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC correctable errors
ecc_memory_dimm_uncorrectable mem.edac_mc_dimm_errorsDIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC uncorrectable errors

Setup

Prerequisites

No action required.

Configuration

File

There is no configuration file.

Options

There are no configuration options.

Examples

There are no configuration examples.


Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.