Ceph
Plugin: go.d.plugin Module: ceph
Overview
This collector monitors the overall health status and performance of your Ceph clusters. It gathers key metrics for the entire cluster, individual Pools, and OSDs.
It collects metrics by periodically issuing HTTP GET requests to the Ceph Manager REST API:
- /api/monitor (only once to get the Ceph cluster id (fsid))
- /api/health/minimal
- /api/osd
- /api/pool?stats=true
This collector is only supported on the following platforms:
- Linux
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
Default Behavior
Auto-Detection
The collector can automatically detect Ceph Manager instances running on:
- localhost that are listening on port 8443
- within Docker containers
Note that the Ceph REST API requires a username and password. While Netdata can automatically detect Ceph Manager instances and create data collection jobs, these jobs will fail unless you provide the necessary credentials.
Limits
The default configuration for this integration does not impose any limits on data collection.
Performance Impact
The default configuration for this integration is not expected to impose a significant performance impact on the system.
Metrics
Metrics grouped by scope.
The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
Per cluster
These metrics refer to the entire Ceph cluster.
Labels:
| Label | Description |
|---|---|
| fsid | A unique identifier of the cluster. |
Metrics:
| Metric | Dimensions | Unit |
|---|---|---|
| ceph.cluster_status | ok, err, warn | status |
| ceph.cluster_hosts_count | hosts | hosts |
| ceph.cluster_monitors_count | monitors | monitors |
| ceph.cluster_osds_count | osds | osds |
| ceph.cluster_osds_by_status_count | up, down, in, out | status |
| ceph.cluster_managers_count | active, standby | managers |
| ceph.cluster_object_gateways_count | object | gateways |
| ceph.cluster_iscsi_gateways_count | iscsi | gateways |
| ceph.cluster_iscsi_gateways_by_status_count | up, down | gateways |
| ceph.cluster_physical_capacity_utilization | utilization | percent |
| ceph.cluster_physical_capacity_usage | avail, used | bytes |
| ceph.cluster_objects_count | objects | objects |
| ceph.cluster_objects_by_status_distribution | healthy, misplaced, degraded, unfound | percent |
| ceph.cluster_pools_count | pools | pools |
| ceph.cluster_pgs_count | pgs | pgs |
| ceph.cluster_pgs_by_status_count | clean, working, warning, unknown | pgs |
| ceph.cluster_pgs_per_osd_count | per_osd | pgs |