Kubernetes Cluster State

Plugin: go.d.plugin Module: k8s_state

Overview

This collector monitors Kubernetes Nodes, Pods and Containers.

This collector is supported on all platforms.

This collector only supports collecting metrics from a single instance of this integration.

Default Behavior

Auto-Detection

This integration doesn't support auto-detection.

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per node

These metrics refer to the Node.

Labels:

Label	Description
k8s_cluster_id	Cluster ID. This is equal to the kube-system namespace UID.
k8s_cluster_name	Cluster name. Cluster name discovery only works in GKE.
k8s_node_name	Node name.

Metrics:

Metric	Dimensions	Unit
k8s_state.node_allocatable_cpu_requests_utilization	requests	%
k8s_state.node_allocatable_cpu_requests_used	requests	millicpu
k8s_state.node_allocatable_cpu_limits_utilization	limits	%
k8s_state.node_allocatable_cpu_limits_used	limits	millicpu
k8s_state.node_allocatable_mem_requests_utilization	requests	%
k8s_state.node_allocatable_mem_requests_used	requests	bytes
k8s_state.node_allocatable_mem_limits_utilization	limits	%
k8s_state.node_allocatable_mem_limits_used	limits	bytes
k8s_state.node_allocatable_pods_utilization	allocated	%
k8s_state.node_allocatable_pods_usage	available, allocated	pods
k8s_state.node_condition	Ready, DiskPressure, MemoryPressure, NetworkUnavailable, PIDPressure	status
k8s_state.node_schedulability	schedulable, unschedulable	state
k8s_state.node_pods_readiness	ready	%
k8s_state.node_pods_readiness_state	ready, unready	pods
k8s_state.node_pods_condition	pod_ready, pod_scheduled, pod_initialized, containers_ready	pods
k8s_state.node_pods_phase	running, failed, succeeded, pending	pods
k8s_state.node_containers	containers, init_containers	containers
k8s_state.node_containers_state	running, waiting, terminated	containers
k8s_state.node_init_containers_state	running, waiting, terminated	containers
k8s_state.node_age	age	seconds

Per deployment

These metrics refer to Deployments.

Labels:

Label	Description
k8s_cluster_id	Cluster ID. This is equal to the kube-system namespace UID.
k8s_cluster_name	Cluster name. Cluster name discovery only works in GKE.
k8s_deployment_name	Deployment name.
k8s_namespace	Namespace.

Metrics:

Metric	Dimensions	Unit
k8s_state.deployment_conditions	available, replica_failure, progressing	status
k8s_state.deployment_replicas	desired, current, ready	replicas
k8s_state.deployment_age	age	seconds

Per cronjob

These metrics refer to CronJobs.

Labels:

Label	Description
k8s_cluster_id	Cluster ID. This is equal to the kube-system namespace UID.
k8s_cluster_name	Cluster name. Cluster name discovery only works in GKE.
k8s_cronjob_name	CronJob name.
k8s_namespace	Namespace.

Metrics:

Metric	Dimensions	Unit
k8s_state.cronjob_jobs_count_by_status	completed, failed, running, suspended	jobs
k8s_state.cronjob_jobs_failed_by_reason	pod_failure_policy, backoff_limit_exceeded, deadline_exceeded	jobs
k8s_state.cronjob_last_execution_status	completed, failed	status
k8s_state.cronjob_last_completion_duration	last_completion	seconds
k8s_state.cronjob_last_completed_time_ago	last_completed_ago	seconds
k8s_state.cronjob_last_schedule_time_ago	last_schedule_ago	seconds
k8s_state.cronjob_suspend_status	enabled, suspended	status
k8s_state.cronjob_age	age	seconds

Per pod

These metrics refer to the Pod.

Labels:

Label	Description
k8s_cluster_id	Cluster ID. This is equal to the kube-system namespace UID.
k8s_cluster_name	Cluster name. Cluster name discovery only works in GKE.
k8s_node_name	Node name.
k8s_namespace	Namespace.
k8s_controller_kind	Controller kind (ReplicaSet, DaemonSet, StatefulSet, Job, etc.).
k8s_controller_name	Controller name.
k8s_pod_name	Pod name.
k8s_qos_class	Pod QOS class (burstable, guaranteed, besteffort).

Metrics:

Metric	Dimensions	Unit
k8s_state.pod_cpu_requests_used	requests	millicpu
k8s_state.pod_cpu_limits_used	limits	millicpu
k8s_state.pod_mem_requests_used	requests	bytes
k8s_state.pod_mem_limits_used	limits	bytes
k8s_state.pod_condition	pod_ready, pod_scheduled, pod_initialized, containers_ready	state
k8s_state.pod_phase	running, failed, succeeded, pending	state
k8s_state.pod_status_reason	Evicted, NodeAffinity, NodeLost, Shutdown, UnexpectedAdmissionError, Other	status
k8s_state.pod_age	age	seconds
k8s_state.pod_containers	containers, init_containers	containers
k8s_state.pod_containers_state	running, waiting, terminated	containers
k8s_state.pod_init_containers_state	running, waiting, terminated	containers

Per container

These metrics refer to the Pod container.

Labels:

Label	Description
k8s_cluster_id	Cluster ID. This is equal to the kube-system namespace UID.
k8s_cluster_name	Cluster name. Cluster name discovery only works in GKE.
k8s_node_name	Node name.
k8s_namespace	Namespace.
k8s_controller_kind	Controller kind (ReplicaSet, DaemonSet, StatefulSet, Job, etc.).
k8s_controller_name	Controller name.
k8s_pod_name	Pod name.
k8s_qos_class	Pod QOS class (burstable, guaranteed, besteffort).
k8s_container_name	Container name.

Metrics:

Metric	Dimensions	Unit
k8s_state.pod_container_readiness_state	ready	state
k8s_state.pod_container_restarts	restarts	restarts
k8s_state.pod_container_state	running, waiting, terminated	state
k8s_state.pod_container_waiting_state_reason	ContainerCreating, CrashLoopBackOff, CreateContainerConfigError, CreateContainerError, ErrImagePull, ImagePullBackOff, InvalidImageName, PodInitializing, Other	state
k8s_state.pod_container_terminated_state_reason	Completed, ContainerCannotRun, DeadlineExceeded, Error, Evicted, OOMKilled, Other	state

Alerts

The following alerts are available:

Alert name	On metric	Description
k8s_state_deployment_condition_available	k8s_state.deployment_conditions	Deployment ${label:k8s_deployment_name} does not have the minimum required replicas
k8s_state_cronjob_last_execution_failed	k8s_state.cronjob_last_execution_status	CronJob ${label:k8s_cronjob_name} in ${label:k8s_namespace} failing

Setup

Prerequisites

No action required.

Configuration

File

The configuration file name for this integration is go.d/k8s_state.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name1

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/k8s_state.conf

Options

There are no configuration options.

Examples

There are no configuration examples.

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the k8s_state collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
```
sudo -u netdata -s
```

Run the go.d.plugin to debug the collector:

./go.d.plugin -d -m k8s_state

To debug a specific job:

./go.d.plugin -d -m k8s_state -j jobName

Getting Logs

If you're encountering problems with the k8s_state collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep k8s_state

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

grep k8s_state /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

docker logs netdata 2>&1 | grep k8s_state

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

Overview​

Default Behavior​

Auto-Detection​

Limits​

Performance Impact​

Metrics​

Per node​

Per deployment​

Per cronjob​

Per pod​

Per container​

Alerts​

Setup​

Prerequisites​

Configuration​

File​

Options​

Examples​

Troubleshooting​

Debug Mode​

Getting Logs​

System with systemd​

System without systemd​

Docker Container​

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Metrics

Per node

Per deployment

Per cronjob

Per pod

Per container

Alerts

Setup

Prerequisites

Configuration

File

Options

Examples

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container