Netdata Helm chart for Kubernetes deployments

Based on the work of varyumin (https://github.com/varyumin/netdata).

Introduction

This chart bootstraps a Netdata deployment on a Kubernetes cluster using the Helm package manager.

By default, the chart installs:

A Netdata child pod on each node of a cluster, using a Daemonset
A Netdata k8s state monitoring pod on one node, using a Deployment. This virtual node is called netdata-k8s-state.
A Netdata parent pod on one node, using a Deployment. This virtual node is called netdata-parent.

Disabled by default:

A Netdata restarter CronJob. Its main purpose is to automatically update Netdata when using nightly releases.

The child pods and the state pod function as headless collectors that collect and forward all the metrics to the parent pod. The parent pod uses persistent volumes to store metrics and alarms, handle alarm notifications, and provide the Netdata UI to view metrics using an ingress controller.

Please validate that the settings are suitable for your cluster before using them in production.

Prerequisites

A working cluster running Kubernetes v1.9 or newer.
The kubectl command line tool, within one minor version difference of your cluster, on an administrative system.
The Helm package manager v3.8.0 or newer on the same administrative system.

Required Resources and Permissions

Netdata is a comprehensive monitoring solution that requires specific access to host resources to function effectively. By design, monitoring solutions like Netdata need visibility into various system components to collect metrics and provide insights. The following mounts, privileges, and capabilities are essential for Netdata's operation, and applying restrictive security profiles or limiting these accesses may significantly impact functionality or render the monitoring solution ineffective.

See required mounts, privileges and RBAC resources

Required Mounts

Mount	Type	Node	Components & Descriptions
`/`	hostPath	child	• diskspace.plugin: Host mount points monitoring.
`/proc`	hostPath	child	• proc.plugin: Host system monitoring (CPU, memory, network interfaces, disks, etc.).
`/sys`	hostPath	child	• cgroups.plugin: Docker containers monitoring and name resolution.
`/var/log`	hostPath	child	• systemd-journal.plugin: Viewing, exploring and analyzing systemd journal logs.
`/etc/os-release`	hostPath	child, parent, k8sState	• netdata: Host info detection.
`/etc/passwd`, `/etc/group`	hostPath	child	• apps.plugin: Monitoring of host system resource usage by each user and user group.
`{{ .Values.child.persistence.hostPath }}/var/lib/netdata`	hostPath (DirectoryOrCreate)	child	• netdata: Persistence of Netdata's /var/lib/netdata directory which contains netdata public unique ID and other files that should persist across container recreations. Without persistence, a new netdata unique ID is generated for each child on every container recreation, causing children to appear as new nodes on the Parent instance.

Required Privileges and Capabilities

Privilege/Capability	Node	Components & Descriptions
Host Network Mode	child	• proc.plugin: Host system networking stack monitoring. • go.d.plugin: Monitoring applications running on the host and inside containers. • local-listeners: Discovering local services/applications. Map open (listening) ports to running services/applications. • network-viewer.plugin: Discovering all current network sockets and building a network-map.
Host PID Mode	child	• cgroups.plugin: Container network interfaces monitoring. Map virtual interfaces in the system namespace to interfaces inside containers.
SYS_ADMIN	child	• cgroups.plugin: Container network interfaces monitoring. Map virtual interfaces in the system namespace to interfaces inside containers. • network-viewer.plugin: Discovering all current network sockets and building a network-map.
SYS_PTRACE	child	• local-listeners: Discovering local services/applications. Map open (listening) ports to running services/applications.

Required Kubernetes RBAC Resources

Resource	Verbs	Components & Descriptions
pods	get, list, watch	• service discovery: Used for discovering services. • go.d/k8s_state: Kubernetes state monitoring. • netdata: Used by cgroup-name.sh and get-kubernetes-labels.sh scripts.
services	get, list, watch	• service discovery: Used for discovering services.
configmaps	get, list, watch	• service discovery: Used for discovering services.
secrets	get, list, watch	• service discovery: Used for discovering services.
nodes	get, list, watch	• go.d/k8s_state: Kubernetes state monitoring.
nodes/metrics	get, list, watch	• go.d/k8s_kubelet: Used when querying Kubelet HTTPS endpoint.
nodes/proxy	get, list, watch	• netdata: Used by cgroup-name.sh when querying Kubelet /pods endpoint.
deployments (apps)	get, list, watch	• go.d/k8s_state: Kubernetes state monitoring.
cronjobs (batch)	get, list, watch	• go.d/k8s_state: Kubernetes state monitoring.
jobs (batch)	get, list, watch	• go.d/k8s_state: Kubernetes state monitoring.
namespaces	get	• go.d/k8s_state: Kubernetes state monitoring. • netdata: Used by cgroup-name.sh and get-kubernetes-labels.sh scripts.

Installing the Helm chart

You can install the Helm chart via our Helm repository, or by cloning this repository.

Installing via our Helm repository (recommended)

To use Netdata's Helm repository, run the following commands:

helm repo add netdata https://netdata.github.io/helmchart/
helm install netdata netdata/netdata

See our install Netdata on Kubernetes documentation for detailed installation and configuration instructions. The remainder of this document assumes you installed the Helm chart by cloning this repository, and thus uses slightly different helm install/helm upgrade commands.

Install by cloning the repository

Clone the repository locally.

git clone https://github.com/netdata/helmchart.git netdata-helmchart

To install the chart with the release name netdata:

helm install netdata ./netdata-helmchart/charts/netdata

The command deploys ingress on the Kubernetes cluster in the default configuration. The configuration section lists the parameters that can be configured during installation.

Tip: List all releases using helm list.

Uninstalling the Chart

To uninstall/delete the my-release deployment:

 helm delete netdata

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configuration

The following table lists the configurable parameters of the netdata chart and their default values.

General settings

Key	Type	Default	Description
replicaCount	int	1	Number of `replicas` for the parent netdata `Deployment`
deploymentStrategy.type	string	"Recreate"	Deployment strategy for pod deployments. Recreate is the safest value.
imagePullSecrets	list	[]	An optional list of references to secrets in the same namespace to use for pulling any of the images
image.repository	string	"netdata/netdata"	Container image repository
image.tag	string	"{{ .Chart.AppVersion }}"	Container image tag
image.pullPolicy	string	"Always"	Container image pull policy
initContainersImage.repository	string	"alpine"	Init containers' image repository
initContainersImage.tag	string	"latest"	Init containers' image tag
initContainersImage.pullPolicy	string	"Always"	Init containers' image pull policy
sysctlInitContainer.enabled	bool	false	Enable an init container to modify Kernel settings
sysctlInitContainer.command	list	[]	sysctl init container command to execute
sysctlInitContainer.resources	object	{}	sysctl Init container CPU/Memory resource requests/limits
service.type	string	"ClusterIP"	Parent service type
service.port	int	19999	Parent service port
service.annotations	object	{}	Additional annotations to add to the service
service.loadBalancerIP	string	""	Static LoadBalancer IP, only to be used with service type=LoadBalancer
service.loadBalancerSourceRanges	list	[]	List of allowed IPs for LoadBalancer
service.externalTrafficPolicy	string	""	Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
service.healthCheckNodePort	string	null	Specifies the health check node port (only to be used with type LoadBalancer and external traffic policy Local)
service.clusterIP	string	""	Specific cluster IP when service type is cluster IP. Use `None` for headless service
ingress.enabled	bool	true	Create Ingress to access the netdata web UI
ingress.annotations	object	See values.yaml for defaults	Associate annotations to the Ingress
ingress.path	string	"/"	URL path for the ingress. If changed, a proxy server needs to be configured in front of netdata to translate path from a custom one to a `/`
ingress.pathType	string	"Prefix"	pathType for your ingress controller. Default value is correct for nginx. If you use your own ingress controller, check the correct value
ingress.hosts[0]	string	"netdata.k8s.local"	URL hostnames for the ingress (they need to resolve to the external IP of the ingress controller)
httpRoute.enabled	bool	false	Create HTTPRoute to access the netdata web UI via Gateway API
httpRoute.annotations	object	{}	Additional annotations to add to the HTTPRoute
httpRoute.labels	object	{}	Additional labels to add to the HTTPRoute
httpRoute.parentRefs	list	[]	Parent references for Gateway API HTTPRoute. Required when `httpRoute.enabled=true`
httpRoute.hostnames	list	[]	Hostnames for the HTTPRoute
httpRoute.rules	list	[]	Optional explicit HTTPRoute rules. If empty, a default PathPrefix `/` rule is generated
rbac.create	bool	true	if true, create & use RBAC resources
rbac.pspEnabled	bool	true	Specifies whether a PodSecurityPolicy should be created
serviceAccount.create	bool	true	if true, create a service account
serviceAccount.name	string	"netdata"	The name of the service account to use. If not set and create is true, a name is generated using the fullname template
serviceAccount.annotations	object	{}	Annotations to add to the service account (e.g. an AWS IRSA `eks.amazonaws.com/role-arn`)
restarter.enabled	bool	false	Install CronJob to update Netdata Pods
restarter.schedule	string	"00 06 * * *"	The schedule in Cron format
restarter.image.repository	string	"rancher/kubectl"	Container image repo
restarter.image.tag	string	".auto"	Container image tag. If `.auto`, the image tag version of the rancher/kubectl will reflect the Kubernetes cluster version
restarter.image.pullPolicy	string	"Always"	Container image pull policy
restarter.restartPolicy	string	"Never"	Container restart policy
restarter.resources	object	{}	Container resources
restarter.concurrencyPolicy	string	"Forbid"	Specifies how to treat concurrent executions of a job
restarter.startingDeadlineSeconds	int	60	Optional deadline in seconds for starting the job if it misses scheduled time for any reason
restarter.successfulJobsHistoryLimit	int	3	The number of successful finished jobs to retain
restarter.failedJobsHistoryLimit	int	3	The number of failed finished jobs to retain
notifications.slack.webhook_url	string	""	Slack webhook URL
notifications.slack.recipient	string	""	Slack recipient list

Service Discovery

Key	Type	Default	Description
sd.image.repository	string	"netdata/agent-sd"	Container image repository
sd.image.tag	string	"v0.2.10"	Container image tag
sd.image.pullPolicy	string	"Always"	Container image pull policy
sd.child.enabled	bool	true	Add service-discovery sidecar container to the netdata child pod definition
sd.child.configmap.name	string	"netdata-child-sd-config-map"	Child service-discovery ConfigMap name
sd.child.configmap.key	string	"config.yml"	Child service-discovery ConfigMap key
sd.child.configmap.from.file	string	""	File to use for child service-discovery configuration generation
sd.child.configmap.from.value	object	{}	Value to use for child service-discovery configuration generation
sd.child.resources	object	See values.yaml for defaults	Child service-discovery container CPU/Memory resource requests/limits

Parent

Key	Type	Default	Description
parent.hostname	string	"netdata-parent"	Parent node hostname
parent.enabled	bool	true	Install parent Deployment to receive metrics from children nodes
parent.port	int	19999	Parent's listen port
parent.resources	object	{}	Resources for the parent deployment
parent.livenessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before liveness probes are initiated
parent.livenessProbe.failureThreshold	int	3	When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
parent.livenessProbe.periodSeconds	int	30	How often (in seconds) to perform the liveness probe
parent.livenessProbe.successThreshold	int	1	Minimum consecutive successes for the liveness probe to be considered successful after having failed
parent.livenessProbe.timeoutSeconds	int	1	Number of seconds after which the liveness probe times out
parent.readinessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before readiness probes are initiated
parent.readinessProbe.failureThreshold	int	3	When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
parent.readinessProbe.periodSeconds	int	30	How often (in seconds) to perform the readiness probe
parent.readinessProbe.successThreshold	int	1	Minimum consecutive successes for the readiness probe to be considered successful after having failed
parent.readinessProbe.timeoutSeconds	int	1	Number of seconds after which the readiness probe times out
parent.securityContext.runAsUser	int	201	The UID to run the container process
parent.securityContext.runAsGroup	int	201	The GID to run the container process
parent.securityContext.fsGroup	int	201	The supplementary group for setting permissions on volumes
parent.terminationGracePeriodSeconds	int	300	Duration in seconds the pod needs to terminate gracefully
parent.nodeSelector	object	{}	Node selector for the parent deployment
parent.tolerations	list	[]	Tolerations settings for the parent deployment
parent.affinity	object	{}	Affinity settings for the parent deployment
parent.priorityClassName	string	""	Pod priority class name for the parent deployment
parent.env	object	{}	Set environment parameters for the parent deployment
parent.envFrom	list	[]	Set environment parameters for the parent deployment from ConfigMap and/or Secrets
parent.podLabels	object	{}	Additional labels to add to the parent pods
parent.podAnnotations	object	{}	Additional annotations to add to the parent pods
parent.dnsPolicy	string	"Default"	DNS policy for pod
parent.database.persistence	bool	true	Whether the parent should use a persistent volume for the DB
parent.database.storageclass	string	"-"	The storage class for the persistent volume claim of the parent's database store, mounted to `/var/cache/netdata`
parent.database.volumesize	string	"15Gi"	The storage space for the PVC of the parent database
parent.alarms.persistence	bool	true	Whether the parent should use a persistent volume for the alarms log
parent.alarms.storageclass	string	"-"	The storage class for the persistent volume claim of the parent's alarm log, mounted to `/var/lib/netdata`
parent.alarms.volumesize	string	"1Gi"	The storage space for the PVC of the parent alarm log
parent.configs	object	See values.yaml for defaults	Manage custom parent's configs
parent.claiming.enabled	bool	false	Enable parent claiming for netdata cloud
parent.claiming.token	string	""	Claim token
parent.claiming.rooms	string	""	Comma separated list of claim rooms IDs. Empty value = All nodes room only
parent.extraVolumeMounts	list	[]	Additional volumeMounts to add to the parent pods
parent.extraVolumes	list	[]	Additional volumes to add to the parent pods
parent.extraInitContainers	list	[]	Additional init containers to add to the parent pods

Child

Key	Type	Default	Description
child.enabled	bool	true	Install child DaemonSet to gather data from nodes
child.port	string	"{{ .Values.parent.port }}"	Children's listen port
child.updateStrategy	object	{}	An update strategy to replace existing DaemonSet pods with new pods
child.resources	object	{}	Resources for the child DaemonSet
child.livenessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before liveness probes are initiated
child.livenessProbe.failureThreshold	int	3	When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
child.livenessProbe.periodSeconds	int	30	How often (in seconds) to perform the liveness probe
child.livenessProbe.successThreshold	int	1	Minimum consecutive successes for the liveness probe to be considered successful after having failed
child.livenessProbe.timeoutSeconds	int	1	Number of seconds after which the liveness probe times out
child.readinessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before readiness probes are initiated
child.readinessProbe.failureThreshold	int	3	When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
child.readinessProbe.periodSeconds	int	30	How often (in seconds) to perform the readiness probe
child.readinessProbe.successThreshold	int	1	Minimum consecutive successes for the readiness probe to be considered successful after having failed
child.readinessProbe.timeoutSeconds	int	1	Number of seconds after which the readiness probe times out
child.terminationGracePeriodSeconds	int	30	Duration in seconds the pod needs to terminate gracefully
child.nodeSelector	object	{}	Node selector for the child daemonsets
child.tolerations	list	See values.yaml for defaults	Tolerations settings for the child daemonsets
child.affinity	object	{}	Affinity settings for the child daemonsets
child.priorityClassName	string	""	Pod priority class name for the child daemonsets
child.podLabels	object	{}	Additional labels to add to the child pods
child.podAnnotationAppArmor.enabled	bool	true	Whether or not to include the AppArmor security annotation
child.podAnnotations	object	{}	Additional annotations to add to the child pods
child.hostNetwork	bool	true	Usage of host networking and ports
child.dnsPolicy	string	"ClusterFirstWithHostNet"	DNS policy for pod. Should be `ClusterFirstWithHostNet` if `child.hostNetwork = true`
child.persistence.enabled	bool	true	Whether or not to persist `/var/lib/netdata` in the `child.persistence.hostPath`
child.persistence.hostPath	string	"/var/lib/netdata-k8s-child"	Host node directory for storing child instance data
child.podsMetadata.useKubelet	bool	false	Send requests to the Kubelet /pods endpoint instead of Kubernetes API server to get pod metadata
child.podsMetadata.kubeletUrl	string	"https://localhost:10250"	Kubelet URL
child.configs	object	See values.yaml for defaults	Manage custom child's configs
child.env	object	{}	Set environment parameters for the child daemonset
child.envFrom	list	[]	Set environment parameters for the child daemonset from ConfigMap and/or Secrets
child.claiming.enabled	bool	false	Enable child claiming for netdata cloud
child.claiming.token	string	""	Claim token
child.claiming.rooms	string	""	Comma separated list of claim rooms IDs. Empty value = All nodes room only
child.extraVolumeMounts	list	[]	Additional volumeMounts to add to the child pods
child.extraVolumes	list	[]	Additional volumes to add to the child pods

K8s State

Key	Type	Default	Description
k8sState.hostname	string	"netdata-k8s-state"	K8s state node hostname
k8sState.enabled	bool	true	Install this Deployment to gather data from K8s cluster
k8sState.port	string	"{{ .Values.parent.port }}"	Listen port
k8sState.resources	object	{}	Compute resources required by this Deployment
k8sState.livenessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before liveness probes are initiated
k8sState.livenessProbe.failureThreshold	int	3	When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
k8sState.livenessProbe.periodSeconds	int	30	How often (in seconds) to perform the liveness probe
k8sState.livenessProbe.successThreshold	int	1	Minimum consecutive successes for the liveness probe to be considered successful after having failed
k8sState.livenessProbe.timeoutSeconds	int	1	Number of seconds after which the liveness probe times out
k8sState.readinessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before readiness probes are initiated
k8sState.readinessProbe.failureThreshold	int	3	When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
k8sState.readinessProbe.periodSeconds	int	30	How often (in seconds) to perform the readiness probe
k8sState.readinessProbe.successThreshold	int	1	Minimum consecutive successes for the readiness probe to be considered successful after having failed
k8sState.readinessProbe.timeoutSeconds	int	1	Number of seconds after which the readiness probe times out
k8sState.terminationGracePeriodSeconds	int	30	Duration in seconds the pod needs to terminate gracefully
k8sState.nodeSelector	object	{}	Node selector
k8sState.tolerations	list	[]	Tolerations settings
k8sState.affinity	object	{}	Affinity settings
k8sState.priorityClassName	string	""	Pod priority class name
k8sState.podLabels	object	{}	Additional labels
k8sState.podAnnotationAppArmor.enabled	bool	true	Whether or not to include the AppArmor security annotation
k8sState.podAnnotations	object	{}	Additional annotations
k8sState.dnsPolicy	string	"ClusterFirstWithHostNet"	DNS policy for pod
k8sState.persistence.enabled	bool	true	Whether should use a persistent volume for `/var/lib/netdata`
k8sState.persistence.storageclass	string	"-"	The storage class for the persistent volume claim of `/var/lib/netdata`
k8sState.persistence.volumesize	string	"1Gi"	The storage space for the PVC of `/var/lib/netdata`
k8sState.env	object	{}	Set environment parameters
k8sState.envFrom	list	[]	Set environment parameters from ConfigMap and/or Secrets
k8sState.configs	object	See values.yaml for defaults	Manage custom configs
k8sState.claiming.enabled	bool	false	Enable claiming for netdata cloud
k8sState.claiming.token	string	""	Claim token
k8sState.claiming.rooms	string	""	Comma separated list of claim rooms IDs. Empty value = All nodes room only
k8sState.extraVolumeMounts	list	[]	Additional volumeMounts to add to the k8sState pods
k8sState.extraVolumes	list	[]	Additional volumes to add to the k8sState pods

Netdata OpenTelemetry

Key	Type	Default	Description
netdataOpentelemetry.enabled	bool	false	Enable the Netdata OpenTelemetry Deployment
netdataOpentelemetry.hostname	string	"netdata-otel"	Hostname for the Netdata OpenTelemetry instance
netdataOpentelemetry.port	string	"{{ .Values.parent.port }}"	Listen port
netdataOpentelemetry.service.type	string	"ClusterIP"	Service type
netdataOpentelemetry.service.port	int	4317	Service port
netdataOpentelemetry.service.annotations	object	{}	Service annotations
netdataOpentelemetry.service.clusterIP	string	""	Cluster IP address (only used with service.type ClusterIP)
netdataOpentelemetry.service.loadBalancerIP	string	""	LoadBalancer IP address (only used with service.type LoadBalancer)
netdataOpentelemetry.service.loadBalancerSourceRanges	list	[]	Allowed source ranges for LoadBalancer (only used with service.type LoadBalancer)
netdataOpentelemetry.service.externalTrafficPolicy	string	""	External traffic policy (only used with service.type LoadBalancer)
netdataOpentelemetry.service.healthCheckNodePort	string	""	Health check node port (only used with service.type LoadBalancer and external traffic policy Local)
netdataOpentelemetry.resources	object	{}	Compute resources required by this Deployment
netdataOpentelemetry.livenessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before liveness probes are initiated
netdataOpentelemetry.livenessProbe.failureThreshold	int	3	When a liveness probe fails, Kubernetes will try failureThreshold times before giving up
netdataOpentelemetry.livenessProbe.periodSeconds	int	30	How often (in seconds) to perform the liveness probe
netdataOpentelemetry.livenessProbe.successThreshold	int	1	Minimum consecutive successes for the liveness probe to be considered successful after having failed
netdataOpentelemetry.livenessProbe.timeoutSeconds	int	1	Number of seconds after which the liveness probe times out
netdataOpentelemetry.readinessProbe.initialDelaySeconds	int	0	Number of seconds after the container has started before readiness probes are initiated
netdataOpentelemetry.readinessProbe.failureThreshold	int	3	When a readiness probe fails, Kubernetes will try failureThreshold times before giving up
netdataOpentelemetry.readinessProbe.periodSeconds	int	30	How often (in seconds) to perform the readiness probe
netdataOpentelemetry.readinessProbe.successThreshold	int	1	Minimum consecutive successes for the readiness probe to be considered successful after having failed
netdataOpentelemetry.readinessProbe.timeoutSeconds	int	1	Number of seconds after which the readiness probe times out
netdataOpentelemetry.securityContext.runAsUser	int	201	The UID to run the container process
netdataOpentelemetry.securityContext.runAsGroup	int	201	The GID to run the container process
netdataOpentelemetry.securityContext.fsGroup	int	201	The supplementary group for setting permissions on volumes
netdataOpentelemetry.terminationGracePeriodSeconds	int	30	Duration in seconds the pod needs to terminate gracefully
netdataOpentelemetry.nodeSelector	object	{}	Node selector
netdataOpentelemetry.tolerations	list	[]	Tolerations settings
netdataOpentelemetry.affinity	object	{}	Affinity settings
netdataOpentelemetry.priorityClassName	string	""	Pod priority class name
netdataOpentelemetry.podLabels	object	{}	Additional labels
netdataOpentelemetry.podAnnotationAppArmor.enabled	bool	true	Whether or not to include the AppArmor security annotation
netdataOpentelemetry.podAnnotations	object	{}	Additional annotations
netdataOpentelemetry.dnsPolicy	string	"ClusterFirst"	DNS policy for pod
netdataOpentelemetry.persistence.enabled	bool	true	Whether to use persistent volumes
netdataOpentelemetry.persistence.storageclass	string	"-"	The storage class for the persistent volume claim (both varlib and varlog volumes)
netdataOpentelemetry.persistence.volumesize	string	"10Gi"	The storage space for the logs (varlog volume)
netdataOpentelemetry.configs	object	See values.yaml for defaults	Manage custom configs
netdataOpentelemetry.env	object	{}	Set environment parameters
netdataOpentelemetry.envFrom	list	[]	Set environment parameters from ConfigMap and/or Secrets
netdataOpentelemetry.claiming.enabled	bool	false	Enable claiming for netdata cloud
netdataOpentelemetry.claiming.token	string	""	Claim token
netdataOpentelemetry.claiming.rooms	string	""	Comma separated list of claim rooms IDs. Empty value = All nodes room only
netdataOpentelemetry.extraVolumeMounts	list	[]	Additional volumeMounts
netdataOpentelemetry.extraVolumes	list	[]	Additional volumes

OpenTelemetry Collector

Key	Type	Default	Description
otel-collector.enabled	bool	false	Set to true to enable the OpenTelemetry Collector
otel-collector.mode	string	"daemonset"	Deployment mode: daemonset, deployment, or statefulset
otel-collector.image.repository	string	"otel/opentelemetry-collector-k8s"	Image repository
otel-collector.presets.kubernetesAttributes.enabled	bool	true	Enable Kubernetes attributes collection
otel-collector.presets.logsCollection.enabled	bool	true	Enable logs collection from Kubernetes pods
otel-collector.presets.logsCollection.includeCollectorLogs	bool	false	Include collector logs in the collection
otel-collector.config	object	See values.yaml for defaults	OpenTelemetry Collector configuration
otel-collector.resources	object	See values.yaml for defaults	Resources
otel-collector.serviceAccount.create	bool	true	Create service account
otel-collector.clusterRole.create	bool	true	Create cluster role
otel-collector.clusterRole.rules	list	See values.yaml for defaults	Cluster role rules
otel-collector.tolerations	list	See values.yaml for defaults	Tolerations to run on all nodes

Example to set the parameters from the command line:

$ helm install ./netdata --name my-release \
    --set notifications.slack.webhook_url=MySlackAPIURL \
    --set notifications.slack.recipient="@MyUser MyChannel"

Another example, to set a different ingress controller.

By default kubernetes.io/ingress.class set to use nginx as an ingress controller, but you can set Traefik as your ingress controller by setting ingress.annotations.

$ helm install ./netdata --name my-release \
    --set ingress.annotations=kubernetes.io/ingress.class: traefik

Alternatively to passing each variable in the command line, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install ./netdata --name my-release -f values.yaml

Tip: You can use the default values.yaml

Note:: To opt out of anonymous statistics, set the DO_NOT_TRACK environment variable to non-zero or non-empty value in parent.env / child.env configuration (e.g.,: DO_NOT_TRACK: 1) or uncomment the line in values.yml.

Configuration files

Parameter	Description	Default
`parent.configs.netdata`	Contents of the parent's `netdata.conf`	`memory mode = dbengine`
`parent.configs.stream`	Contents of the parent's `stream.conf`	Store child data, accept all connections, and issue alarms for child data.
`parent.configs.health`	Contents of `health_alarm_notify.conf`	Email disabled, a sample of the required settings for Slack notifications
`parent.configs.exporting`	Contents of `exporting.conf`	Disabled
`k8sState.configs.netdata`	Contents of `netdata.conf`	No persistent storage, no alarms
`k8sState.configs.stream`	Contents of `stream.conf`	Send metrics to the parent at netdata:{{ service.port }}
`k8sState.configs.exporting`	Contents of `exporting.conf`	Disabled
`k8sState.configs.go.d`	Contents of `go.d.conf`	Only k8s_state enabled
`k8sState.configs.go.d-k8s_state`	Contents of `go.d/k8s_state.conf`	k8s_state configuration
`child.configs.netdata`	Contents of the child's `netdata.conf`	No persistent storage, no alarms, no UI
`child.configs.stream`	Contents of the child's `stream.conf`	Send metrics to the parent at netdata:{{ service.port }}
`child.configs.exporting`	Contents of the child's `exporting.conf`	Disabled
`child.configs.kubelet`	Contents of the child's `go.d/k8s_kubelet.conf` that drives the kubelet collector	Update metrics every sec, do not retry to detect the endpoint, look for the kubelet metrics at http://127.0.0.1:10255/metrics
`child.configs.kubeproxy`	Contents of the child's `go.d/k8s_kubeproxy.conf` that drives the kubeproxy collector	Update metrics every sec, do not retry to detect the endpoint, look for the coredns metrics at http://127.0.0.1:10249/metrics

To deploy additional netdata user configuration files, you will need to add similar entries to either the parent.configs or the child.configs arrays. Regardless of whether you add config files that reside directly under /etc/netdata or in a subdirectory such as /etc/netdata/go.d, you can use the already provided configurations as reference. For reference, the parent.configs the array includes an example alarm that would get triggered if the python.d example module was enabled. Whenever you pass the sensitive data to your configuration like the database credential, you can take an option to put it into the Kubernetes Secret by specifying storedType: secret in the selected configuration. By default, all the configurations will be placed in the Kubernetes configmap.

Note that in this chart's default configuration, the parent performs the health checks and triggers alarms but collects little data. As a result, the only other configuration files that might make sense to add to the parent are the alarm and alarm template definitions, under /etc/netdata/health.d.

Tip: Do pay attention to the indentation of the config file contents, as it matters for the parsing of the yaml file. Note that the first line under var: | must be indented with two more spaces relative to the preceding line:

  data: |-
    config line 1 #Need those two spaces
        config line 2 #No problem indenting more here

Persistent volumes

There are two different persistent volumes on parent node by design (not counting any Configmap/Secret mounts). Both can be used, but they don't have to be. Keep in mind that whenever persistent volumes for parent are not used, all the data for specific PV is lost in case of pod removal.

database (/var/cache/netdata) - all metrics data is stored here. Performance of this volume affects query timings.
alarms (/var/lib/netdata) - alarm log, if not persistent pod recreation will result in parent appearing as a new node in netdata.cloud (due to ./registry/ and ./cloud.d/ being removed).

In case of child instance it is a bit simpler. By default, hostPath: /var/lib/netdata-k8s-child is mounted on child in: /var/lib/netdata. You can disable it, but this option is pretty much required in a real life scenario, as without it each pod deletion will result in a new replication node for a parent.

Collecting logs with OpenTelemetry

Netdata can ingest, store, and visualize your cluster's container logs through OpenTelemetry. This relies on two components — both disabled by default:

netdataOpentelemetry — Netdata's OpenTelemetry receiver. It runs as its own Deployment (the netdata-otel node) and listens for OTLP data on port 4317. It receives, stores, and displays the logs in the Netdata UI. It does not collect logs itself.
otel-collector — the upstream OpenTelemetry Collector, bundled as an optional subchart. It runs as a DaemonSet (one pod per node), reads each node's local container log files, and forwards them over OTLP to netdataOpentelemetry.

netdataOpentelemetry is the destination; otel-collector is what feeds it. The collector is bundled only as a convenient, working default, and is off by default because it is not the only option. If you already run a log pipeline (Fluent Bit, Vector, an existing Collector, or any OTLP-capable agent), leave otel-collector disabled and point that pipeline's OTLP exporter at the netdata-otel service on port 4317 instead.

Log flow:

container stdout/stderr
   │  the container runtime writes them to node-local files (/var/log/pods/…)
   ▼
otel-collector DaemonSet  (one pod per node, needs host log access)
   │  reads each node's log files, pushes over OTLP
   ▼
netdata-otel:4317  →  stored and shown in the Netdata UI

Securing the endpoint with TLS

By default the netdata-otel receiver listens on port 4317 in plaintext — TLS is disabled (tls_cert_path, tls_key_path, and tls_ca_cert_path are unset in netdataOpentelemetry.configs.otel.data). The steps below turn it on with a self-signed certificate. TLS affects both sides: the receiver must serve the certificate, and every client (including the bundled otel-collector) must be switched to TLS, or it will stop delivering data.

1. Generate a self-signed certificate and key (Linux, openssl):

openssl req -x509 -newkey rsa:4096 -nodes \
  -keyout tls.key -out tls.crt -days 365 \
  -subj "/CN=netdata-otel"

2. Create a Kubernetes TLS secret from those files, in the chart's namespace:

kubectl create secret tls netdata-otel-tls \
  --cert=tls.crt --key=tls.key \
  --namespace <your-namespace>

3. Mount the secret into the receiver and point the config at it. The certificate paths live inside netdataOpentelemetry.configs.otel.data, which is a single block — supply it in full with the two tls_*_path values filled in (keep the metrics and logs sections in sync with the chart's values.yaml). Mounting the secret alone does nothing until these paths are set:

netdataOpentelemetry:
  extraVolumes:
    - name: otel-tls
      secret:
        secretName: netdata-otel-tls
  extraVolumeMounts:
    - name: otel-tls
      mountPath: /etc/netdata/otel-certs
      readOnly: true
  configs:
    otel:
      data: |
        endpoint:
          path: "0.0.0.0:4317"
          tls_cert_path: /etc/netdata/otel-certs/tls.crt
          tls_key_path: /etc/netdata/otel-certs/tls.key
          tls_ca_cert_path: null
        metrics:
          print_flattened: false
          buffer_samples: 10
          throttle_charts: 100
          chart_configs_dir: otel.d/v1/metrics
        logs:
          journal_dir: otel/v1
          size_of_journal_file: "100MB"
          number_of_journal_files: 10
          size_of_journal_files: "1GB"
          duration_of_journal_files: "7 days"
          duration_of_journal_file: "2 hours"
          store_otlp_json: false

4. Switch every client to TLS. A TLS listener rejects plaintext connections, so any sender must be reconfigured — otherwise logs silently stop arriving. For the bundled otel-collector, enable TLS on its OTLP exporter. Because the certificate is self-signed, skip verification with insecure_skip_verify — this keeps the connection encrypted but does not validate the certificate chain (only the tls block is overridden; the exporter's endpoint is kept from the chart defaults):

otel-collector:
  config:
    exporters:
      otlp:
        tls:
          insecure: false
          insecure_skip_verify: true

Apply the same change to any external OTLP client — Fluent Bit, Vector, or another Collector — pointing at the netdata-otel service.

For production, replace the self-signed certificate with one issued by a trusted CA, give it a CN/SAN that matches the netdata-otel service DNS name, and have clients trust that CA via tls_ca_cert_path instead of skipping verification.

Service discovery and supported services

Netdata's service discovery, which is installed as part of the Helm chart installation, finds what services are running on a cluster's pods, converts that into configuration files, and exports them, so they can be monitored.

Applications

Service discovery currently supports the following applications via their associated collector:

Prometheus endpoints

Service discovery supports Prometheus endpoints via the Prometheus collector.

Annotations on pods allow a fine control of the scraping process:

prometheus.io/scrape: The default configuration will scrape all pods and, if set to false, this annotation excludes the pod from the scraping process.
prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
prometheus.io/port: Scrape the pod on the indicated port instead of the pod's declared ports.

Configure service discovery

If your cluster runs services on non-default ports or uses non-default names, you may need to configure service discovery to start collecting metrics from your services. You have to edit the default ConfigMap that is shipped with the Helmchart and deploy that to your cluster.

First, copy netdata-helmchart/sdconfig/child.yml to a new location outside the netdata-helmchart directory. The destination can be anywhere you like, but the following examples assume it resides next to the netdata-helmchart directory.

cp netdata-helmchart/sdconfig/child.yml .

Edit the new child.yml file according to your needs. See the Helm chart configuration and the file itself for details. You can then run helm install/helm upgrade with the --set-file argument to use your configured child.yml file instead of the default, changing the path if you copied it elsewhere.

helm install --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata
helm upgrade --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata

Now that you pushed an edited ConfigMap to your cluster, service discovery should find and set up metrics collection from your non-default service.

Custom pod labels and annotations

Occasionally, you will want to add specific labels and annotations to the parent and/or child pods. You might want to do this to tell other applications on the cluster how to treat your pods, or simply to categorize applications on your cluster. You can label and annotate the parent and child pods by using the podLabels and podAnnotations dictionaries under the parent and child objects, respectively.

For example, suppose you're installing Netdata on all your database nodes, and you'd like the child pods to be labeled with workload: database so that you're able to recognize this.

At the same time, say you've configured chaoskube to kill all pods annotated with chaoskube.io/enabled: true, and you'd like chaoskube to be enabled for the parent pod but not the childs.

You would do this by installing as:

$ helm install \
  --set child.podLabels.workload=database \
  --set 'child.podAnnotations.chaoskube\.io/enabled=false' \
  --set 'parent.podAnnotations.chaoskube\.io/enabled=true' \
  netdata ./netdata-helmchart/charts/netdata

Contributing

If you want to contribute, we're humbled!

Take a look at our Contributing Guidelines.
This repository is under the Netdata Code Of Conduct.
Chat about your contribution and let us help you in our forum!

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

Introduction​

Prerequisites​

Required Resources and Permissions​

Required Mounts​

Required Privileges and Capabilities​

Required Kubernetes RBAC Resources​

Installing the Helm chart​

Installing via our Helm repository (recommended)​

Install by cloning the repository​

Uninstalling the Chart​

Configuration​