Skip to main content

Netdata Helm chart for Kubernetes deployments

Artifact HUB

Version: 3.7.160

AppVersion: v2.9.0

Based on the work of varyumin (https://github.com/varyumin/netdata).

Introduction

This chart bootstraps a Netdata deployment on a Kubernetes cluster using the Helm package manager.

By default, the chart installs:

  • A Netdata child pod on each node of a cluster, using a Daemonset
  • A Netdata k8s state monitoring pod on one node, using a Deployment. This virtual node is called netdata-k8s-state.
  • A Netdata parent pod on one node, using a Deployment. This virtual node is called netdata-parent.

Disabled by default:

  • A Netdata restarter CronJob. Its main purpose is to automatically update Netdata when using nightly releases.

The child pods and the state pod function as headless collectors that collect and forward all the metrics to the parent pod. The parent pod uses persistent volumes to store metrics and alarms, handle alarm notifications, and provide the Netdata UI to view metrics using an ingress controller.

Please validate that the settings are suitable for your cluster before using them in production.

Prerequisites

Required Resources and Permissions

Netdata is a comprehensive monitoring solution that requires specific access to host resources to function effectively. By design, monitoring solutions like Netdata need visibility into various system components to collect metrics and provide insights. The following mounts, privileges, and capabilities are essential for Netdata's operation, and applying restrictive security profiles or limiting these accesses may significantly impact functionality or render the monitoring solution ineffective.

See required mounts, privileges and RBAC resources

Required Mounts

MountTypeNodeComponents & Descriptions
/hostPathchilddiskspace.plugin: Host mount points monitoring.
/prochostPathchildproc.plugin: Host system monitoring (CPU, memory, network interfaces, disks, etc.).
/syshostPathchildcgroups.plugin: Docker containers monitoring and name resolution.
/var/loghostPathchildsystemd-journal.plugin: Viewing, exploring and analyzing systemd journal logs.
/etc/os-releasehostPathchild, parent, k8sStatenetdata: Host info detection.
/etc/passwd, /etc/grouphostPathchildapps.plugin: Monitoring of host system resource usage by each user and user group.
{{ .Values.child.persistence.hostPath }}/var/lib/netdatahostPath (DirectoryOrCreate)childnetdata: Persistence of Netdata's /var/lib/netdata directory which contains netdata public unique ID and other files that should persist across container recreations. Without persistence, a new netdata unique ID is generated for each child on every container recreation, causing children to appear as new nodes on the Parent instance.

Required Privileges and Capabilities

Privilege/CapabilityNodeComponents & Descriptions
Host Network Modechildproc.plugin: Host system networking stack monitoring.
go.d.plugin: Monitoring applications running on the host and inside containers.
local-listeners: Discovering local services/applications. Map open (listening) ports to running services/applications.
network-viewer.plugin: Discovering all current network sockets and building a network-map.
Host PID Modechildcgroups.plugin: Container network interfaces monitoring. Map virtual interfaces in the system namespace to interfaces inside containers.
SYS_ADMINchildcgroups.plugin: Container network interfaces monitoring. Map virtual interfaces in the system namespace to interfaces inside containers.
network-viewer.plugin: Discovering all current network sockets and building a network-map.
SYS_PTRACEchildlocal-listeners: Discovering local services/applications. Map open (listening) ports to running services/applications.

Required Kubernetes RBAC Resources

ResourceVerbsComponents & Descriptions
podsget, list, watchservice discovery: Used for discovering services.
go.d/k8s_state: Kubernetes state monitoring.
netdata: Used by cgroup-name.sh and get-kubernetes-labels.sh scripts.
servicesget, list, watchservice discovery: Used for discovering services.
configmapsget, list, watchservice discovery: Used for discovering services.
secretsget, list, watchservice discovery: Used for discovering services.
nodesget, list, watchgo.d/k8s_state: Kubernetes state monitoring.
nodes/metricsget, list, watchgo.d/k8s_kubelet: Used when querying Kubelet HTTPS endpoint.
nodes/proxyget, list, watchnetdata: Used by cgroup-name.sh when querying Kubelet /pods endpoint.
deployments (apps)get, list, watchgo.d/k8s_state: Kubernetes state monitoring.
cronjobs (batch)get, list, watchgo.d/k8s_state: Kubernetes state monitoring.
jobs (batch)get, list, watchgo.d/k8s_state: Kubernetes state monitoring.
namespacesgetgo.d/k8s_state: Kubernetes state monitoring.
netdata: Used by cgroup-name.sh and get-kubernetes-labels.sh scripts.

Installing the Helm chart

You can install the Helm chart via our Helm repository, or by cloning this repository.

To use Netdata's Helm repository, run the following commands:

helm repo add netdata https://netdata.github.io/helmchart/
helm install netdata netdata/netdata

See our install Netdata on Kubernetes documentation for detailed installation and configuration instructions. The remainder of this document assumes you installed the Helm chart by cloning this repository, and thus uses slightly different helm install/helm upgrade commands.

Install by cloning the repository

Clone the repository locally.

git clone https://github.com/netdata/helmchart.git netdata-helmchart

To install the chart with the release name netdata:

helm install netdata ./netdata-helmchart/charts/netdata

The command deploys ingress on the Kubernetes cluster in the default configuration. The configuration section lists the parameters that can be configured during installation.

Tip: List all releases using helm list.

Uninstalling the Chart

To uninstall/delete the my-release deployment:

 helm delete netdata

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configuration

The following table lists the configurable parameters of the netdata chart and their default values.

General settings

KeyTypeDefaultDescription
replicaCountint

1

Number of replicas for the parent netdata Deployment
deploymentStrategy.typestring

"Recreate"

Deployment strategy for pod deployments. Recreate is the safest value.
imagePullSecretslist

[]

An optional list of references to secrets in the same namespace to use for pulling any of the images
image.repositorystring

"netdata/netdata"

Container image repository
image.tagstring

"{{ .Chart.AppVersion }}"

Container image tag
image.pullPolicystring

"Always"

Container image pull policy
initContainersImage.repositorystring

"alpine"

Init containers' image repository
initContainersImage.tagstring

"latest"

Init containers' image tag
initContainersImage.pullPolicystring

"Always"

Init containers' image pull policy
sysctlInitContainer.enabledbool

false

Enable an init container to modify Kernel settings
sysctlInitContainer.commandlist

[]

sysctl init container command to execute
sysctlInitContainer.resourcesobject

{}

sysctl Init container CPU/Memory resource requests/limits
service.typestring

"ClusterIP"

Parent service type
service.portint

19999

Parent service port
service.annotationsobject

{}

Additional annotations to add to the service
service.loadBalancerIPstring

""

Static LoadBalancer IP, only to be used with service type=LoadBalancer
service.loadBalancerSourceRangeslist

[]

List of allowed IPs for LoadBalancer
service.externalTrafficPolicystring

""

Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
service.healthCheckNodePortstring

null

Specifies the health check node port (only to be used with type LoadBalancer and external traffic policy Local)
service.clusterIPstring

""

Specific cluster IP when service type is cluster IP. Use None for headless service
ingress.enabledbool

true

Create Ingress to access the netdata web UI
ingress.annotationsobject

See values.yaml for defaults

Associate annotations to the Ingress
ingress.pathstring

"/"

URL path for the ingress. If changed, a proxy server needs to be configured in front of netdata to translate path from a custom one to a /
ingress.pathTypestring

"Prefix"

pathType for your ingress controller. Default value is correct for nginx. If you use your own ingress controller, check the correct value
ingress.hosts[0]string

"netdata.k8s.local"

URL hostnames for the ingress (they need to resolve to the external IP of the ingress controller)
rbac.createbool

true

if true, create & use RBAC resources
rbac.pspEnabledbool

true

Specifies whether a PodSecurityPolicy should be created
serviceAccount.createbool

true

if true, create a service account
serviceAccount.namestring

"netdata"

The name of the service account to use. If not set and create is true, a name is generated using the fullname template
restarter.enabledbool

false

Install CronJob to update Netdata Pods
restarter.schedulestring

"00 06 * * *"

The schedule in Cron format
restarter.image.repositorystring

"rancher/kubectl"

Container image repo
restarter.image.tagstring

".auto"

Container image tag. If .auto, the image tag version of the rancher/kubectl will reflect the Kubernetes cluster version
restarter.image.pullPolicystring

"Always"

Container image pull policy
restarter.restartPolicystring

"Never"

Container restart policy
restarter.resourcesobject

{}

Container resources
restarter.concurrencyPolicystring

"Forbid"

Specifies how to treat concurrent executions of a job
restarter.startingDeadlineSecondsint

60

Optional deadline in seconds for starting the job if it misses scheduled time for any reason
restarter.successfulJobsHistoryLimitint

3

The number of successful finished jobs to retain
restarter.failedJobsHistoryLimitint

3

The number of failed finished jobs to retain
notifications.slack.webhook_urlstring

""

Slack webhook URL
notifications.slack.recipientstring

""

Slack recipient list

Service Discovery

KeyTypeDefaultDescription
sd.image.repositorystring

"netdata/agent-sd"

Container image repository
sd.image.tagstring

"v0.2.10"

Container image tag
sd.image.pullPolicystring

"Always"

Container image pull policy
sd.child.enabledbool

true

Add service-discovery sidecar container to the netdata child pod definition
sd.child.configmap.namestring

"netdata-child-sd-config-map"

Child service-discovery ConfigMap name
sd.child.configmap.keystring

"config.yml"

Child service-discovery ConfigMap key
sd.child.configmap.from.filestring

""

File to use for child service-discovery configuration generation
sd.child.configmap.from.valueobject

{}

Value to use for child service-discovery configuration generation
sd.child.resourcesobject

See values.yaml for defaults

Child service-discovery container CPU/Memory resource requests/limits

Parent

KeyTypeDefaultDescription
parent.hostnamestring

"netdata-parent"

Parent node hostname
parent.enabledbool

true

Install parent Deployment to receive metrics from children nodes
parent.portint

19999

Parent's listen port
parent.resourcesobject

{}

Resources for the parent deployment
parent.livenessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before liveness probes are initiated
parent.livenessProbe.failureThresholdint

3

When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
parent.livenessProbe.periodSecondsint

30

How often (in seconds) to perform the liveness probe
parent.livenessProbe.successThresholdint

1

Minimum consecutive successes for the liveness probe to be considered successful after having failed
parent.livenessProbe.timeoutSecondsint

1

Number of seconds after which the liveness probe times out
parent.readinessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before readiness probes are initiated
parent.readinessProbe.failureThresholdint

3

When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
parent.readinessProbe.periodSecondsint

30

How often (in seconds) to perform the readiness probe
parent.readinessProbe.successThresholdint

1

Minimum consecutive successes for the readiness probe to be considered successful after having failed
parent.readinessProbe.timeoutSecondsint

1

Number of seconds after which the readiness probe times out
parent.securityContext.runAsUserint

201

The UID to run the container process
parent.securityContext.runAsGroupint

201

The GID to run the container process
parent.securityContext.fsGroupint

201

The supplementary group for setting permissions on volumes
parent.terminationGracePeriodSecondsint

300

Duration in seconds the pod needs to terminate gracefully
parent.nodeSelectorobject

{}

Node selector for the parent deployment
parent.tolerationslist

[]

Tolerations settings for the parent deployment
parent.affinityobject

{}

Affinity settings for the parent deployment
parent.priorityClassNamestring

""

Pod priority class name for the parent deployment
parent.envobject

{}

Set environment parameters for the parent deployment
parent.envFromlist

[]

Set environment parameters for the parent deployment from ConfigMap and/or Secrets
parent.podLabelsobject

{}

Additional labels to add to the parent pods
parent.podAnnotationsobject

{}

Additional annotations to add to the parent pods
parent.dnsPolicystring

"Default"

DNS policy for pod
parent.database.persistencebool

true

Whether the parent should use a persistent volume for the DB
parent.database.storageclassstring

"-"

The storage class for the persistent volume claim of the parent's database store, mounted to /var/cache/netdata
parent.database.volumesizestring

"5Gi"

The storage space for the PVC of the parent database
parent.alarms.persistencebool

true

Whether the parent should use a persistent volume for the alarms log
parent.alarms.storageclassstring

"-"

The storage class for the persistent volume claim of the parent's alarm log, mounted to /var/lib/netdata
parent.alarms.volumesizestring

"1Gi"

The storage space for the PVC of the parent alarm log
parent.configsobject

See values.yaml for defaults

Manage custom parent's configs
parent.claiming.enabledbool

false

Enable parent claiming for netdata cloud
parent.claiming.tokenstring

""

Claim token
parent.claiming.roomsstring

""

Comma separated list of claim rooms IDs. Empty value = All nodes room only
parent.extraVolumeMountslist

[]

Additional volumeMounts to add to the parent pods
parent.extraVolumeslist

[]

Additional volumes to add to the parent pods
parent.extraInitContainerslist

[]

Additional init containers to add to the parent pods

Child

KeyTypeDefaultDescription
child.enabledbool

true

Install child DaemonSet to gather data from nodes
child.portstring

"{{ .Values.parent.port }}"

Children's listen port
child.updateStrategyobject

{}

An update strategy to replace existing DaemonSet pods with new pods
child.resourcesobject

{}

Resources for the child DaemonSet
child.livenessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before liveness probes are initiated
child.livenessProbe.failureThresholdint

3

When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
child.livenessProbe.periodSecondsint

30

How often (in seconds) to perform the liveness probe
child.livenessProbe.successThresholdint

1

Minimum consecutive successes for the liveness probe to be considered successful after having failed
child.livenessProbe.timeoutSecondsint

1

Number of seconds after which the liveness probe times out
child.readinessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before readiness probes are initiated
child.readinessProbe.failureThresholdint

3

When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
child.readinessProbe.periodSecondsint

30

How often (in seconds) to perform the readiness probe
child.readinessProbe.successThresholdint

1

Minimum consecutive successes for the readiness probe to be considered successful after having failed
child.readinessProbe.timeoutSecondsint

1

Number of seconds after which the readiness probe times out
child.terminationGracePeriodSecondsint

30

Duration in seconds the pod needs to terminate gracefully
child.nodeSelectorobject

{}

Node selector for the child daemonsets
child.tolerationslist

See values.yaml for defaults

Tolerations settings for the child daemonsets
child.affinityobject

{}

Affinity settings for the child daemonsets
child.priorityClassNamestring

""

Pod priority class name for the child daemonsets
child.podLabelsobject

{}

Additional labels to add to the child pods
child.podAnnotationAppArmor.enabledbool

true

Whether or not to include the AppArmor security annotation
child.podAnnotationsobject

{}

Additional annotations to add to the child pods
child.hostNetworkbool

true

Usage of host networking and ports
child.dnsPolicystring

"ClusterFirstWithHostNet"

DNS policy for pod. Should be ClusterFirstWithHostNet if child.hostNetwork = true
child.persistence.enabledbool

true

Whether or not to persist /var/lib/netdata in the child.persistence.hostPath
child.persistence.hostPathstring

"/var/lib/netdata-k8s-child"

Host node directory for storing child instance data
child.podsMetadata.useKubeletbool

false

Send requests to the Kubelet /pods endpoint instead of Kubernetes API server to get pod metadata
child.podsMetadata.kubeletUrlstring

"https://localhost:10250"

Kubelet URL
child.configsobject

See values.yaml for defaults

Manage custom child's configs
child.envobject

{}

Set environment parameters for the child daemonset
child.envFromlist

[]

Set environment parameters for the child daemonset from ConfigMap and/or Secrets
child.claiming.enabledbool

false

Enable child claiming for netdata cloud
child.claiming.tokenstring

""

Claim token
child.claiming.roomsstring

""

Comma separated list of claim rooms IDs. Empty value = All nodes room only
child.extraVolumeMountslist

[]

Additional volumeMounts to add to the child pods
child.extraVolumeslist

[]

Additional volumes to add to the child pods

K8s State

KeyTypeDefaultDescription
k8sState.hostnamestring

"netdata-k8s-state"

K8s state node hostname
k8sState.enabledbool

true

Install this Deployment to gather data from K8s cluster
k8sState.portstring

"{{ .Values.parent.port }}"

Listen port
k8sState.resourcesobject

{}

Compute resources required by this Deployment
k8sState.livenessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before liveness probes are initiated
k8sState.livenessProbe.failureThresholdint

3

When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container
k8sState.livenessProbe.periodSecondsint

30

How often (in seconds) to perform the liveness probe
k8sState.livenessProbe.successThresholdint

1

Minimum consecutive successes for the liveness probe to be considered successful after having failed
k8sState.livenessProbe.timeoutSecondsint

1

Number of seconds after which the liveness probe times out
k8sState.readinessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before readiness probes are initiated
k8sState.readinessProbe.failureThresholdint

3

When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready
k8sState.readinessProbe.periodSecondsint

30

How often (in seconds) to perform the readiness probe
k8sState.readinessProbe.successThresholdint

1

Minimum consecutive successes for the readiness probe to be considered successful after having failed
k8sState.readinessProbe.timeoutSecondsint

1

Number of seconds after which the readiness probe times out
k8sState.terminationGracePeriodSecondsint

30

Duration in seconds the pod needs to terminate gracefully
k8sState.nodeSelectorobject

{}

Node selector
k8sState.tolerationslist

[]

Tolerations settings
k8sState.affinityobject

{}

Affinity settings
k8sState.priorityClassNamestring

""

Pod priority class name
k8sState.podLabelsobject

{}

Additional labels
k8sState.podAnnotationAppArmor.enabledbool

true

Whether or not to include the AppArmor security annotation
k8sState.podAnnotationsobject

{}

Additional annotations
k8sState.dnsPolicystring

"ClusterFirstWithHostNet"

DNS policy for pod
k8sState.persistence.enabledbool

true

Whether should use a persistent volume for /var/lib/netdata
k8sState.persistence.storageclassstring

"-"

The storage class for the persistent volume claim of /var/lib/netdata
k8sState.persistence.volumesizestring

"1Gi"

The storage space for the PVC of /var/lib/netdata
k8sState.envobject

{}

Set environment parameters
k8sState.envFromlist

[]

Set environment parameters from ConfigMap and/or Secrets
k8sState.configsobject

See values.yaml for defaults

Manage custom configs
k8sState.claiming.enabledbool

false

Enable claiming for netdata cloud
k8sState.claiming.tokenstring

""

Claim token
k8sState.claiming.roomsstring

""

Comma separated list of claim rooms IDs. Empty value = All nodes room only
k8sState.extraVolumeMountslist

[]

Additional volumeMounts to add to the k8sState pods
k8sState.extraVolumeslist

[]

Additional volumes to add to the k8sState pods

Netdata OpenTelemetry

KeyTypeDefaultDescription
netdataOpentelemetry.enabledbool

false

Enable the Netdata OpenTelemetry Deployment
netdataOpentelemetry.hostnamestring

"netdata-otel"

Hostname for the Netdata OpenTelemetry instance
netdataOpentelemetry.portstring

"{{ .Values.parent.port }}"

Listen port
netdataOpentelemetry.service.typestring

"ClusterIP"

Service type
netdataOpentelemetry.service.portint

4317

Service port
netdataOpentelemetry.service.annotationsobject

{}

Service annotations
netdataOpentelemetry.service.clusterIPstring

""

Cluster IP address (only used with service.type ClusterIP)
netdataOpentelemetry.service.loadBalancerIPstring

""

LoadBalancer IP address (only used with service.type LoadBalancer)
netdataOpentelemetry.service.loadBalancerSourceRangeslist

[]

Allowed source ranges for LoadBalancer (only used with service.type LoadBalancer)
netdataOpentelemetry.service.externalTrafficPolicystring

""

External traffic policy (only used with service.type LoadBalancer)
netdataOpentelemetry.service.healthCheckNodePortstring

""

Health check node port (only used with service.type LoadBalancer and external traffic policy Local)
netdataOpentelemetry.resourcesobject

{}

Compute resources required by this Deployment
netdataOpentelemetry.livenessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before liveness probes are initiated
netdataOpentelemetry.livenessProbe.failureThresholdint

3

When a liveness probe fails, Kubernetes will try failureThreshold times before giving up
netdataOpentelemetry.livenessProbe.periodSecondsint

30

How often (in seconds) to perform the liveness probe
netdataOpentelemetry.livenessProbe.successThresholdint

1

Minimum consecutive successes for the liveness probe to be considered successful after having failed
netdataOpentelemetry.livenessProbe.timeoutSecondsint

1

Number of seconds after which the liveness probe times out
netdataOpentelemetry.readinessProbe.initialDelaySecondsint

0

Number of seconds after the container has started before readiness probes are initiated
netdataOpentelemetry.readinessProbe.failureThresholdint

3

When a readiness probe fails, Kubernetes will try failureThreshold times before giving up
netdataOpentelemetry.readinessProbe.periodSecondsint

30

How often (in seconds) to perform the readiness probe
netdataOpentelemetry.readinessProbe.successThresholdint

1

Minimum consecutive successes for the readiness probe to be considered successful after having failed
netdataOpentelemetry.readinessProbe.timeoutSecondsint

1

Number of seconds after which the readiness probe times out
netdataOpentelemetry.securityContext.runAsUserint

201

The UID to run the container process
netdataOpentelemetry.securityContext.runAsGroupint

201

The GID to run the container process
netdataOpentelemetry.securityContext.fsGroupint

201

The supplementary group for setting permissions on volumes
netdataOpentelemetry.terminationGracePeriodSecondsint

30

Duration in seconds the pod needs to terminate gracefully
netdataOpentelemetry.nodeSelectorobject

{}

Node selector
netdataOpentelemetry.tolerationslist

[]

Tolerations settings
netdataOpentelemetry.affinityobject

{}

Affinity settings
netdataOpentelemetry.priorityClassNamestring

""

Pod priority class name
netdataOpentelemetry.podLabelsobject

{}

Additional labels
netdataOpentelemetry.podAnnotationAppArmor.enabledbool

true

Whether or not to include the AppArmor security annotation
netdataOpentelemetry.podAnnotationsobject

{}

Additional annotations
netdataOpentelemetry.dnsPolicystring

"Default"

DNS policy for pod
netdataOpentelemetry.persistence.enabledbool

true

Whether should use a persistent volume
netdataOpentelemetry.persistence.storageclassstring

"-"

The storage class for the persistent volume claim
netdataOpentelemetry.persistence.volumesizestring

"10Gi"

The storage space for the PVC
netdataOpentelemetry.configsobject

See values.yaml for defaults

Manage custom configs
netdataOpentelemetry.envobject

{}

Set environment parameters
netdataOpentelemetry.envFromlist

[]

Set environment parameters from ConfigMap and/or Secrets
netdataOpentelemetry.claiming.enabledbool

false

Enable claiming for netdata cloud
netdataOpentelemetry.claiming.tokenstring

""

Claim token
netdataOpentelemetry.claiming.roomsstring

""

Comma separated list of claim rooms IDs. Empty value = All nodes room only
netdataOpentelemetry.extraVolumeMountslist

[]

Additional volumeMounts
netdataOpentelemetry.extraVolumeslist

[]

Additional volumes

OpenTelemetry Collector

KeyTypeDefaultDescription
otel-collector.enabledbool

false

Set to true to enable the OpenTelemetry Collector
otel-collector.modestring

"daemonset"

Deployment mode: daemonset, deployment, or statefulset
otel-collector.image.repositorystring

"otel/opentelemetry-collector-k8s"

Image repository
otel-collector.presets.kubernetesAttributes.enabledbool

true

Enable Kubernetes attributes collection
otel-collector.presets.logsCollection.enabledbool

true

Enable logs collection from Kubernetes pods
otel-collector.presets.logsCollection.includeCollectorLogsbool

false

Include collector logs in the collection
otel-collector.configobject

See values.yaml for defaults

OpenTelemetry Collector configuration
otel-collector.resourcesobject

See values.yaml for defaults

Resources
otel-collector.serviceAccount.createbool

true

Create service account
otel-collector.clusterRole.createbool

true

Create cluster role
otel-collector.clusterRole.ruleslist

See values.yaml for defaults

Cluster role rules
otel-collector.tolerationslist

See values.yaml for defaults

Tolerations to run on all nodes

Example to set the parameters from the command line:

$ helm install ./netdata --name my-release \
--set notifications.slack.webhook_url=MySlackAPIURL \
--set notifications.slack.recipient="@MyUser MyChannel"

Another example, to set a different ingress controller.

By default kubernetes.io/ingress.class set to use nginx as an ingress controller, but you can set Traefik as your ingress controller by setting ingress.annotations.

$ helm install ./netdata --name my-release \
--set ingress.annotations=kubernetes.io/ingress.class: traefik

Alternatively to passing each variable in the command line, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install ./netdata --name my-release -f values.yaml

Tip: You can use the default values.yaml

Note:: To opt out of anonymous statistics, set the DO_NOT_TRACK environment variable to non-zero or non-empty value in parent.env / child.env configuration (e.g.,: DO_NOT_TRACK: 1) or uncomment the line in values.yml.

Configuration files

ParameterDescriptionDefault
parent.configs.netdataContents of the parent's netdata.confmemory mode = dbengine
parent.configs.streamContents of the parent's stream.confStore child data, accept all connections, and issue alarms for child data.
parent.configs.healthContents of health_alarm_notify.confEmail disabled, a sample of the required settings for Slack notifications
parent.configs.exportingContents of exporting.confDisabled
k8sState.configs.netdataContents of netdata.confNo persistent storage, no alarms
k8sState.configs.streamContents of stream.confSend metrics to the parent at netdata:{{ service.port }}
k8sState.configs.exportingContents of exporting.confDisabled
k8sState.configs.go.dContents of go.d.confOnly k8s_state enabled
k8sState.configs.go.d-k8s_stateContents of go.d/k8s_state.confk8s_state configuration
child.configs.netdataContents of the child's netdata.confNo persistent storage, no alarms, no UI
child.configs.streamContents of the child's stream.confSend metrics to the parent at netdata:{{ service.port }}
child.configs.exportingContents of the child's exporting.confDisabled
child.configs.kubeletContents of the child's go.d/k8s_kubelet.conf that drives the kubelet collectorUpdate metrics every sec, do not retry to detect the endpoint, look for the kubelet metrics at http://127.0.0.1:10255/metrics
child.configs.kubeproxyContents of the child's go.d/k8s_kubeproxy.conf that drives the kubeproxy collectorUpdate metrics every sec, do not retry to detect the endpoint, look for the coredns metrics at http://127.0.0.1:10249/metrics

To deploy additional netdata user configuration files, you will need to add similar entries to either the parent.configs or the child.configs arrays. Regardless of whether you add config files that reside directly under /etc/netdata or in a subdirectory such as /etc/netdata/go.d, you can use the already provided configurations as reference. For reference, the parent.configs the array includes an example alarm that would get triggered if the python.d example module was enabled. Whenever you pass the sensitive data to your configuration like the database credential, you can take an option to put it into the Kubernetes Secret by specifying storedType: secret in the selected configuration. By default, all the configurations will be placed in the Kubernetes configmap.

Note that in this chart's default configuration, the parent performs the health checks and triggers alarms but collects little data. As a result, the only other configuration files that might make sense to add to the parent are the alarm and alarm template definitions, under /etc/netdata/health.d.

Tip: Do pay attention to the indentation of the config file contents, as it matters for the parsing of the yaml file. Note that the first line under var: | must be indented with two more spaces relative to the preceding line:

  data: |-
config line 1 #Need those two spaces
config line 2 #No problem indenting more here

Persistent volumes

There are two different persistent volumes on parent node by design (not counting any Configmap/Secret mounts). Both can be used, but they don't have to be. Keep in mind that whenever persistent volumes for parent are not used, all the data for specific PV is lost in case of pod removal.

  1. database (/var/cache/netdata) - all metrics data is stored here. Performance of this volume affects query timings.
  2. alarms (/var/lib/netdata) - alarm log, if not persistent pod recreation will result in parent appearing as a new node in netdata.cloud (due to ./registry/ and ./cloud.d/ being removed).

In case of child instance it is a bit simpler. By default, hostPath: /var/lib/netdata-k8s-child is mounted on child in: /var/lib/netdata. You can disable it, but this option is pretty much required in a real life scenario, as without it each pod deletion will result in a new replication node for a parent.

Service discovery and supported services

Netdata's service discovery, which is installed as part of the Helm chart installation, finds what services are running on a cluster's pods, converts that into configuration files, and exports them, so they can be monitored.

Applications

Service discovery currently supports the following applications via their associated collector:

Prometheus endpoints

Service discovery supports Prometheus endpoints via the Prometheus collector.

Annotations on pods allow a fine control of the scraping process:

  • prometheus.io/scrape: The default configuration will scrape all pods and, if set to false, this annotation excludes the pod from the scraping process.
  • prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  • prometheus.io/port: Scrape the pod on the indicated port instead of the pod's declared ports.

Configure service discovery

If your cluster runs services on non-default ports or uses non-default names, you may need to configure service discovery to start collecting metrics from your services. You have to edit the default ConfigMap that is shipped with the Helmchart and deploy that to your cluster.

First, copy netdata-helmchart/sdconfig/child.yml to a new location outside the netdata-helmchart directory. The destination can be anywhere you like, but the following examples assume it resides next to the netdata-helmchart directory.

cp netdata-helmchart/sdconfig/child.yml .

Edit the new child.yml file according to your needs. See the Helm chart configuration and the file itself for details. You can then run helm install/helm upgrade with the --set-file argument to use your configured child.yml file instead of the default, changing the path if you copied it elsewhere.

helm install --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata
helm upgrade --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata

Now that you pushed an edited ConfigMap to your cluster, service discovery should find and set up metrics collection from your non-default service.

Custom pod labels and annotations

Occasionally, you will want to add specific labels and annotations to the parent and/or child pods. You might want to do this to tell other applications on the cluster how to treat your pods, or simply to categorize applications on your cluster. You can label and annotate the parent and child pods by using the podLabels and podAnnotations dictionaries under the parent and child objects, respectively.

For example, suppose you're installing Netdata on all your database nodes, and you'd like the child pods to be labeled with workload: database so that you're able to recognize this.

At the same time, say you've configured chaoskube to kill all pods annotated with chaoskube.io/enabled: true, and you'd like chaoskube to be enabled for the parent pod but not the childs.

You would do this by installing as:

$ helm install \
--set child.podLabels.workload=database \
--set 'child.podAnnotations.chaoskube\.io/enabled=false' \
--set 'parent.podAnnotations.chaoskube\.io/enabled=true' \
netdata ./netdata-helmchart/charts/netdata

Contributing

If you want to contribute, we're humbled!


Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.