Consul

Plugin: go.d.plugin Module: consul

Overview

This collector monitors key metrics of Consul Agents: transaction timings, leadership changes, memory usage and more.

It periodically sends HTTP requests to Consul REST API.

Used endpoints:

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

This collector discovers instances running on the local host, that provide metrics on port 8500.

On startup, it tries to collect metrics from:

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

The set of metrics depends on the Consul Agent mode.

Per Consul instance

These metrics refer to the entire monitored application.

This scope has no labels.

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.client_rpc_requests_rate	rpc	requests/s	•	•	•
consul.client_rpc_requests_exceeded_rate	exceeded	requests/s	•	•	•
consul.client_rpc_requests_failed_rate	failed	requests/s	•	•	•
consul.memory_allocated	allocated	bytes	•	•	•
consul.memory_sys	sys	bytes	•	•	•
consul.gc_pause_time	gc_pause	seconds	•	•	•
consul.kvs_apply_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.kvs_apply_operations_rate	kvs_apply	ops/s	•	•
consul.txn_apply_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.txn_apply_operations_rate	txn_apply	ops/s	•	•
consul.autopilot_health_status	healthy, unhealthy	status	•	•
consul.autopilot_failure_tolerance	failure_tolerance	servers	•	•
consul.autopilot_server_health_status	healthy, unhealthy	status	•	•
consul.autopilot_server_stable_time	stable	seconds	•	•
consul.autopilot_server_serf_status	active, failed, left, none	status	•	•
consul.autopilot_server_voter_status	voter, not_voter	status	•	•
consul.network_lan_rtt	min, max, avg	ms	•	•
consul.raft_commit_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•
consul.raft_commits_rate	commits	commits/s	•
consul.raft_leader_last_contact_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•
consul.raft_leader_oldest_log_age	oldest_log_age	seconds	•
consul.raft_follower_last_contact_leader_time	leader_last_contact	ms		•
consul.raft_rpc_install_snapshot_time	quantile_0.5, quantile_0.9, quantile_0.99	ms		•
consul.raft_leader_elections_rate	leader	elections/s	•	•
consul.raft_leadership_transitions_rate	leadership	transitions/s	•	•
consul.server_leadership_status	leader, not_leader	status	•	•
consul.raft_thread_main_saturation_perc	quantile_0.5, quantile_0.9, quantile_0.99	percentage	•	•
consul.raft_thread_fsm_saturation_perc	quantile_0.5, quantile_0.9, quantile_0.99	percentage	•	•
consul.raft_fsm_last_restore_duration	last_restore_duration	ms	•	•
consul.raft_boltdb_freelist_bytes	freelist	bytes	•	•
consul.raft_boltdb_logs_per_batch_rate	written	logs/s	•	•
consul.raft_boltdb_store_logs_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.license_expiration_time	license_expiration	seconds	•	•	•

Per node check

Metrics about checks on Node level.

Labels:

Label	Description
datacenter	Datacenter Identifier
node_name	The node's name
check_name	The check's name

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.node_health_check_status	passing, maintenance, warning, critical	status	•	•	•

Per service check

Metrics about checks at a Service level.

Labels:

Label	Description
datacenter	Datacenter Identifier
node_name	The node's name
check_name	The check's name
service_name	The service's name

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.service_health_check_status	passing, maintenance, warning, critical	status	•	•	•

Alerts

The following alerts are available:

Alert name	On metric	Description
consul_node_health_check_status	consul.node_health_check_status	node health check ${label:check_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_service_health_check_status	consul.service_health_check_status	service health check ${label:check_name} for service ${label:service_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_exceeded	consul.client_rpc_requests_exceeded_rate	number of rate-limited RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_failed	consul.client_rpc_requests_failed_rate	number of failed RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_gc_pause_time	consul.gc_pause_time	time spent in stop-the-world garbage collection pauses on server ${label:node_name} datacenter ${label:datacenter}
consul_autopilot_health_status	consul.autopilot_health_status	datacenter ${label:datacenter} cluster is unhealthy as reported by server ${label:node_name}
consul_autopilot_server_health_status	consul.autopilot_server_health_status	server ${label:node_name} from datacenter ${label:datacenter} is unhealthy
consul_raft_leader_last_contact_time	consul.raft_leader_last_contact_time	median time elapsed since leader server ${label:node_name} datacenter ${label:datacenter} was last able to contact the follower nodes
consul_raft_leadership_transitions	consul.raft_leadership_transitions_rate	there has been a leadership change and server ${label:node_name} datacenter ${label:datacenter} has become the leader
consul_raft_thread_main_saturation	consul.raft_thread_main_saturation_perc	average saturation of the main Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_raft_thread_fsm_saturation	consul.raft_thread_fsm_saturation_perc	average saturation of the FSM Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_license_expiration_time	consul.license_expiration_time	Consul Enterprise licence expiration time on node ${label:node_name} datacenter ${label:datacenter}

Setup

You can configure the consul collector in two ways:

Method	Best for	How to
UI	Fast setup without editing files	Go to Nodes → Configure this node → Collectors → Jobs, search for consul, then click + to add a job.
File	If you prefer configuring via file, or need to automate deployments (e.g., with Ansible)	Edit `go.d/consul.conf` and add a job.

important

UI configuration requires paid Netdata Cloud plan.

Prerequisites

Enable Prometheus telemetry

Enable telemetry on your Consul Agent, by increasing the value of prometheus_retention_time from 0.

Add required ACLs to Token

Required only if authentication is enabled.

ACL	Endpoint
`operator:read`	autopilot health status
`node:read`	checks
`agent:read`	configuration, metrics, and lan coordinates

Configuration

Options

The following options can be defined globally: update_every, autodetection_retry.

All options

Group	Option	Description	Default	Required
Collection	update_every	Data collection interval (seconds).	1	no
	autodetection_retry	Autodetection retry interval (seconds). Set 0 to disable.	0	no
Target	url	Consul HTTP API URL.	http://localhost:8500	yes
	timeout	HTTP request timeout (seconds).	1	no
HTTP Auth	acl_token	Consul ACL token sent with every request (`X-Consul-Token` header).		no
	username	Username for Basic HTTP authentication.		no
	password	Password for Basic HTTP authentication.		no
	bearer_token_file	Path to a file containing a bearer token (used for `Authorization: Bearer`).		no
TLS	tls_skip_verify	Skip TLS certificate and hostname verification (insecure).	no	no
	tls_ca	Path to CA bundle used to validate the server certificate.		no
	tls_cert	Path to client TLS certificate (for mTLS).		no
	tls_key	Path to client TLS private key (for mTLS).		no
Proxy	proxy_url	HTTP proxy URL.		no
	proxy_username	Username for proxy Basic HTTP authentication.		no
	proxy_password	Password for proxy Basic HTTP authentication.		no
Request	method	HTTP method to use.	GET	no
	body	Request body (e.g., for POST/PUT).		no
	headers	Additional HTTP headers (one per line as key: value).		no
	not_follow_redirects	Do not follow HTTP redirects.	no	no
	force_http2	Force HTTP/2 (including h2c over TCP).	no	no
Virtual Node	vnode	Associates this data collection job with a Virtual Node.		no

via UI

Configure the consul collector from the Netdata web interface:

Go to Nodes.
Select the node where you want the consul data-collection job to run and click the ⚙ (Configure this node). That node will run the data collection.
The Collectors → Jobs view opens by default.
In the Search box, type consul (or scroll the list) to locate the consul collector.
Click the + next to the consul collector to add a new job.
Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/consul.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/consul.conf

Examples

Basic

An example configuration.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

Basic HTTP auth

Local server with basic HTTP authentication.

Config

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
    username: foo
    password: bar

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

Config

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

  - name: remote
    url: http://203.0.113.10:8500
    acl_token: "ada7f751-f654-8872-7f93-498e799158b6"

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the consul collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
```
sudo -u netdata -s
```
Run the go.d.plugin to debug the collector:
```
./go.d.plugin -d -m consul
```
To debug a specific job:
```
./go.d.plugin -d -m consul -j jobName
```

Getting Logs

If you're encountering problems with the consul collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep consul

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

grep consul /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

docker logs netdata 2>&1 | grep consul

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

Overview​

Default Behavior​

Auto-Detection​

Limits​

Performance Impact​

Metrics​

Per Consul instance​

Per node check​

Per service check​

Alerts​

Setup​

Prerequisites​

Enable Prometheus telemetry​

Add required ACLs to Token​

Configuration​

Options​

via UI​

via File​

Examples​

Basic​

Basic HTTP auth​

Multi-instance​

Troubleshooting​

Debug Mode​

Getting Logs​

System with systemd​

System without systemd​

Docker Container​

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Metrics

Per Consul instance

Per node check

Per service check

Alerts

Setup

Prerequisites

Enable Prometheus telemetry

Add required ACLs to Token

Configuration

Options

via UI

via File

Examples

Basic

Basic HTTP auth

Multi-instance

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container