Skip to main content

Amazon CloudWatch

Plugin: go.d.plugin Module: cloudwatch

Overview

Monitor AWS infrastructure through Amazon CloudWatch. This collector discovers CloudWatch metrics for a curated set of AWS services and renders them as Netdata charts, with minimal configuration.

Monitored services:

  • Amazon EC2 (compute)
  • Amazon RDS (relational databases)
  • Elastic Load Balancing -- Classic (ELB), Application (ALB), and Network (NLB) load balancers
  • Amazon S3 (object storage)
  • AWS Lambda (serverless functions)
  • Amazon SQS (message queues)
  • Amazon DynamoDB (NoSQL databases)
  • Amazon API Gateway (REST APIs)
  • AWS Step Functions (workflow orchestration)
  • NAT Gateway (VPC networking)
  • Amazon Kinesis Data Streams (streaming ingestion)
  • Amazon Data Firehose (delivery streams)
  • Amazon SNS (pub/sub messaging)
  • Amazon EBS (block storage volumes)
  • Amazon EFS (elastic file systems)
  • Amazon ECS (container services)
  • Amazon ElastiCache (in-memory cache)
  • Amazon OpenSearch Service (search and analytics)
  • Amazon DocumentDB (document database)
  • Amazon Redshift (data warehouse)
  • Amazon MSK (Kafka streaming)
  • Amazon CloudFront (content delivery network / CDN)
  • AWS Auto Scaling (EC2 Auto Scaling group capacity)
  • Amazon Bedrock (foundation-model invocations and tokens)
  • Amazon EventBridge (event rules)
  • AWS Site-to-Site VPN (VPN connections)
  • Amazon EKS (Kubernetes control plane: API server, scheduler, etcd)

Each service is defined by a profile -- a YAML file declaring its CloudWatch namespace, the metrics and statistics to collect, and a chart template -- so coverage can be extended without code changes.

Need a service that isn't listed?

Request a profile -- it's just a YAML file, no code change. Open a feature request and attach the service's CloudWatch metric schema, captured with this read-only command. It prints only metric and dimension names (no resource IDs, ARNs, or metric values), so the output is safe to share:

aws cloudwatch list-metrics --namespace "AWS/<Service>" --region <your-region> --output json \
| jq -c '[.Metrics[] | {metric: .MetricName, dimensions: ([.Dimensions[].Name] | sort)}] | unique'

Replace AWS/<Service> with the service namespace (for example AWS/AmazonMQ) and <your-region> with a Region where the service runs. The exact metrics and dimensions in the output are what we need to author a correct profile quickly.

The collector discovers available metrics with the CloudWatch ListMetrics API (one paginated call per selected service profile and region; the collector then keeps only the metrics whose dimension set matches each profile's instance dimensions) and queries them in batches with the GetMetricData API. Account identity is resolved once at startup via sts:GetCallerIdentity. Authentication uses the AWS SDK default credential chain, static access keys, or an assumed IAM role.

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

The configured IAM identity requires cloudwatch:ListMetrics, cloudwatch:GetMetricData, and sts:GetCallerIdentity. When auth.mode is assume_role, it also requires sts:AssumeRole.

Default Behavior

Auto-Detection

With profiles.mode: auto (default), the collector discovers metrics for all built-in service profiles across the configured regions and emits charts only for services that have live metrics. Discovery is cached and refreshed every discovery.refresh_every seconds (default 300).

Limits

  • Minimum collection interval is 60 seconds (CloudWatch's minimum metric period).
  • CloudWatch publishes metrics with a delay; the effective query offset is max(query_offset, period), so long-period metrics (such as the daily S3 storage metrics) are inherently about one period behind.
  • There is no cap on discovered resources; a warning is logged at 1000 or more discovered instances (collection is never truncated).
  • Resources are labeled by their identifying CloudWatch dimensions (for example EC2 instance_id), not by their Name tag or other resource tags; tag-based naming and filtering are not currently supported. (A dimension that is constant across resources, such as CloudFront's Region=Global, is used to match and query metrics but is not turned into a label.)

Performance Impact

AWS bills CloudWatch API usage. GetMetricData (the metric queries) is the cost driver, billed per metric requested; ListMetrics discovery falls under the free tier and then costs a fraction as much. As a rough anchor, GetMetricData is billed at roughly $0.01 per 1,000 metrics requested -- confirm current CloudWatch pricing for your region. Each combination of instance, metric, and statistic is one billed query, run once per its own period (not once per collection cycle), so cost scales with discovered instances, metrics, statistics, and their periods -- not with update_every. The collector already minimizes it with curated per-service profiles, single-statistic defaults, exact dimension filtering, cached discovery, and recently_active_only. To reduce it further, restrict services with profiles.mode: exact or narrow regions.

Setup

You can configure the cloudwatch collector in two ways:

MethodBest forHow to
UIFast setup without editing filesGo to Nodes → Configure this node → Collectors → Jobs, search for cloudwatch, then click + to add a job.
FileIf you prefer configuring via file, or need to automate deployments (e.g., with Ansible)Edit go.d/cloudwatch.conf and add a job.
important

UI configuration requires paid Netdata Cloud plan.

Prerequisites

Create an AWS IAM identity with CloudWatch read access

The collector needs an IAM identity (user or role) allowed to read CloudWatch metrics and resolve the AWS account identity.

Attach a policy such as:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricData",
"sts:GetCallerIdentity"
],
"Resource": "*"
}
]
}

cloudwatch:ListMetrics, cloudwatch:GetMetricData, and sts:GetCallerIdentity do not support resource-level permissions, so "Resource": "*" is required -- this is already least-privilege for these read actions. In assume_role mode, scope sts:AssumeRole to the specific role ARN(s) rather than *.

Then provide credentials with one of the auth.mode options:

  • default -- the AWS SDK default credential chain (environment variables, shared config/credentials files, EC2 instance profile, or EKS IRSA). Recommended when Netdata runs inside AWS.
  • access_key -- a static access key ID and secret access key.
  • assume_role -- assume an IAM role by ARN (add sts:AssumeRole to the base identity's policy).

Configuration

Options

The following options can be defined globally or per job.

Profile file locations:

TypePath
Stock profiles/usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/
User overrides/etc/netdata/go.d/cloudwatch.profiles/

A user profile file with the same basename as a stock profile overrides it.

Config options
GroupOptionDescriptionDefaultRequired
Collectionupdate_everyData collection interval (seconds). Must be at least 60 (CloudWatch's minimum period).60no
autodetection_retryRecheck interval (seconds) when the job fails to start. Default 0 means no retry; set a positive value to keep retrying.0no
regionsList of AWS regions to collect from. At least one region is required; all regions must be in one AWS partition.yes
query_offsetSeconds subtracted from the current time when building query windows, to account for CloudWatch publish latency. The effective offset is max(query_offset, period).600no
timeoutTimeout for AWS API requests (seconds).30no
Authenticationauth.modeAuthentication method: default, access_key, or assume_role.defaultyes
auth.mode_access_key.access_key_idAWS access key ID (used in access_key mode).no
auth.mode_access_key.secret_access_keyAWS secret access key (used in access_key mode).no
auth.mode_access_key.session_tokenOptional AWS session token for temporary credentials (used in access_key mode).no
auth.mode_assume_role.rolesA single-element list with the IAM role to assume (used in assume_role mode); each entry has role_arn and an optional external_id. Exactly one role is supported per job -- to monitor multiple accounts, run one job per account/role.no
Profilesprofiles.modeProfile selection: auto (default service profiles), exact (only the profiles you list, by basename), or combined (default profiles plus deep-grain per-target-group / per-operation / per-request-filter profiles).autono
profiles.mode_exact.entriesList of profiles to collect by basename (required when profiles.mode is exact). Each entry has a name, e.g. ec2 or alb_target.no
Discoverydiscovery.refresh_everyHow often (seconds) to re-discover metrics. Minimum 60.300no
discovery.recently_active_onlyList only metrics active in the last 3 hours. Automatically disabled for metrics whose period exceeds 3 hours (such as the daily S3 storage metrics).yesno
Virtual NodevnodeAssociates this data collection job with a Virtual Node.no
auth.mode

Determines how the collector authenticates with AWS.

ModeWhen to useRequired options
defaultRunning inside AWS, or with credentials in the environment / shared configNone
access_keyExplicit static credentialsaccess_key_id, secret_access_key
assume_roleAssume an IAM role (cross-account or scoped access)roles[].role_arn

via UI

Configure the cloudwatch collector from the Netdata web interface:

  1. Go to Nodes.
  2. Select the node where you want the cloudwatch data-collection job to run and click the (Configure this node). That node will run the data collection.
  3. The Collectors → Jobs view opens by default.
  4. In the Search box, type cloudwatch (or scroll the list) to locate the cloudwatch collector.
  5. Click the + next to the cloudwatch collector to add a new job.
  6. Fill in the job fields, then click Test to verify the configuration and Submit to save.
    • Test runs the job with the provided settings and shows whether data can be collected.
    • If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/cloudwatch.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
- name: some_name1
- name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/cloudwatch.conf
Examples
Default credentials, single region

Collect from us-east-1 using the AWS SDK default credential chain. Best when Netdata runs on an EC2 instance or in EKS with an attached IAM role.

Config
jobs:
- name: default_credentials
regions:
- us-east-1
auth:
mode: default

Static access key, multiple regions

Collect from two regions using a static access key.

Config
jobs:
- name: access_key
regions:
- us-east-1
- eu-west-1
auth:
mode: access_key
mode_access_key:
access_key_id: "your-access-key-id"
secret_access_key: "your-secret-access-key"

Assume an IAM role

Assume a CloudWatch read-only role, for example to collect from another account.

Config
jobs:
- name: assume_role
regions:
- us-east-1
auth:
mode: assume_role
mode_assume_role:
roles:
- role_arn: "arn:aws:iam::123456789012:role/netdata-cloudwatch"
# external_id: "your-external-id" # add if the role's trust policy requires it

Specific services only

Collect only EC2 and RDS instead of auto-discovering all built-in services.

Config
jobs:
- name: ec2_rds
regions:
- us-east-1
profiles:
mode: exact
mode_exact:
entries:
- name: ec2
- name: rds
auth:
mode: default

All services including deep-grain profiles

Use combined mode to also collect the opt-in deep-grain profiles (ALB target groups, DynamoDB operations, S3 request metrics).

Config
jobs:
- name: combined
regions:
- us-east-1
profiles:
mode: combined
auth:
mode: default

Alerts

There are no alerts configured by default for this integration.

Metrics

Charts are generated at runtime from the active service profiles. Each discovered AWS resource becomes a chart instance identified by its account_id, region, and the service's own dimensions (for example instance_id for EC2, or bucket_name and storage_type for S3); its contexts live under the cloudwatch. namespace. All CloudWatch metrics appear on the node running the collector -- individual AWS resources are distinguished by labels, not as separate Netdata nodes. Because CloudWatch publishes with a delay, allow a few minutes for the first data points.

Key terms:

  • Namespace -- AWS's grouping for a service's metrics (e.g. AWS/EC2).
  • Dimension -- a name/value pair that identifies a resource within a namespace (e.g. InstanceId).
  • Statistic -- the CloudWatch aggregation applied per period (e.g. average, sum, maximum).
  • Profile -- the Netdata YAML file that maps a namespace's metrics to charts.
  • Partition -- an isolated AWS region group (standard aws, GovCloud aws-us-gov, or China aws-cn); all of a job's regions must share one.

The built-in profiles ship the following charts by default. Each service links to its profile -- the authoritative definition of its exact metrics, statistics, dimensions, and charts:

ProfileMetric prefixDescription
Amazon EC2cloudwatch.ec2.*CPU utilization, network traffic, disk operations, status-check failures
Amazon RDScloudwatch.rds.*CPU utilization, database connections, freeable memory, free storage space, disk throughput, IOPS, latency
Classic Load Balancer (ELB)cloudwatch.elb.*request count, backend and load-balancer response codes, backend connection errors, latency, host count, spillover count
Application Load Balancer (ALB)cloudwatch.alb.*request count, target and load-balancer response codes, connection rate, active connections, processed traffic, target response time, consumed LCUs
Network Load Balancer (NLB)cloudwatch.nlb.*active and new flow counts, processed bytes and packets, consumed LCUs, TCP resets
Amazon S3cloudwatch.s3.*bucket size, number of objects (daily storage metrics)
AWS Lambdacloudwatch.lambda.*invocations, errors and throttles, duration
Amazon SQScloudwatch.sqs.*message throughput, empty receives, queue depth, age of oldest message, sent message size
Amazon DynamoDBcloudwatch.dynamodb.*consumed and provisioned capacity, throttle events
Amazon API Gatewaycloudwatch.api_gateway.*requests, errors, latency
AWS Step Functionscloudwatch.step_functions.*executions, throttled events, execution time
NAT Gatewaycloudwatch.nat_gateway.*traffic, active connections, connection rate, errors, idle timeouts
Amazon Kinesis Data Streamscloudwatch.kinesis.*data throughput, records, GetRecords iterator age, operation latency, throughput exceeded, PutRecords rejected
Amazon Data Firehosecloudwatch.firehose.*records, throughput, put requests, throttled records, S3 delivery freshness and success
Amazon SNScloudwatch.sns.*messages published, notifications, published message size
Amazon EBScloudwatch.ebs.*volume throughput, IOPS, queue length, idle time, burst balance
Amazon EFScloudwatch.efs.*I/O throughput, metered vs permitted throughput, percent I/O limit, burst credit balance, client connections
Amazon ECScloudwatch.ecs.*service utilization, EBS filesystem utilization, live task count
Amazon ElastiCachecloudwatch.elasticache.*CPU utilization, memory, database memory usage, current and new connections, cache hits and misses, evictions, network traffic
Amazon OpenSearch Servicecloudwatch.opensearch.*cluster status, index writes blocked, nodes, CPU utilization, JVM memory pressure, free storage space, search and indexing rate, search and indexing latency
Amazon DocumentDBcloudwatch.docdb.*CPU utilization, freeable memory, connections, buffer cache hit ratio, disk IOPS, latency, throughput, replica lag, cursors timed out
Amazon Redshiftcloudwatch.redshift.*health, CPU utilization, disk space used, database connections, disk IOPS, throughput, network throughput
Amazon MSKcloudwatch.msk.*broker throughput, messages in, CPU, disk used, memory, partitions, connections
Amazon CloudFrontcloudwatch.cloudfront.*requests, downloaded and uploaded traffic, total/4xx/5xx error rates
AWS Auto Scalingcloudwatch.auto_scaling.*group sizing (min/max/desired/total) and instances by state (in-service, pending, standby, terminating)
Amazon Bedrockcloudwatch.bedrock.*invocations, invocation errors, token throughput, invocation and time-to-first-token latency
Amazon EventBridgecloudwatch.eventbridge.*target invocations, rule activity (matched events, triggered rules), ingestion-to-invocation latency
AWS Site-to-Site VPNcloudwatch.vpn.*tunnel traffic (in/out) and tunnel state (fraction of tunnels up)
Amazon EKScloudwatch.eks.*control-plane health: API server request rate, errors, p99 latency, and in-flight requests; etcd database size; scheduler pending pods and scheduling attempts

Each profile also carries optional metrics that are commented out to keep cost and cardinality low; uncomment a metric and its matching chart, then restart the Netdata Agent (profiles are loaded once per go.d process and cached). Stock profiles are shipped at /usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/. To customize a service, copy its profile into /etc/netdata/go.d/cloudwatch.profiles/ (keep the same filename) and edit it -- a user profile fully replaces the stock one of the same name -- then restart the Agent.

With profiles.mode: combined, these deep-grain profiles are collected in addition to the defaults:

ProfileMetric prefixDescription
ALB Target Groupscloudwatch.alb_target.*per-target-group host count, requests per target, response time, response codes, connection errors
DynamoDB Operationscloudwatch.dynamodb_operation.*per-operation successful request latency, system errors, throttled requests, returned items
S3 Request Metricscloudwatch.s3_requests.*requests, request errors, request latency, request data transfer

These deep-grain profiles are the highest-cardinality data the collector emits. S3 Request Metrics additionally require per-bucket request-metrics configuration in AWS and are billed at CloudWatch custom-metric rates; they collect nothing until enabled on the bucket.

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the cloudwatch collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
  • Switch to the netdata user.

    sudo -u netdata -s
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m cloudwatch

    To debug a specific job:

    ./go.d.plugin -d -m cloudwatch -j jobName

Getting Logs

If you're encountering problems with the cloudwatch collector, follow these steps to retrieve logs and identify potential issues:

  • Run the command specific to your system (systemd, non-systemd, or Docker container).
  • Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep cloudwatch

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

grep cloudwatch /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

docker logs netdata 2>&1 | grep cloudwatch

No metrics are collected

Check the following:

  • Permissions -- the IAM identity allows cloudwatch:ListMetrics, cloudwatch:GetMetricData, and sts:GetCallerIdentity (plus sts:AssumeRole in assume_role mode).
  • Regions -- the regions list includes the regions where your resources run. Some services are global and report to a single region: Amazon CloudFront publishes its CloudWatch metrics only in us-east-1 (with a constant Region=Global), so regions must include us-east-1 to collect it.
  • Resources are active -- confirm in the AWS CloudWatch console that the resources are publishing metrics.
  • Collector logs -- check for authentication or API errors:
    # systemd
    journalctl -u netdata --namespace=netdata --grep cloudwatch --since "5 minutes ago"
    # non-systemd
    grep cloudwatch /var/log/netdata/collector.log

Missing metrics for some services

  • Profile mode -- ensure profiles.mode: auto (default), or that the service's profile basename is listed under profiles.mode_exact.entries.
  • Daily metrics -- S3 storage metrics are published once per day. They are inherently delayed by about a day, and recently_active_only is automatically disabled for them.
  • Resource activity -- some metrics only appear when the resource is actively processing data (for example, EventBridge and Bedrock publish a metric only when its value is non-zero).
  • Auto Scaling group metrics -- Auto Scaling group metrics (cloudwatch.auto_scaling.*) are not published until group-metrics collection is enabled on the group (aws autoscaling enable-metrics-collection --granularity 1Minute). Amazon EKS managed node groups have it enabled by default.
  • EKS control-plane metrics -- EKS control-plane metrics (cloudwatch.eks.*) are published to the AWS/EKS namespace automatically, at no additional EKS charge, only for clusters running Kubernetes 1.28 or later; older clusters do not report them. These are distinct from Container Insights / the CloudWatch Observability add-on (agent-based, billed separately).

Charts have gaps or incomplete data

CloudWatch publishes metrics with a delay.

  • The collector uses query_offset (default 600 seconds), and the effective offset is at least one full metric period.
  • If charts still have gaps, increase query_offset.

Access denied or authentication errors

  • Verify the credentials selected by auth.mode are valid and not expired.
  • For assume_role, confirm the base identity is allowed to sts:AssumeRole the target role and that the role's trust policy permits it.
  • For AWS GovCloud or China partitions, ensure every region in regions belongs to the same partition.

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.