Amazon CloudWatch
Plugin: go.d.plugin Module: cloudwatch
Overview
Monitor AWS infrastructure through Amazon CloudWatch. This collector discovers CloudWatch metrics for a curated set of AWS services and renders them as Netdata charts, with minimal configuration.
Monitored services:
- Amazon EC2 (compute)
- Amazon RDS (relational databases)
- Elastic Load Balancing -- Classic (ELB), Application (ALB), and Network (NLB) load balancers
- Amazon S3 (object storage)
- AWS Lambda (serverless functions)
- Amazon SQS (message queues)
- Amazon DynamoDB (NoSQL databases)
- Amazon API Gateway (REST APIs)
- AWS Step Functions (workflow orchestration)
- NAT Gateway (VPC networking)
- Amazon Kinesis Data Streams (streaming ingestion)
- Amazon Data Firehose (delivery streams)
- Amazon SNS (pub/sub messaging)
- Amazon EBS (block storage volumes)
- Amazon EFS (elastic file systems)
- Amazon ECS (container services)
- Amazon ElastiCache (in-memory cache)
- Amazon OpenSearch Service (search and analytics)
- Amazon DocumentDB (document database)
- Amazon Redshift (data warehouse)
- Amazon MSK (Kafka streaming)
- Amazon CloudFront (content delivery network / CDN)
- AWS Auto Scaling (EC2 Auto Scaling group capacity)
- Amazon Bedrock (foundation-model invocations and tokens)
- Amazon EventBridge (event rules)
- AWS Site-to-Site VPN (VPN connections)
- Amazon EKS (Kubernetes control plane: API server, scheduler, etcd)
Each service is defined by a profile -- a YAML file declaring its CloudWatch namespace, the metrics and statistics to collect, and a chart template -- so coverage can be extended without code changes.
Request a profile -- it's just a YAML file, no code change. Open a feature request and attach the service's CloudWatch metric schema, captured with this read-only command. It prints only metric and dimension names (no resource IDs, ARNs, or metric values), so the output is safe to share:
aws cloudwatch list-metrics --namespace "AWS/<Service>" --region <your-region> --output json \
| jq -c '[.Metrics[] | {metric: .MetricName, dimensions: ([.Dimensions[].Name] | sort)}] | unique'
Replace AWS/<Service> with the service namespace (for example AWS/AmazonMQ) and <your-region> with a Region where the service runs. The exact metrics and dimensions in the output are what we need to author a correct profile quickly.
The collector discovers available metrics with the CloudWatch ListMetrics API (one paginated call per selected service profile and region; the collector then keeps only the metrics whose dimension set matches each profile's instance dimensions) and queries them in batches with the GetMetricData API. Account identity is resolved once at startup via sts:GetCallerIdentity. Authentication uses the AWS SDK default credential chain, static access keys, or an assumed IAM role.
This collector is supported on all platforms.
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
The configured IAM identity requires cloudwatch:ListMetrics, cloudwatch:GetMetricData, and sts:GetCallerIdentity. When auth.mode is assume_role, it also requires sts:AssumeRole.
Default Behavior
Auto-Detection
With profiles.mode: auto (default), the collector discovers metrics for all built-in service profiles across the configured regions and emits charts only for services that have live metrics. Discovery is cached and refreshed every discovery.refresh_every seconds (default 300).
Limits
- Minimum collection interval is 60 seconds (CloudWatch's minimum metric period).
- CloudWatch publishes metrics with a delay; the effective query offset is
max(query_offset, period), so long-period metrics (such as the daily S3 storage metrics) are inherently about one period behind. - There is no cap on discovered resources; a warning is logged at 1000 or more discovered instances (collection is never truncated).
- Resources are labeled by their identifying CloudWatch dimensions (for example EC2
instance_id), not by theirNametag or other resource tags; tag-based naming and filtering are not currently supported. (A dimension that is constant across resources, such as CloudFront'sRegion=Global, is used to match and query metrics but is not turned into a label.)
Performance Impact
AWS bills CloudWatch API usage. GetMetricData (the metric queries) is the cost driver, billed per metric requested; ListMetrics discovery falls under the free tier and then costs a fraction as much. As a rough anchor, GetMetricData is billed at roughly $0.01 per 1,000 metrics requested -- confirm current CloudWatch pricing for your region. Each combination of instance, metric, and statistic is one billed query, run once per its own period (not once per collection cycle), so cost scales with discovered instances, metrics, statistics, and their periods -- not with update_every. The collector already minimizes it with curated per-service profiles, single-statistic defaults, exact dimension filtering, cached discovery, and recently_active_only. To reduce it further, restrict services with profiles.mode: exact or narrow regions.
Setup
You can configure the cloudwatch collector in two ways:
| Method | Best for | How to |
|---|---|---|
| UI | Fast setup without editing files | Go to Nodes → Configure this node → Collectors → Jobs, search for cloudwatch, then click + to add a job. |
| File | If you prefer configuring via file, or need to automate deployments (e.g., with Ansible) | Edit go.d/cloudwatch.conf and add a job. |
UI configuration requires paid Netdata Cloud plan.
Prerequisites
Create an AWS IAM identity with CloudWatch read access
The collector needs an IAM identity (user or role) allowed to read CloudWatch metrics and resolve the AWS account identity.
Attach a policy such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricData",
"sts:GetCallerIdentity"
],
"Resource": "*"
}
]
}
cloudwatch:ListMetrics, cloudwatch:GetMetricData, and sts:GetCallerIdentity do not support resource-level permissions, so "Resource": "*" is required -- this is already least-privilege for these read actions. In assume_role mode, scope sts:AssumeRole to the specific role ARN(s) rather than *.
Then provide credentials with one of the auth.mode options:
default-- the AWS SDK default credential chain (environment variables, shared config/credentials files, EC2 instance profile, or EKS IRSA). Recommended when Netdata runs inside AWS.access_key-- a static access key ID and secret access key.assume_role-- assume an IAM role by ARN (addsts:AssumeRoleto the base identity's policy).
Configuration
Options
The following options can be defined globally or per job.
Profile file locations:
| Type | Path |
|---|---|
| Stock profiles | /usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/ |
| User overrides | /etc/netdata/go.d/cloudwatch.profiles/ |
A user profile file with the same basename as a stock profile overrides it.
Config options
| Group | Option | Description | Default | Required |
|---|---|---|---|---|
| Collection | update_every | Data collection interval (seconds). Must be at least 60 (CloudWatch's minimum period). | 60 | no |
| autodetection_retry | Recheck interval (seconds) when the job fails to start. Default 0 means no retry; set a positive value to keep retrying. | 0 | no | |
| regions | List of AWS regions to collect from. At least one region is required; all regions must be in one AWS partition. | yes | ||
| query_offset | Seconds subtracted from the current time when building query windows, to account for CloudWatch publish latency. The effective offset is max(query_offset, period). | 600 | no | |
| timeout | Timeout for AWS API requests (seconds). | 30 | no | |
| Authentication | auth.mode | Authentication method: default, access_key, or assume_role. | default | yes |
| auth.mode_access_key.access_key_id | AWS access key ID (used in access_key mode). | no | ||
| auth.mode_access_key.secret_access_key | AWS secret access key (used in access_key mode). | no | ||
| auth.mode_access_key.session_token | Optional AWS session token for temporary credentials (used in access_key mode). | no | ||
| auth.mode_assume_role.roles | A single-element list with the IAM role to assume (used in assume_role mode); each entry has role_arn and an optional external_id. Exactly one role is supported per job -- to monitor multiple accounts, run one job per account/role. | no | ||
| Profiles | profiles.mode | Profile selection: auto (default service profiles), exact (only the profiles you list, by basename), or combined (default profiles plus deep-grain per-target-group / per-operation / per-request-filter profiles). | auto | no |
| profiles.mode_exact.entries | List of profiles to collect by basename (required when profiles.mode is exact). Each entry has a name, e.g. ec2 or alb_target. | no | ||
| Discovery | discovery.refresh_every | How often (seconds) to re-discover metrics. Minimum 60. | 300 | no |
| discovery.recently_active_only | List only metrics active in the last 3 hours. Automatically disabled for metrics whose period exceeds 3 hours (such as the daily S3 storage metrics). | yes | no | |
| Virtual Node | vnode | Associates this data collection job with a Virtual Node. | no |
auth.mode
Determines how the collector authenticates with AWS.
| Mode | When to use | Required options |
|---|---|---|
default | Running inside AWS, or with credentials in the environment / shared config | None |
access_key | Explicit static credentials | access_key_id, secret_access_key |
assume_role | Assume an IAM role (cross-account or scoped access) | roles[].role_arn |
via UI
Configure the cloudwatch collector from the Netdata web interface:
- Go to Nodes.
- Select the node where you want the cloudwatch data-collection job to run and click the ⚙ (Configure this node). That node will run the data collection.
- The Collectors → Jobs view opens by default.
- In the Search box, type cloudwatch (or scroll the list) to locate the cloudwatch collector.
- Click the + next to the cloudwatch collector to add a new job.
- Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.
via File
The configuration file name for this integration is go.d/cloudwatch.conf.
The file format is YAML. Generally, the structure is:
update_every: 1
autodetection_retry: 0
jobs:
- name: some_name1
- name: some_name2
You can edit the configuration file using the edit-config script from the
Netdata config directory.
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/cloudwatch.conf
Examples
Default credentials, single region
Collect from us-east-1 using the AWS SDK default credential chain. Best when Netdata runs on an EC2 instance or in EKS with an attached IAM role.
Config
jobs:
- name: default_credentials
regions:
- us-east-1
auth:
mode: default
Static access key, multiple regions
Collect from two regions using a static access key.
Config
jobs:
- name: access_key
regions:
- us-east-1
- eu-west-1
auth:
mode: access_key
mode_access_key:
access_key_id: "your-access-key-id"
secret_access_key: "your-secret-access-key"
Assume an IAM role
Assume a CloudWatch read-only role, for example to collect from another account.
Config
jobs:
- name: assume_role
regions:
- us-east-1
auth:
mode: assume_role
mode_assume_role:
roles:
- role_arn: "arn:aws:iam::123456789012:role/netdata-cloudwatch"
# external_id: "your-external-id" # add if the role's trust policy requires it
Specific services only
Collect only EC2 and RDS instead of auto-discovering all built-in services.
Config
jobs:
- name: ec2_rds
regions:
- us-east-1
profiles:
mode: exact
mode_exact:
entries:
- name: ec2
- name: rds
auth:
mode: default
All services including deep-grain profiles
Use combined mode to also collect the opt-in deep-grain profiles (ALB target groups, DynamoDB operations, S3 request metrics).
Config
jobs:
- name: combined
regions:
- us-east-1
profiles:
mode: combined
auth:
mode: default
Alerts
There are no alerts configured by default for this integration.
Metrics
Charts are generated at runtime from the active service profiles. Each discovered AWS resource becomes a chart instance identified by its account_id, region, and the service's own dimensions (for example instance_id for EC2, or bucket_name and storage_type for S3); its contexts live under the cloudwatch. namespace. All CloudWatch metrics appear on the node running the collector -- individual AWS resources are distinguished by labels, not as separate Netdata nodes. Because CloudWatch publishes with a delay, allow a few minutes for the first data points.
Key terms:
- Namespace -- AWS's grouping for a service's metrics (e.g.
AWS/EC2). - Dimension -- a name/value pair that identifies a resource within a namespace (e.g.
InstanceId). - Statistic -- the CloudWatch aggregation applied per period (e.g. average, sum, maximum).
- Profile -- the Netdata YAML file that maps a namespace's metrics to charts.
- Partition -- an isolated AWS region group (standard
aws, GovCloudaws-us-gov, or Chinaaws-cn); all of a job's regions must share one.
The built-in profiles ship the following charts by default. Each service links to its profile -- the authoritative definition of its exact metrics, statistics, dimensions, and charts:
| Profile | Metric prefix | Description |
|---|---|---|
| Amazon EC2 | cloudwatch.ec2.* | CPU utilization, network traffic, disk operations, status-check failures |
| Amazon RDS | cloudwatch.rds.* | CPU utilization, database connections, freeable memory, free storage space, disk throughput, IOPS, latency |
| Classic Load Balancer (ELB) | cloudwatch.elb.* | request count, backend and load-balancer response codes, backend connection errors, latency, host count, spillover count |
| Application Load Balancer (ALB) | cloudwatch.alb.* | request count, target and load-balancer response codes, connection rate, active connections, processed traffic, target response time, consumed LCUs |
| Network Load Balancer (NLB) | cloudwatch.nlb.* | active and new flow counts, processed bytes and packets, consumed LCUs, TCP resets |
| Amazon S3 | cloudwatch.s3.* | bucket size, number of objects (daily storage metrics) |
| AWS Lambda | cloudwatch.lambda.* | invocations, errors and throttles, duration |
| Amazon SQS | cloudwatch.sqs.* | message throughput, empty receives, queue depth, age of oldest message, sent message size |
| Amazon DynamoDB | cloudwatch.dynamodb.* | consumed and provisioned capacity, throttle events |
| Amazon API Gateway | cloudwatch.api_gateway.* | requests, errors, latency |
| AWS Step Functions | cloudwatch.step_functions.* | executions, throttled events, execution time |
| NAT Gateway | cloudwatch.nat_gateway.* | traffic, active connections, connection rate, errors, idle timeouts |
| Amazon Kinesis Data Streams | cloudwatch.kinesis.* | data throughput, records, GetRecords iterator age, operation latency, throughput exceeded, PutRecords rejected |
| Amazon Data Firehose | cloudwatch.firehose.* | records, throughput, put requests, throttled records, S3 delivery freshness and success |
| Amazon SNS | cloudwatch.sns.* | messages published, notifications, published message size |
| Amazon EBS | cloudwatch.ebs.* | volume throughput, IOPS, queue length, idle time, burst balance |
| Amazon EFS | cloudwatch.efs.* | I/O throughput, metered vs permitted throughput, percent I/O limit, burst credit balance, client connections |
| Amazon ECS | cloudwatch.ecs.* | service utilization, EBS filesystem utilization, live task count |
| Amazon ElastiCache | cloudwatch.elasticache.* | CPU utilization, memory, database memory usage, current and new connections, cache hits and misses, evictions, network traffic |
| Amazon OpenSearch Service | cloudwatch.opensearch.* | cluster status, index writes blocked, nodes, CPU utilization, JVM memory pressure, free storage space, search and indexing rate, search and indexing latency |
| Amazon DocumentDB | cloudwatch.docdb.* | CPU utilization, freeable memory, connections, buffer cache hit ratio, disk IOPS, latency, throughput, replica lag, cursors timed out |
| Amazon Redshift | cloudwatch.redshift.* | health, CPU utilization, disk space used, database connections, disk IOPS, throughput, network throughput |
| Amazon MSK | cloudwatch.msk.* | broker throughput, messages in, CPU, disk used, memory, partitions, connections |
| Amazon CloudFront | cloudwatch.cloudfront.* | requests, downloaded and uploaded traffic, total/4xx/5xx error rates |
| AWS Auto Scaling | cloudwatch.auto_scaling.* | group sizing (min/max/desired/total) and instances by state (in-service, pending, standby, terminating) |
| Amazon Bedrock | cloudwatch.bedrock.* | invocations, invocation errors, token throughput, invocation and time-to-first-token latency |
| Amazon EventBridge | cloudwatch.eventbridge.* | target invocations, rule activity (matched events, triggered rules), ingestion-to-invocation latency |
| AWS Site-to-Site VPN | cloudwatch.vpn.* | tunnel traffic (in/out) and tunnel state (fraction of tunnels up) |
| Amazon EKS | cloudwatch.eks.* | control-plane health: API server request rate, errors, p99 latency, and in-flight requests; etcd database size; scheduler pending pods and scheduling attempts |
Each profile also carries optional metrics that are commented out to keep cost and cardinality low; uncomment a metric and its matching chart, then restart the Netdata Agent (profiles are loaded once per go.d process and cached). Stock profiles are shipped at /usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/. To customize a service, copy its profile into /etc/netdata/go.d/cloudwatch.profiles/ (keep the same filename) and edit it -- a user profile fully replaces the stock one of the same name -- then restart the Agent.
With profiles.mode: combined, these deep-grain profiles are collected in addition to the defaults:
| Profile | Metric prefix | Description |
|---|---|---|
| ALB Target Groups | cloudwatch.alb_target.* | per-target-group host count, requests per target, response time, response codes, connection errors |
| DynamoDB Operations | cloudwatch.dynamodb_operation.* | per-operation successful request latency, system errors, throttled requests, returned items |
| S3 Request Metrics | cloudwatch.s3_requests.* | requests, request errors, request latency, request data transfer |
These deep-grain profiles are the highest-cardinality data the collector emits. S3 Request Metrics additionally require per-bucket request-metrics configuration in AWS and are billed at CloudWatch custom-metric rates; they collect nothing until enabled on the bucket.
Troubleshooting
Debug Mode
Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.
To troubleshoot issues with the cloudwatch collector, run the go.d.plugin with the debug option enabled. The output
should give you clues as to why the collector isn't working.
-
Navigate to the
plugins.ddirectory, usually at/usr/libexec/netdata/plugins.d/. If that's not the case on your system, opennetdata.confand look for thepluginssetting under[directories].cd /usr/libexec/netdata/plugins.d/ -
Switch to the
netdatauser.sudo -u netdata -s -
Run the
go.d.pluginto debug the collector:./go.d.plugin -d -m cloudwatchTo debug a specific job:
./go.d.plugin -d -m cloudwatch -j jobName
Getting Logs
If you're encountering problems with the cloudwatch collector, follow these steps to retrieve logs and identify potential issues:
- Run the command specific to your system (systemd, non-systemd, or Docker container).
- Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.
System with systemd
Use the following command to view logs generated since the last Netdata service restart:
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep cloudwatch
System without systemd
Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:
grep cloudwatch /var/log/netdata/collector.log
Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.
Docker Container
If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:
docker logs netdata 2>&1 | grep cloudwatch
No metrics are collected
Check the following:
- Permissions -- the IAM identity allows
cloudwatch:ListMetrics,cloudwatch:GetMetricData, andsts:GetCallerIdentity(plussts:AssumeRoleinassume_rolemode). - Regions -- the
regionslist includes the regions where your resources run. Some services are global and report to a single region: Amazon CloudFront publishes its CloudWatch metrics only inus-east-1(with a constantRegion=Global), soregionsmust includeus-east-1to collect it. - Resources are active -- confirm in the AWS CloudWatch console that the resources are publishing metrics.
- Collector logs -- check for authentication or API errors:
# systemd
journalctl -u netdata --namespace=netdata --grep cloudwatch --since "5 minutes ago"
# non-systemd
grep cloudwatch /var/log/netdata/collector.log
Missing metrics for some services
- Profile mode -- ensure
profiles.mode: auto(default), or that the service's profile basename is listed underprofiles.mode_exact.entries. - Daily metrics -- S3 storage metrics are published once per day. They are inherently delayed by about a day, and
recently_active_onlyis automatically disabled for them. - Resource activity -- some metrics only appear when the resource is actively processing data (for example, EventBridge and Bedrock publish a metric only when its value is non-zero).
- Auto Scaling group metrics -- Auto Scaling group metrics (
cloudwatch.auto_scaling.*) are not published until group-metrics collection is enabled on the group (aws autoscaling enable-metrics-collection --granularity 1Minute). Amazon EKS managed node groups have it enabled by default. - EKS control-plane metrics -- EKS control-plane metrics (
cloudwatch.eks.*) are published to theAWS/EKSnamespace automatically, at no additional EKS charge, only for clusters running Kubernetes 1.28 or later; older clusters do not report them. These are distinct from Container Insights / the CloudWatch Observability add-on (agent-based, billed separately).
Charts have gaps or incomplete data
CloudWatch publishes metrics with a delay.
- The collector uses
query_offset(default 600 seconds), and the effective offset is at least one full metric period. - If charts still have gaps, increase
query_offset.
Access denied or authentication errors
- Verify the credentials selected by
auth.modeare valid and not expired. - For
assume_role, confirm the base identity is allowed tosts:AssumeRolethe target role and that the role's trust policy permits it. - For AWS GovCloud or China partitions, ensure every region in
regionsbelongs to the same partition.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.