Amazon CloudWatch

Plugin: go.d.plugin Module: cloudwatch

Overview

Monitor AWS infrastructure through Amazon CloudWatch. This collector discovers CloudWatch metrics for a curated set of AWS services and renders them as Netdata charts, with minimal configuration.

Monitored services:

Area	Services
Compute and containers	Amazon EC2, AWS Lambda, Amazon ECS, Amazon EKS (Kubernetes control plane), AWS Auto Scaling
Databases and analytics	Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, Amazon DocumentDB, Amazon Redshift, Amazon OpenSearch Service
Storage	Amazon S3, Amazon EBS, Amazon EFS
Networking and content delivery	Classic (ELB), Application (ALB), and Network (NLB) load balancers, NAT Gateway, AWS PrivateLink endpoints and endpoint services, Amazon CloudFront, AWS Site-to-Site VPN
Messaging, streaming, and events	Amazon SQS, Amazon SNS, Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon MSK, Amazon EventBridge
Application services and AI	Amazon API Gateway, AWS Step Functions, Amazon Bedrock
Cost	AWS Billing estimated month-to-date charges (opt-in)

Key terms used throughout this page:

Term	Meaning
Namespace	AWS's grouping for a service's metrics (for example `AWS/EC2`).
Dimension	A name/value pair that identifies a resource within a namespace (for example `InstanceId`).
Statistic	The CloudWatch aggregation applied per period (for example Average, Sum, Maximum).
Profile	A Netdata YAML file that maps a namespace's metrics to charts.
Grain	The exact dimension set a profile matches -- the level of detail one chart instance represents (for example one PrivateLink endpoint vs one endpoint per subnet).
Target	A named AWS identity to monitor: a credential source used directly or through one assumed role.
Rule	An ordered configuration entry that selects targets, profiles, metrics, and regions.
Series	One metric/statistic pair for one resource instance -- the unit counted by plan limits and AWS billing.
Partition	An isolated AWS region group (standard `aws`, GovCloud `aws-us-gov`, China `aws-cn`). All regions selected for one target must share a partition, and an assumed-role ARN must match it.

Coverage is defined by profiles -- YAML files declaring a CloudWatch namespace, an exact resource-dimension grain, supported regions, metrics, statistics, and chart template. A service can use multiple profiles when AWS publishes distinct grains, and coverage can be extended without collector code changes. See the AWS CloudWatch profile format for the complete schema and authoring rules.

Need a service that isn't listed?

Request a profile -- it's just a YAML file, no code change. Open a feature request and attach the service's CloudWatch metric schema, captured with this read-only command. It prints only metric and dimension names (no resource IDs, ARNs, or metric values), so the output is safe to share:

aws cloudwatch list-metrics --namespace "AWS/<Service>" --region <your-region> --output json \
  | jq -c '[.Metrics[] | {metric: .MetricName, dimensions: ([.Dimensions[].Name] | sort)}] | unique'

Replace AWS/<Service> with the service namespace (for example AWS/AmazonMQ) and <your-region> with a region where the service runs. The exact metrics and dimensions in the output are what we need to author a correct profile quickly.

This collector reads runtime metrics from CloudWatch. It complements the AWS EC2 Compute instances integration (EC2 inventory and capacity) and the AWS Quota integration (AWS Service Quotas). They use different AWS data sources and do not replace one another.

The collector works in three stages:

Plan -- named credential sources, monitored targets, and ordered collection rules are compiled into a fixed runtime plan. Each target resolves its AWS account ID through sts:GetCallerIdentity; target names remain distinct identities even when they resolve to the same account. When selections overlap, rule order -- then target order within the rule -- decides ownership: the first match owns each exported metric/statistic series.
Discover -- a profile with identifying dimensions finds its resources with one CloudWatch ListMetrics scan per target, region, and namespace, then applies its exact dimension matcher. A profile whose dimensions are all constants is a known static instance and skips ListMetrics. Optional resource-tag filters are resolved with the Resource Groups Tagging API before queries are expanded.
Query -- every selected series gets a resolved timing policy: aggregation period, rolling lookback, and publication delay. GetMetricData searches the aligned rolling window for the newest complete datapoint, and Netdata receives the retained value on every collection cycle.

Built-in collector-activity charts show CloudWatch API calls, calculated billable metric requests, and raw queries, so you can tune the plan and relate collector work to AWS billing.

To start collecting, jump to Setup -- every job needs three blocks: credentials (how to authenticate), targets (which AWS identity to monitor), and rules (which services and regions to collect). The sections in between are tuning and cost reference.

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Permission	Needed when
`cloudwatch:GetMetricData`	Always, for every target.
`cloudwatch:ListMetrics`	Any selected profile has an identifying dimension and therefore needs discovery. An all-constant profile such as `billing_total` is queried directly and does not need it.
`sts:AssumeRole`	A target sets `assume_role`; grant it on the credential source's identity for that role ARN.
`tag:GetResources`	Resource tag filters (`rule_defaults.filters.resource_tags`, `rules[].filters.resource_tags`) or resource tag labels (`labels.resource_tags`) are configured.

The collector also calls sts:GetCallerIdentity for account attribution, but AWS does not require an explicit permission grant for that operation.

Default Behavior

Auto-Detection

The defaults are designed so a minimal configuration collects something useful:

A rule that omits profiles selects all default-enabled profiles for its targets and regions.
A rule that omits metrics collects the default-enabled metrics from those profiles.
A metric group changes only its named profile; other selected profiles keep their defaults. The group keeps the profile's defaults unless defaults: false; either way, it adds the exact AWS MetricNames it lists.
Statistics resolve from the metric entry, then the group, then the profile declaration.
Charts appear only for profiles with live metrics.
Discovery and the query blueprint are cached; discovery refreshes every discovery.refresh_every seconds (default 300).

Limits

Timing

The minimum collection interval is 60 seconds (CloudWatch's minimum metric period).
Query timing resolves field-by-field: rules[].query, then rule_defaults.query, then metric and profile defaults, then the built-in 10-minute publication delay. The combined publication_delay + lookback + period horizon cannot exceed 14 days.
The stock S3 storage profile uses a conservative one-day delay policy: AWS documents that S3 storage metrics are reported once per day, without guaranteeing publication within one day.
A successful sparse query can replay its newest eligible CloudWatch value for up to lookback. During transient AWS failures, the retained value can be replayed longer, until a successful query replaces or expires it.

Plan size

The collector refuses a plan too large to query safely: more than 20,000 selected series, 600,000 datapoints due in one cycle, or 40 batched GetMetricData requests (up to five statistics for one metric count as a single request). Only very broad rules approach these bounds; narrow regions, profiles, or metrics if you do.
limits.max_instances (default 1000) bounds distinct final static or discovered instances after tag filtering and overlap resolution. Overflow rejects the refreshed plan; instances are never silently truncated.
limits.max_discovery_groups (default 64, hard maximum 100) bounds unique (target, region, namespace) discovery groups. Compatible rules and profiles share a group. Larger collection must be split across jobs: one refresh can admit at most 100 groups that reach ListMetrics.

Discovery bounds

Each discovery group is bounded to 100 ListMetrics pages, 50,000 scanned metrics, 1,000,000 residual profile matches, and 20,000 candidate instances. Overflow fails the group without replacing its previous snapshot.
One whole discovery refresh is additionally capped at 100 admitted ListMetrics operations, the same scan/match/candidate totals, 64 MiB of candidate storage, and one shared timeout. Every non-skipped group that resolves a client runs its first admitted operation before continuations share the remaining budget.
Exhausting an aggregate limit or the timeout discards the attempted refresh atomically: the existing snapshot stays active and discovery retries after discovery.refresh_every. On the first pass, all-constant static profiles keep collecting while dynamic discovery waits for its retry; without static work, total discovery failure makes the collection attempt fail.

Labels

Resources are labeled by their identifying CloudWatch dimensions (for example EC2 instance_id). Selected resource tags can be attached as non-identity labels via labels.resource_tags; changing those tags updates labels without changing chart identity. A dimension that is constant across resources (such as CloudFront's Region=Global) is matched and queried but not turned into a label.

Performance Impact

What AWS bills

AWS bills CloudWatch API usage. GetMetricData is the cost driver -- roughly $0.01 per 1,000 metrics requested (confirm current CloudWatch pricing for your region). ListMetrics discovery falls under the free tier and then costs a fraction as much. Up to five statistics requested for the same metric count as one billable metric request, and the collector's batching preserves that grouping.

To estimate a job's normal daily cost:

billable metric requests/day ≈ instances × billable metrics per instance × (86,400 / period seconds)

For example, one Billing series at the stock 10-minute period is 86,400 / 600 = 144 requests per day. For a running job, skip the arithmetic and read the CloudWatch Metric Requests chart described below -- it reports the billable metric requests actually submitted.

How the collector keeps cost down

Each series is queried once per newly eligible period window, not once per Netdata collection cycle.
Curated profiles, exact metric/statistic/resource-tag selection, and single-statistic defaults keep the selected set small.
Compatible rules and profiles share discovery scans; discovery and query plans are cached; recently_active_only narrows scans to active resources.
A transient failure retries after one update_every, then doubles the delay within the same eligible window, capped at the effective period. Retries are billable, so update_every affects failure-time cost even though it does not set the normal query cadence.

Cost scales with selected targets, instances, metrics, statistics beyond AWS's grouping, periods, and lookback length. Longer lookbacks increase requested datapoints and can disable CloudWatch's three-hour recently-active discovery filter. To reduce cost, narrow rules[].targets, rules[].profiles, rules[].metrics, rules[].regions, or configure resource tag filters.

Watching collector-issued work

Three collector-activity chart types expose the inputs behind that cost model:

Chart	Counts	Instance labels
CloudWatch API Calls	Collector-issued `ListMetrics` and `GetMetricData` calls, including every pagination page	`account_id`, `region`, `operation`
CloudWatch Metric Requests	Calculated billable `GetMetricData` metric requests, using AWS's up-to-five-statistics grouping	`account_id`, `region`
CloudWatch Raw Queries	Submitted `MetricDataQuery` items, for plan tuning	`account_id`, `region`, `profile`

Calls and billable metric requests deliberately have no profile attribution because one shared scan or request can serve multiple profiles; targets that resolve to the same account are aggregated. Each chart reports an absolute count for the interval since the preceding successfully committed collector frame: a cached interval with no real AWS work reports zero, activity from failed cycles carries into the next successful frame, and job replacement or a process restart resets it. The gauges exclude SDK-internal retries and are billing inputs, not an AWS invoice.

Billing profile cost

The opt-in Billing profiles use a 10-minute period, so each selected Billing series normally produces 144 billable metric requests per day before retries (200 series produce 28,800). AWS charges by metrics requested, not by the datapoint slots that the 24-hour lookback reserves. The total profile is static and performs no ListMetrics; the three dynamic Billing grains share one namespace discovery stream per target and refresh interval. Billing cardinality grows with services, linked accounts, and observed account/service pairs, so select only the grains you need.

Setup

You can configure the cloudwatch collector in two ways:

Method	Best for	How to
UI	Fast setup without editing files	Go to Nodes → Configure this node → Collectors → Jobs, search for cloudwatch, then click + to add a job.
File	If you prefer configuring via file, or need to automate deployments (e.g., with Ansible)	Edit `go.d/cloudwatch.conf` and add a job.

important

UI configuration requires paid Netdata Cloud plan.

Prerequisites

Create an AWS IAM identity with CloudWatch read access

The collector needs an IAM identity (user or role) allowed to read CloudWatch metrics. It resolves the AWS account identity with sts:GetCallerIdentity, which does not require an explicit permission grant.

Attach a policy such as:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:ListMetrics",
        "cloudwatch:GetMetricData"
      ],
      "Resource": "*"
    }
  ]
}

Permission notes:

cloudwatch:ListMetrics and cloudwatch:GetMetricData do not support resource-level permissions, so "Resource": "*" is already least-privilege for these read actions.
A job that selects only profiles whose dimensions are all constants, such as billing_total, can omit cloudwatch:ListMetrics; every dynamic profile needs it.
sts:GetCallerIdentity needs no explicit grant.
Scope sts:AssumeRole to the specific role ARN(s) rather than *.
To enable resource tag filtering or labels, also grant tag:GetResources (it likewise requires "Resource": "*").

In the collector configuration, define one or more named credential sources:

Type	Behavior	Use when
`default`	AWS SDK default credential chain: environment variables, shared config/credentials files, EC2 instance profile, or EKS IRSA	Netdata runs inside AWS
`static`	Explicit access key ID and secret access key, plus an optional session token	Keys are provisioned externally; use go.d secret references, not plaintext values

A target can use either source directly or use it to assume one IAM role. If the role trust policy requires an external ID, the role owner supplies that value; it is not an AWS password or access key. See AWS guidance for third-party access.

Enable CloudWatch Billing metrics before selecting Billing profiles

The Billing profiles are opt-in. Before collecting them, enable Receive CloudWatch Billing Alerts in AWS Billing Preferences as described in AWS's Billing alarm documentation.

This setup action is separate from the collector's runtime IAM policy:

Enabling the preference requires the account root user or an IAM principal allowed to view Billing information. The collector identity still needs only the CloudWatch read permissions described above.
Once enabled, AWS says Billing metric data collection cannot be disabled. Deleting CloudWatch alarms does not disable the metric feed.
The first data normally appears about 15 minutes after first enablement. AWS then calculates and publishes estimates several times daily, so a working job can legitimately have no chart or can hold the last published value between updates.
Billing data is published only in us-east-1, represents worldwide charges, and is reported only in USD. The charts show the latest published estimate for the current month, not a forecast.
For consolidated billing, enable the preference in the management/payer account. That account can expose the consolidated total plus linked-account views; a standalone/member view can contain fewer grains. If the management/payer account changes, enable the preference again in the new account.
AWS does not publish these Billing metrics for Amazon Partner Network (APN) accounts.

The collector samples and holds the newest successfully retrieved value. Treat the charts as the latest published month-to-date estimate, not an invoice or real-time ledger; chart behavior details are described under Metrics.

Configuration

Options

The following options can be defined globally or per job.

Profile file locations:

Type	Path
Stock profiles	`/usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/`
User overrides	`/etc/netdata/go.d/cloudwatch.profiles/`

A user profile file with the same basename as a stock profile overrides it.

Config options

Group	Option	Description	Default	Required
Collection	update_every	Data collection interval (seconds). Must be at least 60 (CloudWatch's minimum period).	60	no
	autodetection_retry	Recheck interval (seconds) when the job fails to start. Default `0` means no retry; set a positive value to keep retrying.	0	no
	timeout	AWS operation timeout (seconds). Identity, resource-tag, and query operations use it for their operation scope; discovery shares one timeout across its whole refresh stage.	30	no
Authentication	credentials	Up to 64 named credential sources. Every source has a `type` of `default` (AWS SDK default chain) or `static` (explicit access/session credentials in `type_static`). Multiple targets can share one credential source.		yes
	credentials[].name	Credential source name referenced by targets. Names are lowercase, start with a letter, use only letters, digits, `_`, and `-`, and are at most 64 characters; the same format applies to target and rule names.		yes
	credentials[].type	Credential source type: `default` uses the AWS SDK chain; `static` requires `type_static`.		yes
	credentials[].type_static	Configuration used only when the credential source `type` is `static`.		no
	credentials[].type_static.access_key_id	AWS access key ID. Required in `type_static`. Use a go.d secret reference such as `${env:AWS_ACCESS_KEY_ID}`.		no
	credentials[].type_static.secret_access_key	AWS secret access key. Required in `type_static`. Use a go.d secret reference; do not store plaintext credentials in the file.		no
	credentials[].type_static.session_token	Optional AWS session token in `type_static` for temporary credentials. Use a go.d secret reference.		no
Targets	targets	Up to 64 named monitored AWS identities. A target uses one credential source directly or uses that source to assume one role. Targets remain distinct even when they resolve to the same AWS account.		yes
	targets[].name	Target name referenced by collection rules.		yes
	targets[].credentials	Name of the credential source used by this target.		yes
	targets[].assume_role.role_arn	IAM role ARN to assume using the target's credential source. Required when `assume_role` is present.		no
	targets[].assume_role.external_id	Optional value supplied by the role owner when the role trust policy requires an external ID. It is not an AWS password or access key.		no
Rules	rules	Up to 256 ordered collection rules. Each rule selects targets, profiles, optional per-profile metric overrides, regions, and optional resource-tag filters. The earliest matching rule and target own each overlapping exported metric/statistic series.		yes
	rules[].name	Unique rule name used in diagnostics.		yes
	rules[].targets	Ordered names of monitored targets selected by this rule. Order breaks overlap ties within the rule.		yes
	rules[].profiles.defaults	Include all default-enabled profiles. Defaults to `true` when `profiles` or `defaults` is omitted.	yes	no
	rules[].profiles.include	Profile basenames to add explicitly. Set `defaults` to `false` to collect only this list, including profiles disabled by default. PrivateLink detail choices are `privatelink_endpoint_subnet`, `privatelink_service_az`, `privatelink_service_load_balancer`, `privatelink_service_az_load_balancer`, and `privatelink_service_vpc_endpoint`. Billing choices are `billing_total`, `billing_service`, `billing_linked_account`, and `billing_linked_account_service`.		no
	rules[].profiles.exclude	Profile basenames to remove from the selected set. A profile cannot be both included and excluded.		no
	rules[].metrics	Optional per-profile metric overrides. Omit `metrics` to collect default-enabled metrics. Profiles without a group keep their defaults; a group can add opt-in metrics or switch to an exact-only set. Explicit selections expand to at most 256 metric/statistic pairs per rule.		no
	rules[].metrics[].profile	Profile basename. It must already be selected by `rules[].profiles` and may appear in only one metrics group per rule. Other selected profiles keep their default-enabled metrics.		yes
	rules[].metrics[].defaults	Include this profile's default-enabled metrics before adding the exact MetricNames below. Set `false` for an exact-only selection.	yes	no
	rules[].metrics[].statistics	Optional non-empty AWS statistics inherited by included metrics that omit their own list. When both lists are omitted, the profile-declared statistics are used. Named statistics are case-insensitive.		no
	rules[].metrics[].include	Non-empty list of exact, case-sensitive AWS CloudWatch MetricNames added by this group. Duplicate names are rejected.		yes
	rules[].metrics[].include[].name	Exact, case-sensitive AWS CloudWatch MetricName exported by the profile.		yes
	rules[].metrics[].include[].statistics	Optional non-empty replacement for the group statistics. When both are omitted, inherit every statistic declared for the metric by the profile. Use `Average`, `Minimum`, `Maximum`, `Sum`, `SampleCount`, or `p<N>`; named statistics are case-insensitive.		no
	rules[].regions	Canonical lowercase AWS region codes selected by this rule. The compiler intersects them with intrinsic profile restrictions; CloudFront and the Billing profiles support only `us-east-1`.		yes
Query Policy	rule_defaults.query	Shared query timing inherited field-by-field by collection rules. Omitted fields fall through to profile or built-in fallbacks. The resolved `publication_delay + lookback + period` horizon cannot exceed 14 days.		no
	rule_defaults.query.period	Default CloudWatch aggregation period from `1m` through `24h`, as an exact multiple of `1m`. An omitted rule period inherits this value before nested metric and profile query defaults.		no
	rule_defaults.query.lookback	Default rolling window searched for the newest eligible datapoint. It must be at least the effective period, an exact period multiple, and no more than 1,440 buckets (a bucket is one period).		no
	rule_defaults.query.publication_delay	Default collector wait after a bucket closes before it becomes eligible. This is a scheduling policy, not an AWS publication guarantee. Omission falls through to the profile value and then the built-in `10m` fallback. Setting this option overrides profile-specific delays for every inheriting rule, including the stock S3 storage profile's conservative `1d`; AWS documents only that S3 storage metrics are reported once per day, so use a shorter default only after verifying each workload's publication timing.		no
	rules[].query	Optional query timing overrides for this rule. Each omitted field independently inherits `rule_defaults.query`, then the relevant profile or built-in fallback.		no
	rules[].query.period	CloudWatch aggregation period for every series selected by this rule, from `1m` through `24h` as an exact multiple of `1m`. Rate metrics are normalized using this effective period.		no
	rules[].query.lookback	Rolling window searched for the newest complete finite datapoint. It must be at least the effective period, an exact period multiple, and no more than 1,440 buckets (a bucket is one period). Successful queries may present the retained datapoint as current for up to this duration; longer lookbacks increase response work.		no
	rules[].query.publication_delay	Collector wait after a bucket closes before querying it. This is a scheduling policy, not an AWS publication guarantee. Explicit `0s` is allowed for metrics known to publish immediately.		no
Resource Filters	rule_defaults.filters.resource_tags	Job-wide list of exact, case-sensitive AWS resource tag predicates inherited by rules that omit `rules[].filters.resource_tags`. All keys must match; any listed value for one key may match. The Resource Groups Tagging API performs the focused lookup and requires `tag:GetResources`.		no
	rule_defaults.filters.resource_tags[].key	Exact AWS resource tag key. A filter list supports at most 50 distinct keys.		yes
	rule_defaults.filters.resource_tags[].values	One to 20 exact, case-sensitive accepted values for this key. Values for one key are ORed.		yes
	rules[].filters.resource_tags	Per-rule replacement for `rule_defaults.filters.resource_tags`. Omit it to inherit the default, provide a non-empty list to replace the default, or set `[]` to disable tag filtering for this rule.		no
	rules[].filters.resource_tags[].key	Exact AWS resource tag key. A filter list supports at most 50 distinct keys.		yes
	rules[].filters.resource_tags[].values	One to 20 exact, case-sensitive accepted values for this key. Values for one key are ORed.		yes
Resource Labels	labels.resource_tags	Optional AWS resource tags copied to charts as non-identity labels. This is presentation only and does not select resources. Tag values may contain personal data, so expose only keys intended for Netdata. Requires `tag:GetResources`.		no
	labels.resource_tags[].key	Exact, case-sensitive AWS resource tag key.		yes
	labels.resource_tags[].label	Optional Netdata label key. When omitted, the AWS key is normalized (`Name` becomes `name`). Use an explicit label to avoid invalid names or collisions with identity labels such as `region`.		no
Limits	limits.max_instances	Maximum distinct final static or discovered CloudWatch instances that emit at least one selected series after filtering and exported-series overlap resolution. Metric/statistic fan-out is not counted. Overflow rejects the refreshed plan; collection never truncates to the first N instances.	1000	no
	limits.max_discovery_groups	Maximum unique `(target, region, namespace)` discovery groups compiled for the job. Compatible rules and profiles share groups. The default is an accidental-expansion safeguard; raise it only for intentional work. Valid range 1 to 100. Split larger collection across jobs because one refresh can admit at most 100 groups that reach `ListMetrics`.	64	no
Discovery	discovery.refresh_every	How often (seconds) to re-discover metrics. Minimum 60.	300	no
	discovery.recently_active_only	Restrict `ListMetrics` discovery to metrics CloudWatch saw activity for in the last three hours, which keeps scans smaller and cheaper. Profiles sharing one target, region, and namespace share one scan; the filter applies only while every participating series has `publication_delay + lookback + period` of three hours or less, and a single longer-horizon series keeps that whole scan unfiltered.	yes	no
Virtual Node	vnode	Associates this data collection job with a Virtual Node.		no

via UI

Configure the cloudwatch collector from the Netdata web interface:

Go to Nodes.
Select the node where you want the cloudwatch data-collection job to run and click the ⚙ (Configure this node). That node will run the data collection.
The Collectors → Jobs view opens by default.
In the Search box, type cloudwatch (or scroll the list) to locate the cloudwatch collector.
Click the + next to the cloudwatch collector to add a new job.
Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/cloudwatch.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/cloudwatch.conf

Examples

Default credentials, single region

Monitor the base AWS identity in us-east-1 using the SDK default credential chain and all default-enabled profiles.

Config

jobs:
  - name: default_credentials
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rules:
      - name: base-defaults
        targets: [base]
        regions: [us-east-1]

Static credentials assume multiple roles

Use one static/session credential source to assume roles for multiple monitored targets. Store credentials in supported secret providers, not in plaintext.

Config

jobs:
  - name: cross_account
    credentials:
      - name: bootstrap
        type: static
        type_static:
          access_key_id: ${env:AWS_ACCESS_KEY_ID}
          secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
          session_token: ${env:AWS_SESSION_TOKEN}
    targets:
      - name: production
        credentials: bootstrap
        assume_role:
          role_arn: "arn:aws:iam::[ACCOUNT]:role/[ROLE]"
          external_id: ${env:AWS_EXTERNAL_ID}
      - name: staging
        credentials: bootstrap
        assume_role:
          role_arn: "arn:aws:iam::[ACCOUNT]:role/[ROLE]"
    rules:
      - name: both-defaults
        targets: [production, staging]
        regions: [us-east-1, eu-west-1]

Add an opt-in metric to a profile

Keep the default EC2 metrics and add CPU credit balance for burstable instances. Because the metric group omits defaults, it defaults to true; omitting statistics inherits the profile-declared Average statistic.

Config

jobs:
  - name: ec2_with_cpu_credits
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rules:
      - name: ec2
        targets: [base]
        profiles:
          defaults: false
          include: [ec2]
        metrics:
          - profile: ec2
            include:
              - name: CPUCreditBalance
        regions: [us-east-1]

Different timing for metrics in one profile

Use disjoint exact metric selections when one service needs different query timing. Earlier rules own only the metric/statistic series they select, so these two Lambda rules do not shadow each other.

Config

jobs:
  - name: lambda_split_policy
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rules:
      - name: lambda-activity
        targets: [base]
        profiles:
          defaults: false
          include: [lambda]
        metrics:
          - profile: lambda
            defaults: false
            statistics: [Sum]
            include:
              - name: Invocations
        regions: [us-east-1]
        query:
          period: 5m
          lookback: 30m
          publication_delay: 10m
      - name: lambda-latency
        targets: [base]
        profiles:
          defaults: false
          include: [lambda]
        metrics:
          - profile: lambda
            defaults: false
            statistics: [Average, p90]
            include:
              - name: Duration
        regions: [us-east-1]
        query:
          period: 1m
          lookback: 5m
          publication_delay: 5m

Lower resolution to reduce cost

Collect the default profiles at five-minute resolution and refresh discovery less often. rule_defaults.query applies field-by-field to every rule that does not override it, replacing profile timing defaults -- the daily S3 storage profile is excluded here so the job-wide five-minute policy does not query it before AWS publishes. update_every controls how often charts update and the failure-retry cadence, not the normal query cost.

Config

jobs:
  - name: low_resolution
    update_every: 300
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rule_defaults:
      query:
        period: 5m
        lookback: 15m
    rules:
      - name: five-minute-defaults
        targets: [base]
        profiles:
          exclude: [s3]
        regions: [us-east-1]
    discovery:
      refresh_every: 900

AWS Billing estimated charges

Collect each available exact Billing grain independently. Billing metrics must be enabled first, are published only in us-east-1, and do not support resource-tag filters or resource-tag-derived labels. The stock profiles use a 10-minute period and 24-hour retrieval window; a rules[].query block would override only the fields it sets.

Config

jobs:
  - name: billing_estimated_charges
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: billing
        credentials: sdk_default
    rules:
      - name: billing-grains
        targets: [billing]
        profiles:
          defaults: false
          include:
            - billing_total
            - billing_service
            - billing_linked_account
            - billing_linked_account_service
        regions: [us-east-1]
        filters:
          # Billing is not an RGTA resource. This explicitly
          # disables any inherited resource-tag filter.
          resource_tags: []

AWS PrivateLink endpoints with split timing

Collect endpoint-level Average statistics every minute, independently collect six-hour processed-byte Sum windows normalized to bytes/s, and opt into the higher-cardinality endpoint-by-subnet view. Both endpoint grains support the same VPC endpoint resource-tag filters and labels; subnet charts inherit their parent endpoint's tags.

Config

jobs:
  - name: privatelink_endpoints
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rule_defaults:
      filters:
        resource_tags:
          - key: environment
            values: [production]
    rules:
      - name: endpoint-averages
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_endpoint]
        metrics:
          - profile: privatelink_endpoint
            defaults: false
            statistics: [Average]
            include:
              - name: ActiveConnections
              - name: BytesProcessed
              - name: NewConnections
        regions: [us-east-1]
        query:
          period: 1m
          lookback: 5m
          publication_delay: 5m
      - name: endpoint-six-hour-bytes
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_endpoint]
        metrics:
          - profile: privatelink_endpoint
            defaults: false
            include:
              - name: BytesProcessed
                statistics: [Sum]
        regions: [us-east-1]
        query:
          period: 6h
          lookback: 6h
          publication_delay: 5m
      - name: endpoint-subnets
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_endpoint_subnet]
        regions: [us-east-1]
    labels:
      resource_tags:
        - key: Name

AWS PrivateLink services with split timing

Collect provider-side traffic averages every minute, connected-endpoint count every five minutes, and independently collect six-hour processed-byte Sum windows normalized to bytes/s. Every grain joins tags through its parent VPC endpoint service, including the service-by-consumer-endpoint view.

Config

jobs:
  - name: privatelink_services
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rule_defaults:
      filters:
        resource_tags:
          - key: environment
            values: [production]
    rules:
      - name: service-traffic-averages
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_service]
        metrics:
          - profile: privatelink_service
            defaults: false
            statistics: [Average]
            include:
              - name: ActiveConnections
              - name: BytesProcessed
              - name: NewConnections
              - name: RstPacketsSent
        regions: [us-east-1]
        query:
          period: 1m
          lookback: 5m
          publication_delay: 5m
      - name: service-endpoint-count
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_service]
        metrics:
          - profile: privatelink_service
            defaults: false
            include:
              - name: EndpointsCount
                statistics: [Average]
        regions: [us-east-1]
        query:
          period: 5m
          lookback: 5m
          publication_delay: 5m
      - name: service-six-hour-bytes
        targets: [base]
        profiles:
          defaults: false
          include: [privatelink_service]
        metrics:
          - profile: privatelink_service
            defaults: false
            include:
              - name: BytesProcessed
                statistics: [Sum]
        regions: [us-east-1]
        query:
          period: 6h
          lookback: 6h
          publication_delay: 5m
    labels:
      resource_tags:
        - key: Name

All services including opt-in profiles

Select defaults and explicitly add every disabled opt-in profile, including detailed PrivateLink endpoint/service grains and four Billing grains. Their cardinality, prerequisites, and cost guidance still apply.

Config

jobs:
  - name: defaults_and_opt_in
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rules:
      - name: expanded-services
        targets: [base]
        profiles:
          defaults: true
          include:
            - alb_target
            - dynamodb_operation
            - s3_requests
            - ebs_stalled_io
            - privatelink_endpoint_subnet
            - privatelink_service_az
            - privatelink_service_load_balancer
            - privatelink_service_az_load_balancer
            - privatelink_service_vpc_endpoint
            - billing_total
            - billing_service
            - billing_linked_account
            - billing_linked_account_service
        regions: [us-east-1]

Filter resources by tag and add tag labels

Apply one job-wide exact tag filter, disable it for an unsupported profile, and expose selected AWS tags as mutable non-identity chart labels.

Config

jobs:
  - name: tagged_resources
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rule_defaults:
      filters:
        resource_tags:
          - key: managed-by
            values: [platform]
    rules:
      - name: filtered-defaults
        targets: [base]
        regions: [us-east-1]
      - name: unfiltered-cloudfront
        targets: [base]
        profiles:
          defaults: false
          include: [cloudfront]
        regions: [us-east-1]
        filters:
          resource_tags: []
    labels:
      resource_tags:
        - key: Name
        - key: owner
          label: resource_owner
    limits:
      max_instances: 1000
      max_discovery_groups: 64

Present AWS metrics on a virtual node

Attach the job to a virtual node so CloudWatch metrics appear as their own Netdata node instead of on the node running the collector. The virtual node must already be defined in the Agent's vnodes configuration.

Config

jobs:
  - name: aws_production
    vnode: aws-production
    credentials:
      - name: sdk_default
        type: default
    targets:
      - name: base
        credentials: sdk_default
    rules:
      - name: base-defaults
        targets: [base]
        regions: [us-east-1]

Alerts

The following alerts are available:

Alert name	On metric	Description
aws_cloudwatch_ec2_status_check_failed	cloudwatch.ec2.status_check_failed	EC2 status check failed on ${label:instance_id}
aws_cloudwatch_ec2_attached_ebs_status_check_failed	cloudwatch.ec2.status_check_failed	EC2 attached EBS status check failed on ${label:instance_id}
aws_cloudwatch_alb_target_group_unhealthy_hosts	cloudwatch.alb_target_health.unhealthy_hosts	ALB target group has unhealthy targets on ${label:load_balancer}/${label:target_group}
aws_cloudwatch_nlb_target_group_unhealthy_hosts	cloudwatch.nlb_target_health.unhealthy_hosts	NLB target group has unhealthy targets on ${label:load_balancer}/${label:target_group}
aws_cloudwatch_ebs_stalled_io_check_failed	cloudwatch.ebs_stalled_io.stalled_io_check	EBS volume stalled I/O check failed on ${label:volume_id}; requires the opt-in ebs_stalled_io profile
aws_cloudwatch_nat_gateway_port_allocation_errors	cloudwatch.nat_gateway.errors	NAT Gateway port allocation errors on ${label:nat_gateway_id}
aws_cloudwatch_efs_io_limit_reached	cloudwatch.efs.io_limit	EFS I/O limit reached on ${label:file_system_id}
aws_cloudwatch_efs_burst_credits_exhausted	cloudwatch.efs.burst_credit	EFS burst credits exhausted on ${label:file_system_id}
aws_cloudwatch_ecs_cpu_utilization	cloudwatch.ecs.utilization	ECS service CPU utilization high on ${label:cluster_name}/${label:service_name}
aws_cloudwatch_ecs_memory_utilization	cloudwatch.ecs.utilization	ECS service memory utilization high on ${label:cluster_name}/${label:service_name}
aws_cloudwatch_ecs_ebs_filesystem_utilization	cloudwatch.ecs.ebs_filesystem_utilization	ECS EBS filesystem utilization high on ${label:cluster_name}/${label:service_name}
aws_cloudwatch_opensearch_cluster_status_red	cloudwatch.opensearch.cluster_status	OpenSearch cluster red on ${label:domain_name}
aws_cloudwatch_opensearch_cluster_status_yellow	cloudwatch.opensearch.cluster_status	OpenSearch cluster yellow on ${label:domain_name}
aws_cloudwatch_opensearch_index_writes_blocked	cloudwatch.opensearch.index_writes_blocked	OpenSearch index writes blocked on ${label:domain_name}
aws_cloudwatch_opensearch_jvm_memory_pressure	cloudwatch.opensearch.jvm_memory_pressure	OpenSearch JVM memory pressure high on ${label:domain_name}
aws_cloudwatch_opensearch_cpu_utilization	cloudwatch.opensearch.cpu	OpenSearch CPU utilization high on ${label:domain_name}
aws_cloudwatch_opensearch_automated_snapshot_failure	cloudwatch.opensearch.automated_snapshot_failure	OpenSearch automated snapshot failed on ${label:domain_name}
aws_cloudwatch_opensearch_old_gen_jvm_memory_pressure	cloudwatch.opensearch.old_gen_jvm_memory_pressure	OpenSearch old-gen JVM memory pressure high on ${label:domain_name}
aws_cloudwatch_elasticache_engine_cpu_utilization	cloudwatch.elasticache.cpu	ElastiCache engine CPU utilization high on ${label:cache_cluster_id}/${label:cache_node_id}
aws_cloudwatch_msk_active_controller_missing	cloudwatch.msk_cluster.active_controllers	MSK cluster has no active controller on ${label:cluster_name}
aws_cloudwatch_msk_multiple_active_controllers	cloudwatch.msk_cluster.active_controllers	MSK cluster has multiple active controllers on ${label:cluster_name}
aws_cloudwatch_msk_offline_partitions	cloudwatch.msk_cluster.offline_partitions	MSK cluster has offline partitions on ${label:cluster_name}
aws_cloudwatch_msk_cpu_utilization	cloudwatch.msk.cpu	MSK broker CPU utilization high on ${label:cluster_name}/${label:broker_id}
aws_cloudwatch_msk_data_logs_disk_used	cloudwatch.msk.disk_used	MSK broker data-log disk utilization high on ${label:cluster_name}/${label:broker_id}
aws_cloudwatch_msk_heap_memory_after_gc	cloudwatch.msk.heap_memory_after_gc	MSK broker heap memory after GC high on ${label:cluster_name}/${label:broker_id}
aws_cloudwatch_msk_under_replicated_partitions	cloudwatch.msk.partitions	MSK broker has sustained under-replicated partitions on ${label:cluster_name}/${label:broker_id}
aws_cloudwatch_msk_under_min_isr_partitions	cloudwatch.msk.under_min_isr	MSK broker has partitions below minimum ISR on ${label:cluster_name}/${label:broker_id}
aws_cloudwatch_rds_replica_lag	cloudwatch.rds.replica_lag	RDS replica lag high on ${label:db_instance_identifier}
aws_cloudwatch_rds_maximum_used_transaction_ids	cloudwatch.rds.maximum_used_transaction_ids	RDS transaction ID usage high on ${label:db_instance_identifier}
aws_cloudwatch_rds_ebs_byte_balance	cloudwatch.rds.ebs_balance	RDS EBS byte balance low on ${label:db_instance_identifier}
aws_cloudwatch_rds_ebs_io_balance	cloudwatch.rds.ebs_balance	RDS EBS I/O balance low on ${label:db_instance_identifier}
aws_cloudwatch_vpn_tunnel_down	cloudwatch.vpn.tunnel_state	VPN tunnel down on ${label:vpn_id}
aws_cloudwatch_sns_invalid_notification_attributes	cloudwatch.sns.invalid_notifications	SNS invalid notification attributes on ${label:topic_name}
aws_cloudwatch_sns_invalid_notification_body	cloudwatch.sns.invalid_notifications	SNS invalid notification message body on ${label:topic_name}
aws_cloudwatch_sns_notifications_redriven_to_dlq	cloudwatch.sns.dlq_redrive	SNS notifications redriven to DLQ on ${label:topic_name}
aws_cloudwatch_sns_notifications_failed_to_redrive_to_dlq	cloudwatch.sns.dlq_redrive	SNS notifications failed to redrive to DLQ on ${label:topic_name}

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Charts are generated at runtime from the active service profiles:

Each static or discovered AWS instance becomes a chart instance identified by its account_id, region, and the profile's identifying dimensions (for example instance_id for EC2, or bucket_name and storage_type for S3).
All contexts live under the cloudwatch. namespace.
Metrics land on the job's configured vnode when present, otherwise on the node running the collector. Individual AWS resources are distinguished by labels, not created as separate Netdata nodes.
CloudWatch publishes with a delay, so allow a few minutes for the first datapoints.

Every job also emits three collector-activity chart types in the same cloudwatch.* namespace: cloudwatch.collector_api_calls (labeled operation instances, calls dimension), cloudwatch.collector_metric_requests (requests dimension), and cloudwatch.collector_queries (labeled profile instances, queries dimension). They report absolute counts for the interval since the preceding successfully committed collector frame; they measure collector-issued work, not an AWS invoice.

The built-in profiles ship the following charts by default. Each service links to its profile -- the authoritative definition of its exact metrics, statistics, dimensions, and charts:

Profile	Metric prefix	Description
Amazon EC2	`cloudwatch.ec2.*`	CPU utilization, network traffic, disk operations, status-check failures, attached-EBS status-check failures
Amazon RDS	`cloudwatch.rds.*`	CPU utilization, database connections, freeable memory, swap usage, free storage space, disk queue depth, disk and network throughput, IOPS, latency, replica lag, PostgreSQL transaction ID usage, EBS credit balance
Classic Load Balancer (ELB)	`cloudwatch.elb.*`	request count, backend and load-balancer response codes, backend connection errors, latency, host count, spillover count
Application Load Balancer (ALB)	`cloudwatch.alb.*`	request count, target and load-balancer response codes, connection rate, active connections, processed traffic, target response time, consumed LCUs
ALB Target Health	`cloudwatch.alb_target_health.*`	per-target-group unhealthy host count
Network Load Balancer (NLB)	`cloudwatch.nlb.*`	active and new flow counts, processed bytes and packets, consumed LCUs, TCP resets
NLB Target Health	`cloudwatch.nlb_target_health.*`	per-target-group unhealthy host count
Amazon S3	`cloudwatch.s3.*`	bucket size, number of objects (daily storage metrics)
AWS Lambda	`cloudwatch.lambda.*`	invocations, errors and throttles, duration
Amazon SQS	`cloudwatch.sqs.*`	message throughput, empty receives, queue depth, maximum age of oldest message, sent message size
Amazon DynamoDB	`cloudwatch.dynamodb.*`	consumed and provisioned capacity, throttle events
Amazon API Gateway	`cloudwatch.api_gateway.*`	requests, errors, latency
AWS Step Functions	`cloudwatch.step_functions.*`	executions, throttled events, execution time
NAT Gateway	`cloudwatch.nat_gateway.*`	traffic, active connections, connection rate, errors, idle timeouts
AWS PrivateLink endpoints	`cloudwatch.privatelink_endpoint.*`	endpoint-level active and new connections, processed bytes, dropped packets, and received reset packets
AWS PrivateLink endpoint services	`cloudwatch.privatelink_service.*`	provider-side active and new connections, connected endpoints, processed bytes, and sent reset packets
Amazon Kinesis Data Streams	`cloudwatch.kinesis.*`	data throughput, records, GetRecords iterator age, operation latency, throughput exceeded, PutRecords rejected
Amazon Data Firehose	`cloudwatch.firehose.*`	records, throughput, put requests, throttled records, S3 delivery freshness and success
Amazon SNS	`cloudwatch.sns.*`	messages published, notifications, invalid notification filters, DLQ redrive, published message size
Amazon EBS	`cloudwatch.ebs.*`	volume throughput, IOPS, queue length, idle time, burst balance
Amazon EFS	`cloudwatch.efs.*`	I/O throughput, metered vs permitted throughput, percent I/O limit, burst credit balance, client connections
Amazon ECS	`cloudwatch.ecs.*`	service utilization, EBS filesystem utilization, live task count
Amazon ElastiCache	`cloudwatch.elasticache.*`	CPU utilization, memory, database memory usage, current and new connections, cache hits and misses, evictions, network traffic
Amazon OpenSearch Service	`cloudwatch.opensearch.*`	cluster status, index writes blocked, automated snapshot failures, nodes, CPU utilization, JVM memory pressure, old-gen JVM memory pressure, free storage space, search and indexing rate, search and indexing latency
Amazon DocumentDB	`cloudwatch.docdb.*`	CPU utilization, freeable memory, connections, buffer cache hit ratio, disk IOPS, latency, throughput, replica lag, cursors timed out
Amazon Redshift	`cloudwatch.redshift.*`	health, CPU utilization, disk space used, database connections, disk IOPS, throughput, network throughput
Amazon MSK	`cloudwatch.msk.*`	broker throughput, messages in, CPU, disk used, memory, heap memory after GC, partitions, under-min-ISR partitions, connections
Amazon MSK Cluster	`cloudwatch.msk_cluster.*`	active controllers and offline partitions
Amazon CloudFront	`cloudwatch.cloudfront.*`	requests, downloaded and uploaded traffic, total/4xx/5xx error rates
AWS Auto Scaling	`cloudwatch.auto_scaling.*`	group sizing (min/max/desired/total) and instances by state (in-service, pending, standby, terminating)
Amazon Bedrock	`cloudwatch.bedrock.*`	invocations, invocation errors, token throughput, invocation and time-to-first-token latency
Amazon EventBridge	`cloudwatch.eventbridge.*`	target invocations, rule activity (matched events, triggered rules), ingestion-to-invocation latency
AWS Site-to-Site VPN	`cloudwatch.vpn.*`	tunnel traffic (in/out) and tunnel state (fraction of tunnels up)
Amazon EKS	`cloudwatch.eks.*`	control-plane health: API server request rate, errors, p99 latency, and in-flight requests; etcd database size; scheduler pending pods and scheduling attempts

Stock profiles can also declare opt-in metrics (disabled: true on the metric). They are not queried or billed by default, and their chart definitions are already part of the profile:

Enable one through that profile's rules[].metrics[].include; the chart materializes once the selected series emits data.
The metric group keeps the profile's default-enabled metrics unless defaults: false, and omitted statistics inherit the profile declaration.
There is no need to copy and edit a stock profile merely to enable a curated metric.

Stock profiles are shipped at /usr/lib/netdata/conf.d/go.d/cloudwatch.profiles/default/; custom profile overrides live under /etc/netdata/go.d/cloudwatch.profiles/ and require a go.d restart because the catalog is cached process-wide.

These disabled opt-in profiles are collected when a rule names them in profiles.include:

Profile	Metric prefix	Description
ALB Target Groups	`cloudwatch.alb_target.*`	per-target-group host count, requests per target, response time, response codes, connection errors
DynamoDB Operations	`cloudwatch.dynamodb_operation.*`	per-operation successful request latency, system errors, throttled requests, returned items
EBS Stalled I/O	`cloudwatch.ebs_stalled_io.*`	per-volume stalled I/O health check
S3 Request Metrics	`cloudwatch.s3_requests.*`	requests, request errors, request latency, request data transfer
AWS PrivateLink endpoints by subnet	`cloudwatch.privatelink_endpoint_subnet.*`	the endpoint metrics split by `subnet_id`; one endpoint can produce several chart instances
AWS PrivateLink services by Availability Zone	`cloudwatch.privatelink_service_az.*`	provider-side traffic split by `availability_zone` and `service_id`
AWS PrivateLink services by load balancer	`cloudwatch.privatelink_service_load_balancer.*`	provider-side traffic split by `load_balancer_arn` and `service_id`
AWS PrivateLink services by Availability Zone and load balancer	`cloudwatch.privatelink_service_az_load_balancer.*`	provider-side traffic split by `availability_zone`, `load_balancer_arn`, and `service_id`
AWS PrivateLink services by VPC endpoint	`cloudwatch.privatelink_service_vpc_endpoint.*`	provider-side traffic split by `service_id` and consumer `vpc_endpoint_id`
AWS Billing total	`cloudwatch.billing_total.*`	latest worldwide estimated month-to-date charge; identity labels: `account_id`, `region`
AWS Billing by service	`cloudwatch.billing_service.*`	estimated charges by `service_name`
AWS Billing by linked account	`cloudwatch.billing_linked_account.*`	estimated charges by `linked_account_id` when the payer/management account publishes this grain
AWS Billing by linked account and service	`cloudwatch.billing_linked_account_service.*`	estimated charges by `linked_account_id` and `service_name` when available

Billing grains. The Billing profiles are exact grains rather than one wildcard: select only the views you need. All use EstimatedCharges, statistic Maximum, USD, a 10-minute period, and a 24-hour retrieval window.

AWS publishes the underlying estimate several times daily; the collector re-emits the newest successful value between publications.
Around the UTC month boundary, a prior-month value can remain visible until a successful query replaces or expires it, and transient AWS failures can retain it longer.
region=us-east-1 identifies the CloudWatch publication/query location; the charge itself is worldwide.
Billing dimensions are not Resource Groups Tagging API resources, so resource-tag filters and tag-derived labels do not apply.
A valid job can produce no Billing chart when AWS has not published the selected grain.

PrivateLink endpoint grains. The default privatelink_endpoint profile identifies one endpoint by endpoint_type, service_name, vpc_endpoint_id, and vpc_id. The opt-in privatelink_endpoint_subnet profile adds subnet_id; enable it deliberately, because every endpoint can produce several subnet chart instances and seven additional metric/statistic queries per instance. Both profiles share one AWS/PrivateLinkEndpoints discovery scan and one VPC endpoint tag association, so endpoint resource-tag filters and labels also apply to every subnet child. The stock surface exports Average and per-second Sum views where both are useful; exact metric/statistic rules can assign different timing without shadowing siblings.

PrivateLink service grains. The default privatelink_service profile identifies the provider service by service_id and is the only grain that exports EndpointsCount. The opt-in profiles split by availability_zone, load_balancer_arn, both, or consumer vpc_endpoint_id. All five share one AWS/PrivateLinkServices discovery scan and join tags through the parent endpoint service (ec2:vpc-endpoint-service). The collector deliberately does not attach endpoint or load-balancer tags to detailed children -- a vpc_endpoint_id can identify a consumer endpoint outside the service-owning account. EndpointsCount is read on its documented five-minute cadence and records zero when no datapoint is published; traffic gauges show gaps when absent.

PrivateLink cost. At stock timing (five-minute period, lookback, and publication delay), an endpoint, subnet, or default service instance has five structural CloudWatch metrics: (86,400 / 300) × 5 = 1,440 billable metric requests per day before retries. Each opt-in detailed service instance has four, or 1,152 per day. A one-minute override runs its selected metrics five times as often; narrow profiles, metrics, regions, grains, and resource tags when that freshness is not required.

Cardinality warning. These opt-in profiles include potentially high-cardinality data. S3 Request Metrics additionally require per-bucket request-metrics configuration in AWS and are billed at CloudWatch custom-metric rates; they collect nothing until enabled on the bucket. PrivateLink cardinality grows with endpoint subnets, service Availability Zones, load balancers, and consumer endpoints; the combined Availability Zone/load-balancer grain multiplies those dimensions. The Billing service/account grains grow with the payer's services and linked accounts.

Per AWS account, region, and operation

Collector-issued CloudWatch API work attributed to one resolved AWS account, region, and API operation.

Labels:

Label	Description
account_id	Resolved AWS account ID.
region	AWS region where the collector issued the operation.
operation	Collector-issued CloudWatch API operation (`list_metrics` or `get_metric_data`).

Metrics:

Metric	Dimensions	Unit
cloudwatch.collector_api_calls	calls	calls

Per AWS account and region

Calculated billable CloudWatch metric requests attributed to one resolved AWS account and region.

Labels:

Label	Description
account_id	Resolved AWS account ID.
region	AWS region where the collector submitted the metric requests.

Metrics:

Metric	Dimensions	Unit
cloudwatch.collector_metric_requests	requests	requests

Per AWS account, region, and profile

Raw CloudWatch metric-data queries attributed to their source profile for collection-plan tuning.

Labels:

Label	Description
account_id	Resolved AWS account ID.
region	AWS region where the collector submitted the queries.
profile	CloudWatch profile that produced the submitted raw queries.

Metrics:

Metric	Dimensions	Unit
cloudwatch.collector_queries	queries	queries

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the cloudwatch collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
```
sudo -u netdata -s
```

Run the go.d.plugin to debug the collector:

./go.d.plugin -d -m cloudwatch

To debug a specific job:

./go.d.plugin -d -m cloudwatch -j jobName

Getting Logs

If you're encountering problems with the cloudwatch collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep cloudwatch

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

grep cloudwatch /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

docker logs netdata 2>&1 | grep cloudwatch

No metrics are collected

Check the following:

Permissions -- every target allows cloudwatch:GetMetricData; targets selecting any profile with dynamic dimensions also require cloudwatch:ListMetrics. All-constant profiles such as billing_total skip discovery and do not need ListMetrics. sts:GetCallerIdentity needs no explicit grant. Targets with assume_role also require sts:AssumeRole on the source identity. Resource tag filtering or labels require tag:GetResources.
Rules -- rules[].targets, rules[].profiles, optional rules[].metrics, and rules[].regions select the expected target, service, exact metric/statistic, and region. CloudFront publishes metrics only in us-east-1; its profile enforces this automatically.
Resource filters -- a rule that omits filters.resource_tags inherits rule_defaults.filters.resource_tags. An explicitly included profile without a safe Resource Groups Tagging API association is rejected; use filters.resource_tags: [] for a deliberate unfiltered rule.
Resources are active -- confirm in the AWS CloudWatch console that the resources are publishing metrics.
Timeout -- discovery shares one timeout (default 30 seconds) across a whole refresh. A large scope can exhaust it; the refresh is then discarded and retried after discovery.refresh_every. Raise timeout or narrow the scope.

Collector logs -- check for authentication or API errors:

# systemd
journalctl -u netdata --namespace=netdata --grep cloudwatch --since "5 minutes ago"
# non-systemd
grep cloudwatch /var/log/netdata/collector.log

Missing metrics for some services

Profile selection -- omit rules[].profiles to select defaults, or ensure the service basename appears under rules[].profiles.include and is not excluded.
Metric selection -- omit rules[].metrics to collect default-enabled metrics. A group changes only its named profile and keeps those defaults unless defaults: false; every include[].name must be an exact AWS MetricName. Statistics may come from the metric entry, group, or profile declaration.
Daily metrics -- AWS documents that S3 storage metrics are reported once per day, but does not state a publication-within-one-day guarantee. The stock profile therefore uses a conservative 1d collector delay policy, and recently_active_only is automatically disabled for its long query horizon.
Resource activity -- some metrics only appear when the resource is actively processing data (for example, EventBridge and Bedrock publish a metric only when its value is non-zero).
Auto Scaling group metrics -- Auto Scaling group metrics (cloudwatch.auto_scaling.*) are not published until group-metrics collection is enabled on the group (aws autoscaling enable-metrics-collection --granularity 1Minute). Amazon EKS managed node groups have it enabled by default.
EKS control-plane metrics -- EKS control-plane metrics (cloudwatch.eks.*) are published to the AWS/EKS namespace automatically, at no additional EKS charge, only for clusters running Kubernetes 1.28 or later; older clusters do not report them. These are distinct from Container Insights / the CloudWatch Observability add-on (agent-based, billed separately).

Charts have gaps or incomplete data

CloudWatch publishes metrics with a delay.

Keep rules[].query.period at or above the metric's real publication cadence. A shorter override increases billed query frequency but cannot make AWS publish more often, so it can create empty windows.
Set rules[].query.publication_delay when a workload publishes completed buckets later than its profile or the built-in 10m fallback.
Check rule_defaults.query.publication_delay before overriding an individual rule. A job-wide value replaces profile-specific delays for inheriting rules, including the stock S3 storage profile's conservative 1d, and a shorter value can query daily data before it is published.
Set rules[].query.lookback to search a wider rolling window for sparse datapoints. It must be an exact multiple of the effective period.
Stale-looking values -- a successful query presents its newest eligible datapoint on every Netdata collection cycle, so an old CloudWatch value can appear current for up to lookback. During transient AWS failures, replay can continue longer until a successful query replaces or expires it.
Longer lookbacks increase response work and may disable recently_active_only for the shared discovery scan.
Transient query failures preserve the retained value and retry after one update_every; later delays double within the same eligible window up to the effective period. A newly eligible window resets the backoff.

Discovery group limit exceeded

A discovery group is one unique (target, region, namespace) combination. Compatible rules and profiles share the same group. limits.max_discovery_groups defaults to 64 to bound accidental ListMetrics expansion.

This can result from accidental target/region/profile expansion or from a legitimate large installation. Verify the derived scope first. For intentional scale, raise the safeguard up to 100. Beyond 100, split the configuration across multiple jobs: one bounded refresh admits at most 100 ListMetrics SDK operations, and every non-skipped group that resolves a client reaches its first admitted operation before continuations. Skipped groups and client-resolution failures consume no operation budget. Splitting preserves metric coverage while keeping each job's discovery cost, memory, and completion time bounded.

Job fails to start or the plan is rejected

Configuration validation and plan preflight fail loudly instead of silently truncating; the collector log names the exact bound or reference that failed.

Names and references -- credential, target, and rule names must be lowercase, start with a letter, use only letters, digits, _, and -, and stay within 64 characters. Every target and rule reference must point to a defined name.
Partition mismatch -- all regions selected for one target must belong to one AWS partition, and an assumed-role ARN must match it. Split partitions across targets or jobs.
Plan size -- more than 20,000 selected series, 600,000 datapoints due in one cycle, or 40 batched GetMetricData requests rejects the plan. Narrow rules or split the configuration across jobs.
limits.max_instances -- more distinct final instances than the limit (default 1000) rejects the refreshed plan. Raise the limit deliberately or narrow the selection; instances are never silently truncated.

Access denied or authentication errors

Verify the credential source referenced by the failing target is valid and not expired.
For a target with assume_role, confirm its source identity is allowed to assume the role and that the role trust policy permits it. If the trust policy requires an external ID, use the value supplied by the role owner.
For AWS GovCloud or China partitions, ensure each target's selected rule regions match its role ARN partition.

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

Overview​

Default Behavior​

Auto-Detection​

Limits​

Performance Impact​

Setup​

Prerequisites​

Create an AWS IAM identity with CloudWatch read access​

Enable CloudWatch Billing metrics before selecting Billing profiles​

Configuration​

Options​

via UI​

via File​

Examples​

Default credentials, single region​

Static credentials assume multiple roles​

Add an opt-in metric to a profile​

Different timing for metrics in one profile​

Lower resolution to reduce cost​

AWS Billing estimated charges​

AWS PrivateLink endpoints with split timing​

AWS PrivateLink services with split timing​

All services including opt-in profiles​

Filter resources by tag and add tag labels​

Present AWS metrics on a virtual node​

Alerts​

Metrics​

Per AWS account, region, and operation​

Per AWS account and region​

Per AWS account, region, and profile​

Troubleshooting​

Debug Mode​

Getting Logs​

System with systemd​

System without systemd​

Docker Container​

No metrics are collected​

Missing metrics for some services​

Charts have gaps or incomplete data​

Discovery group limit exceeded​

Job fails to start or the plan is rejected​

Access denied or authentication errors​

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Setup

Prerequisites

Create an AWS IAM identity with CloudWatch read access

Enable CloudWatch Billing metrics before selecting Billing profiles

Configuration

Options

via UI

via File

Examples

Default credentials, single region

Static credentials assume multiple roles

Add an opt-in metric to a profile

Different timing for metrics in one profile

Lower resolution to reduce cost

AWS Billing estimated charges

AWS PrivateLink endpoints with split timing

AWS PrivateLink services with split timing

All services including opt-in profiles

Filter resources by tag and add tag labels

Present AWS metrics on a virtual node

Alerts

Metrics

Per AWS account, region, and operation

Per AWS account and region

Per AWS account, region, and profile

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container

No metrics are collected

Missing metrics for some services

Charts have gaps or incomplete data

Discovery group limit exceeded

Job fails to start or the plan is rejected

Access denied or authentication errors