Prometheus and CloudWatch Integration for AWS Metric Collection

The Prometheus CloudWatch exporter pulls AWS CloudWatch metrics into your Prometheus setup, giving you a unified view of your infrastructure alongside application metrics.

If you're already running Prometheus and need visibility into AWS services like EC2, RDS, or Lambda, this exporter handles the integration without forcing you to switch monitoring stacks.

The blog talks about how to configure the exporter, which metrics are most important, and how to avoid the common pitfalls that can spike your CloudWatch API costs.

Why CloudWatch Is a Smart Backend for Prometheus Workloads

AWS CloudWatch stores detailed metrics about your infrastructure, but accessing them through separate dashboards fragments your monitoring. The CloudWatch exporter brings these metrics into Prometheus, where you can correlate them with application metrics, set up unified alerting rules, and build comprehensive dashboards.

The exporter works by querying CloudWatch APIs on a schedule and converting the data into Prometheus format. This means you get AWS infrastructure metrics alongside your custom application metrics in the same time-series database.

💡

For a look at how to handle logs across AWS accounts in a similar way, check out this guide on centralized AWS logging.

Getting the CloudWatch Exporter Running on Your Infrastructure

The Prometheus CloudWatch exporter runs as a separate service that Prometheus scrapes. You'll need AWS credentials with CloudWatch read permissions and a configuration file that defines which metrics to collect.

There are two formats you can start with:

Basic Configuration

Start with a minimal config file (config.yml):

# Standard format
region: us-east-1
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_dimensions: [InstanceId]
    aws_statistics: [Average]

Alternative configuration format:

# Alternative format (also valid)
region: us-east-1
namespace: AWS/EC2
metrics:
  - name: CPUUtilization
    dimensions:
      - name: InstanceId
        value: i-12345678

Both formats work, but they serve slightly different use cases. The first format follows the CloudWatch Exporter's standard structure, using aws_namespace, aws_metric_name, and aws_dimensions. It’s more flexible and better suited for dynamically collecting metrics across multiple resources, like all EC2 instances in a region, without hardcoding instance IDs.

The second format is more specific. It’s useful when you want to target a known resource directly, such as a particular EC2 instance, and define its dimensions explicitly. This can work well for small-scale setups or debugging, but doesn’t scale easily.

Download Binary or Use Docker:

You can run the CloudWatch exporter in several ways. The most straightforward is using Docker, but you can also download the binary directly.

Download the binary:

# Get the latest release for Linux
curl -s https://api.github.com/repos/prometheus/cloudwatch_exporter/releases/latest \
  | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -i -

# Extract and run
tar -xzf cloudwatch_exporter-*.tar.gz
./cloudwatch_exporter --config.file=config.yml

This downloads the latest release directly from GitHub, extracts the binary, and runs it with your configuration file. The exporter will start listening on port 9106 by default.

Using Docker:

docker run -p 9106:9106 -v $(pwd)/config.yml:/config.yml \
  prom/cloudwatch-exporter:latest --config.file=/config.yml

The Docker approach mounts your local config file into the container and exposes port 9106 for Prometheus to scrape. This method keeps your host system clean and handles dependencies automatically.

💡

If you're sending custom metrics to CloudWatch alongside standard ones, this guide covers setup and types with clear examples.

IAM Policies Required to Scrape CloudWatch Metrics

The exporter needs read access to CloudWatch APIs. Create an IAM user or role with the CloudWatchReadOnlyAccess policy, or create a custom policy with these permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:GetMetricData",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}

This policy grants the minimum permissions needed: fetching metric data, getting statistics over time ranges, and listing available metrics. The exporter uses these to discover and collect your CloudWatch metrics.

For EC2 instances, attach this policy to an IAM role and assign the role to your instance. For local development, use AWS CLI credentials or environment variables:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1

These environment variables tell the exporter how to authenticate with AWS and which region to query for metrics.

Add the exporter to your Prometheus config:

scrape_configs:
  - job_name: 'cloudwatch'
    static_configs:
      - targets: ['localhost:9106']
    scrape_interval: 300s  # CloudWatch has 5-minute resolution

Essential CloudWatch Metrics for Production Monitoring

Focus on metrics that help you understand system health and performance patterns.

EC2 Performance Indicators: CPU, Network, and Disk Metrics

metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_dimensions: [InstanceId]
    aws_statistics: [Average]
  
  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_dimensions: [InstanceId]
    aws_statistics: [Sum]
    
  - aws_namespace: AWS/EC2
    aws_metric_name: DiskReadOps
    aws_dimensions: [InstanceId]
    aws_statistics: [Sum]

This collects the three core EC2 metrics: CPU usage as a percentage, network bytes received (Sum gives you total traffic), and disk read operations. These metrics help identify compute bottlenecks, network saturation, and storage performance issues.

RDS Database Health: Connection and Latency Tracking

Database metrics from Amazon RDS often reveal performance bottlenecks before they affect users:

- aws_namespace: AWS/RDS
    aws_metric_name: DatabaseConnections
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
    
  - aws_namespace: AWS/RDS
    aws_metric_name: ReadLatency
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]

DatabaseConnections shows how many active connections your database is handling - useful for capacity planning and detecting connection leaks. ReadLatency measures how long read queries take, which directly impacts application response times.

Application Load Balancer Monitoring: Response Times and Error Rates

- aws_namespace: AWS/ApplicationELB
    aws_metric_name: TargetResponseTime
    aws_dimensions: [LoadBalancer]
    aws_statistics: [Average]
    
  - aws_namespace: AWS/ApplicationELB
    aws_metric_name: HTTPCode_Target_5XX_Count
    aws_dimensions: [LoadBalancer]
    aws_statistics: [Sum]

TargetResponseTime measures how long your application takes to respond through the load balancer – this is what users experience. HTTPCode_Target_5XX_Count tracks server errors, helping you catch application issues before they become widespread.

Probo Cuts Monitoring Costs by 90% with Last9

How to Control CloudWatch API Costs

Each metric you configure generates CloudWatch API calls. With hundreds of instances, this adds up quickly. The exporter makes one API call per metric per dimension combination.

Target Specific Resources: Avoid the "Collect Everything" Trap

Instead of collecting metrics for all instances:

# This creates API calls for every EC2 instance
aws_dimensions: [InstanceId]

This configuration makes one API call per instance, which can get expensive with hundreds of instances.

Target specific instances or use auto-discovery sparingly:

# More focused approach
aws_dimensions: [InstanceId]
aws_dimension_select:
  InstanceId: [i-1234567890abcdef0, i-0987654321fedcba0]

This limits collection to only the specified instances, reducing API calls and costs. Use this pattern for critical instances that need detailed monitoring.

Match CloudWatch Resolution: Don't Over-Scrape

CloudWatch metrics have different native resolutions. Basic monitoring provides 5-minute data points, while detailed monitoring offers 1-minute resolution.

Set your Prometheus scrape interval to match:

scrape_configs:
  - job_name: 'cloudwatch'
    scrape_interval: 300s  # 5 minutes for basic monitoring
    scrape_timeout: 120s   # Give API calls time to complete

Scraping every 5 minutes matches CloudWatch's basic monitoring cadence. Scraping more frequently just creates duplicate data points and wastes API calls. The timeout gives CloudWatch APIs enough time to respond during busy periods.

💡

If your AWS setup includes SQS, this guide explains the key metrics to monitor for queue health and processing reliability.

Advanced Exporter Configuration Patterns

Auto-Discovery: Monitoring Dynamic Infrastructure

For environments where instances come and go, use dimension filters:

metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_dimensions: [InstanceId]
    aws_dimension_select_regex:
      InstanceId: ["i-.*"]  # All instances
    aws_statistics: [Average]

The regex pattern i-.* matches all EC2 instance IDs, automatically picking up new instances as they launch. This works well for auto-scaling groups where instance IDs change frequently.

Cleaner Metric Names: Customizing Prometheus Output

The default naming can get verbose. Customize metric names for cleaner Prometheus queries:

metrics:
  - aws_namespace: AWS/RDS
    aws_metric_name: DatabaseConnections
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
    set_timestamp: false
    # Results in: aws_rds_database_connections_average

Setting set_timestamp: false uses the scrape timestamp instead of CloudWatch's timestamp, which works better with Prometheus's time-series model. The resulting metric name follows Prometheus naming conventions.

Monitor the Exporter with Meta Metrics

The CloudWatch exporter exposes its metrics on /metrics. Watch for API throttling and collection duration:

cloudwatch_requests_total - Total API requests made
cloudwatch_api_requests_duration_seconds - How long do API calls take
cloudwatch_get_metric_statistics_requests_total - Specific metric requests

Set up Prometheus alerts when API calls start failing or taking too long:

- alert: CloudWatchExporterDown
  expr: up{job="cloudwatch"} == 0
  for: 5m
  annotations:
    summary: "CloudWatch exporter is down"

This alert fires when Prometheus can't scrape the exporter for 5 minutes, indicating either the exporter has crashed or has network connectivity issues.

Connect to Broader Observability Platforms

Once you're collecting CloudWatch metrics in Prometheus, you can plug them into your broader observability setup. At Last9, we run a managed platform built for high-cardinality telemetry, so you don’t have to worry about scale, cost blowups, or broken dashboards.

You don’t have to change your setup; just a few minimal tweaks are enough to get started with Last9. We work with your existing Prometheus and CloudWatch exporters, so you can keep your workflows and dashboards while gaining better performance, cost controls, and observability at scale.

Troubleshooting Common CloudWatch Exporter Issues

API Rate Limiting: When CloudWatch Pushes Back

CloudWatch APIs have rate limits. If you're collecting many metrics, you'll hit them. The exporter includes built-in retry logic, but you might need to reduce collection frequency or split metrics across multiple exporter instances.

# Split high-volume metrics across instances
# Instance 1: EC2 metrics
# Instance 2: RDS metrics  
# Instance 3: ELB metrics

This approach distributes API load across multiple exporter processes, each with their rate limit bucket.

Timestamp Synchronization: CloudWatch vs Prometheus Time

CloudWatch metrics have timestamps, but Prometheus prefers current timestamps for scraping. Set set_timestamp: false in your config to use scrape time instead of CloudWatch timestamps.

metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_dimensions: [InstanceId]
    aws_statistics: [Average]
    set_timestamp: false  # Use Prometheus scrape time

This prevents time-series alignment issues when CloudWatch timestamps don't match your Prometheus scrape intervals.

Managing High-Cardinality Dimensions

Some AWS services have high-cardinality dimensions (like Lambda function versions). These can create thousands of metric series. Use aws_dimension_select to limit which dimensions you collect.

metrics:
  - aws_namespace: AWS/Lambda
    aws_metric_name: Invocations
    aws_dimensions: [FunctionName]  # Skip version dimension
    aws_statistics: [Sum]

This collects Lambda invocations by function name only, avoiding the version dimension that could create separate series for each deployment.

💡

Now, fix production CloudWatch and Prometheus issues instantly right from your IDE, with AI and Last9 MCP. Bring real-time production context into your local environment to auto-fix code faster.

Final Thoughts

This guide covered how to get AWS CloudWatch metrics into Prometheus using the CloudWatch exporter. We looked at the setup process, the IAM permissions needed, which metrics are useful to track, and how to monitor the exporter itself.

The idea is to make AWS metrics part of your existing Prometheus workflow, without rebuilding everything from scratch.

You can also combine CloudWatch metrics with Prometheus recording rules to create derived metrics that span both AWS infrastructure and application performance.

💡

And if you're dealing with edge cases or need help debugging exporter configs, our Discord Community is open. We have a channel where you can discuss specific setups with other developers.

FAQs

Q: How often should I scrape the CloudWatch exporter?

A: Match your scrape interval to CloudWatch's resolution. Basic monitoring provides 5-minute data points, so scraping every 5 minutes (300s) makes sense. Scraping more frequently just creates duplicate data points.

Q: What are the minimum IAM permissions needed?

A: You need cloudwatch:GetMetricStatistics, cloudwatch:GetMetricData, and cloudwatch:ListMetrics. The CloudWatchReadOnlyAccess managed policy includes these plus some extras, but works fine for most setups.

Q: Can I run this on EC2 vs. locally?

A: Both work. EC2 instances can use IAM roles (recommended), while local development typically uses AWS CLI credentials or environment variables. Running on EC2 in the same region as your resources reduces latency.

Q: What happens if the exporter can't reach CloudWatch APIs? A: Failed API calls return no data for that scrape interval. Prometheus will see gaps in the time series. The exporter includes retry logic, but persistent failures usually indicate network issues or credential problems.

Q: How do I collect custom CloudWatch metrics?

A: Use the same configuration format but specify your custom namespace. If you're publishing custom metrics to MyApp/Performance, set aws_namespace: MyApp/Performance in your config.

Q: Can I filter metrics by tags?

A: Not directly through the exporter config. You'll need to use CloudWatch's dimension filtering or set up separate exporter instances for different tagged resources.