The Prometheus CloudWatch exporter pulls AWS CloudWatch metrics into your Prometheus setup, giving you a unified view of your infrastructure alongside application metrics.
If you're already running Prometheus and need visibility into AWS services like EC2, RDS, or Lambda, this exporter handles the integration without forcing you to switch monitoring stacks.
The blog talks about how to configure the exporter, which metrics are most important, and how to avoid the common pitfalls that can spike your CloudWatch API costs.
Why CloudWatch Is a Smart Backend for Prometheus Workloads
AWS CloudWatch stores detailed metrics about your infrastructure, but accessing them through separate dashboards fragments your monitoring. The CloudWatch exporter brings these metrics into Prometheus, where you can correlate them with application metrics, set up unified alerting rules, and build comprehensive dashboards.
The exporter works by querying CloudWatch APIs on a schedule and converting the data into Prometheus format. This means you get AWS infrastructure metrics alongside your custom application metrics in the same time-series database.
Getting the CloudWatch Exporter Running on Your Infrastructure
The Prometheus CloudWatch exporter runs as a separate service that Prometheus scrapes. You'll need AWS credentials with CloudWatch read permissions and a configuration file that defines which metrics to collect.
There are two formats you can start with:
Basic Configuration
Start with a minimal config file (config.yml
):
# Standard format
region: us-east-1
metrics:
- aws_namespace: AWS/EC2
aws_metric_name: CPUUtilization
aws_dimensions: [InstanceId]
aws_statistics: [Average]
Alternative configuration format:
# Alternative format (also valid)
region: us-east-1
namespace: AWS/EC2
metrics:
- name: CPUUtilization
dimensions:
- name: InstanceId
value: i-12345678
Both formats work, but they serve slightly different use cases. The first format follows the CloudWatch Exporter's standard structure, using aws_namespace
, aws_metric_name
, and aws_dimensions
. It’s more flexible and better suited for dynamically collecting metrics across multiple resources, like all EC2 instances in a region, without hardcoding instance IDs.
The second format is more specific. It’s useful when you want to target a known resource directly, such as a particular EC2 instance, and define its dimensions explicitly. This can work well for small-scale setups or debugging, but doesn’t scale easily.
Download Binary or Use Docker:
You can run the CloudWatch exporter in several ways. The most straightforward is using Docker, but you can also download the binary directly.
Download the binary:
# Get the latest release for Linux
curl -s https://api.github.com/repos/prometheus/cloudwatch_exporter/releases/latest \
| grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -i -
# Extract and run
tar -xzf cloudwatch_exporter-*.tar.gz
./cloudwatch_exporter --config.file=config.yml
This downloads the latest release directly from GitHub, extracts the binary, and runs it with your configuration file. The exporter will start listening on port 9106 by default.
Using Docker:
docker run -p 9106:9106 -v $(pwd)/config.yml:/config.yml \
prom/cloudwatch-exporter:latest --config.file=/config.yml
The Docker approach mounts your local config file into the container and exposes port 9106 for Prometheus to scrape. This method keeps your host system clean and handles dependencies automatically.
IAM Policies Required to Scrape CloudWatch Metrics
The exporter needs read access to CloudWatch APIs. Create an IAM user or role with the CloudWatchReadOnlyAccess
policy, or create a custom policy with these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics",
"cloudwatch:GetMetricData",
"cloudwatch:ListMetrics"
],
"Resource": "*"
}
]
}
This policy grants the minimum permissions needed: fetching metric data, getting statistics over time ranges, and listing available metrics. The exporter uses these to discover and collect your CloudWatch metrics.
For EC2 instances, attach this policy to an IAM role and assign the role to your instance. For local development, use AWS CLI credentials or environment variables:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1
These environment variables tell the exporter how to authenticate with AWS and which region to query for metrics.
Add the exporter to your Prometheus config:
scrape_configs:
- job_name: 'cloudwatch'
static_configs:
- targets: ['localhost:9106']
scrape_interval: 300s # CloudWatch has 5-minute resolution
Essential CloudWatch Metrics for Production Monitoring
Focus on metrics that help you understand system health and performance patterns.
EC2 Performance Indicators: CPU, Network, and Disk Metrics
metrics:
- aws_namespace: AWS/EC2
aws_metric_name: CPUUtilization
aws_dimensions: [InstanceId]
aws_statistics: [Average]
- aws_namespace: AWS/EC2
aws_metric_name: NetworkIn
aws_dimensions: [InstanceId]
aws_statistics: [Sum]
- aws_namespace: AWS/EC2
aws_metric_name: DiskReadOps
aws_dimensions: [InstanceId]
aws_statistics: [Sum]
This collects the three core EC2 metrics: CPU usage as a percentage, network bytes received (Sum gives you total traffic), and disk read operations. These metrics help identify compute bottlenecks, network saturation, and storage performance issues.
RDS Database Health: Connection and Latency Tracking
Database metrics from Amazon RDS often reveal performance bottlenecks before they affect users:
- aws_namespace: AWS/RDS
aws_metric_name: DatabaseConnections
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
- aws_namespace: AWS/RDS
aws_metric_name: ReadLatency
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
DatabaseConnections shows how many active connections your database is handling - useful for capacity planning and detecting connection leaks. ReadLatency measures how long read queries take, which directly impacts application response times.
Application Load Balancer Monitoring: Response Times and Error Rates
- aws_namespace: AWS/ApplicationELB
aws_metric_name: TargetResponseTime
aws_dimensions: [LoadBalancer]
aws_statistics: [Average]
- aws_namespace: AWS/ApplicationELB
aws_metric_name: HTTPCode_Target_5XX_Count
aws_dimensions: [LoadBalancer]
aws_statistics: [Sum]
TargetResponseTime
measures how long your application takes to respond through the load balancer – this is what users experience. HTTPCode_Target_5XX_Count
tracks server errors, helping you catch application issues before they become widespread.

How to Control CloudWatch API Costs
Each metric you configure generates CloudWatch API calls. With hundreds of instances, this adds up quickly. The exporter makes one API call per metric per dimension combination.
Target Specific Resources: Avoid the "Collect Everything" Trap
Instead of collecting metrics for all instances:
# This creates API calls for every EC2 instance
aws_dimensions: [InstanceId]
This configuration makes one API call per instance, which can get expensive with hundreds of instances.
Target specific instances or use auto-discovery sparingly:
# More focused approach
aws_dimensions: [InstanceId]
aws_dimension_select:
InstanceId: [i-1234567890abcdef0, i-0987654321fedcba0]
This limits collection to only the specified instances, reducing API calls and costs. Use this pattern for critical instances that need detailed monitoring.
Match CloudWatch Resolution: Don't Over-Scrape
CloudWatch metrics have different native resolutions. Basic monitoring provides 5-minute data points, while detailed monitoring offers 1-minute resolution.
Set your Prometheus scrape interval to match:
scrape_configs:
- job_name: 'cloudwatch'
scrape_interval: 300s # 5 minutes for basic monitoring
scrape_timeout: 120s # Give API calls time to complete
Scraping every 5 minutes matches CloudWatch's basic monitoring cadence. Scraping more frequently just creates duplicate data points and wastes API calls. The timeout gives CloudWatch APIs enough time to respond during busy periods.
Advanced Exporter Configuration Patterns
Auto-Discovery: Monitoring Dynamic Infrastructure
For environments where instances come and go, use dimension filters:
metrics:
- aws_namespace: AWS/EC2
aws_metric_name: CPUUtilization
aws_dimensions: [InstanceId]
aws_dimension_select_regex:
InstanceId: ["i-.*"] # All instances
aws_statistics: [Average]
The regex pattern i-.*
matches all EC2 instance IDs, automatically picking up new instances as they launch. This works well for auto-scaling groups where instance IDs change frequently.
Cleaner Metric Names: Customizing Prometheus Output
The default naming can get verbose. Customize metric names for cleaner Prometheus queries:
metrics:
- aws_namespace: AWS/RDS
aws_metric_name: DatabaseConnections
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
set_timestamp: false
# Results in: aws_rds_database_connections_average
Setting set_timestamp: false
uses the scrape timestamp instead of CloudWatch's timestamp, which works better with Prometheus's time-series model. The resulting metric name follows Prometheus naming conventions.
Monitor the Exporter with Meta Metrics
The CloudWatch exporter exposes its metrics on /metrics
. Watch for API throttling and collection duration:
cloudwatch_requests_total
- Total API requests madecloudwatch_api_requests_duration_seconds
- How long do API calls takecloudwatch_get_metric_statistics_requests_total
- Specific metric requests
Set up Prometheus alerts when API calls start failing or taking too long:
- alert: CloudWatchExporterDown
expr: up{job="cloudwatch"} == 0
for: 5m
annotations:
summary: "CloudWatch exporter is down"
This alert fires when Prometheus can't scrape the exporter for 5 minutes, indicating either the exporter has crashed or has network connectivity issues.
Connect to Broader Observability Platforms
Once you're collecting CloudWatch metrics in Prometheus, you can plug them into your broader observability setup. At Last9, we run a managed platform built for high-cardinality telemetry, so you don’t have to worry about scale, cost blowups, or broken dashboards.
You don’t have to change your setup; just a few minimal tweaks are enough to get started with Last9. We work with your existing Prometheus and CloudWatch exporters, so you can keep your workflows and dashboards while gaining better performance, cost controls, and observability at scale.
Troubleshooting Common CloudWatch Exporter Issues
API Rate Limiting: When CloudWatch Pushes Back
CloudWatch APIs have rate limits. If you're collecting many metrics, you'll hit them. The exporter includes built-in retry logic, but you might need to reduce collection frequency or split metrics across multiple exporter instances.
# Split high-volume metrics across instances
# Instance 1: EC2 metrics
# Instance 2: RDS metrics
# Instance 3: ELB metrics
This approach distributes API load across multiple exporter processes, each with their rate limit bucket.
Timestamp Synchronization: CloudWatch vs Prometheus Time
CloudWatch metrics have timestamps, but Prometheus prefers current timestamps for scraping. Set set_timestamp: false
in your config to use scrape time instead of CloudWatch timestamps.
metrics:
- aws_namespace: AWS/EC2
aws_metric_name: CPUUtilization
aws_dimensions: [InstanceId]
aws_statistics: [Average]
set_timestamp: false # Use Prometheus scrape time
This prevents time-series alignment issues when CloudWatch timestamps don't match your Prometheus scrape intervals.
Managing High-Cardinality Dimensions
Some AWS services have high-cardinality dimensions (like Lambda function versions). These can create thousands of metric series. Use aws_dimension_select
to limit which dimensions you collect.
metrics:
- aws_namespace: AWS/Lambda
aws_metric_name: Invocations
aws_dimensions: [FunctionName] # Skip version dimension
aws_statistics: [Sum]
This collects Lambda invocations by function name only, avoiding the version dimension that could create separate series for each deployment.
Final Thoughts
This guide covered how to get AWS CloudWatch metrics into Prometheus using the CloudWatch exporter. We looked at the setup process, the IAM permissions needed, which metrics are useful to track, and how to monitor the exporter itself.
The idea is to make AWS metrics part of your existing Prometheus workflow, without rebuilding everything from scratch.
You can also combine CloudWatch metrics with Prometheus recording rules to create derived metrics that span both AWS infrastructure and application performance.
FAQs
Q: How often should I scrape the CloudWatch exporter?
A: Match your scrape interval to CloudWatch's resolution. Basic monitoring provides 5-minute data points, so scraping every 5 minutes (300s) makes sense. Scraping more frequently just creates duplicate data points.
Q: What are the minimum IAM permissions needed?
A: You need cloudwatch:GetMetricStatistics
, cloudwatch:GetMetricData
, and cloudwatch:ListMetrics
. The CloudWatchReadOnlyAccess
managed policy includes these plus some extras, but works fine for most setups.
Q: Can I run this on EC2 vs. locally?
A: Both work. EC2 instances can use IAM roles (recommended), while local development typically uses AWS CLI credentials or environment variables. Running on EC2 in the same region as your resources reduces latency.
Q: What happens if the exporter can't reach CloudWatch APIs? A: Failed API calls return no data for that scrape interval. Prometheus will see gaps in the time series. The exporter includes retry logic, but persistent failures usually indicate network issues or credential problems.
Q: How do I collect custom CloudWatch metrics?
A: Use the same configuration format but specify your custom namespace. If you're publishing custom metrics to MyApp/Performance
, set aws_namespace: MyApp/Performance
in your config.
Q: Can I filter metrics by tags?
A: Not directly through the exporter config. You'll need to use CloudWatch's dimension filtering or set up separate exporter instances for different tagged resources.