AWS RDS
Monitor AWS RDS database instances and clusters with CloudWatch metrics for comprehensive database performance observability
Monitor your Amazon RDS database instances and clusters with CloudWatch metrics integration. This setup provides comprehensive monitoring of database performance, including CPU utilization, storage, connections, query performance, and more.
Prerequisites
Before setting up AWS RDS monitoring, ensure you have:
- AWS Account: With access to RDS and CloudWatch services
- RDS Instances: Running database instances to monitor
- CloudWatch Permissions: IAM permissions to read CloudWatch metrics
- Monitoring Server: Where you can install and run OpenTelemetry Collector
- Last9 Account: With metrics integration credentials
-
Install OpenTelemetry Collector
Install the OpenTelemetry Collector with AWS receiver support:
For Debian/Ubuntu systems:
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.debsudo dpkg -i otelcol-contrib_0.118.0_linux_amd64.debFor Red Hat/CentOS systems:
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.rpmsudo rpm -ivh otelcol-contrib_0.118.0_linux_amd64.rpm -
Configure AWS Credentials
Set up AWS credentials for CloudWatch access:
Create or update
~/.aws/credentials:[default]aws_access_key_id = YOUR_ACCESS_KEY_IDaws_secret_access_key = YOUR_SECRET_ACCESS_KEYregion = us-east-1Set environment variables:
export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_IDexport AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEYexport AWS_REGION=us-east-1If running on EC2, attach an IAM role with the following policy:
{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Action": ["cloudwatch:GetMetricStatistics","cloudwatch:ListMetrics","rds:DescribeDBInstances","rds:DescribeDBClusters"],"Resource": "*"}]} -
Create OpenTelemetry Collector Configuration
Create the collector configuration file:
sudo mkdir -p /etc/otelcol-contribsudo nano /etc/otelcol-contrib/config.yamlAdd the following configuration to collect RDS CloudWatch metrics:
receivers:awscloudwatch:region: us-east-1 # Change to your AWS regionmetrics:- metric_name: CPUUtilizationnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*" # Monitor all RDS instances- metric_name: DatabaseConnectionsnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: FreeStorageSpacenamespace: AWS/RDSstat: [Average, Minimum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: ReadLatencynamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: WriteLatencynamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: ReadThroughputnamespace: AWS/RDSstat: [Average, Sum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: WriteThroughputnamespace: AWS/RDSstat: [Average, Sum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: ReadIOPSnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: WriteIOPSnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: FreeableMemorynamespace: AWS/RDSstat: [Average, Minimum]dimensions:- name: DBInstanceIdentifiervalue: "*"- metric_name: SwapUsagenamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "*"collection_interval: 300s # 5 minutes (CloudWatch default)processors:batch:timeout: 30ssend_batch_size: 10000send_batch_max_size: 10000resourcedetection/cloud:detectors: ["aws"]transform/metrics:metric_statements:- context: metricstatements:- set(resource.attributes["service.name"], "aws-rds")- set(resource.attributes["deployment.environment"], "production")exporters:prometheusremotewrite:endpoint: "$last9_remote_write_url"auth:authenticator: basicauth/metricsresource_to_telemetry_conversion:enabled: truedebug:verbosity: detailedextensions:basicauth/metrics:client_auth:username: "$last9_remote_write_username"password: "$last9_remote_write_password"service:extensions: [basicauth/metrics]pipelines:metrics:receivers: [awscloudwatch]processors: [batch, resourcedetection/cloud, transform/metrics]exporters: [prometheusremotewrite] -
Configure Specific RDS Instances (Optional)
To monitor specific RDS instances instead of all instances, modify the dimensions:
receivers:awscloudwatch:region: us-east-1metrics:- metric_name: CPUUtilizationnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "production-db" # Specific instance- metric_name: DatabaseConnectionsnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBInstanceIdentifiervalue: "production-db" -
Add RDS Cluster Metrics (for Aurora)
If you’re using Aurora clusters, add cluster-specific metrics:
# Add these metrics to your existing configurationreceivers:awscloudwatch:metrics:- metric_name: CPUUtilizationnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBClusterIdentifiervalue: "*" # Monitor all clusters- metric_name: AuroraReplicaLagnamespace: AWS/RDSstat: [Average, Maximum]dimensions:- name: DBClusterIdentifiervalue: "*"- metric_name: VolumeReadIOPsnamespace: AWS/RDSstat: [Average, Sum]dimensions:- name: DBClusterIdentifiervalue: "*"- metric_name: VolumeWriteIOPsnamespace: AWS/RDSstat: [Average, Sum]dimensions:- name: DBClusterIdentifiervalue: "*" -
Create Systemd Service Configuration
Create a systemd service file:
sudo nano /etc/systemd/system/otelcol-contrib.serviceAdd the service configuration:
[Unit]Description=OpenTelemetry Collector for AWS RDS MonitoringAfter=network.target[Service]ExecStart=/usr/bin/otelcol-contrib --config /etc/otelcol-contrib/config.yamlRestart=alwaysUser=rootGroup=rootEnvironment=AWS_REGION=us-east-1[Install]WantedBy=multi-user.target -
Start and Enable the Service
Start the OpenTelemetry Collector service:
sudo systemctl daemon-reloadsudo systemctl enable otelcol-contribsudo systemctl start otelcol-contrib
Understanding RDS Metrics
The AWS RDS integration collects comprehensive CloudWatch metrics:
Performance Metrics
- CPU Utilization: Processor usage percentage across database instances
- Database Connections: Active connections to the database
- Freeable Memory: Available RAM for database operations
- Swap Usage: Swap space utilization indicating memory pressure
Storage Metrics
- Free Storage Space: Available disk space on the DB instance
- Volume Read/Write IOPs: Input/output operations per second
- Read/Write Latency: Average time per disk I/O operation
- Read/Write Throughput: Bytes read/written per second
Aurora Cluster Metrics
- Aurora Replica Lag: Time lag between primary and read replica
- Volume Read/Write IOPs: Cluster-level I/O operations
- Aurora Volume: Storage volume utilization for Aurora clusters
Engine-Specific Metrics
Different RDS engines provide additional specialized metrics:
- MySQL/MariaDB: Slow queries, binary log usage
- PostgreSQL: Transaction logs, vacuum operations
- Oracle: Archive log usage, shared pool efficiency
- SQL Server: Lock waits, buffer cache hit ratio
Advanced Configuration
Multi-Region Monitoring
Monitor RDS instances across multiple AWS regions:
receivers: awscloudwatch/us-east-1: region: us-east-1 metrics: - metric_name: CPUUtilization namespace: AWS/RDS stat: [Average, Maximum] awscloudwatch/us-west-2: region: us-west-2 metrics: - metric_name: CPUUtilization namespace: AWS/RDS stat: [Average, Maximum]
service: pipelines: metrics: receivers: [awscloudwatch/us-east-1, awscloudwatch/us-west-2]Custom Metric Collection
Add custom CloudWatch metrics for specific monitoring needs:
receivers: awscloudwatch: metrics: - metric_name: BinLogDiskUsage namespace: AWS/RDS stat: [Average, Maximum] dimensions: - name: DBInstanceIdentifier value: "mysql-instance" - metric_name: TransactionLogsDiskUsage namespace: AWS/RDS stat: [Average, Maximum] dimensions: - name: DBInstanceIdentifier value: "postgres-instance"Enhanced Monitoring Integration
For RDS Enhanced Monitoring, you can complement CloudWatch metrics with more detailed system-level metrics by setting up additional collectors on EC2 instances.
Verification
-
Check Service Status
Verify the OpenTelemetry Collector is running:
sudo systemctl status otelcol-contrib -
Monitor Service Logs
Check for any configuration errors:
sudo journalctl -u otelcol-contrib -f -
Verify AWS Connectivity
Test AWS API access:
aws rds describe-db-instances --region us-east-1aws cloudwatch list-metrics --namespace AWS/RDS --region us-east-1 -
Verify Metrics in Last9
Log into your Last9 account and check that RDS metrics are being received in Grafana.
Look for metrics like:
CPUUtilizationDatabaseConnectionsFreeStorageSpaceReadLatencyWriteLatency
Key Metrics to Monitor
Critical Performance Indicators
| Metric | Description | Alert Threshold |
|---|---|---|
CPUUtilization | Database CPU usage percentage | > 80% for sustained periods |
DatabaseConnections | Active database connections | Near max_connections limit |
FreeStorageSpace | Available disk space | < 20% of total storage |
ReadLatency | Average read operation latency | > 200ms consistently |
WriteLatency | Average write operation latency | > 200ms consistently |
Capacity Planning
| Metric | Description | Monitoring Focus |
|---|---|---|
FreeableMemory | Available RAM | Track trends for memory sizing |
ReadThroughput | Data read throughput | Monitor growth patterns |
WriteThroughput | Data write throughput | Monitor growth patterns |
ReadIOPS | Read operations per second | Track I/O capacity needs |
WriteIOPS | Write operations per second | Track I/O capacity needs |
Troubleshooting
CloudWatch API Issues
Permission Denied:
# Verify AWS credentials are configuredaws sts get-caller-identity
# Check IAM permissions for CloudWatchaws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::ACCOUNT:user/USERNAME \ --action-names cloudwatch:GetMetricStatistics \ --resource-arns "*"Rate Limiting:
# Adjust collection interval to reduce API callsreceivers: awscloudwatch: collection_interval: 600s # 10 minutes instead of 5Missing Metrics
No Data Appearing:
- Verify RDS instances exist in the specified region
- Check CloudWatch console for metric availability
- Ensure metric names match exactly (case-sensitive)
Partial Data:
# List available RDS metricsaws cloudwatch list-metrics --namespace AWS/RDS --region us-east-1
# Check specific instance metricsaws cloudwatch get-metric-statistics \ --namespace AWS/RDS \ --metric-name CPUUtilization \ --dimensions Name=DBInstanceIdentifier,Value=your-db-instance \ --start-time 2024-01-01T00:00:00Z \ --end-time 2024-01-01T01:00:00Z \ --period 300 \ --statistics AverageHigh Costs
CloudWatch API Costs:
- Increase collection intervals for less critical metrics
- Use metric filtering to collect only necessary metrics
- Consider using AWS CloudWatch Agent for EC2-based monitoring
Best Practices
Security
- IAM Roles: Use IAM roles instead of access keys when running on EC2
- Least Privilege: Grant only necessary CloudWatch and RDS permissions
- Credential Management: Store credentials securely using AWS Secrets Manager or environment variables
Performance
- Collection Intervals: Balance monitoring granularity with API costs
- Metric Selection: Monitor only metrics relevant to your use case
- Regional Optimization: Deploy collectors in the same region as RDS instances
Monitoring Strategy
- Alerting: Set up alerts for critical metrics like CPU, connections, and storage
- Dashboards: Create comprehensive dashboards for different stakeholders
- Baseline Metrics: Establish performance baselines for comparison
- Multi-Environment: Use different service names or tags per environment
Need Help?
If you encounter any issues or have questions:
- Join our Discord community for real-time support
- Contact our support team at support@last9.io