Skip to content
Last9 named a Gartner Cool Vendor in AI for SRE Observability for 2025! Read more →
Last9

AWS RDS

Monitor AWS RDS database instances and clusters with CloudWatch metrics for comprehensive database performance observability

Monitor your Amazon RDS database instances and clusters with CloudWatch metrics integration. This setup provides comprehensive monitoring of database performance, including CPU utilization, storage, connections, query performance, and more.

Prerequisites

Before setting up AWS RDS monitoring, ensure you have:

  • AWS Account: With access to RDS and CloudWatch services
  • RDS Instances: Running database instances to monitor
  • CloudWatch Permissions: IAM permissions to read CloudWatch metrics
  • Monitoring Server: Where you can install and run OpenTelemetry Collector
  • Last9 Account: With metrics integration credentials
  1. Install OpenTelemetry Collector

    Install the OpenTelemetry Collector with AWS receiver support:

    For Debian/Ubuntu systems:

    wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.deb
    sudo dpkg -i otelcol-contrib_0.118.0_linux_amd64.deb
  2. Configure AWS Credentials

    Set up AWS credentials for CloudWatch access:

    Create or update ~/.aws/credentials:

    [default]
    aws_access_key_id = YOUR_ACCESS_KEY_ID
    aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
    region = us-east-1
  3. Create OpenTelemetry Collector Configuration

    Create the collector configuration file:

    sudo mkdir -p /etc/otelcol-contrib
    sudo nano /etc/otelcol-contrib/config.yaml

    Add the following configuration to collect RDS CloudWatch metrics:

    receivers:
    awscloudwatch:
    region: us-east-1 # Change to your AWS region
    metrics:
    - metric_name: CPUUtilization
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*" # Monitor all RDS instances
    - metric_name: DatabaseConnections
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: FreeStorageSpace
    namespace: AWS/RDS
    stat: [Average, Minimum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: ReadLatency
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: WriteLatency
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: ReadThroughput
    namespace: AWS/RDS
    stat: [Average, Sum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: WriteThroughput
    namespace: AWS/RDS
    stat: [Average, Sum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: ReadIOPS
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: WriteIOPS
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: FreeableMemory
    namespace: AWS/RDS
    stat: [Average, Minimum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    - metric_name: SwapUsage
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "*"
    collection_interval: 300s # 5 minutes (CloudWatch default)
    processors:
    batch:
    timeout: 30s
    send_batch_size: 10000
    send_batch_max_size: 10000
    resourcedetection/cloud:
    detectors: ["aws"]
    transform/metrics:
    metric_statements:
    - context: metric
    statements:
    - set(resource.attributes["service.name"], "aws-rds")
    - set(resource.attributes["deployment.environment"], "production")
    exporters:
    prometheusremotewrite:
    endpoint: "$last9_remote_write_url"
    auth:
    authenticator: basicauth/metrics
    resource_to_telemetry_conversion:
    enabled: true
    debug:
    verbosity: detailed
    extensions:
    basicauth/metrics:
    client_auth:
    username: "$last9_remote_write_username"
    password: "$last9_remote_write_password"
    service:
    extensions: [basicauth/metrics]
    pipelines:
    metrics:
    receivers: [awscloudwatch]
    processors: [batch, resourcedetection/cloud, transform/metrics]
    exporters: [prometheusremotewrite]
  4. Configure Specific RDS Instances (Optional)

    To monitor specific RDS instances instead of all instances, modify the dimensions:

    receivers:
    awscloudwatch:
    region: us-east-1
    metrics:
    - metric_name: CPUUtilization
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "production-db" # Specific instance
    - metric_name: DatabaseConnections
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBInstanceIdentifier
    value: "production-db"
  5. Add RDS Cluster Metrics (for Aurora)

    If you’re using Aurora clusters, add cluster-specific metrics:

    # Add these metrics to your existing configuration
    receivers:
    awscloudwatch:
    metrics:
    - metric_name: CPUUtilization
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBClusterIdentifier
    value: "*" # Monitor all clusters
    - metric_name: AuroraReplicaLag
    namespace: AWS/RDS
    stat: [Average, Maximum]
    dimensions:
    - name: DBClusterIdentifier
    value: "*"
    - metric_name: VolumeReadIOPs
    namespace: AWS/RDS
    stat: [Average, Sum]
    dimensions:
    - name: DBClusterIdentifier
    value: "*"
    - metric_name: VolumeWriteIOPs
    namespace: AWS/RDS
    stat: [Average, Sum]
    dimensions:
    - name: DBClusterIdentifier
    value: "*"
  6. Create Systemd Service Configuration

    Create a systemd service file:

    sudo nano /etc/systemd/system/otelcol-contrib.service

    Add the service configuration:

    [Unit]
    Description=OpenTelemetry Collector for AWS RDS Monitoring
    After=network.target
    [Service]
    ExecStart=/usr/bin/otelcol-contrib --config /etc/otelcol-contrib/config.yaml
    Restart=always
    User=root
    Group=root
    Environment=AWS_REGION=us-east-1
    [Install]
    WantedBy=multi-user.target
  7. Start and Enable the Service

    Start the OpenTelemetry Collector service:

    sudo systemctl daemon-reload
    sudo systemctl enable otelcol-contrib
    sudo systemctl start otelcol-contrib

Understanding RDS Metrics

The AWS RDS integration collects comprehensive CloudWatch metrics:

Performance Metrics

  • CPU Utilization: Processor usage percentage across database instances
  • Database Connections: Active connections to the database
  • Freeable Memory: Available RAM for database operations
  • Swap Usage: Swap space utilization indicating memory pressure

Storage Metrics

  • Free Storage Space: Available disk space on the DB instance
  • Volume Read/Write IOPs: Input/output operations per second
  • Read/Write Latency: Average time per disk I/O operation
  • Read/Write Throughput: Bytes read/written per second

Aurora Cluster Metrics

  • Aurora Replica Lag: Time lag between primary and read replica
  • Volume Read/Write IOPs: Cluster-level I/O operations
  • Aurora Volume: Storage volume utilization for Aurora clusters

Engine-Specific Metrics

Different RDS engines provide additional specialized metrics:

  • MySQL/MariaDB: Slow queries, binary log usage
  • PostgreSQL: Transaction logs, vacuum operations
  • Oracle: Archive log usage, shared pool efficiency
  • SQL Server: Lock waits, buffer cache hit ratio

Advanced Configuration

Multi-Region Monitoring

Monitor RDS instances across multiple AWS regions:

receivers:
awscloudwatch/us-east-1:
region: us-east-1
metrics:
- metric_name: CPUUtilization
namespace: AWS/RDS
stat: [Average, Maximum]
awscloudwatch/us-west-2:
region: us-west-2
metrics:
- metric_name: CPUUtilization
namespace: AWS/RDS
stat: [Average, Maximum]
service:
pipelines:
metrics:
receivers: [awscloudwatch/us-east-1, awscloudwatch/us-west-2]

Custom Metric Collection

Add custom CloudWatch metrics for specific monitoring needs:

receivers:
awscloudwatch:
metrics:
- metric_name: BinLogDiskUsage
namespace: AWS/RDS
stat: [Average, Maximum]
dimensions:
- name: DBInstanceIdentifier
value: "mysql-instance"
- metric_name: TransactionLogsDiskUsage
namespace: AWS/RDS
stat: [Average, Maximum]
dimensions:
- name: DBInstanceIdentifier
value: "postgres-instance"

Enhanced Monitoring Integration

For RDS Enhanced Monitoring, you can complement CloudWatch metrics with more detailed system-level metrics by setting up additional collectors on EC2 instances.

Verification

  1. Check Service Status

    Verify the OpenTelemetry Collector is running:

    sudo systemctl status otelcol-contrib
  2. Monitor Service Logs

    Check for any configuration errors:

    sudo journalctl -u otelcol-contrib -f
  3. Verify AWS Connectivity

    Test AWS API access:

    aws rds describe-db-instances --region us-east-1
    aws cloudwatch list-metrics --namespace AWS/RDS --region us-east-1
  4. Verify Metrics in Last9

    Log into your Last9 account and check that RDS metrics are being received in Grafana.

    Look for metrics like:

    • CPUUtilization
    • DatabaseConnections
    • FreeStorageSpace
    • ReadLatency
    • WriteLatency

Key Metrics to Monitor

Critical Performance Indicators

MetricDescriptionAlert Threshold
CPUUtilizationDatabase CPU usage percentage> 80% for sustained periods
DatabaseConnectionsActive database connectionsNear max_connections limit
FreeStorageSpaceAvailable disk space< 20% of total storage
ReadLatencyAverage read operation latency> 200ms consistently
WriteLatencyAverage write operation latency> 200ms consistently

Capacity Planning

MetricDescriptionMonitoring Focus
FreeableMemoryAvailable RAMTrack trends for memory sizing
ReadThroughputData read throughputMonitor growth patterns
WriteThroughputData write throughputMonitor growth patterns
ReadIOPSRead operations per secondTrack I/O capacity needs
WriteIOPSWrite operations per secondTrack I/O capacity needs

Troubleshooting

CloudWatch API Issues

Permission Denied:

# Verify AWS credentials are configured
aws sts get-caller-identity
# Check IAM permissions for CloudWatch
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::ACCOUNT:user/USERNAME \
--action-names cloudwatch:GetMetricStatistics \
--resource-arns "*"

Rate Limiting:

# Adjust collection interval to reduce API calls
receivers:
awscloudwatch:
collection_interval: 600s # 10 minutes instead of 5

Missing Metrics

No Data Appearing:

  • Verify RDS instances exist in the specified region
  • Check CloudWatch console for metric availability
  • Ensure metric names match exactly (case-sensitive)

Partial Data:

# List available RDS metrics
aws cloudwatch list-metrics --namespace AWS/RDS --region us-east-1
# Check specific instance metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=your-db-instance \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T01:00:00Z \
--period 300 \
--statistics Average

High Costs

CloudWatch API Costs:

  • Increase collection intervals for less critical metrics
  • Use metric filtering to collect only necessary metrics
  • Consider using AWS CloudWatch Agent for EC2-based monitoring

Best Practices

Security

  • IAM Roles: Use IAM roles instead of access keys when running on EC2
  • Least Privilege: Grant only necessary CloudWatch and RDS permissions
  • Credential Management: Store credentials securely using AWS Secrets Manager or environment variables

Performance

  • Collection Intervals: Balance monitoring granularity with API costs
  • Metric Selection: Monitor only metrics relevant to your use case
  • Regional Optimization: Deploy collectors in the same region as RDS instances

Monitoring Strategy

  • Alerting: Set up alerts for critical metrics like CPU, connections, and storage
  • Dashboards: Create comprehensive dashboards for different stakeholders
  • Baseline Metrics: Establish performance baselines for comparison
  • Multi-Environment: Use different service names or tags per environment

Need Help?

If you encounter any issues or have questions: