Skip to content
Last9 named a Gartner Cool Vendor in AI for SRE Observability for 2025! Read more →
Last9

AWS SQS

Monitor AWS SQS queue performance, message throughput, and dead letter queues with CloudWatch metrics for comprehensive message queue observability

Monitor your Amazon SQS (Simple Queue Service) queues with CloudWatch metrics integration. This setup provides comprehensive monitoring of queue performance, message throughput, processing delays, dead letter queues, and overall queue health.

Prerequisites

Before setting up AWS SQS monitoring, ensure you have:

  • AWS Account: With access to SQS and CloudWatch services
  • SQS Queues: Running queues to monitor
  • CloudWatch Permissions: IAM permissions to read CloudWatch metrics
  • Monitoring Server: Where you can install and run OpenTelemetry Collector
  • Last9 Account: With metrics integration credentials
  1. Install OpenTelemetry Collector

    Install the OpenTelemetry Collector with AWS receiver support:

    For Debian/Ubuntu systems:

    wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.deb
    sudo dpkg -i otelcol-contrib_0.118.0_linux_amd64.deb
  2. Configure AWS Credentials

    Set up AWS credentials for CloudWatch access:

    Create or update ~/.aws/credentials:

    [default]
    aws_access_key_id = YOUR_ACCESS_KEY_ID
    aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
    region = us-east-1
  3. Create OpenTelemetry Collector Configuration

    Create the collector configuration file:

    sudo mkdir -p /etc/otelcol-contrib
    sudo nano /etc/otelcol-contrib/config.yaml

    Add the following configuration to collect SQS CloudWatch metrics:

    receivers:
    awscloudwatch:
    region: us-east-1 # Change to your AWS region
    metrics:
    # Queue Message Metrics
    - metric_name: NumberOfMessagesSent
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "*" # Monitor all queues
    - metric_name: NumberOfMessagesReceived
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: NumberOfMessagesDeleted
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: ApproximateNumberOfMessages
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: ApproximateNumberOfMessagesVisible
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: ApproximateNumberOfMessagesNotVisible
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    # Dead Letter Queue Metrics
    - metric_name: ApproximateNumberOfMessagesDelayed
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: NumberOfMessagesRed
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "*"
    # Age and Processing Metrics
    - metric_name: ApproximateAgeOfOldestMessage
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: ReceiveMessageWaitTime
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "*"
    # Size and Throughput Metrics
    - metric_name: SentMessageSize
    namespace: AWS/SQS
    stat: [Average, Maximum, Sum]
    dimensions:
    - name: QueueName
    value: "*"
    - metric_name: NumberOfEmptyReceives
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "*"
    collection_interval: 300s # 5 minutes (CloudWatch default)
    processors:
    batch:
    timeout: 30s
    send_batch_size: 10000
    send_batch_max_size: 10000
    resourcedetection/cloud:
    detectors: ["aws"]
    transform/metrics:
    metric_statements:
    - context: metric
    statements:
    - set(resource.attributes["service.name"], "aws-sqs")
    - set(resource.attributes["deployment.environment"], "production")
    exporters:
    prometheusremotewrite:
    endpoint: "$last9_remote_write_url"
    auth:
    authenticator: basicauth/metrics
    resource_to_telemetry_conversion:
    enabled: true
    debug:
    verbosity: detailed
    extensions:
    basicauth/metrics:
    client_auth:
    username: "$last9_remote_write_username"
    password: "$last9_remote_write_password"
    service:
    extensions: [basicauth/metrics]
    pipelines:
    metrics:
    receivers: [awscloudwatch]
    processors: [batch, resourcedetection/cloud, transform/metrics]
    exporters: [prometheusremotewrite]
  4. Configure Specific Queues (Optional)

    To monitor specific SQS queues instead of all queues, modify the dimensions:

    receivers:
    awscloudwatch:
    region: us-east-1
    metrics:
    - metric_name: ApproximateNumberOfMessages
    namespace: AWS/SQS
    stat: [Average, Maximum]
    dimensions:
    - name: QueueName
    value: "production-orders" # Specific queue
    - metric_name: NumberOfMessagesSent
    namespace: AWS/SQS
    stat: [Sum, Average]
    dimensions:
    - name: QueueName
    value: "production-orders"
  5. Add FIFO Queue Metrics (if applicable)

    If you’re using FIFO queues, add FIFO-specific metrics:

    receivers:
    awscloudwatch:
    metrics:
    - metric_name: ContentBasedDeduplication
    namespace: AWS/SQS
    stat: [Sum]
    dimensions:
    - name: QueueName
    value: "*.fifo" # Monitor all FIFO queues
    - metric_name: DeduplicationScope
    namespace: AWS/SQS
    stat: [Sum]
    dimensions:
    - name: QueueName
    value: "*.fifo"
    - metric_name: FifoThroughputLimit
    namespace: AWS/SQS
    stat: [Sum]
    dimensions:
    - name: QueueName
    value: "*.fifo"
  6. Create Systemd Service Configuration

    Create a systemd service file:

    sudo nano /etc/systemd/system/otelcol-contrib.service

    Add the service configuration:

    [Unit]
    Description=OpenTelemetry Collector for AWS SQS Monitoring
    After=network.target
    [Service]
    ExecStart=/usr/bin/otelcol-contrib --config /etc/otelcol-contrib/config.yaml
    Restart=always
    User=root
    Group=root
    Environment=AWS_REGION=us-east-1
    [Install]
    WantedBy=multi-user.target
  7. Start and Enable the Service

    Start the OpenTelemetry Collector service:

    sudo systemctl daemon-reload
    sudo systemctl enable otelcol-contrib
    sudo systemctl start otelcol-contrib

Understanding SQS Metrics

The AWS SQS integration collects comprehensive CloudWatch metrics:

Message Flow Metrics

  • NumberOfMessagesSent: Messages added to the queue
  • NumberOfMessagesReceived: Messages retrieved from the queue
  • NumberOfMessagesDeleted: Messages successfully processed and removed
  • NumberOfEmptyReceives: Polling attempts that returned no messages

Queue State Metrics

  • ApproximateNumberOfMessages: Total messages in the queue
  • ApproximateNumberOfMessagesVisible: Messages available for retrieval
  • ApproximateNumberOfMessagesNotVisible: Messages being processed (in-flight)
  • ApproximateNumberOfMessagesDelayed: Messages delayed for future delivery

Performance Metrics

  • ApproximateAgeOfOldestMessage: Age of the oldest message in seconds
  • ReceiveMessageWaitTime: Wait time for long polling operations
  • SentMessageSize: Size of messages being sent

Dead Letter Queue Metrics

  • NumberOfMessagesMoved: Messages moved to dead letter queues
  • DeadLetterQueueSourceQueues: Dead letter queue relationships

FIFO Queue Metrics (FIFO Queues Only)

  • ContentBasedDeduplication: Messages deduplicated by content
  • DeduplicationScope: Deduplication behavior per message group
  • FifoThroughputLimit: FIFO queue throughput limitations

Advanced Configuration

Multi-Region Monitoring

Monitor SQS queues across multiple AWS regions:

receivers:
awscloudwatch/us-east-1:
region: us-east-1
metrics:
- metric_name: ApproximateNumberOfMessages
namespace: AWS/SQS
stat: [Average, Maximum]
awscloudwatch/us-west-2:
region: us-west-2
metrics:
- metric_name: ApproximateNumberOfMessages
namespace: AWS/SQS
stat: [Average, Maximum]
service:
pipelines:
metrics:
receivers: [awscloudwatch/us-east-1, awscloudwatch/us-west-2]

Queue-Specific Monitoring

Monitor different queue types with specific configurations:

receivers:
awscloudwatch/standard-queues:
region: us-east-1
metrics:
- metric_name: ApproximateNumberOfMessages
namespace: AWS/SQS
stat: [Average, Maximum]
dimensions:
- name: QueueName
value: "production-*" # Standard queues
awscloudwatch/fifo-queues:
region: us-east-1
metrics:
- metric_name: ApproximateNumberOfMessages
namespace: AWS/SQS
stat: [Average, Maximum]
dimensions:
- name: QueueName
value: "*.fifo" # FIFO queues only

Dead Letter Queue Monitoring

Specific configuration for monitoring dead letter queues:

receivers:
awscloudwatch/dlq:
region: us-east-1
metrics:
- metric_name: ApproximateNumberOfMessages
namespace: AWS/SQS
stat: [Average, Maximum, Sum]
dimensions:
- name: QueueName
value: "*-dlq" # Dead letter queues
- metric_name: ApproximateAgeOfOldestMessage
namespace: AWS/SQS
stat: [Maximum]
dimensions:
- name: QueueName
value: "*-dlq"

Verification

  1. Check Service Status

    Verify the OpenTelemetry Collector is running:

    sudo systemctl status otelcol-contrib
  2. Monitor Service Logs

    Check for any configuration errors:

    sudo journalctl -u otelcol-contrib -f
  3. Verify AWS Connectivity

    Test AWS API access:

    aws sqs list-queues --region us-east-1
    aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1
  4. Generate SQS Activity

    Create some queue activity to generate metrics:

    # Send test messages to a queue
    aws sqs send-message \
    --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue \
    --message-body "Test message 1"
    # Receive messages
    aws sqs receive-message \
    --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue
    # Check queue attributes
    aws sqs get-queue-attributes \
    --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue \
    --attribute-names All
  5. Verify Metrics in Last9

    Log into your Last9 account and check that SQS metrics are being received in Grafana.

    Look for metrics like:

    • ApproximateNumberOfMessages
    • NumberOfMessagesSent
    • NumberOfMessagesReceived
    • ApproximateAgeOfOldestMessage

Key Metrics to Monitor

Critical Queue Health Indicators

MetricDescriptionAlert Threshold
ApproximateNumberOfMessagesMessages waiting in queue> 1000 for high-throughput queues
ApproximateAgeOfOldestMessageAge of oldest unprocessed message> 300 seconds (5 minutes)
NumberOfMessagesReceivedMessages being processedSudden drops indicate consumer issues
NumberOfEmptyReceivesPolling without messagesHigh values indicate inefficient polling

Performance Monitoring

MetricDescriptionMonitoring Focus
NumberOfMessagesSentProduction rateTrack message ingestion trends
NumberOfMessagesDeletedProcessing rateShould match sent messages over time
SentMessageSizeMessage size distributionMonitor for size limits and costs
ReceiveMessageWaitTimeLong polling efficiencyOptimize consumer polling strategy

Dead Letter Queue Monitoring

MetricDescriptionAlert Condition
ApproximateNumberOfMessages (DLQ)Failed messages> 0 (any messages in DLQ)
NumberOfMessagesMovedMessages moved to DLQIncreasing trend indicates issues

Troubleshooting

CloudWatch API Issues

Permission Denied:

# Verify AWS credentials
aws sts get-caller-identity
# Test SQS access
aws sqs list-queues --region us-east-1
# Check CloudWatch permissions
aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1 | head -10

Rate Limiting:

# Adjust collection interval to reduce API calls
receivers:
awscloudwatch:
collection_interval: 600s # 10 minutes instead of 5

Missing Metrics

No Queue Metrics:

# Verify queues exist
aws sqs list-queues --region us-east-1
# Check specific queue metrics availability
aws cloudwatch get-metric-statistics \
--namespace AWS/SQS \
--metric-name ApproximateNumberOfMessages \
--dimensions Name=QueueName,Value=your-queue-name \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 300 \
--statistics Average

Partial Data:

# List all available SQS metrics
aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1
# Check queue-specific metrics
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/queue-name \
--attribute-names All

High Message Age

Troubleshoot Message Processing:

# Check queue attributes for visibility timeout
aws sqs get-queue-attributes \
--queue-url YOUR_QUEUE_URL \
--attribute-names VisibilityTimeoutSeconds,ReceiveMessageWaitTimeSeconds
# Monitor consumer behavior
aws sqs get-queue-attributes \
--queue-url YOUR_QUEUE_URL \
--attribute-names ApproximateNumberOfMessagesNotVisible

Best Practices

Security

  • IAM Roles: Use IAM roles instead of access keys when running on EC2
  • Least Privilege: Grant only necessary CloudWatch and SQS permissions
  • Queue Access: Restrict SQS queue access to authorized consumers and producers

Performance

  • Collection Intervals: Balance monitoring granularity with CloudWatch API costs
  • Metric Selection: Monitor only metrics relevant to your specific queues
  • Regional Optimization: Deploy collectors in the same region as SQS queues

Monitoring Strategy

  • Queue Depth Alerts: Set alerts for excessive queue depth
  • Consumer Health: Monitor message processing rates and age
  • Dead Letter Queues: Always monitor DLQs for failed message processing
  • Cost Optimization: Use appropriate CloudWatch metric collection intervals

Queue Management

  • Visibility Timeout: Configure appropriate visibility timeouts for your workload
  • Message Retention: Set appropriate message retention periods
  • Redrive Policy: Configure dead letter queues with appropriate maxReceiveCount
  • Long Polling: Use long polling to reduce empty receives and costs

Need Help?

If you encounter any issues or have questions: