AWS SQS

Monitor your Amazon SQS (Simple Queue Service) queues with CloudWatch metrics integration. This setup provides comprehensive monitoring of queue performance, message throughput, processing delays, dead letter queues, and overall queue health.

Prerequisites

Before setting up AWS SQS monitoring, ensure you have:

AWS Account: With access to SQS and CloudWatch services
SQS Queues: Running queues to monitor
CloudWatch Permissions: IAM permissions to read CloudWatch metrics
Monitoring Server: Where you can install and run OpenTelemetry Collector
Last9 Account: With metrics integration credentials

Install OpenTelemetry Collector

Install the OpenTelemetry Collector with AWS receiver support:

DEB Package
RPM Package

For Debian/Ubuntu systems:

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.deb
sudo dpkg -i otelcol-contrib_0.118.0_linux_amd64.deb

For Red Hat/CentOS systems:

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol-contrib_0.118.0_linux_amd64.rpm
sudo rpm -ivh otelcol-contrib_0.118.0_linux_amd64.rpm

Configure AWS Credentials

Set up AWS credentials for CloudWatch access:

Create or update ~/.aws/credentials:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
region = us-east-1

Set environment variables:

export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
export AWS_REGION=us-east-1

If running on EC2, attach an IAM role with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics",
        "sqs:ListQueues",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "*"
    }
  ]
}

Create OpenTelemetry Collector Configuration

Create the collector configuration file:

sudo mkdir -p /etc/otelcol-contrib
sudo nano /etc/otelcol-contrib/config.yaml

Add the following configuration to collect SQS CloudWatch metrics:

receivers:
  awscloudwatch:
    region: us-east-1 # Change to your AWS region
    metrics:
      # Queue Message Metrics
      - metric_name: NumberOfMessagesSent
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "*" # Monitor all queues
      - metric_name: NumberOfMessagesReceived
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: NumberOfMessagesDeleted
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: ApproximateNumberOfMessagesVisible
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: ApproximateNumberOfMessagesNotVisible
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"

      # Dead Letter Queue Metrics
      - metric_name: ApproximateNumberOfMessagesDelayed
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: NumberOfMessagesRed
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "*"

      # Age and Processing Metrics
      - metric_name: ApproximateAgeOfOldestMessage
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: ReceiveMessageWaitTime
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*"

      # Size and Throughput Metrics
      - metric_name: SentMessageSize
        namespace: AWS/SQS
        stat: [Average, Maximum, Sum]
        dimensions:
          - name: QueueName
            value: "*"
      - metric_name: NumberOfEmptyReceives
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "*"
    collection_interval: 300s # 5 minutes (CloudWatch default)

processors:
  batch:
    timeout: 30s
    send_batch_size: 10000
    send_batch_max_size: 10000
  resourcedetection/cloud:
    detectors: ["aws"]
  transform/metrics:
    metric_statements:
      - context: metric
        statements:
          - set(resource.attributes["service.name"], "aws-sqs")
          - set(resource.attributes["deployment.environment"], "production")

exporters:
  prometheusremotewrite:
    endpoint: "$last9_remote_write_url"
    auth:
      authenticator: basicauth/metrics
    resource_to_telemetry_conversion:
      enabled: true
  debug:
    verbosity: detailed

extensions:
  basicauth/metrics:
    client_auth:
      username: "$last9_remote_write_username"
      password: "$last9_remote_write_password"

service:
  extensions: [basicauth/metrics]
  pipelines:
    metrics:
      receivers: [awscloudwatch]
      processors: [batch, resourcedetection/cloud, transform/metrics]
      exporters: [prometheusremotewrite]

Configure Specific Queues (Optional)

To monitor specific SQS queues instead of all queues, modify the dimensions:

receivers:
  awscloudwatch:
    region: us-east-1
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "production-orders" # Specific queue
      - metric_name: NumberOfMessagesSent
        namespace: AWS/SQS
        stat: [Sum, Average]
        dimensions:
          - name: QueueName
            value: "production-orders"

Add FIFO Queue Metrics (if applicable)

If you’re using FIFO queues, add FIFO-specific metrics:

receivers:
  awscloudwatch:
    metrics:
      - metric_name: ContentBasedDeduplication
        namespace: AWS/SQS
        stat: [Sum]
        dimensions:
          - name: QueueName
            value: "*.fifo" # Monitor all FIFO queues
      - metric_name: DeduplicationScope
        namespace: AWS/SQS
        stat: [Sum]
        dimensions:
          - name: QueueName
            value: "*.fifo"
      - metric_name: FifoThroughputLimit
        namespace: AWS/SQS
        stat: [Sum]
        dimensions:
          - name: QueueName
            value: "*.fifo"

Create Systemd Service Configuration

Create a systemd service file:

sudo nano /etc/systemd/system/otelcol-contrib.service

Add the service configuration:

[Unit]
Description=OpenTelemetry Collector for AWS SQS Monitoring
After=network.target

[Service]
ExecStart=/usr/bin/otelcol-contrib --config /etc/otelcol-contrib/config.yaml
Restart=always
User=root
Group=root
Environment=AWS_REGION=us-east-1

[Install]
WantedBy=multi-user.target

Start and Enable the Service

Start the OpenTelemetry Collector service:

sudo systemctl daemon-reload
sudo systemctl enable otelcol-contrib
sudo systemctl start otelcol-contrib

Understanding SQS Metrics

The AWS SQS integration collects comprehensive CloudWatch metrics:

Message Flow Metrics

NumberOfMessagesSent: Messages added to the queue
NumberOfMessagesReceived: Messages retrieved from the queue
NumberOfMessagesDeleted: Messages successfully processed and removed
NumberOfEmptyReceives: Polling attempts that returned no messages

Queue State Metrics

ApproximateNumberOfMessages: Total messages in the queue
ApproximateNumberOfMessagesVisible: Messages available for retrieval
ApproximateNumberOfMessagesNotVisible: Messages being processed (in-flight)
ApproximateNumberOfMessagesDelayed: Messages delayed for future delivery

Performance Metrics

ApproximateAgeOfOldestMessage: Age of the oldest message in seconds
ReceiveMessageWaitTime: Wait time for long polling operations
SentMessageSize: Size of messages being sent

Dead Letter Queue Metrics

NumberOfMessagesMoved: Messages moved to dead letter queues
DeadLetterQueueSourceQueues: Dead letter queue relationships

FIFO Queue Metrics (FIFO Queues Only)

ContentBasedDeduplication: Messages deduplicated by content
DeduplicationScope: Deduplication behavior per message group
FifoThroughputLimit: FIFO queue throughput limitations

Advanced Configuration

Multi-Region Monitoring

Monitor SQS queues across multiple AWS regions:

receivers:
  awscloudwatch/us-east-1:
    region: us-east-1
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]
  awscloudwatch/us-west-2:
    region: us-west-2
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]

service:
  pipelines:
    metrics:
      receivers: [awscloudwatch/us-east-1, awscloudwatch/us-west-2]

Queue-Specific Monitoring

Monitor different queue types with specific configurations:

receivers:
  awscloudwatch/standard-queues:
    region: us-east-1
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "production-*" # Standard queues
  awscloudwatch/fifo-queues:
    region: us-east-1
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum]
        dimensions:
          - name: QueueName
            value: "*.fifo" # FIFO queues only

Dead Letter Queue Monitoring

Specific configuration for monitoring dead letter queues:

receivers:
  awscloudwatch/dlq:
    region: us-east-1
    metrics:
      - metric_name: ApproximateNumberOfMessages
        namespace: AWS/SQS
        stat: [Average, Maximum, Sum]
        dimensions:
          - name: QueueName
            value: "*-dlq" # Dead letter queues
      - metric_name: ApproximateAgeOfOldestMessage
        namespace: AWS/SQS
        stat: [Maximum]
        dimensions:
          - name: QueueName
            value: "*-dlq"

Verification

Check Service Status

Verify the OpenTelemetry Collector is running:
```
sudo systemctl status otelcol-contrib
```
Monitor Service Logs

Check for any configuration errors:
```
sudo journalctl -u otelcol-contrib -f
```

Verify AWS Connectivity

Test AWS API access:

aws sqs list-queues --region us-east-1
aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1

Generate SQS Activity

Create some queue activity to generate metrics:

# Send test messages to a queue
aws sqs send-message \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue \
  --message-body "Test message 1"

# Receive messages
aws sqs receive-message \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue

# Check queue attributes
aws sqs get-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/test-queue \
  --attribute-names All

Verify Metrics in Last9

Log into your Last9 account and check that SQS metrics are being received in Grafana.

Look for metrics like:
- ApproximateNumberOfMessages
- NumberOfMessagesSent
- NumberOfMessagesReceived
- ApproximateAgeOfOldestMessage

Key Metrics to Monitor

Critical Queue Health Indicators

Metric	Description	Alert Threshold
`ApproximateNumberOfMessages`	Messages waiting in queue	> 1000 for high-throughput queues
`ApproximateAgeOfOldestMessage`	Age of oldest unprocessed message	> 300 seconds (5 minutes)
`NumberOfMessagesReceived`	Messages being processed	Sudden drops indicate consumer issues
`NumberOfEmptyReceives`	Polling without messages	High values indicate inefficient polling

Performance Monitoring

Metric	Description	Monitoring Focus
`NumberOfMessagesSent`	Production rate	Track message ingestion trends
`NumberOfMessagesDeleted`	Processing rate	Should match sent messages over time
`SentMessageSize`	Message size distribution	Monitor for size limits and costs
`ReceiveMessageWaitTime`	Long polling efficiency	Optimize consumer polling strategy

Dead Letter Queue Monitoring

Metric	Description	Alert Condition
`ApproximateNumberOfMessages` (DLQ)	Failed messages	> 0 (any messages in DLQ)
`NumberOfMessagesMoved`	Messages moved to DLQ	Increasing trend indicates issues

Trace Context Propagation through SQS

To get end-to-end distributed traces across services that communicate via SQS, you need to propagate W3C TraceContext (traceparent and tracestate) through SQS MessageAttributes.

Producer — Injecting Trace Context

On the producer (the service sending messages to SQS), inject the current trace context into MessageAttributes before calling SendMessage:

from opentelemetry.propagate import inject

def inject_trace_context() -> dict:
    carrier = {}
    inject(carrier)

    message_attributes = {}
    for key, value in carrier.items():
        message_attributes[key] = {
            "DataType": "String",
            "StringValue": value,
        }
    return message_attributes

# Usage
response = sqs.send_message(
    QueueUrl=queue_url,
    MessageBody=json.dumps(payload),
    MessageAttributes=inject_trace_context(),
)

const { propagation, context } = require("@opentelemetry/api");

function injectTraceContext() {
  const carrier = {};
  propagation.inject(context.active(), carrier);

  const messageAttributes = {};
  for (const [key, value] of Object.entries(carrier)) {
    messageAttributes[key] = { DataType: "String", StringValue: value };
  }
  return messageAttributes;
}

// Usage
await sqs.sendMessage({
  QueueUrl: queueUrl,
  MessageBody: JSON.stringify(payload),
  MessageAttributes: injectTraceContext(),
});

package main

import (
    "context"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/service/sqs/types"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
)

// buildMessageAttributes injects the current trace context into SQS MessageAttributes.
func buildMessageAttributes(ctx context.Context) map[string]types.MessageAttributeValue {
    carrier := propagation.MapCarrier{}
    otel.GetTextMapPropagator().Inject(ctx, carrier)

    attrs := map[string]types.MessageAttributeValue{}
    for _, key := range carrier.Keys() {
        attrs[key] = types.MessageAttributeValue{
            DataType:    aws.String("String"),
            StringValue: aws.String(carrier.Get(key)),
        }
    }

    return attrs
}

Consumer — Extracting Trace Context

On the consumer (Lambda or any SQS reader), extract the trace context from MessageAttributes and use it as the parent context for new spans.

Field	ESM Format (Lambda trigger)	SDK Format (ReceiveMessage)
String value	`stringValue`	`StringValue`
Data type	`dataType`	`DataType`

See the AWS Lambda integration — SQS Trace Propagation for consumer-side extraction code.

Auto-Injection with `@opentelemetry/instrumentation-aws-sdk`

If you use the AWS SDK instrumentation for Node.js, trace context is injected and extracted automatically — no manual carrier code needed.

import { AwsInstrumentation } from "@opentelemetry/instrumentation-aws-sdk";

registerInstrumentations({
  instrumentations: [
    new AwsInstrumentation({
      // false (default): extract traceparent from MessageAttributes
      // true: extract traceparent from message body JSON field
      //       needed for SNS→SQS fanout (MessageAttributes stripped) or Lambda producers
      sqsExtractContextPropagationFromPayload: false,
    }),
  ],
});

`sqsExtractContextPropagationFromPayload`	Extracts from	Use when
`false` (default)	`MessageAttributes.traceparent`	Producer uses `AwsInstrumentation` or manually injects into MessageAttributes
`true`	Message body JSON field `traceparent`	Lambda ESM triggers, non-OTel producers that embed context in body

SNS → SQS — Raw vs Wrapped Delivery

If your system uses both direct app → SQS and SNS → SQS on the same consumer, the delivery mode on the SNS subscription determines where traceparent ends up:

SNS subscription delivery	`traceparent` location in SQS message	Extraction
Raw message delivery ON	`MessageAttributes.traceparent`	Standard — `sqsExtractContextPropagationFromPayload: false`
Raw message delivery OFF (SNS default)	Inside body JSON envelope: `body.MessageAttributes.traceparent.Value`	Requires custom body parsing

Recommended: enable raw message delivery on SNS→SQS subscriptions. This preserves MessageAttributes through the fanout, so a single extraction path works for both direct and SNS-originated messages.

aws sns set-subscription-attributes \
  --subscription-arn <your-subscription-arn> \
  --attribute-name RawMessageDelivery \
  --attribute-value true

If you cannot change the subscription config, extract from both paths — MessageAttributes first, SNS envelope body as fallback:

function extractProducerContext(message: Message): SpanContext | null {
  // Path 1: direct app → SQS or SNS→SQS with rawMessageDelivery=true
  const fromAttrs = extractFromMessageAttributes(message);
  if (fromAttrs) return fromAttrs;

  // Path 2: SNS→SQS with rawMessageDelivery=false
  // SNS wraps body as: {"Type":"Notification","MessageAttributes":{"traceparent":{"Type":"String","Value":"..."}}}
  return extractFromSnsEnvelope(message);
}

function extractFromSnsEnvelope(message: Message): SpanContext | null {
  try {
    const envelope = JSON.parse(message.Body ?? "{}");
    if (envelope.Type !== "Notification" || !envelope.MessageAttributes) return null;
    const carrier: Record<string, string> = {};
    for (const [key, attr] of Object.entries(envelope.MessageAttributes as Record<string, { Type: string; Value: string }>)) {
      if (attr.Type === "String") carrier[key.toLowerCase()] = attr.Value;
    }
    return spanContextFrom(carrier);
  } catch {
    return null;
  }
}

Polling-based Consumers — Per-Poll and Per-Message Correlation

Long-polling consumers (e.g., setInterval/setTimeout loops) typically receive a batch of up to 10 messages per call. Without explicit spans, all message processing is invisible inside a single receive operation.

The recommended pattern creates two levels of spans:

sqs.poll_cycle (SPAN_KIND_INTERNAL)      ← one per interval tick
  ├── <queue> receive (SPAN_KIND_CONSUMER) ← auto by AwsInstrumentation
  ├── <queue> process (SPAN_KIND_CONSUMER) ← manual, per message
  │     ├── link → producer trace          ← cross-trace navigation
  │     └── SQS.DeleteMessage              ← auto by AwsInstrumentation
  └── <queue> process (SPAN_KIND_CONSUMER) ← parallel per message

Why links and not parent? Setting the producer’s span as parent collapses producer and consumer into one trace tree. Using links keeps them as independent traces that can navigate to each other — correct per the OTel messaging spec.

NestJS / Node.js

import { trace, context, SpanKind, SpanStatusCode, propagation } from "@opentelemetry/api";
import { ReceiveMessageCommand, DeleteMessageCommand, SQSClient, Message } from "@aws-sdk/client-sqs";

const tracer = trace.getTracer("sqs-poller");
const sqs = new SQSClient({ region: process.env.AWS_REGION });

async function pollOnce() {
  // Root span groups the entire interval tick
  const pollSpan = tracer.startSpan("sqs.poll_cycle", {
    kind: SpanKind.INTERNAL,
    attributes: { "messaging.system": "aws.sqs", "messaging.destination.name": QUEUE_NAME },
  });

  await context.with(trace.setSpan(context.active(), pollSpan), async () => {
    try {
      // AwsInstrumentation auto-instruments this as a CONSUMER child span
      const { Messages = [] } = await sqs.send(new ReceiveMessageCommand({
        QueueUrl: QUEUE_URL,
        MaxNumberOfMessages: 10,
        WaitTimeSeconds: 5,
        MessageAttributeNames: ["All"], // required for traceparent extraction
      }));

      pollSpan.setAttribute("messaging.batch.message_count", Messages.length);
      await Promise.all(Messages.map(processMessage));
      pollSpan.setStatus({ code: SpanStatusCode.OK });
    } catch (err) {
      pollSpan.recordException(err as Error);
      pollSpan.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
    } finally {
      pollSpan.end();
    }
  });
}

async function processMessage(message: Message) {
  // Extract producer's span context from MessageAttributes
  const carrier: Record<string, string> = {};
  for (const [key, attr] of Object.entries(message.MessageAttributes ?? {})) {
    const val = attr as { StringValue?: string };
    if (val.StringValue) carrier[key.toLowerCase()] = val.StringValue;
  }
  const producerCtx = trace.getSpanContext(propagation.extract(context.active(), carrier));

  const msgSpan = tracer.startSpan(`${QUEUE_NAME} process`, {
    kind: SpanKind.CONSUMER,
    links: producerCtx ? [{ context: producerCtx }] : [],
    attributes: {
      "messaging.system": "aws.sqs",
      "messaging.message.id": message.MessageId,
      "messaging.operation": "process",
    },
  });

  await context.with(trace.setSpan(context.active(), msgSpan), async () => {
    try {
      await handleMessage(message); // your business logic
      await sqs.send(new DeleteMessageCommand({ QueueUrl: QUEUE_URL, ReceiptHandle: message.ReceiptHandle! }));
      msgSpan.setStatus({ code: SpanStatusCode.OK });
    } catch (err) {
      msgSpan.recordException(err as Error);
      msgSpan.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
    } finally {
      msgSpan.end();
    }
  });
}

Log Correlation

To correlate logs with traces, emit log records through the OTel Logs API with context.active(). The SDK automatically attaches trace_id and span_id from the active span context.

Node.js (OTel Logs API)
Python

First, set up a LoggerProvider in your instrumentation bootstrap:

import { LoggerProvider, BatchLogRecordProcessor } from "@opentelemetry/sdk-logs";
import { OTLPLogExporter } from "@opentelemetry/exporter-logs-otlp-http";
import { logs } from "@opentelemetry/api-logs";

const loggerProvider = new LoggerProvider({ resource });
loggerProvider.addLogRecordProcessor(new BatchLogRecordProcessor(new OTLPLogExporter()));
logs.setGlobalLoggerProvider(loggerProvider);

Then emit structured logs from inside your span context:

import { SeverityNumber, logs } from "@opentelemetry/api-logs";
import { context } from "@opentelemetry/api";

const logger = logs.getLogger("sqs-consumer");

// Inside processMessage(), while msgSpan is active:
logger.emit({
  severityNumber: SeverityNumber.INFO,
  severityText: "INFO",
  body: "message_processing_start",
  attributes: { messageId: message.MessageId, queueName: QUEUE_NAME },
  context: context.active(), // SDK attaches trace_id + span_id automatically
});

Log records arrive in Last9 with trace_id and span_id fields, enabling direct navigation from a log line to its trace in the Last9 UI.

import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentation

# Auto-injects trace_id and span_id into Python log records
LoggingInstrumentation().instrument()

logger = logging.getLogger(__name__)

# Inside your message handler, while a span is active:
logger.info("message_processing_start", extra={"message_id": message_id})
# Log record will include trace_id and span_id fields

Full Examples

Pattern	Example
NestJS polling consumer — per-poll + per-message spans + log correlation	javascript/nestjs-sqs-correlation
Python SQS → Lambda trace propagation	python/aws-sqs-lambda

Best Practices

Security

IAM Roles: Use IAM roles instead of access keys when running on EC2
Least Privilege: Grant only necessary CloudWatch and SQS permissions
Queue Access: Restrict SQS queue access to authorized consumers and producers

Performance

Collection Intervals: Balance monitoring granularity with CloudWatch API costs
Metric Selection: Monitor only metrics relevant to your specific queues
Regional Optimization: Deploy collectors in the same region as SQS queues

Monitoring Strategy

Queue Depth Alerts: Set alerts for excessive queue depth
Consumer Health: Monitor message processing rates and age
Dead Letter Queues: Always monitor DLQs for failed message processing
Cost Optimization: Use appropriate CloudWatch metric collection intervals

Queue Management

Visibility Timeout: Configure appropriate visibility timeouts for your workload
Message Retention: Set appropriate message retention periods
Redrive Policy: Configure dead letter queues with appropriate maxReceiveCount
Long Polling: Use long polling to reduce empty receives and costs

Troubleshooting

CloudWatch API issues

Permission denied. Verify credentials and access to SQS and CloudWatch:

aws sts get-caller-identity
aws sqs list-queues --region us-east-1
aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1 | head -10

Rate limiting. Increase the receiver’s collection_interval to reduce API call volume:

receivers:
  awscloudwatch:
    collection_interval: 600s # 10 minutes instead of 5

Missing metrics

No queue metrics appearing. Confirm queues exist and that CloudWatch has metric data for them:

aws sqs list-queues --region us-east-1

aws cloudwatch get-metric-statistics \
  --namespace AWS/SQS \
  --metric-name ApproximateNumberOfMessages \
  --dimensions Name=QueueName,Value=your-queue-name \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Average

Partial data. List the full set of SQS metrics available and inspect queue attributes:

aws cloudwatch list-metrics --namespace AWS/SQS --region us-east-1

aws sqs get-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/queue-name \
  --attribute-names All

High message age

Slow consumer processing. Check the visibility timeout, long-polling settings, and in-flight message count:

aws sqs get-queue-attributes \
  --queue-url YOUR_QUEUE_URL \
  --attribute-names VisibilityTimeoutSeconds,ReceiveMessageWaitTimeSeconds

aws sqs get-queue-attributes \
  --queue-url YOUR_QUEUE_URL \
  --attribute-names ApproximateNumberOfMessagesNotVisible

Please get in touch with us on Discord or Email if you have any questions.

AWS SQS

Prerequisites

Understanding SQS Metrics

Message Flow Metrics

Queue State Metrics

Performance Metrics

Dead Letter Queue Metrics

FIFO Queue Metrics (FIFO Queues Only)

Advanced Configuration

Multi-Region Monitoring

Queue-Specific Monitoring

Dead Letter Queue Monitoring

Verification

Key Metrics to Monitor

Critical Queue Health Indicators

Performance Monitoring

Dead Letter Queue Monitoring

Trace Context Propagation through SQS

Producer — Injecting Trace Context

Consumer — Extracting Trace Context

Auto-Injection with @opentelemetry/instrumentation-aws-sdk

SNS → SQS — Raw vs Wrapped Delivery

Polling-based Consumers — Per-Poll and Per-Message Correlation

Log Correlation

Full Examples

Best Practices

Security

Performance

Monitoring Strategy

Queue Management

Troubleshooting

Auto-Injection with `@opentelemetry/instrumentation-aws-sdk`