Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Stream AWS Metrics to Grafana with Last9 in 10 minutes

Visualize AWS metrics like Lambda, API Gateway, and RDS in Grafana using Last9. No agents, no code, set it up in under 10 minutes.

Jul 18th, ‘25
Stream AWS Metrics to Grafana with Last9 in 10 minutes
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

It’s 2:47 AM and your Lambda functions are timing out. API response times are spiking. You’re flipping between the CloudWatch console, your APM tool, and your logs, trying to figure out what’s going wrong.

CloudWatch has the metrics you need: CPU usage, memory pressure, and request rates — but connecting that data to what your app is doing takes time. The delay in stitching it all together slows down your incident response. By the time you have a full view, your MTTR has already crossed your SLA.

The CloudWatch Data Correlation Problem

Most teams today run a distributed observability stack, Grafana for custom dashboards, Prometheus for time-series data, an APM for traces, and CloudWatch for AWS metrics. These tools do their jobs well, but they work in isolation.

The real bottleneck isn’t the amount of data—it’s connecting the dots across systems. When something breaks, you need to quickly correlate:

  • High response latency with DB connection pool issues
  • API timeouts with frequent Lambda cold starts
  • RDS CPU spikes with delays in transaction processing
  • ECS resource limits with increased HTTP 5xx errors

CloudWatch’s built-in dashboards offer basic charts, but lack the flexibility you need during an incident. At the same time, your application data lives in Grafana with rich templating and PromQL. Bridging these gaps during outages adds unnecessary friction when you need answers fast.

💡
If you're comparing CloudWatch Metric Streams with Prometheus-based approaches, this guide on Prometheus–CloudWatch integration breaks down the setup and tradeoffs.

Why Use Last9 for CloudWatch Integration?

Before jumping into setup, it’s worth asking: why stream CloudWatch metrics to Last9 at all?

The Problem with Tool Sprawl

Most teams juggle multiple observability tools:

  • Prometheus for metric storage
  • Grafana for dashboards
  • APMs for tracing
  • CloudWatch for AWS metrics
  • Logging systems for application logs

This fragmentation leads to real problems:

  • Operational overhead — More tools mean more integrations and configs
  • Inconsistent data — Each system has its own query language and retention model
  • Vendor lock-in — Proprietary formats make switching harder
  • Pricing complexity — Multiple billing models are tough to predict

What Last9 Solves

Last9 reduces this complexity with a unified, developer-friendly observability platform.

  • Prometheus-compatible: Keep your PromQL queries, dashboards, and alerts as-is
  • Built-in Grafana: Comes pre-wired—no setup, no plugin hassle
  • Multi-source ingestion: Ingest metrics from:
    • AWS CloudWatch
    • Kubernetes (via exporters)
    • App metrics (via client libraries)
    • 3rd-party APIs
    • Custom sources via HTTP
  • Predictable pricing: Charged by usage volume, not per host or metric

Built for High-Scale Workloads

  • Handles high cardinality: Built for noisy container and microservice metrics
  • Long-term retention: Store months of data without slowdowns
  • Extended PromQL: Run complex queries for anomaly detection and forecasting
  • Enterprise-ready: SOC2-compliant, encrypted, and access-controlled

No duct-taped dashboards. No jumping across tools. Just unified observability, built around your existing metrics.

💡
If you're evaluating CloudWatch against OpenTelemetry for observability, this CloudWatch vs OpenTelemetry comparison outlines how they differ across metrics, traces, and cost control.

How It Works: Streaming CloudWatch Metrics to Last9

CloudWatch Metric Streams let you export AWS metrics in near real-time, no polling, no API throttling. You can stream them directly into Last9 using Amazon Kinesis Firehose.

Here’s the architecture in action:

AWS Services (Lambda, RDS, ECS)

CloudWatch Metric Streams

Amazon Kinesis Firehose

Last9 HTTP Endpoint (OpenTelemetry format)

Grafana (Unified Dashboards)

Why This Setup Works Well

  • Real-time metrics
    Sub-minute latency. No need to wait for scrapers or API pulls.
  • Lower AWS costs
    Avoids the pricey CloudWatch GetMetricData calls.
  • Scales automatically
    Kinesis Firehose can handle high-throughput metric streams out of the box.
  • Standards-based format
    Metrics are streamed in OpenTelemetry format, making them easy to ingest, analyze, and visualize alongside your app metrics.

With this pipeline, your AWS infrastructure metrics show up in Grafana, next to app and business metrics, without juggling exporters or writing custom collectors.

10-Minute Setup from CloudWatch to Grafana

Here’s how to route your AWS metrics from CloudWatch into Grafana using Last9’s observability platform.

Step 1: Get Your Last9 Integration Credentials

In your Last9 dashboard:

  • Go to Home → Integrations → CloudWatch
  • Copy the following credentials:
    • HTTP Endpoint URL: Target for your Firehose delivery stream
    • Username and Password: Used for HTTP basic auth

You’ll use these to authenticate AWS Kinesis Firehose with Last9’s metric ingestion endpoint.

Step 2: Set Up IAM Permissions

CloudWatch metric streaming relies on multiple AWS services. Create an IAM policy with the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:StartMetricStreams",
        "cloudwatch:PutMetricStream",
        "cloudwatch:GetMetricStream",
        "cloudwatch:GetMetricData",
        "cloudwatch:ListMetrics",
        "cloudwatch:ListMetricStreams"
      ],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "firehose:CreateDeliveryStream",
        "firehose:PutRecord",
        "firehose:PutRecordBatch",
        "firehose:DescribeDeliveryStream",
        "firehose:UpdateDestination",
        "firehose:ListDeliveryStreams"
      ],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:CreateBucket",
        "s3:GetBucketLocation",
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": ["arn:aws:s3:::*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateRole",
        "iam:CreatePolicy",
        "iam:AttachRolePolicy",
        "iam:CreatePolicyVersion",
        "iam:DeletePolicyVersion",
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::<account_id>:role/*",
        "arn:aws:iam::<account_id>:policy/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream"
      ],
      "Resource": [
        "arn:aws:logs:<region>:<account_id>:log-group:*:log-stream:*"
      ]
    }
  ]
}

Why each block matters:

  • CloudWatch: Enables metric stream setup and retrieval
  • Firehose: Handles metric delivery to Last9
  • S3: Provides backup storage for failed records
  • IAM: Grants Firehose the ability to assume roles and attach policies
  • CloudWatch Logs: Lets you monitor Firehose delivery failures

Step 3: Create Kinesis Data Firehose Delivery Stream

  1. Go to the AWS Kinesis → Delivery Streams section
  2. Click Create delivery stream
  3. Choose Direct PUT as the source
  4. Set the stream name as: last9-{your-org-name}
  5. Configure HTTP Endpoint:
    • Endpoint URL: Paste Last9’s ingestion URL
    • Authentication: Use basic auth with your credentials
  6. Set buffer parameters:
    • Size: 1 MB (default)
    • Interval: 60 seconds
  7. Enable GZIP compression
  8. Set an S3 bucket for backup of failed deliveries
  9. Enable CloudWatch Logs for error tracking
  10. Click Create delivery stream

Notes:

  • GZIP cuts bandwidth by ~70% for time series payloads
  • Buffering affects latency vs request volume tradeoff
  • S3 + Logs improve durability and visibility during failures

Step 4: Create CloudWatch Metric Stream

  1. In the CloudWatch → Metrics → Streams section, click Create metric stream
  2. Choose your metrics:
    • All metrics: For initial observability
    • Selective: For cost-controlled setups
  3. Set Firehose delivery stream as the destination
  4. Choose Output format: OpenTelemetry 0.7
  5. Name your stream: last9-{your-org-name}
  6. Leave state as Enabled
  7. Click Create

Tips for metric filtering:

  • Start broad, then filter based on what’s useful
  • Namespaces: Focus on Lambda, RDS, ECS, API Gateway
  • Metrics: Avoid high-cardinality metrics unless they’re actionable
  • Statistics: Sum, Avg, Max are common; adjust for alerting needs

Step 5: Verify Ingestion in Grafana

Within a few minutes, AWS metrics will begin appearing in Last9-powered Grafana dashboards. All CloudWatch namespaces are prefixed with amazonaws_com_AWS_.

Some examples to validate:

# Lambda function duration
amazonaws_com_AWS_Lambda_Duration{function_name="your-function-name"}

# RDS CPU Utilization
amazonaws_com_AWS_RDS_CPUUtilization{db_instance_identifier="your-db-instance"}

# ECS container CPU
amazonaws_com_AWS_ECS_CPUUtilization{cluster_name="your-cluster-name"}
💡
To understand how to publish and use custom metrics in CloudWatch, check out this guide on AWS CloudWatch custom metrics with detailed examples.

Advanced Dashboard Implementation

Once CloudWatch metrics are flowing through Last9, you can build dashboards that directly tie AWS service behavior to application performance characteristics. Below are a few practical examples.

1. Lambda Performance and Cold Start Impact

Monitor Lambda execution behavior across P50–P99 durations, cold start latency, and downstream dependencies.

PromQL Queries:

Database connection usage:

db_connection_pool_active_connections{service="api-handler"}

Upstream request latency:

histogram_quantile(0.95, http_request_duration_seconds{handler="/api/users"})

Error rate vs. invocations:

rate(amazonaws_com_AWS_Lambda_Errors[5m]) / rate(amazonaws_com_AWS_Lambda_Invocations[5m])

Memory pressure:

amazonaws_com_AWS_Lambda_MemoryUtilization{function_name="api-handler"}

Cold start time and frequency:

amazonaws_com_AWS_Lambda_InitDuration{function_name="api-handler"}
rate(amazonaws_com_AWS_Lambda_InitDuration[5m])

Latency percentiles:

histogram_quantile(0.50, amazonaws_com_AWS_Lambda_Duration{function_name="api-handler"})
histogram_quantile(0.95, amazonaws_com_AWS_Lambda_Duration{function_name="api-handler"})
histogram_quantile(0.99, amazonaws_com_AWS_Lambda_Duration{function_name="api-handler"})

This dashboard helps correlate cold starts with tail latencies, track memory usage under burst loads, and detect application bottlenecks caused by external service calls or DB saturation.

2. RDS Metrics and Query Latency Correlation

Visualize infrastructure-level metrics alongside database and app-level behavior to spot query slowdowns, CPU pressure, or connection pool starvation.

PromQL Queries:

5xx error trends:

rate(http_requests_total{status=~"5.."}[5m])

Connection pool wait times:

db_connection_pool_wait_time_seconds

Query durations (app-side):

db_query_duration_seconds{query_type="SELECT"}
db_query_duration_seconds{query_type="INSERT"}

Disk I/O and latency:

amazonaws_com_AWS_RDS_ReadIOPS{db_instance_identifier="prod-db"}
amazonaws_com_AWS_RDS_WriteIOPS{db_instance_identifier="prod-db"}
amazonaws_com_AWS_RDS_ReadLatency{db_instance_identifier="prod-db"}
amazonaws_com_AWS_RDS_WriteLatency{db_instance_identifier="prod-db"}

Connection usage:

amazonaws_com_AWS_RDS_DatabaseConnections{db_instance_identifier="prod-db"}

CPU and credit balance:

amazonaws_com_AWS_RDS_CPUUtilization{db_instance_identifier="prod-db"}
amazonaws_com_AWS_RDS_CPUCreditBalance{db_instance_identifier="prod-db"}

Use this dashboard to detect when spikes in latency or errors align with increased connection usage, CPU saturation, or degraded IOPS. It also helps surface whether query-level slowness is infrastructure-bound or coming from upstream services.

3. ECS Resource Usage and Application Load

Track how ECS-managed workloads consume CPU/memory, scale task counts, and respond to application-level traffic patterns.

PromQL Queries:

Container restarts (Kube or ECS-level):

increase(container_restarts_total[1h])

Request volume and latency (from app metrics):

rate(http_requests_total[5m])
histogram_quantile(0.95, http_request_duration_seconds)

Network I/O:

amazonaws_com_AWS_ECS_NetworkRxBytes{cluster_name="production"}
amazonaws_com_AWS_ECS_NetworkTxBytes{cluster_name="production"}

Running and pending tasks:

amazonaws_com_AWS_ECS_RunningTaskCount{cluster_name="production", service_name=~".*"}
amazonaws_com_AWS_ECS_PendingTaskCount{cluster_name="production", service_name=~".*"}

Memory utilization:

amazonaws_com_AWS_ECS_MemoryUtilization{cluster_name="production", service_name=~".*"}

Service-level CPU usage:

amazonaws_com_AWS_ECS_CPUUtilization{cluster_name="production", service_name=~".*"}

This dashboard is useful for tracking scaling anomalies, resource constraints across ECS services, and changes in network or memory usage that could lead to degraded app performance or restarts.

Production Optimization Strategies

Cost Management

CloudWatch metric streaming costs can quickly add up, especially if you're streaming large volumes of data you don’t use. To control spend without compromising observability:

1. Namespace Filtering

Restrict to essential AWS services only. For most applications, that includes:

"IncludeFilters": [
  { "Namespace": "AWS/Lambda" },
  { "Namespace": "AWS/RDS" },
  { "Namespace": "AWS/ECS" },
  { "Namespace": "AWS/ApplicationELB" }
]

2. Metric-Level Filtering

Filter out noisy or low-value metrics. Stream only what's needed for alerting and performance tracking:

"MetricNames": [
  "Duration", "Errors", "Invocations",             // Lambda
  "CPUUtilization", "DatabaseConnections",         // RDS
  "MemoryUtilization"                              // ECS
]

3. Statistic Selection

Avoid streaming every statistic. Focus on percentiles for SLO/SLA tracking:

"AdditionalStatistics": [ "p50", "p95", "p99" ]

Performance Tuning

Fine-tune Kinesis Firehose delivery settings based on your monitoring needs.

Buffer Configuration

  • Low latency (near real-time):
    • Buffer size: 1MB
    • Interval: 60 seconds
  • High throughput (batch analytics):
    • Buffer size: 5MB
    • Interval: 300 seconds

Compression

  • Enable GZIP to reduce network overhead.

Monitor Delivery Stream Health

Track the performance of metric delivery using built-in CloudWatch metrics:

  • Delivery success rate:
    amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_Success
  • Delivery latency (freshness):
    amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_DataFreshness
  • Processing failures:
    amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_ProcessingFailed

Error Handling and Monitoring

To prevent data loss during delivery failures:

S3 Backup Setup

  • Configure a fallback S3 bucket for failed deliveries.
  • Apply a 30-day retention policy for debugging or replay.

Alerting

  • Set up CloudWatch Alarms on S3 object count to detect spikes in delivery failures.
💡
Last9 includes full monitoring capabilities:alerting, notifications, and all the knobs you need. If you're dealing with gaps in coverage, alert fatigue, or cleanup overhead, here’s how Last9 alerting helps you manage it.

Example PromQL Queries

# Failed delivery rate over 5 minutes
rate(amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_ProcessingFailed[5m])

# P95 delivery latency
histogram_quantile(0.95, rate(amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_DataFreshness_bucket[5m]))

Troubleshoot CloudWatch Metric Streaming Issues

This section outlines common failure scenarios and how to systematically debug issues across CloudWatch, Kinesis Firehose, and Last9’s ingestion pipeline.

1. Metrics Are Not Appearing in Grafana

Check CloudWatch Metric Stream Status

Run the following command to verify if the metric stream is active:

aws cloudwatch describe-metric-streams --names last9-your-org

If the status is Creating, Updating, or Failed, metrics won’t stream.

Validate Kinesis Firehose Configuration

Make sure the delivery stream is active and correctly set up to push data to an HTTP endpoint:

aws firehose describe-delivery-stream --delivery-stream-name last9-your-org

Look for DeliveryStreamStatus: ACTIVE and a valid HTTP destination.

Inspect Firehose Logs for Delivery Failures

Check if metrics are failing to reach Last9 due to network errors, timeouts, or invalid endpoints:

aws logs describe-log-groups --log-group-name-prefix /aws/kinesisfirehose/last9-your-org

Look for DeliveryToHttpEndpoint_Failure metrics or log entries with 4xx/5xx status codes.

2. Unexpected Increase in AWS Costs

Analyze High-Volume Metric Namespaces

CloudWatch Metric Streams charge based on volume. Namespaces like AWS/EC2, AWS/ApplicationELB, or high-cardinality custom metrics can lead to high costs.

Use the following metric to understand volume:

amazonaws_com_AWS_CloudWatch_MetricStreamRecords{stream_name="last9-your-org"}

Review Firehose Data Throughput

Large payloads or inefficient batching can spike costs. Check:

amazonaws_com_AWS_KinesisFirehose_DeliveryToHttpEndpoint_Bytes

Reduce frequency, batch more aggressively, and enable GZIP compression if needed.

Audit S3 Backup Usage

If S3BackupMode is enabled, undelivered metrics may be stored, incurring additional storage charges.

Check the configured S3 bucket:

aws s3 ls s3://your-backup-bucket --recursive

3. Metrics Are Delayed or Dropped

Monitor Delivery Buffer Limits

Firehose buffering settings may be too aggressive or too small. Use:

  • BufferingIntervalInSeconds
  • BufferingSizeInMBs

Check if your stream regularly hits the upper threshold and adjust as needed:

aws firehose update-destination \
  --delivery-stream-name last9-your-org \
  --http-endpoint-destination-configuration '{"BufferingHints":{"SizeInMBs":10,"IntervalInSeconds":300}, "CompressionFormat":"GZIP"}'

Inspect Firehose Throughput Limits

Each Firehose shard supports ~1 MB/sec or 1,000 records/sec. Exceeding this requires scaling:

  • Add more shards
  • Enable parallelism via multiple delivery streams (if applicable)

Investigate Endpoint Health

If Last9’s ingestion endpoint is overloaded or rate-limited, Firehose retries can add latency or drop metrics. Check for:

  • Increased DeliveryToHttpEndpoint_Failure metrics
  • Elevated retry attempts
  • 429 or 5xx responses in logs
💡
Fix production issues faster with Last9 MCP, access CloudWatch logs, metrics, and traces right from your IDE. Get real-time context to debug AWS services without switching tabs

Architecture Patterns for CloudWatch Metric Streaming

Choose integration patterns based on your architecture type. Below are the key CloudWatch namespaces and metric selectors to stream into Last9 for effective monitoring and troubleshooting.

Observability Patterns for Microservices on AWS

In a microservices setup, AWS services are often split across compute, data, and messaging layers. To monitor these distributed components, stream from these CloudWatch namespaces:

  • AWS/Lambda – Event-driven compute
  • AWS/RDS – Relational database layer
  • AWS/ElastiCache – In-memory cache stores
  • AWS/SQS – Queueing and decoupling services
  • AWS/ApplicationELB – Load balancing
  • AWS/ApiGateway – API management and routing

These provide coverage across service interactions—ideal for tracing tail latency, retries, or timeouts across upstream and downstream dependencies.

Monitoring Serverless Architectures with CloudWatch

Serverless systems depend on tightly coupled services like Lambda, API Gateway, and DynamoDB. These metric selectors help track cold starts, latency spikes, and throughput issues.

Key Lambda, API Gateway, and DynamoDB Metrics:

# Lambda duration and concurrency
amazonaws_com_AWS_Lambda_Duration{function_name=~".*"}
amazonaws_com_AWS_Lambda_ConcurrentExecutions{function_name=~".*"}

# API Gateway latency and error rates
amazonaws_com_AWS_ApiGateway_Latency{api_name=~".*"}
amazonaws_com_AWS_ApiGateway_4XXError{api_name=~".*"}

# DynamoDB table capacity
amazonaws_com_AWS_DynamoDB_ConsumedReadCapacityUnits{table_name=~".*"}
amazonaws_com_AWS_DynamoDB_ConsumedWriteCapacityUnits{table_name=~".*"}

Use these to detect concurrency bottlenecks, misconfigured throttling, or excessive cold starts under load.

Observability for Container-Based Deployments (ECS/EKS)

When running applications on ECS or Kubernetes (EKS), metric visibility must include both cluster-level resource usage and service-level performance.

ECS Resource and Task Metrics:

# Cluster resource usage
amazonaws_com_AWS_ECS_CPUUtilization{cluster_name="production"}
amazonaws_com_AWS_ECS_MemoryUtilization{cluster_name="production"}

# Scaling signals
amazonaws_com_AWS_ECS_RunningTaskCount{service_name=~".*"}
amazonaws_com_AWS_ECS_PendingTaskCount{service_name=~".*"}

Combine with Application-Level Metrics:

# Request rate and latency histograms
rate(http_requests_total[5m])
histogram_quantile(0.95, http_request_duration_seconds)

This pairing enables direct correlation between infra behavior (like CPU spikes) and application symptoms (like slow response times or elevated error rates).

Final Notes

CloudWatch collects a ton of useful metrics, but getting them out in a usable, cost-efficient, and queryable form is where most setups fall short.

By streaming CloudWatch metrics directly to Last9, you skip the painful parts: no polling, no re-learning query syntax, no brittle dashboard workarounds. Just clean Prometheus-style metrics you can use.

This setup works well across stacks, be it serverless, containers, or traditional EC2-based microservices. And once it’s running, your team gets actual visibility into AWS workloads, not just surface-level graphs.

💡
And if you need to go deeper, our Discord community has a dedicated channel for technical discussions, bring your use case, compare notes, and get input from other engineers.

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.