AWS Prometheus: Production Patterns That Help You Scale

You've got Prometheus running in one cluster — maybe a dev environment, a single EKS cluster, or a proof-of-concept setup. The configuration is straightforward: node_exporter on a few EC2 instances, some service discovery for pods, and a single Prometheus server scraping everything. Storage is local, retention is 15 days, and you can keep all the default recording rules without worrying about costs.

As you expand beyond that first cluster, new complexity appears. Now you're looking at multi-region deployments, dozens of services across different teams, and workloads that generate vastly different metric volumes. Some services emit thousands of metrics per instance; others barely register above the baseline. You need to route metrics to multiple backends — some for long-term storage, others for real-time alerting.

At this scale, operational realities start showing up:

Prometheus servers that run out of memory during high-cardinality scrapes
Retention policies that work in staging but create storage pressure in production
remote_write configurations that work fine until AWS network costs become a line item
Federation setups that drop metrics during region failovers

Solving these one-off works in the moment, but patterns matter more than individual fixes. Building reliable Prometheus infrastructure on AWS means choosing deployment models that handle variable workloads, maintain visibility across regions, and scale without breaking the bank.

This blog covers the production deployment patterns that work across AWS environments, configuration practices that prevent drift, and strategies for managing costs while keeping full metric coverage.

The Reality of Prometheus Resource Usage

Before rolling out Prometheus on AWS, it helps to know what resources it actually consumes. Getting this right up front makes it easier to size instances, set retention policies, and plan costs with realistic numbers instead of guesses.

AWS’s own benchmarks for Amazon Managed Service for Prometheus (AMP) highlight some common patterns:

Memory. Expect roughly 2GB RAM for every million active series, plus another 1–2GB for query execution. If your workloads are spiky or highly dimensional, budget extra headroom.
Storage. Each sample takes about 1–2 bytes on disk once compressed. Daily churn adds up quickly, so retention settings directly affect how much storage you’ll need.
CPU. Ingestion volume doesn’t usually stress CPUs; query complexity does. A simple range query may barely touch a core, while an expensive PromQL join can max one out.
Network. A remote_write setup generates roughly 100–200MB/hr per 100k active series, depending on scrape intervals and label cardinality. Cross-AZ traffic costs can add up if you’re not careful.

With these numbers, you can:

Choose the right EC2 instance types or AMP workspace sizes for your expected series counts
Set retention periods that strike a balance between visibility and storage cost
Plan remote_write bandwidth usage and anticipate cross-AZ transfer charges
Build federation hierarchies that spread query and storage load across regions effectively

💡

To extend Prometheus visibility beyond cluster metrics, check out how to integrate Prometheus with CloudWatch for AWS metric collection.

Prometheus Deployment Patterns on AWS

Prometheus works well in small setups, but production environments on AWS quickly demand more thoughtful deployment choices. Most teams end up using one of three patterns, each with strengths and trade-offs.

1. Single Prometheus Server

The simplest setup is running one Prometheus server that scrapes all targets, stores metrics locally, and handles queries and alerts. You’ll see this deployed on EC2 (sometimes behind an ALB) or as Amazon Managed Service for Prometheus (AMP).

Why teams start here:

A single configuration to manage
No coordination across servers
Easy service discovery through EC2, EKS, or ECS APIs

This pattern fits best when:

You’re in a single region with predictable workloads
Active series counts stay below ~1–2 million
Teams want low operational overhead
Multi-tenant isolation isn’t a requirement

Example — EKS scrape configuration:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true

  - job_name: 'ec2-nodes'
    ec2_sd_configs:
    - region: us-west-2
      port: 9100

When it starts breaking down:

Queries slow down during scrape bursts
Memory usage sits above 70% regularly
remote_write queues clog during traffic spikes
Federation queries timeout or return incomplete data

Once OOM kills or scrape delays become routine, vertical scaling only stretches so far—you’ll need to distribute the load.

2. Federated Prometheus

Federation introduces hierarchy: regional or cluster-level Prometheus servers scrape local workloads, while a global server pulls aggregated metrics from them using the /federate endpoint.

Why it works well:

Regional metrics stay close to workloads
Cross-region latency and transfer costs stay manageable
Teams can run their own Prometheus instances, but still share dashboards
Only aggregate or selected metrics are shipped upstream

Example — global Prometheus scraping regional servers:

scrape_configs:
  - job_name: 'federate-us-west-2'
    scrape_interval: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"kubernetes-.*"}'
        - '{__name__=~"up|prometheus_.*"}'
    static_configs:
      - targets: ['prom-us-west-2.internal:9090']

  - job_name: 'federate-eu-west-1'
    scrape_interval: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"kubernetes-.*"}'
        - '{__name__=~"up|prometheus_.*"}'
    static_configs:
      - targets: ['prom-eu-west-1.internal:9090']

Things to keep in mind:

Regional servers need retention tuned for federation intervals
Network issues can create blind spots in global dashboards
Labels must be handled carefully to avoid collisions (honor_labels is key)
Federation queries can become costly as series counts grow

3. Remote Storage with Local Prometheus

Here, Prometheus servers live close to workloads for scraping and short-term retention, but metrics are streamed via remote_write into centralized storage (Amazon AMP, Cortex, or commercial backends).

Why this model is popular:

Local Prometheus handles fast queries and alerts
Centralized storage keeps metrics for the long haul
Failures are isolated — local queries still work if remote storage stalls
You can switch backends without changing scrape configs

Example — remote_write to AMP:

global:
  scrape_interval: 15s

remote_write:
  - url: https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-abc123/api/v1/remote_write
    sigv4:
      region: us-west-2
    queue_config:
      capacity: 10000
      max_samples_per_send: 5000
      batch_send_deadline: 5s

scrape_configs:
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
    - role: node

Operational tuning tips:

Set the queue capacity to cover at least 2–3 minutes of data if the backend slows down
Adjust batch size for memory vs. network trade-offs
Watch WAL (write-ahead log) usage, which grows with queue depth and retention settings.

💡

To understand how to collect and use messaging metrics alongside your AWS infrastructure metrics, this write-up on Amazon SQS metrics collection shows practical examples of what to monitor and how it fits into a full observability strategy.

Amazon Managed Service for Prometheus at Scale

Amazon Managed Service for Prometheus (AMP) takes care of the heavy lifting: server infrastructure, automatic scaling, and cross–Availability Zone replication. You send metrics with remote_write and query them using the same PromQL syntax and Grafana integrations you already use.

Multi-Region Setup for AMP

In global environments, you’ll often create workspaces in multiple regions and enable cross-region access for queries.

# Create workspaces in primary regions
aws amp create-workspace --alias production-us-west-2 --region us-west-2
aws amp create-workspace --alias production-eu-west-1 --region eu-west-1

# Allow cross-region queries with a resource policy
aws amp put-resource-policy \
    --resource-arn arn:aws:aps:us-west-2:123456789012:workspace/ws-abc123 \
    --policy-text file://amp-cross-region-policy.json

This gives each region its own Prometheus workspace, while still allowing global queries when needed.

Cost Control in AMP

AMP pricing is based on two dimensions: the number of metric samples ingested and the compute time spent on queries. To keep costs in check, a few practices help.

Filter Metrics at the Source

Instead of pushing every raw time series, use recording rules in local Prometheus to pre-aggregate and send only the reduced dataset.

# Recording rules to reduce remote_write volume
groups:
  - name: cost_optimization
    interval: 30s
    rules:
      - record: instance:cpu_utilization:rate5m
        expr: 1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
      
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))

This configuration aggregates CPU and HTTP request metrics before sending them upstream, cutting ingestion volume significantly.

Keep Metrics Regional

Store metrics in the same region where they are generated to avoid cross-region transfer charges and keep query performance consistent.

Align Retention with Usage

AMP has fixed retention, but you control what gets ingested. Dropping unnecessary series at ingestion is cheaper and cleaner than storing everything and filtering later.

With these patterns, AMP supports large-scale, multi-region monitoring while keeping costs predictable and queries responsive.

Prometheus Deployment Models on EKS

Running Prometheus inside EKS gives you control over resources, storage, and configuration. You decide the instance types, storage classes, and retention periods. The trade-off is that you take on operational overhead—high availability, scaling, and upgrades are now your responsibility.

High-Availability Prometheus on EKS

A common approach is deploying Prometheus as a StatefulSet with persistent volumes and anti-affinity rules so replicas don’t end up on the same node. This ensures metrics survive pod restarts and that workloads are resilient to node failures.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  serviceName: prometheus
  replicas: 2
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:v2.45.0
        args:
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=15d'
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--web.enable-lifecycle'
          - '--web.enable-admin-api'
        volumeMounts:
        - name: prometheus-storage
          mountPath: /prometheus
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: prometheus
            topologyKey: kubernetes.io/hostname
  volumeClaimTemplates:
  - metadata:
      name: prometheus-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 200Gi

Here, Prometheus runs with two replicas, each with 200Gi of gp3 storage. The anti-affinity rules ensure pods are scheduled on separate nodes, reducing the chance of losing both replicas in one failure event.

Service Discovery for Mixed Workloads

Most EKS clusters host a mix of Kubernetes-native workloads and external services running on EC2. Prometheus can scrape both by combining Kubernetes and EC2 service discovery in the same configuration.

scrape_configs:
  # Kubernetes service discovery for pods
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true

  # EC2 service discovery for external services  
  - job_name: 'ec2-external-services'
    ec2_sd_configs:
    - region: us-west-2
      port: 8080
      filters:
      - name: tag:PrometheusMonitoring
        values: [enabled]
      - name: instance-state-name  
        values: [running]
    
    relabel_configs:
    - source_labels: [__meta_ec2_instance_id]
      target_label: instance_id
    - source_labels: [__meta_ec2_tag_Environment]
      target_label: environment

In this setup:

Kubernetes services are automatically discovered through pod annotations.
EC2 instances are filtered by tags (PrometheusMonitoring=enabled) and running state, with useful metadata like instance_id and environment added as labels.

This mixed service discovery pattern is common in hybrid environments where some workloads run inside Kubernetes, while others remain on EC2.

With Prometheus on EKS, you get flexibility to shape deployments for your environment—whether it’s HA setups with persistent storage or hybrid discovery across Kubernetes and EC2. The key trade-off: more power, but more responsibility to manage upgrades, scaling, and failover.

💡

If you’re looking at how logs fit into observability with Prometheus and infrastructure metrics, here’s a guide on AWS centralized logging that walks through collecting, querying, and managing logs across services.

Storage and Retention Strategies for Prometheus on AWS

Prometheus performance and cost depend on how storage is set up. The goal is to keep recent data on fast disks for quick queries, while moving older data into cheaper storage for long-term retention. On AWS, two approaches are most common: tuning EBS volumes for short-term workloads and using S3 for historical metrics.

EBS Volumes Optimized for Short-Term Prometheus Workloads

Prometheus generates a lot of random I/O during queries and compaction. gp3 volumes are a strong choice because IOPS and throughput can be tuned independently of volume size, letting you balance speed and cost.

Example — StorageClass for Prometheus with tuned EBS gp3 volumes:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-optimized
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "4000"      # Higher IOPS for query performance
  throughput: "250" # Sufficient for compaction operations
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

This setup provisions gp3 volumes with 4000 IOPS and 250 MB/s throughput, ensuring queries stay responsive while compaction jobs don’t bottleneck.

Tiered Storage Model with S3 for Historical Metrics

For data that needs to live beyond Prometheus’ local retention window, systems like Thanos or Cortex can push blocks into S3. A common pattern is to keep 15–30 days of recent metrics on EBS for fast queries, while shifting older blocks to S3 for durability at lower cost.

Example — Thanos sidecar configuration for S3 storage:

apiVersion: v1
kind: ConfigMap
metadata:
  name: thanos-storage-config
data:
  storage.yaml: |
    type: s3
    config:
      bucket: prometheus-metrics-long-term
      endpoint: s3.us-west-2.amazonaws.com
      region: us-west-2
      encrypt_sse: true

This setup ensures engineers get quick access to recent metrics during debugging while still having months or years of data archived in S3 for capacity planning and SLA reviews.

Combining EBS for short-term queries with S3 for long-term durability gives Prometheus deployments both speed and cost efficiency.

Reduce AWS Costs Without Losing Visibility

Once the Prometheus infrastructure on AWS is stable, the next challenge is managing costs. Bills can rise quickly as metric volumes increase, storage grows, and cross-region transfers add up. The good news is that there are patterns to cut costs without sacrificing the visibility needed for operations.

Remote Write Optimization for Lower Network Costs

remote_write traffic can become expensive in high-volume setups. Adjusting batching, compression, and retry behavior reduces overhead and keeps data transfer costs in check.

remote_write:
  - url: https://your-backend.com/api/v1/write
    # Increase batch size to reduce network overhead
    queue_config:
      max_samples_per_send: 10000
      batch_send_deadline: 10s
      capacity: 20000
    
    # Use compression
    compression: snappy
    
    # Reduce retry backoff to clear queues faster
    retry_on_failure:
      initial_interval: 1s
      max_interval: 5s

Larger batch sizes and compression make traffic more efficient, while shorter retry intervals prevent queues from building up during spikes.

Metric Filtering to Control Volume

Not every metric deserves the same retention window or resolution. Relabeling rules let you drop noisy series or sample high-cardinality metrics more aggressively.

# Drop noisy metrics that don't provide value
metric_relabel_configs:
- source_labels: [__name__]
  regex: 'container_fs_.*|container_network_.*_errors_total'
  action: drop

# Sample high-cardinality metrics more aggressively
- source_labels: [__name__]  
  regex: 'http_request_duration_seconds_bucket'
  target_label: __tmp_sample_rate
  replacement: '0.1'

This configuration discards low-value container metrics and reduces the resolution of expensive histogram series, cutting ingestion volume while preserving essential visibility.

Regional Data Locality to Avoid Transfer Charges

Cross-region data transfer is one of the most common hidden costs in AWS monitoring. Keeping Prometheus writes within the same region as the workload eliminates those charges and improves query performance.

# Regional Prometheus configuration
global:
  external_labels:
    region: us-west-2
    cluster: production-cluster

remote_write:
  - url: https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-regional/api/v1/remote_write
    sigv4:
      region: us-west-2  # Same region to avoid transfer costs

Adding region and cluster labels ensures queries can still correlate across environments, while raw data stays within its originating region.

💡

With Last9 MCP, you can pull live Prometheus metrics, logs, and traces from AWS into your local environment, giving you the production context needed to troubleshoot issues and fix code faster.

Troubleshoot Prometheus Problems on AWS

A scrape failure can look like a storage problem. High query latency might actually be a network hiccup. The steady way to debug is to walk the data path in order—discovery → scrape → storage → query—and confirm each layer before touching the next.

Start with Prometheus Health

If ingestion is shaky, everything downstream is noise.

Process health: pod/instance status, recent restarts, CPU/RAM.
Config status: last successful reload, syntax errors in logs.
Target inventory: discovered vs. expected targets per job.

# Quick health checks
curl http://localhost:9090/-/healthy
curl http://localhost:9090/-/ready
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'

Healthy/ready endpoints confirm the server is up; target count tells you whether discovery matches expectations.

Confirm Target Discovery

Discovery gaps often masquerade as scrape errors.

Compare kubernetes_sd and ec2_sd results with what you expect.
Verify relabel rules aren’t dropping targets you need.

# Count discovered targets per job
curl -s http://localhost:9090/api/v1/targets \
 | jq '.data.activeTargets | group_by(.labels.job) | map({job: .[0].labels.job, count: length})'

This surfaces jobs with zero or surprisingly low target counts.

Verify Scrape Health

Once discovery looks right, check that scrapes succeed and stay within budgets.

# Targets down (0 = down)
up == 0

# P95 scrape duration per job (watch for spikes)
histogram_quantile(0.95, rate(prometheus_target_scrapes_duration_seconds_bucket[5m])) by (job)

# Dropped samples per job (relabel rules or limits)
sum by (job) (rate(scrape_samples_dropped[5m]))

A run of up == 0 across many target points to auth, DNS, or the network. Rising P95 scrape time hints at latency or overloaded targets. Dropped samples signal relabel filters or sample limits, cutting data.

Check Storage Layer

High memory and slow queries often trace back to WAL pressure or compaction issues.

# WAL growth (backlog from remote_write or heavy ingest)
prometheus_tsdb_wal_size_bytes

# Compaction failures (should stay at zero)
increase(prometheus_tsdb_compactions_failed_total[1h]) > 0

# Remote write backlog
prometheus_remote_storage_pending_samples

Continuous WAL growth with pending samples usually means disk I/O or the remote endpoint can’t keep up. Compaction failures drag query speed; check EBS gp3 IOPS/throughput and disk latency.

Assess Query Performance

If storage looks fine but dashboards crawl, focus on query load.

# P95 query latency for instant queries
histogram_quantile(0.95, rate(prometheus_http_request_duration_seconds_bucket{handler="/api/v1/query"}[5m]))

# Concurrency limits
prometheus_engine_queries_concurrent_max

Heavy joins, wide regexes, and unbounded range vectors chew CPU. Tighten dashboard panels and alert rules; prefer pre-aggregated recording rules for hot paths.

Account for AWS Network Factors

Issues outside Prometheus often explain “random” failures.

Security groups: confirm ports for targets and remote_write.
Cross-AZ latency: watch scrape duration for cross-AZ targets.
ELB quirks: health checks and idle timeouts can clash with scrape intervals.
EBS throttles: check CloudWatch for volume IOPS/throughput caps during compaction.

Next Steps

Pick the deployment model that fits your current scale: single Prometheus for smaller setups, federation for multi-region clusters, or AMP if you want AWS to manage infrastructure. Focus on stable service discovery and storage first, then expand into AWS services like ELB, RDS, and Lambda as visibility needs grow.

Where Last9 Adds Value
Pairing Prometheus with Last9 ensures you keep visibility without the usual cost or performance trade-offs:

Sensitive-data filtering and remapping — Control what labels and values are ingested, so you avoid sending unnecessary or sensitive metadata upstream.
Streaming aggregation — Aggregate or compress data before storage to reduce cost while preserving key signals.
Cold storage with rehydration — Store months or years of metrics in cost-efficient S3-style storage and pull them back when long-range queries are needed.
High-cardinality support — Handle workloads with millions of unique labels while keeping queries responsive.

Integrating Last9 into your AWS Prometheus pipeline gives you the flexibility to scale telemetry volume, retention, and analysis without running into bottlenecks or unpredictable bills.

Start for free today or talk to your team to discuss how it fits into your stack!

FAQs

Q: Should I use Amazon Managed Service for Prometheus or self-hosted?
A: Use AMP if you prefer managed infrastructure and can work within its fixed retention and configuration limits. Choose self-hosted if you need control over storage tuning, custom recording rules, or federation setups.

Q: How do I run Prometheus with high availability across AWS availability zones?
A: Deploy multiple Prometheus instances with identical configurations, add external labels to distinguish them, and route queries through a load balancer that targets healthy nodes.

Q: What’s the best way to monitor EC2 instances and EKS workloads together?
A: Combine service discovery methods: ec2_sd for EC2 instances and kubernetes_sd for pods. Use consistent labels (e.g., environment, region) to make correlation easier.

Q: How much does remote_write cost on AWS?
A: Same-region traffic is typically under $0.09/GB. Cross-region transfers range from $0.09–0.15/GB depending on the source and destination regions.

Q: Can I migrate from self-hosted Prometheus to AMP without losing data?
A: AMP doesn’t support importing existing data. The common approach is to run both systems in parallel and use remote_write to AMP while keeping local storage until the cutover is complete.

Q: What retention period works best for Prometheus on EKS?
A: Keep 15 days on local storage for fast queries. Use remote storage such as S3 via Thanos for longer retention. Most teams want 1–2 weeks of immediate visibility, with older data acceptable at reduced resolution.