You've got Prometheus
running in one cluster — maybe a dev environment, a single EKS
cluster, or a proof-of-concept setup. The configuration is straightforward: node_exporter
on a few EC2
instances, some service discovery for pods, and a single Prometheus server scraping everything. Storage is local, retention is 15 days
, and you can keep all the default recording rules without worrying about costs.
As you expand beyond that first cluster, new complexity appears. Now you're looking at multi-region deployments, dozens of services across different teams, and workloads that generate vastly different metric volumes. Some services emit thousands of metrics per instance; others barely register above the baseline. You need to route metrics to multiple backends — some for long-term storage, others for real-time alerting.
At this scale, operational realities start showing up:
- Prometheus servers that run out of memory during high-cardinality scrapes
- Retention policies that work in staging but create storage pressure in production
remote_write
configurations that work fine until AWS network costs become a line item- Federation setups that drop metrics during region failovers
Solving these one-off works in the moment, but patterns matter more than individual fixes. Building reliable Prometheus infrastructure on AWS means choosing deployment models that handle variable workloads, maintain visibility across regions, and scale without breaking the bank.
This blog covers the production deployment patterns that work across AWS environments, configuration practices that prevent drift, and strategies for managing costs while keeping full metric coverage.
The Reality of Prometheus Resource Usage
Before rolling out Prometheus on AWS, it helps to know what resources it actually consumes. Getting this right up front makes it easier to size instances, set retention policies, and plan costs with realistic numbers instead of guesses.
AWS’s own benchmarks for Amazon Managed Service for Prometheus (AMP
) highlight some common patterns:
- Memory. Expect roughly
2GB RAM
for every million active series, plus another1–2GB
for query execution. If your workloads are spiky or highly dimensional, budget extra headroom. - Storage. Each sample takes about
1–2 bytes
on disk once compressed. Daily churn adds up quickly, so retention settings directly affect how much storage you’ll need. - CPU. Ingestion volume doesn’t usually stress CPUs; query complexity does. A simple range query may barely touch a core, while an expensive
PromQL
join can max one out. - Network. A
remote_write
setup generates roughly100–200MB/hr
per100k
active series, depending on scrape intervals and label cardinality. Cross-AZ
traffic costs can add up if you’re not careful.
With these numbers, you can:
- Choose the right
EC2
instance types orAMP
workspace sizes for your expected series counts - Set retention periods that strike a balance between visibility and storage cost
- Plan
remote_write
bandwidth usage and anticipate cross-AZ
transfer charges - Build federation hierarchies that spread query and storage load across regions effectively
Prometheus Deployment Patterns on AWS
Prometheus works well in small setups, but production environments on AWS quickly demand more thoughtful deployment choices. Most teams end up using one of three patterns, each with strengths and trade-offs.
1. Single Prometheus Server
The simplest setup is running one Prometheus server that scrapes all targets, stores metrics locally, and handles queries and alerts. You’ll see this deployed on EC2
(sometimes behind an ALB) or as Amazon Managed Service for Prometheus (AMP
).
Why teams start here:
- A single configuration to manage
- No coordination across servers
- Easy service discovery through
EC2
,EKS
, orECS
APIs
This pattern fits best when:
- You’re in a single region with predictable workloads
- Active series counts stay below ~1–2 million
- Teams want low operational overhead
- Multi-tenant isolation isn’t a requirement
Example — EKS scrape configuration:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- job_name: 'ec2-nodes'
ec2_sd_configs:
- region: us-west-2
port: 9100
When it starts breaking down:
- Queries slow down during scrape bursts
- Memory usage sits above 70% regularly
remote_write
queues clog during traffic spikes- Federation queries timeout or return incomplete data
Once OOM kills or scrape delays become routine, vertical scaling only stretches so far—you’ll need to distribute the load.
2. Federated Prometheus
Federation introduces hierarchy: regional or cluster-level Prometheus servers scrape local workloads, while a global server pulls aggregated metrics from them using the /federate
endpoint.
Why it works well:
- Regional metrics stay close to workloads
- Cross-region latency and transfer costs stay manageable
- Teams can run their own Prometheus instances, but still share dashboards
- Only aggregate or selected metrics are shipped upstream
Example — global Prometheus scraping regional servers:
scrape_configs:
- job_name: 'federate-us-west-2'
scrape_interval: 30s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job=~"kubernetes-.*"}'
- '{__name__=~"up|prometheus_.*"}'
static_configs:
- targets: ['prom-us-west-2.internal:9090']
- job_name: 'federate-eu-west-1'
scrape_interval: 30s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job=~"kubernetes-.*"}'
- '{__name__=~"up|prometheus_.*"}'
static_configs:
- targets: ['prom-eu-west-1.internal:9090']
Things to keep in mind:
- Regional servers need retention tuned for federation intervals
- Network issues can create blind spots in global dashboards
- Labels must be handled carefully to avoid collisions (
honor_labels
is key) - Federation queries can become costly as series counts grow
3. Remote Storage with Local Prometheus
Here, Prometheus servers live close to workloads for scraping and short-term retention, but metrics are streamed via remote_write
into centralized storage (Amazon AMP, Cortex, or commercial backends).
Why this model is popular:
- Local Prometheus handles fast queries and alerts
- Centralized storage keeps metrics for the long haul
- Failures are isolated — local queries still work if remote storage stalls
- You can switch backends without changing scrape configs
Example — remote_write to AMP:
global:
scrape_interval: 15s
remote_write:
- url: https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-abc123/api/v1/remote_write
sigv4:
region: us-west-2
queue_config:
capacity: 10000
max_samples_per_send: 5000
batch_send_deadline: 5s
scrape_configs:
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
Operational tuning tips:
- Set the queue capacity to cover at least 2–3 minutes of data if the backend slows down
- Adjust batch size for memory vs. network trade-offs
- Watch WAL (write-ahead log) usage, which grows with queue depth and retention settings.
Amazon Managed Service for Prometheus at Scale
Amazon Managed Service for Prometheus (AMP
) takes care of the heavy lifting: server infrastructure, automatic scaling, and cross–Availability Zone replication. You send metrics with remote_write
and query them using the same PromQL
syntax and Grafana integrations you already use.
Multi-Region Setup for AMP
In global environments, you’ll often create workspaces in multiple regions and enable cross-region access for queries.
# Create workspaces in primary regions
aws amp create-workspace --alias production-us-west-2 --region us-west-2
aws amp create-workspace --alias production-eu-west-1 --region eu-west-1
# Allow cross-region queries with a resource policy
aws amp put-resource-policy \
--resource-arn arn:aws:aps:us-west-2:123456789012:workspace/ws-abc123 \
--policy-text file://amp-cross-region-policy.json
This gives each region its own Prometheus workspace, while still allowing global queries when needed.
Cost Control in AMP
AMP pricing is based on two dimensions: the number of metric samples ingested and the compute time spent on queries. To keep costs in check, a few practices help.
Filter Metrics at the Source
Instead of pushing every raw time series, use recording rules in local Prometheus to pre-aggregate and send only the reduced dataset.
# Recording rules to reduce remote_write volume
groups:
- name: cost_optimization
interval: 30s
rules:
- record: instance:cpu_utilization:rate5m
expr: 1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))
This configuration aggregates CPU and HTTP request metrics before sending them upstream, cutting ingestion volume significantly.
Keep Metrics Regional
Store metrics in the same region where they are generated to avoid cross-region transfer charges and keep query performance consistent.
Align Retention with Usage
AMP has fixed retention, but you control what gets ingested. Dropping unnecessary series at ingestion is cheaper and cleaner than storing everything and filtering later.
With these patterns, AMP supports large-scale, multi-region monitoring while keeping costs predictable and queries responsive.
Prometheus Deployment Models on EKS
Running Prometheus inside EKS
gives you control over resources, storage, and configuration. You decide the instance types, storage classes, and retention periods. The trade-off is that you take on operational overhead—high availability, scaling, and upgrades are now your responsibility.
High-Availability Prometheus on EKS
A common approach is deploying Prometheus as a StatefulSet
with persistent volumes and anti-affinity rules so replicas don’t end up on the same node. This ensures metrics survive pod restarts and that workloads are resilient to node failures.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
serviceName: prometheus
replicas: 2
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.45.0
args:
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
- '--web.enable-admin-api'
volumeMounts:
- name: prometheus-storage
mountPath: /prometheus
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: prometheus
topologyKey: kubernetes.io/hostname
volumeClaimTemplates:
- metadata:
name: prometheus-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 200Gi
Here, Prometheus runs with two replicas, each with 200Gi
of gp3
storage. The anti-affinity rules ensure pods are scheduled on separate nodes, reducing the chance of losing both replicas in one failure event.
Service Discovery for Mixed Workloads
Most EKS clusters host a mix of Kubernetes-native workloads and external services running on EC2. Prometheus can scrape both by combining Kubernetes and EC2 service discovery in the same configuration.
scrape_configs:
# Kubernetes service discovery for pods
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# EC2 service discovery for external services
- job_name: 'ec2-external-services'
ec2_sd_configs:
- region: us-west-2
port: 8080
filters:
- name: tag:PrometheusMonitoring
values: [enabled]
- name: instance-state-name
values: [running]
relabel_configs:
- source_labels: [__meta_ec2_instance_id]
target_label: instance_id
- source_labels: [__meta_ec2_tag_Environment]
target_label: environment
In this setup:
- Kubernetes services are automatically discovered through pod annotations.
- EC2 instances are filtered by tags (
PrometheusMonitoring=enabled
) and running state, with useful metadata likeinstance_id
andenvironment
added as labels.
This mixed service discovery pattern is common in hybrid environments where some workloads run inside Kubernetes, while others remain on EC2.
With Prometheus on EKS, you get flexibility to shape deployments for your environment—whether it’s HA setups with persistent storage or hybrid discovery across Kubernetes and EC2. The key trade-off: more power, but more responsibility to manage upgrades, scaling, and failover.
Storage and Retention Strategies for Prometheus on AWS
Prometheus performance and cost depend on how storage is set up. The goal is to keep recent data on fast disks for quick queries, while moving older data into cheaper storage for long-term retention. On AWS, two approaches are most common: tuning EBS volumes for short-term workloads and using S3 for historical metrics.
EBS Volumes Optimized for Short-Term Prometheus Workloads
Prometheus generates a lot of random I/O during queries and compaction. gp3
volumes are a strong choice because IOPS and throughput can be tuned independently of volume size, letting you balance speed and cost.
Example — StorageClass for Prometheus with tuned EBS gp3 volumes:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-optimized
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "4000" # Higher IOPS for query performance
throughput: "250" # Sufficient for compaction operations
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
This setup provisions gp3
volumes with 4000 IOPS and 250 MB/s throughput, ensuring queries stay responsive while compaction jobs don’t bottleneck.
Tiered Storage Model with S3 for Historical Metrics
For data that needs to live beyond Prometheus’ local retention window, systems like Thanos or Cortex can push blocks into S3. A common pattern is to keep 15–30 days of recent metrics on EBS for fast queries, while shifting older blocks to S3 for durability at lower cost.
Example — Thanos sidecar configuration for S3 storage:
apiVersion: v1
kind: ConfigMap
metadata:
name: thanos-storage-config
data:
storage.yaml: |
type: s3
config:
bucket: prometheus-metrics-long-term
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
encrypt_sse: true
This setup ensures engineers get quick access to recent metrics during debugging while still having months or years of data archived in S3 for capacity planning and SLA reviews.
Combining EBS for short-term queries with S3 for long-term durability gives Prometheus deployments both speed and cost efficiency.
Reduce AWS Costs Without Losing Visibility
Once the Prometheus infrastructure on AWS is stable, the next challenge is managing costs. Bills can rise quickly as metric volumes increase, storage grows, and cross-region transfers add up. The good news is that there are patterns to cut costs without sacrificing the visibility needed for operations.
Remote Write Optimization for Lower Network Costs
remote_write
traffic can become expensive in high-volume setups. Adjusting batching, compression, and retry behavior reduces overhead and keeps data transfer costs in check.
remote_write:
- url: https://your-backend.com/api/v1/write
# Increase batch size to reduce network overhead
queue_config:
max_samples_per_send: 10000
batch_send_deadline: 10s
capacity: 20000
# Use compression
compression: snappy
# Reduce retry backoff to clear queues faster
retry_on_failure:
initial_interval: 1s
max_interval: 5s
Larger batch sizes and compression make traffic more efficient, while shorter retry intervals prevent queues from building up during spikes.
Metric Filtering to Control Volume
Not every metric deserves the same retention window or resolution. Relabeling rules let you drop noisy series or sample high-cardinality metrics more aggressively.
# Drop noisy metrics that don't provide value
metric_relabel_configs:
- source_labels: [__name__]
regex: 'container_fs_.*|container_network_.*_errors_total'
action: drop
# Sample high-cardinality metrics more aggressively
- source_labels: [__name__]
regex: 'http_request_duration_seconds_bucket'
target_label: __tmp_sample_rate
replacement: '0.1'
This configuration discards low-value container metrics and reduces the resolution of expensive histogram series, cutting ingestion volume while preserving essential visibility.
Regional Data Locality to Avoid Transfer Charges
Cross-region data transfer is one of the most common hidden costs in AWS monitoring. Keeping Prometheus writes within the same region as the workload eliminates those charges and improves query performance.
# Regional Prometheus configuration
global:
external_labels:
region: us-west-2
cluster: production-cluster
remote_write:
- url: https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-regional/api/v1/remote_write
sigv4:
region: us-west-2 # Same region to avoid transfer costs
Adding region and cluster labels ensures queries can still correlate across environments, while raw data stays within its originating region.
Troubleshoot Prometheus Problems on AWS
A scrape failure can look like a storage problem. High query latency might actually be a network hiccup. The steady way to debug is to walk the data path in order—discovery → scrape → storage → query—and confirm each layer before touching the next.
Start with Prometheus Health
If ingestion is shaky, everything downstream is noise.
- Process health: pod/instance status, recent restarts, CPU/RAM.
- Config status: last successful reload, syntax errors in logs.
- Target inventory: discovered vs. expected targets per job.
# Quick health checks
curl http://localhost:9090/-/healthy
curl http://localhost:9090/-/ready
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'
Healthy/ready endpoints confirm the server is up; target count tells you whether discovery matches expectations.
Confirm Target Discovery
Discovery gaps often masquerade as scrape errors.
- Compare
kubernetes_sd
andec2_sd
results with what you expect. - Verify relabel rules aren’t dropping targets you need.
# Count discovered targets per job
curl -s http://localhost:9090/api/v1/targets \
| jq '.data.activeTargets | group_by(.labels.job) | map({job: .[0].labels.job, count: length})'
This surfaces jobs with zero or surprisingly low target counts.
Verify Scrape Health
Once discovery looks right, check that scrapes succeed and stay within budgets.
# Targets down (0 = down)
up == 0
# P95 scrape duration per job (watch for spikes)
histogram_quantile(0.95, rate(prometheus_target_scrapes_duration_seconds_bucket[5m])) by (job)
# Dropped samples per job (relabel rules or limits)
sum by (job) (rate(scrape_samples_dropped[5m]))
A run of up == 0
across many target points to auth, DNS, or the network. Rising P95 scrape time hints at latency or overloaded targets. Dropped samples signal relabel filters or sample limits, cutting data.
Check Storage Layer
High memory and slow queries often trace back to WAL pressure or compaction issues.
# WAL growth (backlog from remote_write or heavy ingest)
prometheus_tsdb_wal_size_bytes
# Compaction failures (should stay at zero)
increase(prometheus_tsdb_compactions_failed_total[1h]) > 0
# Remote write backlog
prometheus_remote_storage_pending_samples
Continuous WAL growth with pending samples usually means disk I/O or the remote endpoint can’t keep up. Compaction failures drag query speed; check EBS gp3 IOPS/throughput and disk latency.
Assess Query Performance
If storage looks fine but dashboards crawl, focus on query load.
# P95 query latency for instant queries
histogram_quantile(0.95, rate(prometheus_http_request_duration_seconds_bucket{handler="/api/v1/query"}[5m]))
# Concurrency limits
prometheus_engine_queries_concurrent_max
Heavy joins, wide regexes, and unbounded range vectors chew CPU. Tighten dashboard panels and alert rules; prefer pre-aggregated recording rules for hot paths.
Account for AWS Network Factors
Issues outside Prometheus often explain “random” failures.
- Security groups: confirm ports for targets and
remote_write
. - Cross-AZ latency: watch scrape duration for cross-AZ targets.
- ELB quirks: health checks and idle timeouts can clash with scrape intervals.
- EBS throttles: check CloudWatch for volume IOPS/throughput caps during compaction.
Next Steps
Pick the deployment model that fits your current scale: single Prometheus for smaller setups, federation for multi-region clusters, or AMP if you want AWS to manage infrastructure. Focus on stable service discovery and storage first, then expand into AWS services like ELB, RDS, and Lambda as visibility needs grow.
Where Last9 Adds Value
Pairing Prometheus with Last9 ensures you keep visibility without the usual cost or performance trade-offs:
- Sensitive-data filtering and remapping — Control what labels and values are ingested, so you avoid sending unnecessary or sensitive metadata upstream.
- Streaming aggregation — Aggregate or compress data before storage to reduce cost while preserving key signals.
- Cold storage with rehydration — Store months or years of metrics in cost-efficient S3-style storage and pull them back when long-range queries are needed.
- High-cardinality support — Handle workloads with millions of unique labels while keeping queries responsive.
Integrating Last9 into your AWS Prometheus pipeline gives you the flexibility to scale telemetry volume, retention, and analysis without running into bottlenecks or unpredictable bills.
Start for free today or talk to your team to discuss how it fits into your stack!
FAQs
Q: Should I use Amazon Managed Service for Prometheus or self-hosted?
A: Use AMP if you prefer managed infrastructure and can work within its fixed retention and configuration limits. Choose self-hosted if you need control over storage tuning, custom recording rules, or federation setups.
Q: How do I run Prometheus with high availability across AWS availability zones?
A: Deploy multiple Prometheus instances with identical configurations, add external labels to distinguish them, and route queries through a load balancer that targets healthy nodes.
Q: What’s the best way to monitor EC2 instances and EKS workloads together?
A: Combine service discovery methods: ec2_sd
for EC2 instances and kubernetes_sd
for pods. Use consistent labels (e.g., environment
, region
) to make correlation easier.
Q: How much does remote_write cost on AWS?
A: Same-region traffic is typically under $0.09/GB
. Cross-region transfers range from $0.09–0.15/GB
depending on the source and destination regions.
Q: Can I migrate from self-hosted Prometheus to AMP without losing data?
A: AMP doesn’t support importing existing data. The common approach is to run both systems in parallel and use remote_write
to AMP while keeping local storage until the cutover is complete.
Q: What retention period works best for Prometheus on EKS?
A: Keep 15 days on local storage for fast queries. Use remote storage such as S3 via Thanos for longer retention. Most teams want 1–2 weeks of immediate visibility, with older data acceptable at reduced resolution.