Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Prometheus with Docker Compose: The Complete Setup Guide

Learn how to set up, configure, and run Prometheus with Docker Compose for efficient monitoring, alerting, and visualization.

Feb 24th, ‘25
Prometheus with Docker Compose: The Complete Guide
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

This guide shows you how to set up a comprehensive monitoring solution with Prometheus, Grafana, Node Exporter, cAdvisor, and Alertmanager using Docker Compose. Follow the step-by-step instructions to deploy a powerful monitoring stack for your containerized environment.

Why Use Docker Compose for Prometheus?

Docker Compose offers significant advantages for deploying monitoring tools:

  • Simplified deployment: Define your entire monitoring stack in a single configuration file
  • Consistent environments: Deploy the same setup across development, testing, and production
  • Easy updates: Upgrade components individually with minimal disruption
  • Infrastructure as code: Version control your monitoring configuration

These benefits make Docker Compose ideal for setting up and maintaining Prometheus-based monitoring systems.

💡
For a closer look at container-level metrics and what they tell you about performance, check out our guide on Docker container performance metrics.

Prerequisites for Installation

Before you begin, ensure you have:

  • Docker Engine (version 19.03.0+)
  • Docker Compose (version 1.27.0+)
  • 2+ CPU cores and 4GB+ RAM
  • At least 20GB free disk space
  • Basic understanding of Docker concepts

Step-by-Step Guide to Monitoring Prometheus with Docker

Step 1: Creating the Project Structure

Start by setting up a well-organized directory structure:

# Create project directory
mkdir prometheus-monitoring
cd prometheus-monitoring

# Create subdirectories for configurations
mkdir -p prometheus/rules alertmanager grafana/provisioning/{datasources,dashboards}

This directory structure helps keep things clean and manageable as your monitoring setup grows.

  • prometheus/rules/ is where you’ll store custom alerting and recording rules.
  • alertmanager/ will hold the Alertmanager config file, including routing and notification settings.
  • grafana/provisioning/ is split into datasources/ and dashboards/ to support automated Grafana setup—so your dashboards and data sources load automatically on startup.

Organizing your files this way makes it easier to version-control, update configs independently, and troubleshoot issues faster.

Step 2: Defining the Docker Compose Configuration

Create a docker-compose.yml file in the project root:

version: '3.8'

volumes:
  prometheus_data: {}
  grafana_data: {}

networks:
  monitoring:
    driver: bridge

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    networks:
      - monitoring
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring
    restart: unless-stopped

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    networks:
      - monitoring
    restart: unless-stopped

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    volumes:
      - ./alertmanager:/etc/alertmanager
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
    ports:
      - "9093:9093"
    networks:
      - monitoring
    restart: unless-stopped

This Docker Compose setup wires together all the key components for a solid monitoring stack:

  • Prometheus handles time-series data collection and storage. It pulls metrics from exporters and other endpoints based on your configuration. The --web.enable-lifecycle flag lets you trigger config reloads without restarting the container.
  • Node Exporter collects low-level system metrics from the host—like CPU usage, memory, and disk stats. We're mounting /proc and /sys read-only so Prometheus can scrape accurate host metrics without affecting the system.
  • cAdvisor focuses on container-level metrics, offering insights into resource usage per container—handy when you’re running multiple services on the same host.
  • Grafana sits on top of Prometheus and provides a user-friendly interface to visualize your data. The provisioning folders (datasources and dashboards) ensure everything is set up automatically on first run.
  • Alertmanager receives alerts from Prometheus and routes them to the right place—Slack, PagerDuty, email, etc. Mounting the config from your local folder keeps it easy to tweak as your alerting needs evolve.

The volumes ensure data persists across restarts, and the shared monitoring network lets all services communicate internally. This setup gives you full control and visibility over your Docker environment—with minimal manual steps.

💡
If you’re running into issues with service restarts, this guide on Docker Compose restart can help clarify what’s actually happening behind the scenes.

Step 3: Configuring Prometheus

Create a prometheus.yml file in the prometheus directory:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

# Load rules once and periodically evaluate them
rule_files:
  - "rules/*.yml"

# Scrape configurations
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Optimize Performance with Target-Specific Scrape Intervals

Most tutorials use the same scrape interval for all targets, but this is inefficient. Instead, customize intervals based on how frequently metrics change:

scrape_configs:
  # System metrics change frequently, scrape more often
  - job_name: 'node-exporter'
    scrape_interval: 10s
    static_configs:
      - targets: ['node-exporter:9100']

  # Container metrics are also volatile
  - job_name: 'cadvisor'
    scrape_interval: 10s
    static_configs:
      - targets: ['cadvisor:8080']

  # Prometheus itself changes slowly, scrape less frequently
  - job_name: 'prometheus'
    scrape_interval: 30s
    static_configs:
      - targets: ['localhost:9090']

This approach reduces unnecessary scrapes while ensuring critical metrics are captured with appropriate frequency.

Step 4: Setting Up Alert Rules

Create an alert rules file at prometheus/rules/node_alerts.yml:

groups:
- name: node_alerts
  rules:
  - alert: HighCPULoad
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU load (instance {{ $labels.instance }})"
      description: "CPU load is > 80%\n  VALUE = {{ $value }}%\n  LABELS: {{ $labels }}"
      
  - alert: HighMemoryLoad
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory load (instance {{ $labels.instance }})"
      description: "Memory load is > 80%\n  VALUE = {{ $value }}%\n  LABELS: {{ $labels }}"
      
  - alert: HighDiskUsage
    expr: (node_filesystem_size_bytes{fstype=~"ext4|xfs"} - node_filesystem_free_bytes{fstype=~"ext4|xfs"}) / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100 > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High disk usage (instance {{ $labels.instance }})"
      description: "Disk usage is > 85%\n  VALUE = {{ $value }}%\n  LABELS: {{ $labels }}"

Creating Predictive Alerts to Detect Unusual Metric Patterns

Most monitoring setups only alert on threshold violations. Here's a unique alert that detects abnormal patterns:

# Add this to your alert rules file
- alert: UnusualMemoryGrowth
  expr: deriv(node_memory_MemAvailable_bytes[30m]) < -10 * 1024 * 1024
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Unusual memory consumption rate (instance {{ $labels.instance }})"
    description: "Memory is being consumed at a rate of more than 10MB/min\n  VALUE = {{ $value | humanize }}B/s"

This alert detects unusual memory consumption patterns even before critical thresholds are reached, providing an earlier warning of potential issues.

💡
Last9 offers full monitoring support, including alerts and notifications. But no matter which tool you use, alerting still comes with familiar challenges—gaps in coverage, alert fatigue, and cleanup overhead. These aren’t problems with simple fixes.

Practical Container Alert Examples

Create a file at prometheus/rules/container_alerts.yml with practical container-specific alerts:

groups:
- name: container_alerts
  rules:
  - alert: ContainerRestarting
    expr: delta(container_start_time_seconds{name!=""}[15m]) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container restarting ({{ $labels.name }})"
      description: "Container {{ $labels.name }} has restarted in the last 15 minutes"
      
  - alert: ContainerHighMemoryUsage
    expr: (container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""} * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container high memory usage ({{ $labels.name }})"
      description: "Container {{ $labels.name }} memory usage is {{ $value }}%"
      
  - alert: ContainerCPUThrottling
    expr: rate(container_cpu_cfs_throttled_periods_total{name!=""}[5m]) / rate(container_cpu_cfs_periods_total{name!=""}[5m]) > 0.25
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container CPU throttling ({{ $labels.name }})"
      description: "Container {{ $labels.name }} is being throttled {{ $value | humanizePercentage }}"

These alerts catch real-world container issues that often go unnoticed:

  • Container restarts that might indicate application crashes or configuration issues
  • Memory pressure that can lead to OOM kills
  • CPU throttling that affects application performance

Step 5: Configuring Alertmanager

Create a basic Alertmanager configuration in alertmanager/config.yml:

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email-notifications'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'your-email@example.com'
    from: 'alertmanager@example.com'
    smarthost: smtp.example.com:587
    auth_username: 'your-username'
    auth_password: 'your-password'

Implementing Team-Based Alert Routing for Efficient Incident Response

Unlike basic setups, you can route alerts to different teams based on service and severity:

route:
  # Default receiver
  receiver: 'operations-team'
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  
  # Specific routing rules
  routes:
  - match:
      severity: critical
    receiver: 'pager-duty'
    repeat_interval: 1h
    continue: true
    
  - match_re:
      service: database|redis|elasticsearch
    receiver: 'database-team'
    
  - match_re:
      service: frontend|api
    receiver: 'application-team'

receivers:
- name: 'operations-team'
  email_configs:
  - to: 'ops@example.com'
    
- name: 'pager-duty'
  pagerduty_configs:
  - service_key: 'your-pagerduty-key'
    
- name: 'database-team'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/YOUR_KEY'
    channel: '#db-alerts'
    
- name: 'application-team'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/YOUR_KEY'
    channel: '#app-alerts'

This configuration ensures alerts reach the right teams with appropriate urgency.

Step 6: Setting Up Grafana Dashboards

Configure Grafana to connect to Prometheus automatically by creating grafana/provisioning/datasources/datasource.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

Set up dashboard provisioning in grafana/provisioning/dashboards/dashboards.yml:

apiVersion: 1

providers:
  - name: 'Default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards

Create a grafana/dashboards directory to store dashboard JSON files:

mkdir -p grafana/dashboards

Grafana supports automatic setup through provisioning.

  • In datasource.yml, we define Prometheus as the default data source using the internal Docker URL.
  • In dashboards.yml, we tell Grafana to load dashboards from a specific folder.
  • The grafana/dashboards directory is where you’ll store those dashboard JSON files.

With this setup, Grafana connects to Prometheus and loads dashboards automatically—no manual steps needed.

💡
If you want a deeper look at running Grafana in Docker containers, take a peek at our detailed guide on Grafana and Docker.

How to Create a Consolidated Dashboard for Complete System Visibility

Most setups require you to import separate dashboards for different components. Create a unified system dashboard at grafana/dashboards/system-overview.json that shows key metrics from all sources on a single screen:

{
  "title": "System Overview",
  "uid": "system-overview",
  "version": 1,
  "panels": [
    {
      "title": "CPU Usage",
      "type": "gauge",
      "gridPos": {"h": 8, "w": 6, "x": 0, "y": 0},
      "targets": [{"expr": "100 - (avg by(instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"}]
    },
    {
      "title": "Memory Usage",
      "type": "gauge",
      "gridPos": {"h": 8, "w": 6, "x": 6, "y": 0},
      "targets": [{"expr": "(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100"}]
    },
    {
      "title": "Disk Usage",
      "type": "gauge",
      "gridPos": {"h": 8, "w": 6, "x": 12, "y": 0},
      "targets": [{"expr": "(node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"}) / node_filesystem_size_bytes{mountpoint=\"/\"} * 100"}]
    },
    {
      "title": "Container CPU Usage",
      "type": "graph",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8},
      "targets": [{"expr": "sum by(name) (rate(container_cpu_usage_seconds_total{name!=\"\"}[5m])) * 100"}]
    },
    {
      "title": "Container Memory Usage",
      "type": "graph",
      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8},
      "targets": [{"expr": "sum by(name) (container_memory_usage_bytes{name!=\"\"})"}]
    }
  ]
}

Update your Docker Compose file to mount this directory:

grafana:
  # ... existing configuration ...
  volumes:
    - grafana_data:/var/lib/grafana
    - ./grafana/provisioning:/etc/grafana/provisioning
    - ./grafana/dashboards:/var/lib/grafana/dashboards  # Add this line

This approach automates dashboard creation and provides a unified view of system and container metrics.

Step 7: Performance Optimization for Production

For production deployments, optimize Prometheus for better performance:

# Update in docker-compose.yml
prometheus:
  # ... existing configuration ...
  command:
    - '--config.file=/etc/prometheus/prometheus.yml'
    - '--storage.tsdb.path=/prometheus'
    - '--storage.tsdb.retention.time=15d'      # Adjust retention period
    - '--storage.tsdb.wal-compression'         # Compress write-ahead log
    - '--web.enable-lifecycle'                 # Enable runtime reloading

Implement Pre-Computed Metrics to Accelerate Dashboard Performance

Most setups calculate metrics on the fly, causing dashboard slowdowns. Pre-compute frequently used metrics with recording rules in prometheus/rules/recording_rules.yml:

groups:
- name: recording_rules
  interval: 1m
  rules:
  - record: node:cpu_usage:avg5m
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
    
  - record: node:memory_usage:percent
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
    
  - record: container:cpu_usage:avg5m
    expr: sum by(name) (rate(container_cpu_usage_seconds_total{name!=""}[5m])) * 100

Then update your dashboards to use these pre-computed metrics:

# Original query
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Optimized query using recording rule
node:cpu_usage:avg5m

This significantly improves dashboard performance, especially with many concurrent users.

Best Practices for Production Deployments

Version Pinning for Stability

Always use specific versions instead of "latest" tags for production:

prometheus:
  image: prom/prometheus:v2.40.0  # Specific version, not 'latest'
  # ...

grafana:
  image: grafana/grafana:9.3.2  # Specific version
  # ...

This ensures consistent behavior and prevents unexpected changes during updates.

Security Considerations

  1. Use Strong Passwords: Never use default credentials in production:
grafana:
  # ...
  environment:
    - GF_SECURITY_ADMIN_USER=admin
    - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}  # Use env variable
    - GF_USERS_ALLOW_SIGN_UP=false
    - GF_AUTH_ANONYMOUS_ENABLED=false
  1. Network Restriction: Never expose monitoring services directly to the internet. Use a reverse proxy with authentication:
nginx:
  image: nginx:latest
  ports:
    - "443:443"
  volumes:
    - ./nginx/nginx.conf:/etc/nginx/conf.d/default.conf
    - ./nginx/certs:/etc/nginx/certs
    - ./nginx/.htpasswd:/etc/nginx/.htpasswd
  1. Use Non-Root Users: Run containers with non-root users when possible:
grafana:
  # ...
  user: "472"  # Grafana's built-in user

Step 8: Launching the Monitoring Stack

Start your monitoring stack:

docker-compose up -d

Verify all containers are running:

docker-compose ps

Access the monitoring interfaces:

  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3000 (login with admin/admin)
  • cAdvisor: http://localhost:8080
  • Alertmanager: http://localhost:9093

Verify Your Setup

In Prometheus, go to Status > Targets to confirm all targets show "UP" status.

In Grafana, navigate to your dashboards to see system metrics.

💡
If you’re weighing orchestration options, take a look at our comparison of Docker vs. Docker Swarm key differences to see which fits your setup best.

Step 9: Monitoring Docker Containers

cAdvisor automatically collects container metrics. View these metrics in Prometheus with queries like:

  • Container CPU: sum by (name) (rate(container_cpu_usage_seconds_total{name!=""}[1m]))
  • Container Memory: sum by (name) (container_memory_usage_bytes{name!=""})
  • Container Disk I/O: sum by (name) (rate(container_fs_reads_bytes_total{name!=""}[1m]))

Identifying Unhealthy Containers with Targeted Monitoring Queries

Add these queries to your monitoring to identify unhealthy containers:

  • Restarting Containers: sum by(name) (delta(container_start_time_seconds{name!=""}[15m])) > 0
  • High Container CPU: rate(container_cpu_usage_seconds_total{name!=""}[1m]) * 100 > 80
  • OOM Risk: container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""} * 100 > 80

Troubleshooting Prometheus and Grafana in Docker Compose

If you encounter problems:

No data in Grafana

  • Check Prometheus targets in Status > Targets
  • Verify Prometheus data source is working in Grafana
  • Test queries directly in Prometheus UI
  • Verify the time range in Grafana includes data collection period

Container not showing in metrics

  • Ensure cAdvisor has access to Docker socket
  • Check container is running with docker ps
  • Verify metrics in Prometheus with container_cpu_usage_seconds_total

Alerts not firing

  • Check alert rules in Prometheus UI
  • Verify Alertmanager configuration
  • Test alert notifications manually
  • Check that conditions persist long enough to trigger alerts (the for duration)

Performance issues

  • Reduce scrape frequency for less important targets
  • Use recording rules for frequently queried expressions
  • Adjust the retention period based on available disk space
  • Consider using remote storage for longer retention

Next Steps

You’ve now got a solid monitoring setup for your Docker environment. From here, it’s easy to extend—add service-specific exporters, refine dashboards, and connect alerting tools your team already uses.

If you’re starting to feel the limits of managing this stack yourself—or just want something that’s easier to scale and maintain—this is where Last9 can help.

We work with teams at Probo, CleverTap, Replit, and others to handle high-cardinality observability at scale. With native support for OpenTelemetry and Prometheus, Last9 brings metrics, logs, and traces into one place—optimized for performance, cost, and real-time debugging. We’ve even monitored 11 of the 20 largest live-streaming events in history.

Let us handle the complexity, so you can focus on building. Book sometime with us today!

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.