Skip to content
Last9
Book demo

Docker

Monitor Docker containers comprehensively with OpenTelemetry Collector for metrics, logs, and container lifecycle events

Monitor Docker containers and containerized applications using Last9’s OpenTelemetry endpoint. This integration collects comprehensive container metrics, logs, and lifecycle events from your Docker infrastructure.

Prerequisites

  • Docker and Docker Compose installed
  • Containers logging to stdout/stderr
  • Docker daemon with stats API access
  • Last9 account with OTLP endpoint configured

How It Works

  • OpenTelemetry Collector: Collects metrics from Docker containers using the Docker Stats API
  • Logspout: Captures logs from all Docker containers and forwards them to the OpenTelemetry Collector
  • Data Processing: The collector processes, batches, and enriches the telemetry data
  • Export: The processed data is sent to Last9 for visualization and analysis

Configuration

  1. Create Docker Compose Configuration

    Create last9-docker-monitoring.yaml for the monitoring stack:

    version: "3.8"
    services:
    otel-collector:
    image: otel/opentelemetry-collector-contrib:0.118.0
    container_name: last9-otel-collector
    command:
    [
    "--config=/etc/otel-collector-config.yaml",
    "--feature-gates=transform.flatten.logs",
    ]
    volumes:
    - ./otel-config.yaml:/etc/otel-collector-config.yaml:ro
    - /var/run/docker.sock:/var/run/docker.sock:ro
    # Host filesystem mounts for the hostmetrics receiver.
    - /proc:/hostfs/proc:ro
    - /sys:/hostfs/sys:ro
    - /etc/os-release:/hostfs/etc/os-release:ro
    - /:/hostfs/root:ro,rslave
    ports:
    - "4317:4317" # OTLP gRPC receiver
    - "4318:4318" # OTLP HTTP receiver
    - "8888:8888" # Prometheus metrics
    - "8889:8889" # Health check endpoint
    restart: unless-stopped
    user: "0" # root user to access docker stats and /proc
    pid: host # so the process scraper sees host PIDs
    environment:
    - LOGSPOUT=ignore
    - HOST_PROC=/hostfs/proc
    - HOST_SYS=/hostfs/sys
    - HOST_ETC=/hostfs/etc
    networks:
    - last9_monitoring
    healthcheck:
    test:
    [
    "CMD",
    "wget",
    "--no-verbose",
    "--tries=1",
    "--spider",
    "http://localhost:8889/",
    ]
    interval: 30s
    timeout: 5s
    retries: 3
    labels:
    - "monitoring.last9=otel-collector"
    - "logging.disable=false"
    logspout:
    image: "gliderlabs/logspout:v3.2.14"
    container_name: last9-logspout
    volumes:
    - /etc/hostname:/etc/host_hostname:ro
    - /var/run/docker.sock:/var/run/docker.sock:ro
    command: syslog+tcp://otel-collector:2255
    depends_on:
    otel-collector:
    condition: service_healthy
    restart: unless-stopped
    environment:
    - LOGSPOUT_MULTICAST=true
    - BACKLOG=false
    networks:
    - last9_monitoring
    labels:
    - "monitoring.last9=logspout"
    # Optional: Cadvisor for additional container metrics
    cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0
    container_name: last9-cadvisor
    privileged: true
    devices:
    - /dev/kmsg:/dev/kmsg
    volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:rw
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro
    - /cgroup:/cgroup:ro
    ports:
    - "8080:8080"
    restart: unless-stopped
    networks:
    - last9_monitoring
    labels:
    - "monitoring.last9=cadvisor"
    networks:
    last9_monitoring:
    name: last9_monitoring
    driver: bridge
  2. Create OpenTelemetry Collector Configuration

    Create otel-config.yaml with comprehensive Docker monitoring:

    receivers:
    # Docker container metrics
    docker_stats:
    collection_interval: 30s
    timeout: 20s
    api_version: 1.40
    metrics:
    # CPU metrics
    container.cpu.usage.total:
    enabled: true
    container.cpu.usage.kernelmode:
    enabled: true
    container.cpu.usage.usermode:
    enabled: true
    container.cpu.throttling_data.periods:
    enabled: true
    container.cpu.throttling_data.throttled_periods:
    enabled: true
    container.cpu.throttling_data.throttled_time:
    enabled: true
    container.cpu.utilization:
    enabled: true
    container.cpu.percent:
    enabled: true
    # Memory metrics
    container.memory.usage.limit:
    enabled: true
    container.memory.usage.total:
    enabled: true
    container.memory.usage.max:
    enabled: true
    container.memory.percent:
    enabled: true
    container.memory.cache:
    enabled: true
    container.memory.rss:
    enabled: true
    container.memory.swap:
    enabled: true
    # Network metrics
    container.network.io.usage.rx_bytes:
    enabled: true
    container.network.io.usage.tx_bytes:
    enabled: true
    container.network.io.usage.rx_packets:
    enabled: true
    container.network.io.usage.tx_packets:
    enabled: true
    container.network.io.usage.rx_dropped:
    enabled: true
    container.network.io.usage.tx_dropped:
    enabled: true
    container.network.io.usage.rx_errors:
    enabled: true
    container.network.io.usage.tx_errors:
    enabled: true
    # Block I/O metrics
    container.blockio.io_service_bytes_recursive:
    enabled: true
    container.blockio.io_serviced_recursive:
    enabled: true
    # Process metrics
    container.pids.count:
    enabled: true
    container.pids.limit:
    enabled: true
    # Container logs via syslog from logspout
    tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
    - type: syslog_parser
    protocol: rfc5424
    # Host (VM / bare-metal) metrics. root_path: /hostfs + HOST_* env vars
    # point each scraper at the host filesystem rather than the collector
    # container's own /proc.
    hostmetrics:
    collection_interval: 60s
    root_path: /hostfs
    scrapers:
    cpu:
    metrics:
    system.cpu.logical.count:
    enabled: true
    system.cpu.utilization:
    enabled: true
    memory:
    metrics:
    system.memory.utilization:
    enabled: true
    system.memory.limit:
    enabled: true
    load:
    disk:
    filesystem:
    metrics:
    system.filesystem.utilization:
    enabled: true
    exclude_mount_points:
    mount_points:
    - /hostfs/sys/*
    - /hostfs/proc/*
    - /var/lib/docker/*
    - /hostfs/root/var/lib/docker/*
    match_type: regexp
    network:
    paging:
    processes:
    # mute_process_*_error: true silences routine "permission denied"
    # noise for kernel threads and short-lived processes whose
    # /proc/<pid>/{exe,io,cmdline} entries can't be read.
    process:
    mute_process_user_error: true
    mute_process_io_error: true
    mute_process_exe_error: true
    metrics:
    process.cpu.utilization:
    enabled: true
    process.memory.utilization:
    enabled: true
    # Optional: Prometheus scraping for cadvisor
    prometheus/cadvisor:
    config:
    scrape_configs:
    - job_name: "cadvisor"
    static_configs:
    - targets: ["cadvisor:8080"]
    scrape_interval: 30s
    processors:
    # Transform docker logs
    transform/docker_logs:
    error_mode: ignore
    flatten_data: true
    log_statements:
    - context: log
    statements:
    - set(log.body, log.attributes["message"])
    - delete_key(log.attributes, "message")
    - set(resource.attributes["service.name"], log.attributes["appname"])
    - set(log.attributes["container.name"], log.attributes["appname"])
    - set(log.attributes["log.source"], "docker")
    # Add resource attributes for containers
    resource/docker:
    attributes:
    - key: monitoring.tool
    value: last9-otel-docker
    action: insert
    - key: deployment.environment
    from_attribute: docker.container.label.environment
    action: insert
    - key: service.version
    from_attribute: docker.container.label.version
    action: insert
    # Batch processing for performance
    batch:
    send_batch_size: 10000
    send_batch_max_size: 10000
    timeout: 10s
    # Memory limiter to prevent OOM
    memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    # Resource detection
    resourcedetection/system:
    detectors: [env, system, docker]
    system:
    hostname_sources: ["os"]
    timeout: 5s
    override: false
    # Cloud-specific VM detectors. Each detector is a no-op on hosts that
    # aren't on the matching cloud, so it is safe to list all three — only
    # the one that matches the runtime environment will populate cloud.*
    # labels. ECS / EKS / AKS detectors are omitted because they require
    # Kubernetes service env vars or ECS task metadata.
    resourcedetection/cloud:
    detectors: [ec2, gcp, azure]
    timeout: 5s
    # Filter out sensitive containers
    filter/exclude_monitoring:
    metrics:
    exclude:
    match_type: regexp
    resource_attributes:
    - key: container.name
    value: "(last9-otel-collector|last9-logspout|last9-cadvisor)"
    # Promote container resource attributes to datapoint attributes so they
    # are guaranteed to land as filterable labels on metrics in Last9. The
    # docker_stats receiver sets container.name / container.image.name as
    # resource attributes; this transform copies them onto each datapoint.
    transform/container_labels:
    error_mode: ignore
    metric_statements:
    - context: datapoint
    statements:
    - set(datapoint.attributes["container.name"], resource.attributes["container.name"])
    - set(datapoint.attributes["container.image.name"], resource.attributes["container.image.name"])
    - set(datapoint.attributes["container.image.tag"], resource.attributes["container.image.tag"])
    - set(datapoint.attributes["container.id"], resource.attributes["container.id"])
    # Promote host + cloud resource attributes onto each host metric
    # datapoint so they appear as filterable labels in Last9.
    transform/hostmetrics:
    error_mode: ignore
    metric_statements:
    - context: datapoint
    statements:
    - set(datapoint.attributes["host.name"], resource.attributes["host.name"])
    - set(datapoint.attributes["host.type"], resource.attributes["host.type"])
    - set(datapoint.attributes["host.image.id"], resource.attributes["host.image.id"])
    - set(datapoint.attributes["cloud.provider"], resource.attributes["cloud.provider"])
    - set(datapoint.attributes["cloud.platform"], resource.attributes["cloud.platform"])
    - set(datapoint.attributes["cloud.region"], resource.attributes["cloud.region"])
    - set(datapoint.attributes["cloud.account.id"], resource.attributes["cloud.account.id"])
    - set(datapoint.attributes["cloud.availability_zone"], resource.attributes["cloud.availability_zone"])
    exporters:
    # Debug exporter for troubleshooting
    debug:
    verbosity: basic
    sampling_initial: 2
    sampling_thereafter: 500
    # Last9 OTLP exporter
    otlp/last9:
    endpoint: $last9_otlp_endpoint
    headers:
    Authorization: $last9_otlp_auth_header
    compression: gzip
    retry_on_failure:
    enabled: true
    initial_interval: 1s
    max_interval: 30s
    max_elapsed_time: 300s
    sending_queue:
    enabled: true
    num_consumers: 10
    queue_size: 5000
    extensions:
    # Health check extension
    health_check:
    endpoint: "0.0.0.0:8889"
    # Performance profiling
    pprof:
    endpoint: "0.0.0.0:1777"
    # Memory ballast for GC optimization
    memory_ballast:
    size_mib: 165
    service:
    extensions: [health_check, pprof, memory_ballast]
    pipelines:
    # Metrics pipeline
    metrics:
    receivers: [docker_stats, hostmetrics, prometheus/cadvisor]
    processors:
    [
    memory_limiter,
    resourcedetection/system,
    resourcedetection/cloud,
    resource/docker,
    filter/exclude_monitoring,
    transform/container_labels,
    transform/hostmetrics,
    batch,
    ]
    exporters: [otlp/last9]
    # Logs pipeline
    logs:
    receivers: [tcplog/docker]
    processors:
    [
    memory_limiter,
    transform/docker_logs,
    resourcedetection/system,
    resourcedetection/cloud,
    resource/docker,
    batch,
    ]
    exporters: [otlp/last9]
  3. Start the Monitoring Stack

    # Start the monitoring services
    docker compose -f last9-docker-monitoring.yaml up -d
    # Check the status
    docker compose -f last9-docker-monitoring.yaml ps
    # View logs
    docker compose -f last9-docker-monitoring.yaml logs -f otel-collector

Application Integration

Environment Variables for Containers

Set these environment variables in your application containers for better monitoring:

# docker-compose.yml for your applications
version: "3.8"
services:
your-app:
image: your-app:latest
environment:
# OpenTelemetry configuration for direct instrumentation
- OTEL_SERVICE_NAME=your-app-service
- OTEL_EXPORTER_OTLP_ENDPOINT=http://last9-otel-collector:4317
- OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production
labels:
# Labels for container identification
- "service.name=your-app-service"
- "service.version=1.0.0"
- "environment=production"
- "component=api"
networks:
- your_app_network
- last9_monitoring # Connect to monitoring network
depends_on:
- database
database:
image: postgres:15
environment:
- POSTGRES_DB=appdb
- POSTGRES_USER=appuser
- POSTGRES_PASSWORD=secret
labels:
- "service.name=postgres-db"
- "component=database"
networks:
- your_app_network
- last9_monitoring
networks:
your_app_network:
name: your_app_network
last9_monitoring:
external: true
name: last9_monitoring

Custom Log Formatting

For applications producing structured logs:

# Dockerfile with structured logging
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
# Configure structured logging
ENV NODE_ENV=production
ENV LOG_FORMAT=json
ENV LOG_LEVEL=info
# OpenTelemetry configuration
ENV OTEL_SERVICE_NAME=nodejs-app
ENV OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"
CMD ["node", "server.js"]

Health Check Integration

# Add health checks to your services
services:
your-app:
image: your-app:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
labels:
- "health.endpoint=/health"
- "monitoring.enabled=true"

Advanced Configuration

Custom Metrics Collection

# Add custom receivers to otel-config.yaml
receivers:
# JMX metrics for Java applications
jmx:
jar_path: /opt/opentelemetry-jmx-metrics.jar
endpoint: service:jmx:rmi:///jndi/rmi://java-app:9999/jmxrmi
target_system: java
collection_interval: 60s
# StatsD metrics from applications
statsd:
endpoint: "0.0.0.0:8125"
aggregation_interval: 60s
# HTTP endpoint for custom metrics
httpcheck:
targets:
- endpoint: http://your-app:3000/metrics
method: GET
collection_interval: 30s

Log Enrichment

# Enhanced log processing
processors:
transform/enrich_logs:
log_statements:
- context: log
statements:
# Add container metadata
- set(attributes["container.id"], resource.attributes["container.id"])
- set(attributes["container.image"], resource.attributes["container.image.name"])
# Parse JSON logs
- merge_maps(cache, ParseJSON(body), "insert") where body != nil
- set(body, cache["message"]) where cache["message"] != nil
- set(attributes["log.level"], cache["level"]) where cache["level"] != nil
- set(attributes["log.timestamp"], cache["timestamp"]) where cache["timestamp"] != nil

Network Monitoring

# Add network inspection
receivers:
tcplog/network:
listen_address: "0.0.0.0:2256"
operators:
- type: json_parser
processors:
transform/network:
log_statements:
- context: log
statements:
- set(attributes["network.protocol"], "tcp")
- set(attributes["monitoring.type"], "network")

Kubernetes Integration

For Kubernetes environments, use a DaemonSet approach:

# kubernetes-docker-monitoring.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: last9-otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
name: last9-otel-collector
template:
metadata:
labels:
name: last9-otel-collector
spec:
serviceAccount: otel-collector
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.118.0
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: LAST9_OTLP_ENDPOINT
valueFrom:
secretKeyRef:
name: last9-credentials
key: endpoint
- name: LAST9_OTLP_AUTH_HEADER
valueFrom:
secretKeyRef:
name: last9-credentials
key: auth-header
volumeMounts:
- name: config
mountPath: /etc/otel-collector-config.yaml
subPath: config.yaml
- name: docker-socket
mountPath: /var/run/docker.sock
readOnly: true
volumes:
- name: config
configMap:
name: otel-collector-config
- name: docker-socket
hostPath:
path: /var/run/docker.sock

Monitoring Capabilities

This integration provides:

  • Container Metrics: CPU, memory, network, disk I/O, process counts
  • Container Logs: All stdout/stderr output with metadata
  • Host Metrics: System-level resource usage
  • Container Lifecycle: Start, stop, restart events
  • Network Traffic: Inter-container communication patterns
  • Resource Limits: Utilization vs. configured limits
  • Health Status: Container and service health checks

Best Practices

  1. Resource Management: Set appropriate memory limits for collector
  2. Network Configuration: Isolate monitoring traffic when possible
  3. Security: Use read-only Docker socket access
  4. Retention: Configure appropriate log retention policies
  5. Alerting: Set up alerts for high resource usage or container failures

Troubleshooting

  • No container metrics appearing. Verify the Docker socket is mounted on the collector and readable, check the collector runs as user: "0" so it can read the socket, and confirm docker_stats is enabled in the metrics pipeline.

    docker exec last9-otel-collector ls -la /var/run/docker.sock
  • Missing logs. Verify logspout can access the Docker socket, check containers are logging to stdout/stderr (logspout will not pick up file-based logs), and confirm Logspout and the collector are on the same Docker network so syslog (syslog+tcp://otel-collector:2255) is reachable.

  • High resource usage. Adjust collection intervals, configure memory limits, and use batch processing. Example tuning for high-volume environments:

    processors:
    batch:
    send_batch_size: 50000
    send_batch_max_size: 50000
    timeout: 30s
    memory_limiter:
    limit_mib: 1024
    spike_limit_mib: 256
  • Need to inspect collector behavior. Enable debug logging and check connectivity:

    # Check collector logs
    docker logs last9-otel-collector -f
    # Check logspout logs
    docker logs last9-logspout -f
    # Test connectivity
    docker exec last9-otel-collector wget -qO- http://localhost:8889/

Please get in touch with us on Discord or Email if you have any questions.