Skip to content
Last9 named a Gartner Cool Vendor in AI for SRE Observability for 2025! Read more →
Last9

Docker Integration

Monitor Docker containers comprehensively with OpenTelemetry Collector for metrics, logs, and container lifecycle events

Monitor Docker containers and containerized applications using Last9’s OpenTelemetry endpoint. This integration collects comprehensive container metrics, logs, and lifecycle events from your Docker infrastructure.

Prerequisites

  • Docker and Docker Compose installed
  • Containers logging to stdout/stderr
  • Docker daemon with stats API access
  • Last9 account with OTLP endpoint configured

How It Works

  • OpenTelemetry Collector: Collects metrics from Docker containers using the Docker Stats API
  • Logspout: Captures logs from all Docker containers and forwards them to the OpenTelemetry Collector
  • Data Processing: The collector processes, batches, and enriches the telemetry data
  • Export: The processed data is sent to Last9 for visualization and analysis

Configuration

  1. Create Docker Compose Configuration

    Create last9-docker-monitoring.yaml for the monitoring stack:

    version: "3.8"
    services:
    otel-collector:
    image: otel/opentelemetry-collector-contrib:0.118.0
    container_name: last9-otel-collector
    command:
    [
    "--config=/etc/otel-collector-config.yaml",
    "--feature-gates=transform.flatten.logs",
    ]
    volumes:
    - ./otel-config.yaml:/etc/otel-collector-config.yaml
    - /var/run/docker.sock:/var/run/docker.sock:ro
    - /sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro
    - /proc:/hostfs/proc:ro
    - /etc/os-release:/etc/os-release:ro
    ports:
    - "4317:4317" # OTLP gRPC receiver
    - "4318:4318" # OTLP HTTP receiver
    - "8888:8888" # Prometheus metrics
    - "8889:8889" # Health check endpoint
    restart: unless-stopped
    user: "0" # root user to access docker stats and host metrics
    environment:
    - LOGSPOUT=ignore
    - HOST_PROC=/hostfs/proc
    - HOST_SYS=/hostfs/sys
    - HOST_ETC=/hostfs/etc
    networks:
    - last9_monitoring
    healthcheck:
    test:
    [
    "CMD",
    "wget",
    "--no-verbose",
    "--tries=1",
    "--spider",
    "http://localhost:8889/",
    ]
    interval: 30s
    timeout: 5s
    retries: 3
    labels:
    - "monitoring.last9=otel-collector"
    - "logging.disable=false"
    logspout:
    image: "gliderlabs/logspout:v3.2.14"
    container_name: last9-logspout
    volumes:
    - /etc/hostname:/etc/host_hostname:ro
    - /var/run/docker.sock:/var/run/docker.sock:ro
    command: syslog+tcp://otel-collector:2255
    depends_on:
    otel-collector:
    condition: service_healthy
    restart: unless-stopped
    environment:
    - LOGSPOUT_MULTICAST=true
    - BACKLOG=false
    networks:
    - last9_monitoring
    # Add your application networks here to monitor their containers
    # - your_app_network_1
    # - your_app_network_2
    labels:
    - "monitoring.last9=logspout"
    # Optional: Cadvisor for additional container metrics
    cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0
    container_name: last9-cadvisor
    privileged: true
    devices:
    - /dev/kmsg:/dev/kmsg
    volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:rw
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro
    - /cgroup:/cgroup:ro
    ports:
    - "8080:8080"
    restart: unless-stopped
    networks:
    - last9_monitoring
    labels:
    - "monitoring.last9=cadvisor"
    networks:
    last9_monitoring:
    name: last9_monitoring
    driver: bridge
    # Add your existing application networks as external
    # Uncomment and modify as needed:
    # your_app_network_1:
    # external: true
    # name: your_app_network_1
    # your_app_network_2:
    # external: true
    # name: your_app_network_2
  2. Create OpenTelemetry Collector Configuration

    Create otel-config.yaml with comprehensive Docker monitoring:

    receivers:
    # Docker container metrics
    docker_stats:
    collection_interval: 30s
    timeout: 20s
    api_version: 1.40
    metrics:
    # CPU metrics
    container.cpu.usage.total:
    enabled: true
    container.cpu.usage.kernelmode:
    enabled: true
    container.cpu.usage.usermode:
    enabled: true
    container.cpu.throttling_data.periods:
    enabled: true
    container.cpu.throttling_data.throttled_periods:
    enabled: true
    container.cpu.throttling_data.throttled_time:
    enabled: true
    container.cpu.utilization:
    enabled: true
    container.cpu.percent:
    enabled: true
    # Memory metrics
    container.memory.usage.limit:
    enabled: true
    container.memory.usage.total:
    enabled: true
    container.memory.usage.max:
    enabled: true
    container.memory.percent:
    enabled: true
    container.memory.cache:
    enabled: true
    container.memory.rss:
    enabled: true
    container.memory.swap:
    enabled: true
    # Network metrics
    container.network.io.usage.rx_bytes:
    enabled: true
    container.network.io.usage.tx_bytes:
    enabled: true
    container.network.io.usage.rx_packets:
    enabled: true
    container.network.io.usage.tx_packets:
    enabled: true
    container.network.io.usage.rx_dropped:
    enabled: true
    container.network.io.usage.tx_dropped:
    enabled: true
    container.network.io.usage.rx_errors:
    enabled: true
    container.network.io.usage.tx_errors:
    enabled: true
    # Block I/O metrics
    container.blockio.io_service_bytes_recursive:
    enabled: true
    container.blockio.io_serviced_recursive:
    enabled: true
    # Process metrics
    container.pids.count:
    enabled: true
    container.pids.limit:
    enabled: true
    # Container logs via syslog from logspout
    tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
    - type: syslog_parser
    protocol: rfc5424
    # Optional: Host metrics if cadvisor is not used
    hostmetrics:
    collection_interval: 60s
    scrapers:
    cpu:
    metrics:
    system.cpu.utilization:
    enabled: true
    memory:
    metrics:
    system.memory.utilization:
    enabled: true
    disk:
    metrics:
    system.disk.io.time:
    enabled: true
    network:
    load:
    filesystem:
    # Optional: Prometheus scraping for cadvisor
    prometheus/cadvisor:
    config:
    scrape_configs:
    - job_name: "cadvisor"
    static_configs:
    - targets: ["cadvisor:8080"]
    scrape_interval: 30s
    processors:
    # Transform docker logs
    transform/docker_logs:
    error_mode: ignore
    flatten_data: true
    log_statements:
    - context: log
    statements:
    - set(body, attributes["message"])
    - delete_key(attributes, "message")
    - set(resource.attributes["service.name"], attributes["appname"])
    - set(attributes["container.name"], attributes["appname"])
    - set(attributes["log.source"], "docker")
    # Add resource attributes for containers
    resource/docker:
    attributes:
    - key: monitoring.tool
    value: last9-otel-docker
    action: insert
    - key: deployment.environment
    from_attribute: docker.container.label.environment
    action: insert
    - key: service.version
    from_attribute: docker.container.label.version
    action: insert
    # Batch processing for performance
    batch:
    send_batch_size: 10000
    send_batch_max_size: 10000
    timeout: 10s
    # Memory limiter to prevent OOM
    memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    # Resource detection
    resourcedetection:
    detectors: [env, system, docker]
    timeout: 5s
    override: false
    # Filter out sensitive containers
    filter/exclude_monitoring:
    metrics:
    exclude:
    match_type: regexp
    resource_attributes:
    - key: container.name
    value: "(last9-otel-collector|last9-logspout|last9-cadvisor)"
    exporters:
    # Debug exporter for troubleshooting
    debug:
    verbosity: basic
    sampling_initial: 2
    sampling_thereafter: 500
    # Last9 OTLP exporter
    otlp/last9:
    endpoint: $last9_otlp_endpoint
    headers:
    Authorization: $last9_otlp_auth_header
    compression: gzip
    retry_on_failure:
    enabled: true
    initial_interval: 1s
    max_interval: 30s
    max_elapsed_time: 300s
    sending_queue:
    enabled: true
    num_consumers: 10
    queue_size: 5000
    extensions:
    # Health check extension
    health_check:
    endpoint: "0.0.0.0:8889"
    # Performance profiling
    pprof:
    endpoint: "0.0.0.0:1777"
    # Memory ballast for GC optimization
    memory_ballast:
    size_mib: 165
    service:
    extensions: [health_check, pprof, memory_ballast]
    pipelines:
    # Metrics pipeline
    metrics:
    receivers: [docker_stats, hostmetrics, prometheus/cadvisor]
    processors:
    [
    memory_limiter,
    resourcedetection,
    resource/docker,
    filter/exclude_monitoring,
    batch,
    ]
    exporters: [otlp/last9]
    # Logs pipeline
    logs:
    receivers: [tcplog/docker]
    processors:
    [
    memory_limiter,
    transform/docker_logs,
    resourcedetection,
    resource/docker,
    batch,
    ]
    exporters: [otlp/last9]
  3. Start the Monitoring Stack

    # Start the monitoring services
    docker compose -f last9-docker-monitoring.yaml up -d
    # Check the status
    docker compose -f last9-docker-monitoring.yaml ps
    # View logs
    docker compose -f last9-docker-monitoring.yaml logs -f otel-collector

Application Integration

Environment Variables for Containers

Set these environment variables in your application containers for better monitoring:

# docker-compose.yml for your applications
version: "3.8"
services:
your-app:
image: your-app:latest
environment:
# OpenTelemetry configuration for direct instrumentation
- OTEL_SERVICE_NAME=your-app-service
- OTEL_EXPORTER_OTLP_ENDPOINT=http://last9-otel-collector:4317
- OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production
labels:
# Labels for container identification
- "service.name=your-app-service"
- "service.version=1.0.0"
- "environment=production"
- "component=api"
networks:
- your_app_network
- last9_monitoring # Connect to monitoring network
depends_on:
- database
database:
image: postgres:15
environment:
- POSTGRES_DB=appdb
- POSTGRES_USER=appuser
- POSTGRES_PASSWORD=secret
labels:
- "service.name=postgres-db"
- "component=database"
networks:
- your_app_network
- last9_monitoring
networks:
your_app_network:
name: your_app_network
last9_monitoring:
external: true
name: last9_monitoring

Custom Log Formatting

For applications producing structured logs:

# Dockerfile with structured logging
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
# Configure structured logging
ENV NODE_ENV=production
ENV LOG_FORMAT=json
ENV LOG_LEVEL=info
# OpenTelemetry configuration
ENV OTEL_SERVICE_NAME=nodejs-app
ENV OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"
CMD ["node", "server.js"]

Health Check Integration

# Add health checks to your services
services:
your-app:
image: your-app:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
labels:
- "health.endpoint=/health"
- "monitoring.enabled=true"

Advanced Configuration

Custom Metrics Collection

# Add custom receivers to otel-config.yaml
receivers:
# JMX metrics for Java applications
jmx:
jar_path: /opt/opentelemetry-jmx-metrics.jar
endpoint: service:jmx:rmi:///jndi/rmi://java-app:9999/jmxrmi
target_system: java
collection_interval: 60s
# StatsD metrics from applications
statsd:
endpoint: "0.0.0.0:8125"
aggregation_interval: 60s
# HTTP endpoint for custom metrics
httpcheck:
targets:
- endpoint: http://your-app:3000/metrics
method: GET
collection_interval: 30s

Log Enrichment

# Enhanced log processing
processors:
transform/enrich_logs:
log_statements:
- context: log
statements:
# Add container metadata
- set(attributes["container.id"], resource.attributes["container.id"])
- set(attributes["container.image"], resource.attributes["container.image.name"])
# Parse JSON logs
- merge_maps(cache, ParseJSON(body), "insert") where body != nil
- set(body, cache["message"]) where cache["message"] != nil
- set(attributes["log.level"], cache["level"]) where cache["level"] != nil
- set(attributes["log.timestamp"], cache["timestamp"]) where cache["timestamp"] != nil

Network Monitoring

# Add network inspection
receivers:
tcplog/network:
listen_address: "0.0.0.0:2256"
operators:
- type: json_parser
processors:
transform/network:
log_statements:
- context: log
statements:
- set(attributes["network.protocol"], "tcp")
- set(attributes["monitoring.type"], "network")

Kubernetes Integration

For Kubernetes environments, use a DaemonSet approach:

# kubernetes-docker-monitoring.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: last9-otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
name: last9-otel-collector
template:
metadata:
labels:
name: last9-otel-collector
spec:
serviceAccount: otel-collector
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.118.0
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: LAST9_OTLP_ENDPOINT
valueFrom:
secretKeyRef:
name: last9-credentials
key: endpoint
- name: LAST9_OTLP_AUTH_HEADER
valueFrom:
secretKeyRef:
name: last9-credentials
key: auth-header
volumeMounts:
- name: config
mountPath: /etc/otel-collector-config.yaml
subPath: config.yaml
- name: docker-socket
mountPath: /var/run/docker.sock
readOnly: true
volumes:
- name: config
configMap:
name: otel-collector-config
- name: docker-socket
hostPath:
path: /var/run/docker.sock

Troubleshooting

Common Issues

  1. No container metrics appearing:

    • Verify Docker socket access: ls -la /var/run/docker.sock
    • Check collector has root privileges
    • Ensure container is in monitoring network
  2. Missing logs:

    • Verify logspout can access Docker socket
    • Check containers are logging to stdout/stderr
    • Ensure proper network connectivity
  3. High resource usage:

    • Adjust collection intervals
    • Configure memory limits
    • Use batch processing

Debug Mode

Enable debug logging:

# Check collector logs
docker logs last9-otel-collector -f
# Check logspout logs
docker logs last9-logspout -f
# Test connectivity
docker exec last9-otel-collector wget -qO- http://localhost:8889/

Performance Tuning

# Optimize for high-volume environments
processors:
batch:
send_batch_size: 50000
send_batch_max_size: 50000
timeout: 30s
memory_limiter:
limit_mib: 1024
spike_limit_mib: 256

Monitoring Capabilities

This integration provides:

  • Container Metrics: CPU, memory, network, disk I/O, process counts
  • Container Logs: All stdout/stderr output with metadata
  • Host Metrics: System-level resource usage
  • Container Lifecycle: Start, stop, restart events
  • Network Traffic: Inter-container communication patterns
  • Resource Limits: Utilization vs. configured limits
  • Health Status: Container and service health checks

Best Practices

  1. Resource Management: Set appropriate memory limits for collector
  2. Network Configuration: Isolate monitoring traffic when possible
  3. Security: Use read-only Docker socket access
  4. Retention: Configure appropriate log retention policies
  5. Alerting: Set up alerts for high resource usage or container failures

Your Docker infrastructure will now provide comprehensive observability data to Last9, enabling detailed monitoring of containerized applications and infrastructure health.