Docker
Monitor Docker containers comprehensively with OpenTelemetry Collector for metrics, logs, and container lifecycle events
Monitor Docker containers and containerized applications using Last9’s OpenTelemetry endpoint. This integration collects comprehensive container metrics, logs, and lifecycle events from your Docker infrastructure.
Prerequisites
- Docker and Docker Compose installed
- Containers logging to stdout/stderr
- Docker daemon with stats API access
- Last9 account with OTLP endpoint configured
How It Works
- OpenTelemetry Collector: Collects metrics from Docker containers using the Docker Stats API
- Logspout: Captures logs from all Docker containers and forwards them to the OpenTelemetry Collector
- Data Processing: The collector processes, batches, and enriches the telemetry data
- Export: The processed data is sent to Last9 for visualization and analysis
Configuration
-
Create Docker Compose Configuration
Create
last9-docker-monitoring.yamlfor the monitoring stack:version: "3.8"services:otel-collector:image: otel/opentelemetry-collector-contrib:0.118.0container_name: last9-otel-collectorcommand:["--config=/etc/otel-collector-config.yaml","--feature-gates=transform.flatten.logs",]volumes:- ./otel-config.yaml:/etc/otel-collector-config.yaml:ro- /var/run/docker.sock:/var/run/docker.sock:ro# Host filesystem mounts for the hostmetrics receiver.- /proc:/hostfs/proc:ro- /sys:/hostfs/sys:ro- /etc/os-release:/hostfs/etc/os-release:ro- /:/hostfs/root:ro,rslaveports:- "4317:4317" # OTLP gRPC receiver- "4318:4318" # OTLP HTTP receiver- "8888:8888" # Prometheus metrics- "8889:8889" # Health check endpointrestart: unless-stoppeduser: "0" # root user to access docker stats and /procpid: host # so the process scraper sees host PIDsenvironment:- LOGSPOUT=ignore- HOST_PROC=/hostfs/proc- HOST_SYS=/hostfs/sys- HOST_ETC=/hostfs/etcnetworks:- last9_monitoringhealthcheck:test:["CMD","wget","--no-verbose","--tries=1","--spider","http://localhost:8889/",]interval: 30stimeout: 5sretries: 3labels:- "monitoring.last9=otel-collector"- "logging.disable=false"logspout:image: "gliderlabs/logspout:v3.2.14"container_name: last9-logspoutvolumes:- /etc/hostname:/etc/host_hostname:ro- /var/run/docker.sock:/var/run/docker.sock:rocommand: syslog+tcp://otel-collector:2255depends_on:otel-collector:condition: service_healthyrestart: unless-stoppedenvironment:- LOGSPOUT_MULTICAST=true- BACKLOG=falsenetworks:- last9_monitoringlabels:- "monitoring.last9=logspout"# Optional: Cadvisor for additional container metricscadvisor:image: gcr.io/cadvisor/cadvisor:v0.47.0container_name: last9-cadvisorprivileged: truedevices:- /dev/kmsg:/dev/kmsgvolumes:- /:/rootfs:ro- /var/run:/var/run:rw- /sys:/sys:ro- /var/lib/docker/:/var/lib/docker:ro- /cgroup:/cgroup:roports:- "8080:8080"restart: unless-stoppednetworks:- last9_monitoringlabels:- "monitoring.last9=cadvisor"networks:last9_monitoring:name: last9_monitoringdriver: bridge -
Create OpenTelemetry Collector Configuration
Create
otel-config.yamlwith comprehensive Docker monitoring:receivers:# Docker container metricsdocker_stats:collection_interval: 30stimeout: 20sapi_version: 1.40metrics:# CPU metricscontainer.cpu.usage.total:enabled: truecontainer.cpu.usage.kernelmode:enabled: truecontainer.cpu.usage.usermode:enabled: truecontainer.cpu.throttling_data.periods:enabled: truecontainer.cpu.throttling_data.throttled_periods:enabled: truecontainer.cpu.throttling_data.throttled_time:enabled: truecontainer.cpu.utilization:enabled: truecontainer.cpu.percent:enabled: true# Memory metricscontainer.memory.usage.limit:enabled: truecontainer.memory.usage.total:enabled: truecontainer.memory.usage.max:enabled: truecontainer.memory.percent:enabled: truecontainer.memory.cache:enabled: truecontainer.memory.rss:enabled: truecontainer.memory.swap:enabled: true# Network metricscontainer.network.io.usage.rx_bytes:enabled: truecontainer.network.io.usage.tx_bytes:enabled: truecontainer.network.io.usage.rx_packets:enabled: truecontainer.network.io.usage.tx_packets:enabled: truecontainer.network.io.usage.rx_dropped:enabled: truecontainer.network.io.usage.tx_dropped:enabled: truecontainer.network.io.usage.rx_errors:enabled: truecontainer.network.io.usage.tx_errors:enabled: true# Block I/O metricscontainer.blockio.io_service_bytes_recursive:enabled: truecontainer.blockio.io_serviced_recursive:enabled: true# Process metricscontainer.pids.count:enabled: truecontainer.pids.limit:enabled: true# Container logs via syslog from logspouttcplog/docker:listen_address: "0.0.0.0:2255"operators:- type: syslog_parserprotocol: rfc5424# Host (VM / bare-metal) metrics. root_path: /hostfs + HOST_* env vars# point each scraper at the host filesystem rather than the collector# container's own /proc.hostmetrics:collection_interval: 60sroot_path: /hostfsscrapers:cpu:metrics:system.cpu.logical.count:enabled: truesystem.cpu.utilization:enabled: truememory:metrics:system.memory.utilization:enabled: truesystem.memory.limit:enabled: trueload:disk:filesystem:metrics:system.filesystem.utilization:enabled: trueexclude_mount_points:mount_points:- /hostfs/sys/*- /hostfs/proc/*- /var/lib/docker/*- /hostfs/root/var/lib/docker/*match_type: regexpnetwork:paging:processes:# mute_process_*_error: true silences routine "permission denied"# noise for kernel threads and short-lived processes whose# /proc/<pid>/{exe,io,cmdline} entries can't be read.process:mute_process_user_error: truemute_process_io_error: truemute_process_exe_error: truemetrics:process.cpu.utilization:enabled: trueprocess.memory.utilization:enabled: true# Optional: Prometheus scraping for cadvisorprometheus/cadvisor:config:scrape_configs:- job_name: "cadvisor"static_configs:- targets: ["cadvisor:8080"]scrape_interval: 30sprocessors:# Transform docker logstransform/docker_logs:error_mode: ignoreflatten_data: truelog_statements:- context: logstatements:- set(log.body, log.attributes["message"])- delete_key(log.attributes, "message")- set(resource.attributes["service.name"], log.attributes["appname"])- set(log.attributes["container.name"], log.attributes["appname"])- set(log.attributes["log.source"], "docker")# Add resource attributes for containersresource/docker:attributes:- key: monitoring.toolvalue: last9-otel-dockeraction: insert- key: deployment.environmentfrom_attribute: docker.container.label.environmentaction: insert- key: service.versionfrom_attribute: docker.container.label.versionaction: insert# Batch processing for performancebatch:send_batch_size: 10000send_batch_max_size: 10000timeout: 10s# Memory limiter to prevent OOMmemory_limiter:limit_mib: 512spike_limit_mib: 128# Resource detectionresourcedetection/system:detectors: [env, system, docker]system:hostname_sources: ["os"]timeout: 5soverride: false# Cloud-specific VM detectors. Each detector is a no-op on hosts that# aren't on the matching cloud, so it is safe to list all three — only# the one that matches the runtime environment will populate cloud.*# labels. ECS / EKS / AKS detectors are omitted because they require# Kubernetes service env vars or ECS task metadata.resourcedetection/cloud:detectors: [ec2, gcp, azure]timeout: 5s# Filter out sensitive containersfilter/exclude_monitoring:metrics:exclude:match_type: regexpresource_attributes:- key: container.namevalue: "(last9-otel-collector|last9-logspout|last9-cadvisor)"# Promote container resource attributes to datapoint attributes so they# are guaranteed to land as filterable labels on metrics in Last9. The# docker_stats receiver sets container.name / container.image.name as# resource attributes; this transform copies them onto each datapoint.transform/container_labels:error_mode: ignoremetric_statements:- context: datapointstatements:- set(datapoint.attributes["container.name"], resource.attributes["container.name"])- set(datapoint.attributes["container.image.name"], resource.attributes["container.image.name"])- set(datapoint.attributes["container.image.tag"], resource.attributes["container.image.tag"])- set(datapoint.attributes["container.id"], resource.attributes["container.id"])# Promote host + cloud resource attributes onto each host metric# datapoint so they appear as filterable labels in Last9.transform/hostmetrics:error_mode: ignoremetric_statements:- context: datapointstatements:- set(datapoint.attributes["host.name"], resource.attributes["host.name"])- set(datapoint.attributes["host.type"], resource.attributes["host.type"])- set(datapoint.attributes["host.image.id"], resource.attributes["host.image.id"])- set(datapoint.attributes["cloud.provider"], resource.attributes["cloud.provider"])- set(datapoint.attributes["cloud.platform"], resource.attributes["cloud.platform"])- set(datapoint.attributes["cloud.region"], resource.attributes["cloud.region"])- set(datapoint.attributes["cloud.account.id"], resource.attributes["cloud.account.id"])- set(datapoint.attributes["cloud.availability_zone"], resource.attributes["cloud.availability_zone"])exporters:# Debug exporter for troubleshootingdebug:verbosity: basicsampling_initial: 2sampling_thereafter: 500# Last9 OTLP exporterotlp/last9:endpoint: $last9_otlp_endpointheaders:Authorization: $last9_otlp_auth_headercompression: gzipretry_on_failure:enabled: trueinitial_interval: 1smax_interval: 30smax_elapsed_time: 300ssending_queue:enabled: truenum_consumers: 10queue_size: 5000extensions:# Health check extensionhealth_check:endpoint: "0.0.0.0:8889"# Performance profilingpprof:endpoint: "0.0.0.0:1777"# Memory ballast for GC optimizationmemory_ballast:size_mib: 165service:extensions: [health_check, pprof, memory_ballast]pipelines:# Metrics pipelinemetrics:receivers: [docker_stats, hostmetrics, prometheus/cadvisor]processors:[memory_limiter,resourcedetection/system,resourcedetection/cloud,resource/docker,filter/exclude_monitoring,transform/container_labels,transform/hostmetrics,batch,]exporters: [otlp/last9]# Logs pipelinelogs:receivers: [tcplog/docker]processors:[memory_limiter,transform/docker_logs,resourcedetection/system,resourcedetection/cloud,resource/docker,batch,]exporters: [otlp/last9] -
Start the Monitoring Stack
# Start the monitoring servicesdocker compose -f last9-docker-monitoring.yaml up -d# Check the statusdocker compose -f last9-docker-monitoring.yaml ps# View logsdocker compose -f last9-docker-monitoring.yaml logs -f otel-collector
Application Integration
Environment Variables for Containers
Set these environment variables in your application containers for better monitoring:
# docker-compose.yml for your applicationsversion: "3.8"
services: your-app: image: your-app:latest environment: # OpenTelemetry configuration for direct instrumentation - OTEL_SERVICE_NAME=your-app-service - OTEL_EXPORTER_OTLP_ENDPOINT=http://last9-otel-collector:4317 - OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production labels: # Labels for container identification - "service.name=your-app-service" - "service.version=1.0.0" - "environment=production" - "component=api" networks: - your_app_network - last9_monitoring # Connect to monitoring network depends_on: - database
database: image: postgres:15 environment: - POSTGRES_DB=appdb - POSTGRES_USER=appuser - POSTGRES_PASSWORD=secret labels: - "service.name=postgres-db" - "component=database" networks: - your_app_network - last9_monitoring
networks: your_app_network: name: your_app_network last9_monitoring: external: true name: last9_monitoringCustom Log Formatting
For applications producing structured logs:
# Dockerfile with structured loggingFROM node:18-alpine
WORKDIR /appCOPY package*.json ./RUN npm ci --production
COPY . .
# Configure structured loggingENV NODE_ENV=productionENV LOG_FORMAT=jsonENV LOG_LEVEL=info
# OpenTelemetry configurationENV OTEL_SERVICE_NAME=nodejs-appENV OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"
CMD ["node", "server.js"]Health Check Integration
# Add health checks to your servicesservices: your-app: image: your-app:latest healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s labels: - "health.endpoint=/health" - "monitoring.enabled=true"Advanced Configuration
Custom Metrics Collection
# Add custom receivers to otel-config.yamlreceivers: # JMX metrics for Java applications jmx: jar_path: /opt/opentelemetry-jmx-metrics.jar endpoint: service:jmx:rmi:///jndi/rmi://java-app:9999/jmxrmi target_system: java collection_interval: 60s
# StatsD metrics from applications statsd: endpoint: "0.0.0.0:8125" aggregation_interval: 60s
# HTTP endpoint for custom metrics httpcheck: targets: - endpoint: http://your-app:3000/metrics method: GET collection_interval: 30sLog Enrichment
# Enhanced log processingprocessors: transform/enrich_logs: log_statements: - context: log statements: # Add container metadata - set(attributes["container.id"], resource.attributes["container.id"]) - set(attributes["container.image"], resource.attributes["container.image.name"])
# Parse JSON logs - merge_maps(cache, ParseJSON(body), "insert") where body != nil - set(body, cache["message"]) where cache["message"] != nil - set(attributes["log.level"], cache["level"]) where cache["level"] != nil - set(attributes["log.timestamp"], cache["timestamp"]) where cache["timestamp"] != nilNetwork Monitoring
# Add network inspectionreceivers: tcplog/network: listen_address: "0.0.0.0:2256" operators: - type: json_parser
processors: transform/network: log_statements: - context: log statements: - set(attributes["network.protocol"], "tcp") - set(attributes["monitoring.type"], "network")Kubernetes Integration
For Kubernetes environments, use a DaemonSet approach:
# kubernetes-docker-monitoring.yamlapiVersion: apps/v1kind: DaemonSetmetadata: name: last9-otel-collector namespace: monitoringspec: selector: matchLabels: name: last9-otel-collector template: metadata: labels: name: last9-otel-collector spec: serviceAccount: otel-collector containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.118.0 env: - name: K8S_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: LAST9_OTLP_ENDPOINT valueFrom: secretKeyRef: name: last9-credentials key: endpoint - name: LAST9_OTLP_AUTH_HEADER valueFrom: secretKeyRef: name: last9-credentials key: auth-header volumeMounts: - name: config mountPath: /etc/otel-collector-config.yaml subPath: config.yaml - name: docker-socket mountPath: /var/run/docker.sock readOnly: true volumes: - name: config configMap: name: otel-collector-config - name: docker-socket hostPath: path: /var/run/docker.sockMonitoring Capabilities
This integration provides:
- Container Metrics: CPU, memory, network, disk I/O, process counts
- Container Logs: All stdout/stderr output with metadata
- Host Metrics: System-level resource usage
- Container Lifecycle: Start, stop, restart events
- Network Traffic: Inter-container communication patterns
- Resource Limits: Utilization vs. configured limits
- Health Status: Container and service health checks
Best Practices
- Resource Management: Set appropriate memory limits for collector
- Network Configuration: Isolate monitoring traffic when possible
- Security: Use read-only Docker socket access
- Retention: Configure appropriate log retention policies
- Alerting: Set up alerts for high resource usage or container failures
Troubleshooting
-
No container metrics appearing. Verify the Docker socket is mounted on the collector and readable, check the collector runs as
user: "0"so it can read the socket, and confirmdocker_statsis enabled in the metrics pipeline.docker exec last9-otel-collector ls -la /var/run/docker.sock -
Missing logs. Verify logspout can access the Docker socket, check containers are logging to stdout/stderr (logspout will not pick up file-based logs), and confirm Logspout and the collector are on the same Docker network so syslog (
syslog+tcp://otel-collector:2255) is reachable. -
High resource usage. Adjust collection intervals, configure memory limits, and use batch processing. Example tuning for high-volume environments:
processors:batch:send_batch_size: 50000send_batch_max_size: 50000timeout: 30smemory_limiter:limit_mib: 1024spike_limit_mib: 256 -
Need to inspect collector behavior. Enable debug logging and check connectivity:
# Check collector logsdocker logs last9-otel-collector -f# Check logspout logsdocker logs last9-logspout -f# Test connectivitydocker exec last9-otel-collector wget -qO- http://localhost:8889/
Please get in touch with us on Discord or Email if you have any questions.