Docker Integration
Monitor Docker containers comprehensively with OpenTelemetry Collector for metrics, logs, and container lifecycle events
Monitor Docker containers and containerized applications using Last9’s OpenTelemetry endpoint. This integration collects comprehensive container metrics, logs, and lifecycle events from your Docker infrastructure.
Prerequisites
- Docker and Docker Compose installed
- Containers logging to stdout/stderr
- Docker daemon with stats API access
- Last9 account with OTLP endpoint configured
How It Works
- OpenTelemetry Collector: Collects metrics from Docker containers using the Docker Stats API
- Logspout: Captures logs from all Docker containers and forwards them to the OpenTelemetry Collector
- Data Processing: The collector processes, batches, and enriches the telemetry data
- Export: The processed data is sent to Last9 for visualization and analysis
Configuration
-
Create Docker Compose Configuration
Create
last9-docker-monitoring.yamlfor the monitoring stack:version: "3.8"services:otel-collector:image: otel/opentelemetry-collector-contrib:0.118.0container_name: last9-otel-collectorcommand:["--config=/etc/otel-collector-config.yaml","--feature-gates=transform.flatten.logs",]volumes:- ./otel-config.yaml:/etc/otel-collector-config.yaml- /var/run/docker.sock:/var/run/docker.sock:ro- /sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro- /proc:/hostfs/proc:ro- /etc/os-release:/etc/os-release:roports:- "4317:4317" # OTLP gRPC receiver- "4318:4318" # OTLP HTTP receiver- "8888:8888" # Prometheus metrics- "8889:8889" # Health check endpointrestart: unless-stoppeduser: "0" # root user to access docker stats and host metricsenvironment:- LOGSPOUT=ignore- HOST_PROC=/hostfs/proc- HOST_SYS=/hostfs/sys- HOST_ETC=/hostfs/etcnetworks:- last9_monitoringhealthcheck:test:["CMD","wget","--no-verbose","--tries=1","--spider","http://localhost:8889/",]interval: 30stimeout: 5sretries: 3labels:- "monitoring.last9=otel-collector"- "logging.disable=false"logspout:image: "gliderlabs/logspout:v3.2.14"container_name: last9-logspoutvolumes:- /etc/hostname:/etc/host_hostname:ro- /var/run/docker.sock:/var/run/docker.sock:rocommand: syslog+tcp://otel-collector:2255depends_on:otel-collector:condition: service_healthyrestart: unless-stoppedenvironment:- LOGSPOUT_MULTICAST=true- BACKLOG=falsenetworks:- last9_monitoring# Add your application networks here to monitor their containers# - your_app_network_1# - your_app_network_2labels:- "monitoring.last9=logspout"# Optional: Cadvisor for additional container metricscadvisor:image: gcr.io/cadvisor/cadvisor:v0.47.0container_name: last9-cadvisorprivileged: truedevices:- /dev/kmsg:/dev/kmsgvolumes:- /:/rootfs:ro- /var/run:/var/run:rw- /sys:/sys:ro- /var/lib/docker/:/var/lib/docker:ro- /cgroup:/cgroup:roports:- "8080:8080"restart: unless-stoppednetworks:- last9_monitoringlabels:- "monitoring.last9=cadvisor"networks:last9_monitoring:name: last9_monitoringdriver: bridge# Add your existing application networks as external# Uncomment and modify as needed:# your_app_network_1:# external: true# name: your_app_network_1# your_app_network_2:# external: true# name: your_app_network_2 -
Create OpenTelemetry Collector Configuration
Create
otel-config.yamlwith comprehensive Docker monitoring:receivers:# Docker container metricsdocker_stats:collection_interval: 30stimeout: 20sapi_version: 1.40metrics:# CPU metricscontainer.cpu.usage.total:enabled: truecontainer.cpu.usage.kernelmode:enabled: truecontainer.cpu.usage.usermode:enabled: truecontainer.cpu.throttling_data.periods:enabled: truecontainer.cpu.throttling_data.throttled_periods:enabled: truecontainer.cpu.throttling_data.throttled_time:enabled: truecontainer.cpu.utilization:enabled: truecontainer.cpu.percent:enabled: true# Memory metricscontainer.memory.usage.limit:enabled: truecontainer.memory.usage.total:enabled: truecontainer.memory.usage.max:enabled: truecontainer.memory.percent:enabled: truecontainer.memory.cache:enabled: truecontainer.memory.rss:enabled: truecontainer.memory.swap:enabled: true# Network metricscontainer.network.io.usage.rx_bytes:enabled: truecontainer.network.io.usage.tx_bytes:enabled: truecontainer.network.io.usage.rx_packets:enabled: truecontainer.network.io.usage.tx_packets:enabled: truecontainer.network.io.usage.rx_dropped:enabled: truecontainer.network.io.usage.tx_dropped:enabled: truecontainer.network.io.usage.rx_errors:enabled: truecontainer.network.io.usage.tx_errors:enabled: true# Block I/O metricscontainer.blockio.io_service_bytes_recursive:enabled: truecontainer.blockio.io_serviced_recursive:enabled: true# Process metricscontainer.pids.count:enabled: truecontainer.pids.limit:enabled: true# Container logs via syslog from logspouttcplog/docker:listen_address: "0.0.0.0:2255"operators:- type: syslog_parserprotocol: rfc5424# Optional: Host metrics if cadvisor is not usedhostmetrics:collection_interval: 60sscrapers:cpu:metrics:system.cpu.utilization:enabled: truememory:metrics:system.memory.utilization:enabled: truedisk:metrics:system.disk.io.time:enabled: truenetwork:load:filesystem:# Optional: Prometheus scraping for cadvisorprometheus/cadvisor:config:scrape_configs:- job_name: "cadvisor"static_configs:- targets: ["cadvisor:8080"]scrape_interval: 30sprocessors:# Transform docker logstransform/docker_logs:error_mode: ignoreflatten_data: truelog_statements:- context: logstatements:- set(body, attributes["message"])- delete_key(attributes, "message")- set(resource.attributes["service.name"], attributes["appname"])- set(attributes["container.name"], attributes["appname"])- set(attributes["log.source"], "docker")# Add resource attributes for containersresource/docker:attributes:- key: monitoring.toolvalue: last9-otel-dockeraction: insert- key: deployment.environmentfrom_attribute: docker.container.label.environmentaction: insert- key: service.versionfrom_attribute: docker.container.label.versionaction: insert# Batch processing for performancebatch:send_batch_size: 10000send_batch_max_size: 10000timeout: 10s# Memory limiter to prevent OOMmemory_limiter:limit_mib: 512spike_limit_mib: 128# Resource detectionresourcedetection:detectors: [env, system, docker]timeout: 5soverride: false# Filter out sensitive containersfilter/exclude_monitoring:metrics:exclude:match_type: regexpresource_attributes:- key: container.namevalue: "(last9-otel-collector|last9-logspout|last9-cadvisor)"exporters:# Debug exporter for troubleshootingdebug:verbosity: basicsampling_initial: 2sampling_thereafter: 500# Last9 OTLP exporterotlp/last9:endpoint: $last9_otlp_endpointheaders:Authorization: $last9_otlp_auth_headercompression: gzipretry_on_failure:enabled: trueinitial_interval: 1smax_interval: 30smax_elapsed_time: 300ssending_queue:enabled: truenum_consumers: 10queue_size: 5000extensions:# Health check extensionhealth_check:endpoint: "0.0.0.0:8889"# Performance profilingpprof:endpoint: "0.0.0.0:1777"# Memory ballast for GC optimizationmemory_ballast:size_mib: 165service:extensions: [health_check, pprof, memory_ballast]pipelines:# Metrics pipelinemetrics:receivers: [docker_stats, hostmetrics, prometheus/cadvisor]processors:[memory_limiter,resourcedetection,resource/docker,filter/exclude_monitoring,batch,]exporters: [otlp/last9]# Logs pipelinelogs:receivers: [tcplog/docker]processors:[memory_limiter,transform/docker_logs,resourcedetection,resource/docker,batch,]exporters: [otlp/last9] -
Start the Monitoring Stack
# Start the monitoring servicesdocker compose -f last9-docker-monitoring.yaml up -d# Check the statusdocker compose -f last9-docker-monitoring.yaml ps# View logsdocker compose -f last9-docker-monitoring.yaml logs -f otel-collector
Application Integration
Environment Variables for Containers
Set these environment variables in your application containers for better monitoring:
# docker-compose.yml for your applicationsversion: "3.8"
services: your-app: image: your-app:latest environment: # OpenTelemetry configuration for direct instrumentation - OTEL_SERVICE_NAME=your-app-service - OTEL_EXPORTER_OTLP_ENDPOINT=http://last9-otel-collector:4317 - OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production labels: # Labels for container identification - "service.name=your-app-service" - "service.version=1.0.0" - "environment=production" - "component=api" networks: - your_app_network - last9_monitoring # Connect to monitoring network depends_on: - database
database: image: postgres:15 environment: - POSTGRES_DB=appdb - POSTGRES_USER=appuser - POSTGRES_PASSWORD=secret labels: - "service.name=postgres-db" - "component=database" networks: - your_app_network - last9_monitoring
networks: your_app_network: name: your_app_network last9_monitoring: external: true name: last9_monitoringCustom Log Formatting
For applications producing structured logs:
# Dockerfile with structured loggingFROM node:18-alpine
WORKDIR /appCOPY package*.json ./RUN npm ci --production
COPY . .
# Configure structured loggingENV NODE_ENV=productionENV LOG_FORMAT=jsonENV LOG_LEVEL=info
# OpenTelemetry configurationENV OTEL_SERVICE_NAME=nodejs-appENV OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"
CMD ["node", "server.js"]Health Check Integration
# Add health checks to your servicesservices: your-app: image: your-app:latest healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s labels: - "health.endpoint=/health" - "monitoring.enabled=true"Advanced Configuration
Custom Metrics Collection
# Add custom receivers to otel-config.yamlreceivers: # JMX metrics for Java applications jmx: jar_path: /opt/opentelemetry-jmx-metrics.jar endpoint: service:jmx:rmi:///jndi/rmi://java-app:9999/jmxrmi target_system: java collection_interval: 60s
# StatsD metrics from applications statsd: endpoint: "0.0.0.0:8125" aggregation_interval: 60s
# HTTP endpoint for custom metrics httpcheck: targets: - endpoint: http://your-app:3000/metrics method: GET collection_interval: 30sLog Enrichment
# Enhanced log processingprocessors: transform/enrich_logs: log_statements: - context: log statements: # Add container metadata - set(attributes["container.id"], resource.attributes["container.id"]) - set(attributes["container.image"], resource.attributes["container.image.name"])
# Parse JSON logs - merge_maps(cache, ParseJSON(body), "insert") where body != nil - set(body, cache["message"]) where cache["message"] != nil - set(attributes["log.level"], cache["level"]) where cache["level"] != nil - set(attributes["log.timestamp"], cache["timestamp"]) where cache["timestamp"] != nilNetwork Monitoring
# Add network inspectionreceivers: tcplog/network: listen_address: "0.0.0.0:2256" operators: - type: json_parser
processors: transform/network: log_statements: - context: log statements: - set(attributes["network.protocol"], "tcp") - set(attributes["monitoring.type"], "network")Kubernetes Integration
For Kubernetes environments, use a DaemonSet approach:
# kubernetes-docker-monitoring.yamlapiVersion: apps/v1kind: DaemonSetmetadata: name: last9-otel-collector namespace: monitoringspec: selector: matchLabels: name: last9-otel-collector template: metadata: labels: name: last9-otel-collector spec: serviceAccount: otel-collector containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.118.0 env: - name: K8S_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: LAST9_OTLP_ENDPOINT valueFrom: secretKeyRef: name: last9-credentials key: endpoint - name: LAST9_OTLP_AUTH_HEADER valueFrom: secretKeyRef: name: last9-credentials key: auth-header volumeMounts: - name: config mountPath: /etc/otel-collector-config.yaml subPath: config.yaml - name: docker-socket mountPath: /var/run/docker.sock readOnly: true volumes: - name: config configMap: name: otel-collector-config - name: docker-socket hostPath: path: /var/run/docker.sockTroubleshooting
Common Issues
-
No container metrics appearing:
- Verify Docker socket access:
ls -la /var/run/docker.sock - Check collector has root privileges
- Ensure container is in monitoring network
- Verify Docker socket access:
-
Missing logs:
- Verify logspout can access Docker socket
- Check containers are logging to stdout/stderr
- Ensure proper network connectivity
-
High resource usage:
- Adjust collection intervals
- Configure memory limits
- Use batch processing
Debug Mode
Enable debug logging:
# Check collector logsdocker logs last9-otel-collector -f
# Check logspout logsdocker logs last9-logspout -f
# Test connectivitydocker exec last9-otel-collector wget -qO- http://localhost:8889/Performance Tuning
# Optimize for high-volume environmentsprocessors: batch: send_batch_size: 50000 send_batch_max_size: 50000 timeout: 30s
memory_limiter: limit_mib: 1024 spike_limit_mib: 256Monitoring Capabilities
This integration provides:
- Container Metrics: CPU, memory, network, disk I/O, process counts
- Container Logs: All stdout/stderr output with metadata
- Host Metrics: System-level resource usage
- Container Lifecycle: Start, stop, restart events
- Network Traffic: Inter-container communication patterns
- Resource Limits: Utilization vs. configured limits
- Health Status: Container and service health checks
Best Practices
- Resource Management: Set appropriate memory limits for collector
- Network Configuration: Isolate monitoring traffic when possible
- Security: Use read-only Docker socket access
- Retention: Configure appropriate log retention policies
- Alerting: Set up alerts for high resource usage or container failures
Your Docker infrastructure will now provide comprehensive observability data to Last9, enabling detailed monitoring of containerized applications and infrastructure health.