Docker Swarm's logging model shifts from individual container logs to service-level aggregation.
The docker service logs
command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.
This blog talks about how Swarm aggregates logs across replicas, setting up centralized logging for distributed services, and the practical trade-offs between local and remote log access in production clusters.
How Docker Swarm Handles Logs
Swarm doesn't maintain separate logs, but appends its data to existing logs (such as service names and replica numbers). When you run docker service logs nginx_web
, you're seeing output from all replicas of that service, not just one container.
Each log entry includes metadata that helps you track which replica generated the message:
# Sample output from docker service logs
nginx_web.1.abc123def456@node-1 | 192.168.1.10 - - [01/Jul/2025:14:30:15 +0000] "GET / HTTP/1.1" 200 612
nginx_web.2.ghi789jkl012@node-2 | 192.168.1.11 - - [01/Jul/2025:14:30:16 +0000] "GET /api HTTP/1.1" 200 128
The format includes the service name, replica number, task ID, and the node where the container is running. This metadata becomes crucial when you're tracking down issues in a multi-replica service.
Essential Log Commands for Docker Swarm Services
The docker service logs
command works with service names, service IDs, or specific task IDs. This feature became available in Docker 17.03, marking a significant improvement in Swarm log accessibility:
# View logs for entire service
docker service logs web_service
# View logs for specific task
docker service logs web_service.1.abc123def456
# Follow logs in real-time
docker service logs --follow web_service
# View last 50 lines with timestamps
docker service logs --tail 50 --timestamps web_service
This command is only functional for services that use the json-file
or journald
logging driver. If you're using other logging drivers like fluentd
or syslog
, you'll need to set up your log aggregation infrastructure separately.
Filter Logs by Time and Task
Time-based filtering helps you narrow down logs to specific incidents:
# Logs since 2 hours ago
docker service logs --since 2h web_service
# Logs from specific timeframe
docker service logs --since 2025-07-01T10:00:00 --until 2025-07-01T11:00:00 web_service
# Combine with other filters
docker service logs --since 1h --tail 100 --timestamps web_service
You can also target specific replicas when you need to isolate issues:
# First, identify the task ID
docker service ps web_service
# Then view logs for that specific task
docker service logs web_service.1.abc123def456
The practical implication when you're running these commands: if you're troubleshooting a service with many replicas spread across multiple nodes, there's network overhead in aggregating those logs.
What Happens to Logs Across Nodes in Docker Swarm
Docker does cache the logs because it gathers the stdout and stderr prints. But it will cache the logs on the Docker engine where the service is running. This means when you run docker service logs
from a manager node, it may need to fetch logs over the network from worker nodes.
The practical implication when you're running these commands: if you're troubleshooting a service with many replicas spread across multiple nodes, there's network overhead in aggregating those logs.
When you do docker service logs from a machine where the containers of the said service are not running, Docker always pulls the log from the machine over the network, and it never caches.
For high-volume logging scenarios, this can become a bottleneck. Teams often pair service logs with centralized logging solutions to avoid repeatedly pulling logs over the network.
docker logs --tail
explains how to retrieve just what you need.Steps to Set Up Centralized Logging
To centralize your logs, each node in the swarm will need to be configured to forward both daemon and container logs to the destination. You can configure this at the daemon level or per-service.
At the daemon level, configure /etc/docker/daemon.json
:
{
"log-driver": "syslog",
"log-opts": {
"syslog-address": "udp://log-collector:514",
"tag": "{{.Name}}/{{.ID}}"
}
}
Or configure individual services with specific logging drivers:
version: '3.8'
services:
web:
image: nginx
deploy:
replicas: 3
logging:
driver: "fluentd"
options:
fluentd-address: "log-collector:24224"
tag: "web.{{.Name}}"
You can also configure logging at service creation time using the Docker CLI:
# Create service with GELF logging driver
docker service create \
--log-driver=gelf \
--log-opt gelf-address=udp://your.gelf.ip.address:port \
--log-opt tag="YourIdentifier" \
--name web_service \
nginx
# Update existing service with new logging configuration
docker service update \
--log-driver=gelf \
--log-opt gelf-address=udp://graylog:12201 \
web_service
This configuration gives you the flexibility to route different services to different log destinations based on their requirements.
Log Storage and Rotation
By default, no log-rotation is performed. As a result, log-files stored by the default json-file
logging driver can cause a significant amount of disk space to be used for containers that generate much output.
Configure log rotation to prevent disk space issues:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
For other situations, the local
logging driver is recommended as it performs log-rotation by default, and uses a more efficient file format.
Got it! Here's a more grounded and precise version of the example, with a professional yet approachable tone:
Example: Debugging an Nginx Container Crash in a Distributed Environment
Consider a setup where Nginx is deployed as part of a Docker Swarm service, fronted by a load balancer. One of the replicas fails due to a configuration issue, but without centralized monitoring or logging, the only indication is that some requests start timing out.
Step-by-Step Investigation:
- Initial Symptoms
Monitoring dashboards show intermittent 504 Gateway Timeout errors. A deeper look at service-level logs reveals that one of the Nginx replicas has stopped emitting log lines. - Observe Log Timeline
A typical sequence of events might look like:- 10:05:00 → Service logs from one replica stop.
- 10:05:02 → Daemon logs report the container crash.
- 10:05:04 → Swarm initiates rescheduling.
- 10:05:08 → A new container instance starts and emits startup logs.
Review Daemon Logs on the Node
On the node where the task failed, check the Docker daemon logs:
journalctl -u docker.service
These logs confirm that the container exited due to a config error and that Swarm initiated a task reschedule.
Access Container Logs
Identify the failed task’s container ID and inspect its logs:
docker logs <container-id>
You find:
nginx: [emerg] "server" directive is not allowed here in /etc/nginx/nginx.conf:23
This indicates a misconfiguration in the Nginx config file.
Inspect the Service Status
Run the following to check task states:
docker service ps nginx-service
One of the tasks is marked as FAILED
, with an exit code indicating a runtime error.
Why This Pattern is Important
Debugging distributed systems often requires correlating information from multiple sources:
- Service logs: To understand application-level errors.
- Node-level metrics: To detect resource issues that might cause instability.
- Docker daemon logs: To trace container lifecycle and orchestration events.
- Centralized logging: To correlate events across services and nodes without manual log aggregation.
This layered approach makes it easier to spot patterns, like log gaps during replica failure, and trace them back to their root cause.
Integrate with Observability Tools
At Last9, we provide a cost-effective, fully managed observability platform built to handle high-cardinality telemetry at scale. We integrate seamlessly with OpenTelemetry and Prometheus, so your metrics, logs, and traces aren’t scattered across tools or lost in translation.
Whether you're running a single-node setup or a multi-node Docker Swarm cluster, we make it easier to correlate service-level logs, node metrics, and orchestration events, without surprises in cost or visibility gaps.
Teams at Probo, CleverTap, and Replit trust us to monitor their distributed applications, debug incidents faster, and keep their systems reliable.
Need to plug into your existing stack? You can easily pair Last9 with:
- Grafana Loki for log aggregation
- Fluentd or Fluent Bit for log forwarding
- Prometheus for metrics collection
The key is picking tools that work well with Docker’s logging drivers and that can keep up with the distributed, ephemeral nature of containers and services in production.
Common Log Troubleshooting Issues (and How to Fix Them)
1. Missing or Incomplete Logs
Symptom: You don’t see expected log lines, or logs are just... gone.
Checklist:
- Check your logging driver: Run
docker inspect <container>
and verify theLogConfig
section. Most setups usejson-file
, but if you're usingnone
or a custom driver (like Fluentd or syslog), logs may not appear as expected. - Verify stdout/stderr usage: Your application should be writing logs to standard output (
stdout
) and standard error (stderr
). Writing logs to files inside the container (/var/log/...
) bypasses Docker's logging pipeline. - Inspect container behavior: If your logs are buffered (e.g., due to Python’s I/O buffering), add
PYTHONUNBUFFERED=1
or flush output after writes. - Look for file permission issues: If using mounted volumes for logs, the container user might lack write access.
2. Network Timeouts When Fetching Logs
Symptom: docker service logs
hangs or fails, especially on busy services.
Checklist:
- Use
--tail
to limit logs: Avoid fetching everything. Trydocker service logs --tail 100 <service>
to retrieve only the last 100 lines. - Target specific tasks: Instead of querying the whole service, use
docker service ps <service>
to find a task ID and thendocker logs <task-container-id>
.
Investigate log size: Huge log files can choke retrieval. Set max-size
and max-file
options in the log driver to enable rotation:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
3. Delayed or Inconsistent Logs in Aggregation Systems
Symptom: Logs show up late in your centralized logging tool (e.g., Loki, ELK, FluentBit).
Checklist:
- Know where your logs are coming from: If you're accessing logs from a manager node, but the service runs on a worker, logs must hop over the network first. Expect latency.
- Use a proper log forwarder: Tools like Fluentd, FluentBit, or Logspout can forward logs from each node directly to your aggregator.
- Avoid relying solely on
docker service logs
: It's better for debugging than for real-time observability. - Centralize and standardize: Use a centralized logging architecture where each node sends logs independently to a backend—this removes the bottleneck of fetching logs over Docker’s internal plumbing.
Use Logs to Monitor Service Health
To complement throughput metrics, log patterns give you more context around service behavior. You can watch for:
- Error rate changes across replicas
- Startup/shutdown patterns when services scale
- Request distribution across service instances
- Performance degradation before it shows up in metrics
Teams often pair volume metrics with timing metrics to spot stuck requests or failing health checks earlier.
Final Thoughts
Docker Swarm logs provide service-level visibility that's essential for distributed applications. The built-in aggregation works well for smaller deployments, but production environments typically need centralized logging for better performance and retention.
FAQs
Q: Can I view logs from stopped services?
A: No, docker service logs
only works with active services. Once a service is removed, its logs are no longer accessible through this command.
Q: How do I view logs from all services at once?
A: There's no single command to aggregate logs from multiple services. You'll need to use a centralized logging solution or run separate docker service logs
commands for each service.
Q: What happened to Docker logs before version 17.03?
A: Before Docker 17.03, docker service logs
wasn't available. Teams had to rely on centralized logging solutions or SSH into individual nodes to access container logs using docker logs
.
Q: Why are my logs showing up with different timestamps?
A: Each node in your Swarm may have slightly different system times. Consider using NTP synchronization across your cluster nodes for consistent timestamps.
Q: Can I search through service logs?
A: The docker service logs
command doesn't have built-in search functionality. You can pipe the output to grep
or use a log management tool for more advanced search capabilities.
Q: How do I handle logs from services with many replicas?
A: Use the --tail
option to limit output, or target specific task IDs instead of the entire service to reduce network overhead and improve performance.