In cloud environments, logs are often spread across numerous services, making it difficult to track down issues or gather meaningful insights. For AWS users, this challenge can become especially time-consuming. Centralized logging in AWS helps by bringing all your logs into a single platform, making management and analysis easier.
This guide covers everything DevOps engineers need to know about setting up centralized logging in AWS, from the basics to advanced setups, along with troubleshooting tips for resolving common issues.
What is AWS Centralized Logging?
AWS centralized logging is the practice of collecting, storing, and analyzing logs from multiple AWS services in a single location. Instead of jumping between CloudWatch, S3, and various application logs, you get one unified view of your entire infrastructure.
The core benefit? You can spot patterns, troubleshoot faster, and get better insights across your entire AWS environment without the headache of context-switching between different services.
Benefits of Centralized Logging for DevOps Teams
The typical AWS infrastructure generates tons of logs from EC2 instances, Lambda functions, API Gateway, RDS, and dozens of other services. Without centralization, you're essentially:
- Wasting time hunting down logs across different services
- Missing connections between related events
- Struggling to get a complete picture during incidents
- Finding it impossible to set up meaningful alerts
According to AWS, teams using centralized logging typically cut their incident response times by 30-50%. That's not just faster fixes—it's better uptime and happier customers.
Choosing the Right AWS Logging Architecture for Your Needs
Before diving into architectures, let's review the key AWS services that generate logs you'll want to centralize:
AWS Service | Log Type | Description |
---|---|---|
EC2 | Application logs, system logs | Logs from your applications and the OS |
CloudTrail | API activity logs | Records all AWS API calls made in your account |
VPC Flow Logs | Network traffic logs | Captures IP traffic going to and from network interfaces |
AWS Config | Configuration change logs | Records configuration changes to your AWS resources |
AWS WAF | Web application firewall logs | Logs traffic patterns and blocked attacks |
Load Balancers | Access logs | Records client connections and requests |
RDS | Database logs | Includes error logs, audit logs, and slow query logs |
Lambda | Function execution logs | Records function invocation and execution details |
Now, let's look at the three most popular approaches for AWS centralized logging:
CloudWatch Logs + Insights: The AWS Native Solution
The simplest approach uses AWS's built-in services:
- All AWS services send logs to CloudWatch Logs
- CloudWatch Log Insights provides the search and analysis layer
- CloudWatch Dashboards visualize the important metrics
Pros: Native AWS integration, minimal setup, works with most AWS services out of the box
.Cons: Limited retention options, can get expensive at scale, less flexible for custom analysis
ELK Stack on AWS: The Open Source Powerhouse
For more power and flexibility:
- AWS services send logs to a collection pipeline (Logstash or Fluentd)
- Logs are processed and stored in Elasticsearch
- Kibana provides visualization and search capabilities
Pros: Highly customizable, powerful search capabilities, great visualizations. Cons: More complex to set up and maintain, requires separate infrastructure
S3 + Athena + QuickSight: The Cost-Effective Data Lake Approach
The data lake approach:
- Logs are stored in S3 buckets (can be automated with Log Forwarding)
- AWS Athena runs SQL queries against the log data
- QuickSight creates dashboards and visualizations
Pros: Cost-effective for long-term storage, works well for compliance, scales infinitely
Cons: Not real-time, requires SQL knowledge, more setup work
Step-by-Step AWS Centralized Logging Setup
Let's walk through implementing a CloudWatch-based centralized logging solution, which offers the best balance of simplicity and power for most teams.
Step 1: Configuring Comprehensive Log Collection Across Services
First, ensure all your AWS services are sending logs to CloudWatch:
# Example: Configure CloudWatch agent on EC2 instance
sudo yum install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
For Lambda functions, update your function configuration:
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: function/
Handler: app.handler
Runtime: nodejs14.x
Tracing: Active
Policies:
- CloudWatchLambdaInsightsExecutionRolePolicy
Layers:
- !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:14"
Step 2: Optimizing Log Retention for Compliance and Cost Balance
Manage your log retention periods to balance cost and compliance:
# Set 30-day retention for production logs
aws logs put-retention-policy --log-group-name "/aws/lambda/production-api" --retention-in-days 30
# Set 7-day retention for development logs
aws logs put-retention-policy --log-group-name "/aws/lambda/development-api" --retention-in-days 7
Step 3: Implementing Real-time Log Processing with Subscriptions
To forward logs to other services:
# Create subscription filter to send logs to a Lambda processor
aws logs put-subscription-filter \
--log-group-name "/aws/lambda/api-gateway-logs" \
--filter-name "ErrorProcessor" \
--filter-pattern "ERROR" \
--destination-arn "arn:aws:lambda:us-east-1:123456789012:function:LogProcessor"
Step 4: Building Actionable Dashboards for Operational Visibility
Build dashboards for your most important metrics:
aws cloudwatch put-dashboard \
--dashboard-name "ServiceHealthOverview" \
--dashboard-body '{
"widgets": [
{
"type": "log",
"x": 0,
"y": 0,
"width": 24,
"height": 6,
"properties": {
"query": "SOURCE '/aws/lambda/production-api' | fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20",
"region": "us-east-1",
"title": "Recent API Errors",
"view": "table"
}
}
]
}'
Step 5: Scaling to Enterprise: Multi-Account Logging Strategies
For multi-account setups:
- In the destination account, create a CloudWatch Logs destination
- Set up IAM permissions to allow the source account to write to it
- Create subscription filters in the source accounts pointing to the destination
Here's a simplified version of the IAM policy required:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SOURCE_ACCOUNT_ID:root"
},
"Action": "logs:PutSubscriptionFilter",
"Resource": "arn:aws:logs:REGION:DESTINATION_ACCOUNT_ID:destination:DESTINATION_NAME"
}
]
}
Advanced Techniques to Take Your AWS Logging to the Next Level
Once you have the basics running, consider these advanced techniques:
Leveraging Kinesis Firehose for Flexible Log Delivery and Storage
For more flexibility in where your logs end up, Amazon Kinesis Data Firehose is a game-changer:
# Create a Firehose delivery stream to S3
aws firehose create-delivery-stream \
--delivery-stream-name "centralized-logs-to-s3" \
--delivery-stream-type DirectPut \
--s3-destination-configuration \
"RoleARN=arn:aws:iam::123456789012:role/FirehoseS3DeliveryRole,\
BucketARN=arn:aws:s3:::centralized-logs-bucket,\
Prefix=logs/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/,\
ErrorOutputPrefix=errors/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/!{firehose:error-output-type}/,\
BufferingHints={SizeInMBs=128,IntervalInSeconds=300}"
# Set up CloudWatch to send logs to Firehose
aws logs put-subscription-filter \
--log-group-name "/aws/lambda/production-api" \
--filter-name "SendToFirehose" \
--filter-pattern "" \
--destination-arn "arn:aws:firehose:us-east-1:123456789012:deliverystream/centralized-logs-to-s3" \
--role-arn "arn:aws:iam::123456789012:role/CWLtoKinesisFirehoseRole"
Protecting Your Log Data: Essential Security Measures and Encryption
Protect your log data with these critical security measures:
- Set up CloudTrail for logging access to your log data
Implement strict IAM policies for log access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:StartQuery",
"logs:GetQueryResults"
],
"Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/Department": "SecurityTeam"
}
}
}
]
}
Encrypt logs at rest:
# Enable encryption on CloudWatch Log Group
aws logs create-log-group \
--log-group-name "/aws/lambda/secure-service" \
--kms-key-id "arn:aws:kms:us-east-1:123456789012:key/abcd1234-ab12-cd34-ef56-abcdef123456"
Extracting Insights with CloudWatch Logs Insights
CloudWatch Logs Insights lets you run powerful queries across your logs:
fields @timestamp, @message
| filter @message like /Exception/
| parse @message "user: *, action: *" as user, action
| stats count(*) as exceptionCount by user, action
| sort exceptionCount desc
| limit 10
This query finds the top 10 user/action combinations causing exceptions.
Automated Responses with EventBridge
Set up automated responses to specific log patterns:
- Create a CloudWatch Logs Metric Filter to detect patterns
- Set a CloudWatch Alarm on that metric
- Configure an EventBridge rule to trigger automated actions
For example, automatically restart a service when it logs specific error messages:
For example, automatically restart a service when it logs specific error messages:
# Create metric filter
aws logs put-metric-filter \
--log-group-name "API-Gateway-Execution-Logs" \
--filter-name "5xxErrorFilter" \
--filter-pattern '{ $.status = 5* }' \
--metric-transformations \
metricName=5xxErrors,metricNamespace=APIGateway,metricValue=1
# Create alarm
aws cloudwatch put-metric-alarm \
--alarm-name "APIGateway5xxAlarm" \
--metric-name "5xxErrors" \
--namespace "APIGateway" \
--threshold 5 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--period 60 \
--statistic Sum \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:AlertTopic"
How to Enhance Visibility with Third-Party Platforms
While AWS's native tools are useful, specialized observability platforms provide more advanced capabilities.
Our platform, Last9, is designed specifically for high-cardinality observability at scale—essential as your AWS infrastructure expands. Unlike more expensive alternatives, we offer a managed observability solution that fits within your budget.
What sets our platform apart in AWS centralized logging is its integration with OpenTelemetry and Prometheus. This allows you to unify your metrics, logs, and traces for complete visibility.
Many teams rely on Last9 for real-time insights with correlated monitoring and alerting—something proven valuable for companies like Probo, CleverTap, and Replit that require reliable monitoring for high-scale operations.
Other options that integrate well with AWS include:
- Grafana
- Dynatrace
- Sumo Logic
Optimization Strategies for AWS Centralized Logging
Logging costs can add up quickly. Here's how to keep them in check:
Strategic Log Retention: Balancing Compliance Requirements with Costs
Not all logs are equally valuable:
Log Type | Suggested Retention | Reasoning |
---|---|---|
Security Audit Logs | 1+ years | Compliance requirements, breach investigations |
Production Error Logs | 30-90 days | Troubleshooting, pattern analysis |
Debug/Verbose Logs | 3-7 days | Short-term debugging only |
Access Logs | 30 days | Traffic analysis, security investigations |
Reducing Volume While Maintaining Visibility
For busy services, consider sampling logs instead of recording everything:
import random
def lambda_handler(event, context):
# Only log 10% of successful events
if event['success'] == True and random.random() > 0.1:
return
# Always log errors
print(f"Event ID: {event['id']}, Status: {event['status']}")
Log Compression Techniques for Long-term Archiving
When storing logs in S3, use compression:
# Enable log compression in CloudWatch
aws logs put-retention-policy \
--log-group-name "/aws/lambda/high-volume-service" \
--retention-in-days 1
# Set up Lambda to process and compress logs to S3
# (Include Lambda function code that compresses logs before writing to S3)
Troubleshooting Your AWS Centralized Logging Setup
Even well-designed logging systems run into issues. Here's how to tackle the most common problems:
Solving Disappearing Data: Fixing Missing Log Issues
If logs aren't showing up where expected:
- Check IAM permissions for the logging services
- Verify the CloudWatch agent is running (for EC2 instances)
- Look for throttling in the AWS CloudTrail logs
- Confirm log group names match exactly what you're querying
Example fix for IAM permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
Reducing Log Delivery Delays: Addressing High Latency Problems
If logs are delayed:
- Look for network bottlenecks between services
- Check if you're hitting CloudWatch API limits
- Consider batching log writes more efficiently
- Monitor CloudWatch service health in the AWS status page
Standardizing Formats and Fixing Parsing Errors
When logs aren't parsing correctly:
- Standardize log formats across services
- Use structured logging (JSON format) where possible
- Create test cases for your parsing logic
- Update filter patterns to match the actual format
Example of good structured logging in Node.js:
const logger = require('pino')();
function processRequest(req) {
logger.info({
requestId: req.id,
user: req.user,
action: req.action,
duration: performance.now() - req.startTime,
status: "success"
});
}
Controlling Unexpected Logging Costs
If your logging costs spike:
- Look for runaway logging in specific services
- Check for recursive logging patterns (logs about logs)
- Review and adjust retention periods
- Implement log sampling for high-volume, low-value logs
How to Set Up Effective Logging Alerts
Proactive monitoring is crucial for effective operations:
Creating Smart Alert Thresholds
Set up graduated alerting based on severity:
Log Pattern | Threshold | Action | Recovery Time |
---|---|---|---|
5xx Errors | >10 in 5 min | Slack notification | Low: Investigate during work hours |
5xx Errors | >50 in 5 min | PagerDuty alert | Medium: Investigate within 1 hour |
5xx Errors | >200 in 5 min | Incident declared | High: Immediate response |
Auth Failures | >20 in 10 min | Security team alert | Medium: Investigate within 1 hour |
Latency >2s | >5% of requests | Performance team notification | Low: Investigate during work hours |
Setting Up CloudWatch Alarms for Log Metrics
# Create metric filter for critical errors
aws logs put-metric-filter \
--log-group-name "/aws/lambda/payment-processor" \
--filter-name "CriticalErrorFilter" \
--filter-pattern "ERROR" \
--metric-transformations \
metricName=CriticalErrors,metricNamespace=PaymentService,metricValue=1
# Create alarm with SNS notification
aws cloudwatch put-metric-alarm \
--alarm-name "PaymentProcessorCriticalErrors" \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--metric-name "CriticalErrors" \
--namespace "PaymentService" \
--period 300 \
--statistic Sum \
--threshold 5 \
--alarm-description "Alarm when 5 or more critical errors occur within 5 minutes" \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:AlertsTopic"
Conclusion
To wrap things up, setting up centralized logging in AWS helps bring clarity to your log management, making it easier to troubleshoot and monitor your cloud environment.
By combining the right AWS services and configurations, you’ll simplify the process and gain valuable insights. As your system grows, centralized logging will save you time and keep things running smoothly.
FAQs
Q: How much does AWS centralized logging cost? A: The cost varies based on volume and retention. For a medium-sized application, expect $200-500/month using native AWS services. Costs can be higher with third-party tools, but often come with more capabilities.
Q: Can I use AWS centralized logging for compliance requirements? A: Yes, with proper configuration. For regulations like HIPAA, PCI-DSS, or SOC2, you'll need to set appropriate retention periods, encryption settings, and access controls.
Q: How do I handle log data across multiple AWS regions? A: You can either set up regional log aggregation points or forward all logs to a central region. The best approach depends on your latency requirements and data residency needs.
Q: What's the best way to handle PII in logs? A: Implement log scrubbing before centralization. Use regex patterns to identify and mask sensitive data like credit cards, emails, and personal information.
Q: How do I monitor the health of my logging system itself? A: Set up metrics on log volume, ingestion rates, and query performance. Alert on unexpected drops in log volume, which often indicate logging failures.
Q: Can serverless applications use the same centralized logging approach? A: Yes, though the implementation differs slightly. Lambda functions automatically integrate with CloudWatch Logs, but you may need custom code for forwarding to other destinations.