EC2 Monitoring: A Practical Guide for AWS Engineers

Monitoring your EC2 instances shouldn’t be complicated or exhausting. Yet, too often, engineers find themselves troubleshooting issues in the middle of the night, searching for the root cause of an unexpected failure.

Whether you're managing a few instances or hundreds spread across multiple regions, effective EC2 monitoring helps you stay ahead of problems instead of constantly reacting to them. And if you've ever dealt with a critical alert at an inconvenient hour, you know how important that is.

This guide breaks down EC2 monitoring into clear, practical steps—no jargon, just straightforward advice to help you keep your systems running smoothly.

Why EC2 Monitoring Matters

EC2 monitoring isn't just about knowing if your instances are running—it's about understanding their behavior, predicting problems before they happen, and making sure you're not burning cash on underutilized resources.

Here's what proper EC2 monitoring gives you:

Early problem detection: Catch issues while they're minor annoyances, not full-blown outages
Performance insights: Know when it's time to scale up, down, or change instance types
Cost control: Identify instances that are costing more than they should
Security awareness: Spot unusual behavior that might indicate compromise
Sleep: Perhaps the most valuable commodity in tech

💡

If your EC2 instances rely on Redis, keeping an eye on its performance is just as important. Learn more in our Redis Metrics Monitoring guide.

Step-by-Step Process to Setting Up EC2 Monitoring

AWS provides basic monitoring out of the box—here's how to make the most of it.

Step 1: Enable Detailed Monitoring

By default, EC2 comes with basic monitoring that sends metrics to CloudWatch every 5 minutes. Detailed monitoring bumps this to every 1 minute.

# Enable detailed monitoring on a new instance
aws ec2 run-instances --image-id ami-0abcdef1234567890 --instance-type t2.micro --monitoring Enabled

# Enable detailed monitoring on an existing instance
aws ec2 monitor-instances --instance-ids i-1234567890abcdef0

Is it worth the extra cost? That depends. For production workloads, critical systems, or anything that needs rapid response to problems, absolutely. For dev environments or non-critical systems, maybe not.

Step 2: Set Up Basic CloudWatch Alarms

Now that you've got metrics flowing, let's make sure you're alerted when things go sideways:

Navigate to the CloudWatch console
Select "Alarms" then "Create Alarm"
Choose the EC2 instance and metric (e.g., CPU Utilization)
Set appropriate thresholds (e.g., CPU > 80% for 5 minutes)
Add notification actions (SNS topic, email, etc.)

Pro tip: Don't set thresholds too low, or you'll be drowning in false positives faster than you can say "alert fatigue."

Step 3: Create a CloudWatch Dashboard

A well-organized dashboard lets you spot patterns at a glance:

In CloudWatch, go to "Dashboards" and create a new one
Add widgets for your most important metrics:
- CPU Utilization
- Network In/Out
- Disk Read/Write Operations
- Status Check Failures

Arrange them logically—group similar instances together or organize by environment (prod, staging, dev).

💡

If your EC2 instances power APIs, keeping them reliable is crucial. Check out our Top 11 API Monitoring Tools to find the right solution.

The Essential EC2 Metrics You Should Monitor

Not all metrics are created equal. Here are the ones that matter:

System-Level Metrics

Metric	What It Tells You	Warning Signs
CPU Utilization	How hard your instance is working	Sustained periods above 80%
Memory Usage*	Available RAM (requires custom metric)	Consistently above 85%
Disk Space*	Free space on your volumes (requires custom metric)	Less than 20% free space
Network In/Out	Data transfer volume	Sudden spikes or drops

Health Metrics

Metric	What It Tells You	Warning Signs
Status Check (System)	Hardware/AWS issues	Any failures
Status Check (Instance)	OS-level problems	Any failures
Instance State	Whether the instance is running	Unexpected state changes

Load & Performance Metrics

Metric	What It Tells You	Warning Signs
EBS Volume Queue Length	Backup of I/O operations	Consistently above 1
CPU Credit Balance (for burstable instances)	Available burst capacity	Approaching zero
Network Packet Loss	Connection quality	Any non-zero values

💡

For more control over your EC2 monitoring, custom metrics can help track exactly what matters. Learn how to set them up in our AWS CloudWatch Custom Metrics Guide.

Advanced EC2 Monitoring

Basic metrics can only tell you so much. For real visibility, you need to dig deeper.

Step 1: Install the CloudWatch Agent

The CloudWatch agent lets you collect system-level metrics that aren't available by default:

# Install the CloudWatch agent on Amazon Linux 2
sudo yum install amazon-cloudwatch-agent -y

# Create a basic config file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

A minimal config should include:

Memory usage
Disk space utilization
Swap usage
Key process metrics

Step 2: Set Up Custom Metrics

Some things are specific to your application. Custom metrics help you track what matters to your business:

# Using the AWS CLI to publish a custom metric
aws cloudwatch put-metric-data --metric-name ActiveUsers --namespace MyApplication --value 42

# Or from within your application code using the AWS SDK
cloudwatch.putMetricData({
  MetricData: [
    {
      MetricName: 'ActiveUsers',
      Value: 42,
      Unit: 'Count'
    }
  ],
  Namespace: 'MyApplication'
})

Good candidates for custom metrics:

Application-specific counters (users, transactions, etc.)
Business metrics (checkout completions, signups)
Application health indicators (error rates, response times)

Step 3: Implement Log Monitoring

Metrics tell you what's happening; logs tell you why.

Set up metric filters to convert log events to metrics:

# Pattern to match PHP fatal errors
PHP Fatal error:

Create CloudWatch Logs Insights queries for common issues:

# Find error patterns
filter @message like /error|exception|failed|failure|timeout/i
| stats count(*) by bin(30m)

# Track specific events
filter @message like "Database connection"
| stats count(*) as connectionAttempts by bin(5m)

Configure the CloudWatch agent to collect logs:

{
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/syslog",
            "log_group_name": "system-logs",
            "log_stream_name": "{instance_id}"
          },
          {
            "file_path": "/var/log/nginx/error.log",
            "log_group_name": "nginx-error-logs",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  }
}

3 Popular EC2 Monitoring Tools

AWS's built-in tools are a good start, but third-party solutions offer more power and flexibility.

Here's how they stack up:

Last9: The Telemetry Data Platform for Cloud-Native Monitoring

Why Last9?

Trusted by industry leaders like Games24x7, CleverTap, and Replit.
Optimized for cloud-native environments, balancing performance, cost, and user experience.
Seamless integration with OpenTelemetry, Prometheus, and other observability tools.
Unifies metrics, logs, and traces, efficiently handling high-cardinality data.
Smart alerting and real-time insights via the Last9 Control Plane for proactive monitoring.

Best for:

Engineering teams managing complex distributed systems that require deep visibility without added complexity.

Probo Cuts Monitoring Costs by 90% with Last9

Datadog: The Feature-Rich Option

Strengths:

Comprehensive coverage across AWS services
Strong APM capabilities
Extensive integration library
Advanced dashboarding

Best for: Enterprises with diverse technology stacks and dedicated monitoring teams.

New Relic: The APM Specialist

Strengths:

Deep code-level visibility
Strong focus on application performance
Good EC2 resource monitoring
Robust alerting system

Best for: Development teams focused on application performance optimization.

Prometheus + Grafana: The Open-Source Combo

Strengths:

Complete control and customization
No per-host or per-metric fees
Powerful query language (PromQL)
Highly extensible

Best for: Budget-conscious teams with in-house monitoring expertise.

💡

If you're considering Datadog for EC2 monitoring, understanding its pricing is key. Get a detailed breakdown in our Datadog Pricing Guide.

How to Optimize Costs with Smarter EC2 Monitoring

Monitoring isn't just for reliability—it's for keeping your AWS bill in check too.

Identifying Underutilized Instances

Create a CloudWatch dashboard that highlights resource efficiency:

Add metrics for CPU utilization (average, min, max)
Include memory usage if you're using the CloudWatch agent
Set up a weekly report showing instances with consistently low utilization

For burstable instances (T2/T3/T4g), monitor credit balances. If they're always high, you might be overpaying.

Right-sizing Recommendations

Use AWS Cost Explorer's Resource Optimization to get automatic suggestions, or build your own logic:

# Pseudocode for basic right-sizing logic
for instance in ec2_instances:
    cpu_util = get_average_cpu_util(instance.id, period='2weeks')
    memory_util = get_average_memory_util(instance.id, period='2weeks')
    
    if cpu_util < 20 and memory_util < 30:
        recommend_downsize(instance)
    elif cpu_util > 80 or memory_util > 80:
        recommend_upsize(instance)

Automating Instance Scheduling

Not all instances need to run 24/7. Use EC2 monitoring data to identify patterns, then implement scheduling:

# Create a CloudWatch event rule to stop dev instances after hours
aws events put-rule --name "StopDevInstancesNightly" --schedule-expression "cron(0 20 ? * MON-FRI *)"

# Add a target to the rule
aws events put-targets --rule "StopDevInstancesNightly" --targets "Id"="1","Arn"="arn:aws:lambda:region:account-id:function:StopEC2Instances"

Performance Tuning Based on EC2 Monitoring Data

Monitoring becomes truly valuable when you use it to improve performance.

Step 1: Establish Performance Baselines

Before tuning, know what "normal" looks like:

Collect at least two weeks of data across various load conditions
Calculate percentiles (p50, p90, p99) for key metrics
Document these baselines for comparison

Step 2: Identify Bottlenecks

Use your monitoring data to spot constraints:

CPU-bound? Look for high CPU utilization but low memory/disk/network usage
Memory-bound? Watch for high swap usage or OOM errors in logs
I/O-bound? Check EBS volume queue length and I/O operations
Network-bound? Monitor network throughput against instance limits

Step 3: Implement and Verify Improvements

Make one change at a time and measure the impact:

Modify the potential bottleneck (instance type, EBS volume type, etc.)
Monitor the targeted metrics for 24-48 hours
Compare against your baseline
Document the improvement (or rollback if ineffective)

Real-world example: An e-commerce site was seeing slow response times during peak hours. EC2 monitoring showed CPU utilization spiking to 100%, while memory usage stayed below 40%. Upgrading from a compute-optimized to a general-purpose instance with more CPU power reduced response times by 62%.

💡

Choosing the right tools for EC2 monitoring is essential. Explore our Best Infrastructure Monitoring Tools to find the best fit for your needs.

Advanced EC2 Monitoring Techniques for Power Users

Try these advanced techniques:

Custom CloudWatch Composite Alarms

Instead of simple threshold-based alarms, create composite conditions:

# Create a composite alarm that triggers only when both CPU and memory are high
aws cloudwatch put-composite-alarm \
  --alarm-name HighResourceUtilization \
  --alarm-rule "(ALARM(HighCPUAlarm) AND ALARM(HighMemoryAlarm))"

This reduces false positives by ensuring multiple conditions are met before alerting.

EC2 Instance Group Monitoring

Monitor groups of related instances together:

Create a CloudWatch dashboard with metrics aggregated across instance groups
Set up alarms on the aggregate metrics

Use dimension math to compare environments:

SUM(SEARCH('{AWS/EC2,InstanceId} MetricName="CPUUtilization" InstanceType="t3.large"', 'Average'))

Synthetic Canary Monitoring

Don't just monitor the infrastructure—test the user experience:

Create a CloudWatch Synthetics canary that simulates user actions
Schedule it to run every few minutes
Alert on failures or performance degradation

# Create a simple canary using the AWS CLI
aws synthetics create-canary \
  --name api-test-canary \
  --artifact-s3-location S3Bucket=my-bucket,S3Key=canary/artifacts \
  --execution-role-arn arn:aws:iam::account-id:role/CanaryRole \
  --schedule Expression="rate(5 minutes)" \
  --run-config TimeoutInSeconds=60 \
  --code Handler=index.handler,Script=$(base64 -w 0 ./canary-script.js)

💡

Monitoring EC2 instances is just one piece of the puzzle. Synthetic Monitoring helps you test performance and catch issues before users do.

EC2 Monitoring Strategy: A Practical Example

Let's put it all together with a practical example for a mid-sized web application:

The Architecture

8 EC2 instances across 2 AZs (t3.large)
Application tier running Node.js
RDS for database
ElastiCache for session storage

The Monitoring Setup

Basic Infrastructure Monitoring
- CloudWatch detailed monitoring enabled
- Status checks with automated recovery actions
- EBS volume performance metrics
Application-Specific Monitoring
- CloudWatch agent collecting custom metrics:
  - Request rate, response time, error rate
  - Node.js event loop lag
  - Heap usage, garbage collection stats
- Log monitoring for error patterns
User Experience Monitoring
- Synthetic transactions for critical user flows
- Real user monitoring via client-side instrumentation
Alerting Strategy
- P1 alerts (immediate response): Instance failures, severe performance degradation
- P2 alerts (business hours): Elevated error rates, resource constraints
- Weekly performance reviews using collected data

The Results

72% reduction in mean time to detection (MTTD)
45% fewer false-positive alerts
30% improvement in instance resource utilization
Zero unexpected outages in 6 months

Conclusion

Start with the basics, gradually add complexity as needed, and always tie your monitoring strategy to business outcomes.

Remember these key principles:

Monitor what matters to your users and your business
Alert on symptoms, not causes
Automate routine monitoring tasks
Use monitoring data to drive continuous improvement
Choose tools that fit your team's skills and workflow

💡

What's your biggest EC2 monitoring challenge? Drop it in our Discord community where we talk cloud monitoring, trading war stories, and sharing those hard-earned lessons that never make it into the AWS documentation.

FAQs

What's the difference between basic and detailed EC2 monitoring?

Basic monitoring sends metrics to CloudWatch every 5 minutes and is free. Detailed monitoring increases the frequency to every 1 minute and costs extra, but provides more timely data for alerting and auto-scaling.

Do I need to install anything on my EC2 instances for monitoring?

For basic metrics like CPU and network, no. For memory, disk space, and application-specific metrics, you'll need to install the CloudWatch agent or a third-party monitoring agent.

How much does EC2 monitoring cost?

It varies based on your setup. Basic CloudWatch metrics are free, detailed monitoring costs about $2.10 per instance per month, and custom metrics cost $0.30 per metric per month. Third-party tools typically charge per host or per metric.

Can I monitor Windows EC2 instances the same way as Linux?

Yes, though the CloudWatch agent configuration differs slightly. Windows instances also have some Windows-specific metrics like available memory and page file usage.

What's the best EC2 monitoring tool for a small startup?

Start with CloudWatch and the CloudWatch agent for basic needs. As you grow, consider Last9 for its balance of powerful features and user-friendliness without requiring a dedicated monitoring team.

How do I monitor instances across multiple AWS accounts?

Use AWS Organizations and CloudWatch cross-account observability, or implement a third-party solution like Last9 that supports multi-account monitoring natively.

Can I reduce CloudWatch costs while still maintaining good monitoring?

Yes, by being selective about which metrics you collect and at what frequency. Focus on critical metrics at 1-minute intervals and less important ones at 5-minute intervals. Use metric math instead of custom metrics where possible.

How do I correlate EC2 issues with application problems?

Implement distributed tracing with services like X-Ray or third-party APM tools that can connect infrastructure metrics to application performance. This helps identify whether application slowness stems from code issues or resource constraints.

EC2 Monitoring: A Practical Guide for AWS Engineers

Contents

Why EC2 Monitoring Matters

Step-by-Step Process to Setting Up EC2 Monitoring

Step 1: Enable Detailed Monitoring

Step 2: Set Up Basic CloudWatch Alarms

Step 3: Create a CloudWatch Dashboard

The Essential EC2 Metrics You Should Monitor

System-Level Metrics

Health Metrics

Load & Performance Metrics

Advanced EC2 Monitoring

Step 1: Install the CloudWatch Agent

Step 2: Set Up Custom Metrics

Step 3: Implement Log Monitoring

3 Popular EC2 Monitoring Tools

Last9: The Telemetry Data Platform for Cloud-Native Monitoring

Why Last9?

Best for:

Datadog: The Feature-Rich Option

New Relic: The APM Specialist

Prometheus + Grafana: The Open-Source Combo

How to Optimize Costs with Smarter EC2 Monitoring

Identifying Underutilized Instances

Right-sizing Recommendations

Automating Instance Scheduling

Performance Tuning Based on EC2 Monitoring Data

Step 1: Establish Performance Baselines

Step 2: Identify Bottlenecks

Step 3: Implement and Verify Improvements

Advanced EC2 Monitoring Techniques for Power Users

Custom CloudWatch Composite Alarms

EC2 Instance Group Monitoring

Synthetic Canary Monitoring

EC2 Monitoring Strategy: A Practical Example

The Architecture

The Monitoring Setup

The Results

Conclusion

FAQs

What's the difference between basic and detailed EC2 monitoring?

Do I need to install anything on my EC2 instances for monitoring?

How much does EC2 monitoring cost?

Can I monitor Windows EC2 instances the same way as Linux?

What's the best EC2 monitoring tool for a small startup?

How do I monitor instances across multiple AWS accounts?

Can I reduce CloudWatch costs while still maintaining good monitoring?

How do I correlate EC2 issues with application problems?

Contents

Do More with Less

Handcrafted Related Posts

How to Set Up Real User Monitoring

Elasticsearch with Python: A Detailed Guide to Search and Analytics

Cloud Log Management: A Developer's Guide to Scalable Observability