Getting Started with Elastic Load Balancer (ELB) Metrics

As your infrastructure grows, keeping track of what your elastic load balancers are doing isn’t just useful—it’s necessary. Without clear visibility into this layer, it’s easy to miss bottlenecks, uneven traffic distribution, or failing targets.

This guide walks through the key ELB metrics that matter when you're running real workloads in production.

Elastic Load Balancer Statistics: Core Metrics Explained

Elastic load balancer statistics are the data points that show how your load balancers are performing. They're the numbers that tell you whether your traffic is flowing smoothly or if you've got bottlenecks that need attention.

These stats cover everything from request counts and latency to error rates and connection details. Think of them as your load balancer's vital signs—they let you know if things are healthy or if something's about to break.

💡

For a broader view of network behavior alongside ELB stats, you might find this guide on how to view and understand VPC Flow Logs useful.

Why Proactive ELB Statistics Tracking is Critical for System Reliability

You can't fix what you don't measure, and with load balancers sitting at the gateway of your infrastructure, flying blind here is particularly risky. Here's why tracking these numbers is worth your time:

Spot issues before users do – See performance drops before they affect your users
Right-size your resources – Know exactly when to scale up or down based on actual usage patterns
Budget better – Understand traffic patterns to predict future costs and avoid surprise bills
Sleep easier – Set up intelligent alerts that only wake you up when there's a real problem
Improve mean time to resolution – When incidents occur, having historical data speeds up troubleshooting
Optimize for peak traffic – Prepare your infrastructure for high-demand periods by understanding past patterns

Essential Elastic Load Balancer Metrics That Reveal System Health

Let's break down the stats that deserve your attention:

Crucial Request and Connection Metrics for Traffic Analysis

Metric	What It Tells You	Why It Matters	Typical Warning Signs
RequestCount	Total number of requests processed per period	Traffic patterns and capacity planning	Sudden 2x+ increases without explanation
ActiveConnectionCount	Number of concurrent connections currently open	Real-time load assessment and connection pool health	Steady climb without corresponding increase in requests
NewConnectionCount	New connections established per period	Growth trends and sudden spikes	Rapid oscillation patterns indicating connection issues
RejectedConnectionCount	Connections refused due to capacity limits	Early warning of capacity issues	Any non-zero values during normal operation
SurgeQueueLength	Requests waiting for processing	Backlog size and potential timeout risks	Consistently above zero with increasing trend

These metrics show you the raw volume your load balancers are handling. Sudden spikes might indicate legitimate traffic surges (like during a marketing campaign), but they could also signal potential DDoS attacks or misconfigured clients hammering your endpoints.

Pay special attention to the ratio between ActiveConnectionCount and RequestCount. A high number of connections with low request counts often indicates connection pooling issues in your clients.

💡

If you're setting up AWS traffic routing end to end, this CloudFront basics and setup guide can help you connect the dots.

Performance Metrics That Impact User Experience

Metric	What It Tells You	Why It Matters	Target Values
Latency (overall)	Total time to process requests end-to-end	Direct impact on user-perceived performance	Web: <200ms, API: <100ms
TargetResponseTime	Time spent waiting for backend servers	Backend service health and database performance	Should be 70-80% of total latency
ProcessedBytes	Data volume processed by the load balancer	Bandwidth usage planning and cost forecasting	Monitor for sudden changes vs. historical patterns
TLS Negotiation Time	Time spent establishing secure connections	Security overhead impact	Should be <50ms with session resumption enabled
RequestProcessingTime	Time the load balancer spends processing	ELB efficiency and potential configuration issues	Should be minimal (<10ms)

Latency is particularly crucial—users notice when things get slow. Studies show that each 100ms of added latency can reduce conversion rates by up to 7%. A sudden increase here means it's time to start investigating immediately.

For e-commerce or financial applications, you should track the 95th and 99th percentile latency values, not just averages. A smooth average can hide terrible experiences for a significant portion of your users.

Health and Error Metrics: Your Early Warning System

These metrics serve as canaries in the coal mine for your application health:

Metric	What It Tells You	Warning Threshold	Critical Threshold
HTTPCode_ELB_4XX_Count	Client errors (bad requests)	>5% of total requests	>10% of total requests
HTTPCode_ELB_5XX_Count	Server errors (your problem)	>0.1% of total requests	>1% of total requests
HTTPCode_Target_2XX_Count	Successful responses	<90% of total requests	<80% of total requests
HealthyHostCount	Number of backend servers passing health checks	<90% of total hosts	<70% of total hosts
UnhealthyHostCount	Number of failing backend servers	>10% of total hosts	>30% of total hosts
FailedHealthChecks	Number of individual health check failures	Any consistent pattern	Steady increase over time

A sharp rise in 5XX errors typically means something's broken in your backend services. But don't ignore 4XX errors – a sudden increase often indicates a client library issue or API change that's causing compatibility problems.

For high-traffic systems, you should track the ratio of errors to total requests rather than absolute numbers. A system handling millions of requests can have thousands of errors while still being "healthy" from a percentage perspective.

💡

Looking at load balancer stats is a good start, but issues can also crop up downstream—like DynamoDB throttling during traffic spikes.

How to Set Up ELB Statistics Collection

Getting these metrics isn't complicated, but doing it right requires attention to detail. Here's how to get started with a production-grade setup:

Using CloudWatch for Foundational Metric Collection

AWS automatically sends ELB metrics to CloudWatch without any configuration on your part. To view them:

Go to the CloudWatch console in your AWS account
Navigate to Metrics > All metrics in the left navigation
Select the "AWS/ApplicationELB" or "AWS/NetworkELB" namespace depending on your load balancer type
Choose the metrics you want to view from the available dimensions
Adjust the period to see trends (last 3 hours, 1 day, 1 week, or custom)
Consider setting up a CloudWatch dashboard for frequently referenced metrics

But basic CloudWatch metrics only tell part of the story – they provide aggregated data that can mask underlying issues. For production systems, you'll want more detailed insights.

Implementing Enhanced Monitoring with Access Logs for Detailed Analysis

To get deeper insights that can help with troubleshooting and pattern recognition:

Enable access logs in your ELB settings (under Attributes in the ELB console)
Set a lifecycle policy that matches your retention needs (90 days is common for production systems)
Create regular reports using Athena queries or connect to a visualization tool

Set up Athena tables to query these logs efficiently:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
  type                          string,
  time                          string,
  elb                           string,
  client_ip                     string,
  client_port                   int,
  target_ip                     string,
  target_port                   int,
  request_processing_time       double,
  target_processing_time        double,
  response_processing_time      double,
  elb_status_code               int,
  target_status_code            string,
  received_bytes                bigint,
  sent_bytes                    bigint,
  request_verb                  string,
  request_url                   string,
  request_proto                 string,
  user_agent                    string,
  ssl_cipher                    string,
  ssl_protocol                  string,
  target_group_arn              string,
  trace_id                      string,
  domain_name                   string,
  chosen_cert_arn               string,
  matched_rule_priority         string,
  request_creation_time         string,
  actions_executed              string,
  redirect_url                  string,
  lambda_error_reason           string,
  target_port_list              string,
  target_status_code_list       string,
  classification                string,
  classification_reason         string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)

Configure an S3 bucket to receive these logs with appropriate permissions:

{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Principal": {        "AWS": "arn:aws:iam::elb-account-id:root"      },      "Action": "s3:PutObject",      "Resource": "arn:aws:s3:::your-bucket-name/prefix/AWSLogs/your-account-id/*"    }  ]}

Access logs give you details on every request, including source IP, request path, user agent, and processing times—perfect for troubleshooting specific issues and identifying patterns that aggregate metrics might miss.

For high-traffic systems, these logs can grow to terabytes, so set up appropriate partitioning and lifecycle policies.

💡

Once you’re familiar with ELB metrics, it’s worth checking out how CloudWatch metrics work to track and manage system-wide performance.

Building Custom Dashboards

Raw numbers are nice, but visualizations tell the story better:

# Grafana dashboard JSON snippet for ELB metrics
{
  "panels": [
    {
      "title": "ELB Request Count",
      "type": "graph",
      "datasource": "CloudWatch",
      "targets": [
        {
          "region": "us-east-1",
          "namespace": "AWS/ApplicationELB",
          "metricName": "RequestCount",
          "statistics": ["Sum"],
          "dimensions": {
            "LoadBalancer": "your-load-balancer-name"
          }
        }
      ]
    }
  ]
}

Pro tip: Don't build dashboards that just look pretty. Build ones that answer your actual questions.

Production-Grade Alert Thresholds Based on Industry Experience

Knowing what to look for is half the battle in monitoring. Generic thresholds are a starting point, but truly effective monitoring requires context-specific values.

Here are starting points for different types of applications:

Web Application Alert Thresholds

Metric	Warning Threshold	Critical Threshold	Duration	When to Adjust
P95 Latency	>500ms	>1000ms	5 minutes	Lower for premium services, higher for admin portals
5XX Error Rate	>0.5%	>1%	3 minutes	Lower for checkout flows, higher for non-critical paths
4XX Error Rate	>5%	>10%	5 minutes	Higher after API changes or client updates
Healthy Hosts	<80%	<70%	2 minutes	Higher for small clusters, lower for large pools
Rejected Connections	Any	>100/minute	Immediate	Adjust based on connection limits

API Service Alert Thresholds

Metric	Warning Threshold	Critical Threshold	Duration	When to Adjust
P95 Latency	>200ms	>500ms	3 minutes	Lower for real-time APIs, higher for batch processes
5XX Error Rate	>0.1%	>0.5%	2 minutes	Critical APIs should have near-zero tolerance
Request Rate Drop	>20% from baseline	>50% from baseline	5 minutes	Adjust based on normal traffic patterns
Backend Target Errors	>1%	>5%	2 minutes	Lower for critical infrastructure

E-commerce Alert Thresholds

Metric	Warning Threshold	Critical Threshold	Duration	Context
Cart/Checkout Latency	>300ms	>700ms	1 minute	Studies show checkout abandonment increases 7% per 100ms delay
Payment Processing Errors	>0.1%	>0.5%	Immediate	Money is on the line
Product Page Latency	>800ms	>2000ms	5 minutes	Less critical than checkout flows

Adjust these based on your application's specific needs and business context. What's normal for an e-commerce site might be a crisis for a trading platform. For mission-critical systems, consider implementing multi-level alerts with escalation policies that match your SLAs.

💡

If you're already working with ELB, setting up AWS WAF can add an extra layer of protection against unwanted traffic.

Advanced ELB Statistics Analysis Techniques for Performance Optimization

Once you've mastered the basics of ELB monitoring, it's time to level up with more sophisticated analysis techniques that can uncover hidden patterns and opportunities for optimization.

Multi-Dimensional Correlation Analysis for Root Cause Detection

The real insights come from connecting metrics across different parts of your stack. Here are patterns to look for:

Pattern	Potential Cause	Recommended Action
Spike in latency + increase in CPU on targets	Insufficient backend capacity	Scale up backend resources or optimize code
Increase in 4XX errors after deployment	Client/API compatibility issue	Rollback or implement backwards compatibility
High connection count + low request count	Connection leak in client code	Fix keep-alive handling in client applications
Increased latency + normal CPU/memory	Network issues or DNS problems	Check network paths and DNS resolution times
Cyclical latency spikes + high disk I/O	Cache misses or database issues	Tune caching strategy or optimize database queries
Degraded performance during specific hours	Maintenance jobs or cron tasks	Reschedule background tasks to off-peak hours

When troubleshooting complex issues, create correlation graphs that plot multiple metrics on the same timeline—this visual approach often reveals relationships that raw numbers miss.

Data-Driven Capacity Planning with Historical Load Balancer Metrics

Export your metrics data to analyze growth patterns and forecast future needs with statistical confidence:

# Example Python code for exporting CloudWatch metrics
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_data(
    MetricDataQueries=[
        {
            'Id': 'request_count',
            'MetricStat': {
                'Metric': {
                    'Namespace': 'AWS/ApplicationELB',
                    'MetricName': 'RequestCount',
                    'Dimensions': [
                        {'Name': 'LoadBalancerName', 'Value': 'my-load-balancer'},
                    ]
                },
                'Period': 3600,  # 1 hour
                'Stat': 'Sum',
            },
            'ReturnData': True,
        },
    ],
    StartTime=datetime.now() - timedelta(days=30),
    EndTime=datetime.now(),
)

This data can help you predict when you'll need to add capacity or optimize your setup.

Enterprise-Grade Integration with Observability Platforms

CloudWatch alone provides only a fragmented view of your infrastructure. For production systems, you need a unified observability approach that connects load balancer performance with backend services, user experience, and business outcomes.

Comprehensive Observability Platform Comparison

Looking for a tool that brings all your metrics together with context? Here's a detailed comparison of leading options:

Platform	Strengths	Best For	Integration Complexity	Pricing Model
Last9	High-cardinality observability at scale; predictable pricing; excellent correlation capabilities; OpenTelemetry support	Organizations with variable loads and budget constraints; teams needing unified metrics, logs, and traces	Low-Medium (well-documented APIs)	Events-based pricing (predictable)
Prometheus with OpenTelemetry	Open-source; highly customizable; strong community	Teams with Kubernetes; organizations with in-house expertise	Medium-High (requires management)	Free (infrastructure costs only)
Grafana Cloud	Beautiful visualizations; multi-cloud support; strong dashboarding	Teams needing visual insights; multi-cloud environments	Low-Medium	Resource-based with free tier
Dynatrace	AI-powered anomaly detection; session replay	Enterprises with complex environments	Medium	Host/application based
Honeycomb	High-cardinality debugging; BubbleUp feature	Troubleshooting complex distributed systems	Low	Event-based with sampling

If you're watching your budget while needing serious observability horsepower, Last9 stands out from the crowd. We've handled monitoring for 11 of the 20 largest live-streaming events in history, proving scalability under extreme conditions.

How to Create a Unified Monitoring Strategy

Don't treat ELB stats as isolated metrics. The real power comes from connecting them with:

Application metrics
Infrastructure stats
Business KPIs

This lets you answer questions like "Did that latency spike affect our conversion rate?" instead of just "Is the load balancer working?"

💡

To get a fuller picture of your AWS setup beyond ELB, this guide on AWS monitoring tools walks through what's available and how to use them effectively.

Troubleshooting Common ELB Issues Using Statistics

When things go wrong, here's how to use statistics to find the culprit:

High Latency

If your latency metrics are climbing:

Check TargetResponseTime to see if it's a backend issue
Look at ActiveConnectionCount to check for connection pool exhaustion
Review RequestCount for unexpected traffic patterns

Error Rate Spikes

For a sudden increase in errors:

Check HTTP code breakdowns to identify client vs. server errors
Review access logs to find common patterns in failing requests
Check HealthyHostCount to see if backends are failing

Connection Problems

If clients can't connect:

Check RejectedConnectionCount for capacity issues
Look at the security group and network ACL metrics for blocked connections
Review SurgeQueueLength to see if requests are backing up

💡

Now, fix production ELB metric issues instantly—right from your IDE, with AI and Last9 MCP.

ELB Statistics in a CI/CD World

Modern deployment pipelines should include metric validation:

Canary Deployments

Use metrics to validate canary releases:

Deploy to a small percentage of targets
Compare error rates and latency between the canary and stable
Automatically promote or rollback based on statistical thresholds

Blue/Green Deployment Monitoring

When doing blue/green deployments:

Set up duplicate dashboards for both environments
Compare key metrics side by side
Only shift traffic if the new environment's metrics look healthy

Post-Deployment Monitoring

After each release:

Watch for statistically significant changes in any key metrics
Set lower alert thresholds during the first hour after deployment
Record baseline metrics for future comparison

Cost Optimization Through ELB Statistics

Your metrics can help you save money:

Right-sizing Based on Usage Patterns

Analyze RequestCount and ProcessedBytes over time
Identify peak periods vs. quiet times
Consider switching to smaller load balancers during predictable low-traffic periods

Idle Load Balancer Detection

Create reports to find underutilized resources:

-- Example Athena query to find underutilized ELBs
SELECT 
    elb_name,
    SUM(request_count) as total_requests,
    AVG(latency) as avg_latency
FROM elb_logs
WHERE date BETWEEN DATE_SUB(CURRENT_DATE, 30) AND CURRENT_DATE
GROUP BY elb_name
HAVING total_requests < 1000 -- Define your threshold
ORDER BY total_requests ASC;

Cross-Region Traffic Analysis

If you're using Global Accelerator or similar:

Compare metrics across regions
Adjust traffic distribution to maximize performance while minimizing cost
Consider shutting down redundant capacity in low-use regions

ELB Statistics for Security and Compliance

Your metrics aren't just for performance—they're a security tool too:

Detecting Unusual Traffic Patterns

Set up anomaly detection for:

Unusual spikes in request volume
Unexpected geographic distribution of traffic
Abnormal request size patterns

Compliance Reporting

For regulated industries:

Set up automatic exports of key metrics
Document uptime and performance statistics
Create custom dashboards for auditors showing security-relevant metrics

DDoS Detection and Mitigation

Configure alerts for potential attacks:

Sudden increases in rejected connections
Unusual patterns in client errors
Spikes in specific HTTP methods (especially uncommon ones)

Conclusion

Understanding Elastic Load Balancer statistics isn’t just something you turn to when things break—it’s part of running stable, efficient systems day to day. Start with the core metrics, get familiar with the patterns, and build from there.

💡

If you're curious to see how others are tackling similar challenges, our Discord community has plenty of ongoing conversations around observability and practical monitoring setups.

FAQs

How frequently are ELB metrics updated in CloudWatch?

Most ELB metrics are updated at one-minute intervals in CloudWatch by default. However, if you enable detailed monitoring, some metrics can be available at 10-second intervals, which is valuable for real-time troubleshooting. Keep in mind that higher resolution comes with increased CloudWatch costs.

Do ELB statistics count against my CloudWatch costs?

Yes, standard CloudWatch pricing applies to ELB metrics. Basic metrics at 1-minute resolution are included at no additional charge, but detailed monitoring (higher frequency), custom metrics, dashboards, alarms, and long-term storage will incur additional costs. A typical production setup with reasonable alerting can add $50-200/month to your CloudWatch bill depending on scale.

How long should I retain ELB statistics for different business needs?

Business Need	Recommended Retention	Reasoning
Operational monitoring	2-4 weeks	Sufficient for troubleshooting recent issues
Capacity planning	13 months	Enables year-over-year comparisons and seasonal analysis
Compliance (finance)	3-7 years	Many financial regulations require extended retention
Compliance (healthcare)	6+ years	HIPAA and related standards have lengthy requirements
Security forensics	1+ year	Allows investigation of long-term patterns and breaches

Always consult your legal and compliance teams for specific requirements in regulated industries.

Can I export ELB metrics to external systems for custom analysis?

Yes, you have several options for exporting metrics:

CloudWatch API: Programmatically extract metrics using boto3 or AWS SDK
CloudWatch Metric Streams: Stream metrics in real-time to Kinesis Data Firehose
Third-party integrations: Most observability platforms offer direct CloudWatch integration
Custom Lambda exporters: Build custom export logic for specific requirements

For large-scale environments, consider using CloudWatch Metric Streams with Firehose to an S3 data lake for cost-effective long-term storage and analysis.

How do ELB statistics differ between Application Load Balancers and Network Load Balancers?

Feature	Application Load Balancer	Network Load Balancer
Layer	Layer 7 (HTTP/HTTPS)	Layer 4 (TCP/UDP/TLS)
Request metrics	Full HTTP(S) metrics	Connection-based metrics
Status code tracking	HTTP status codes (2XX, 4XX, 5XX)	TCP connection status
Unique metrics	Rule evaluation counts, HTTP header processing	TCP_ELB_Reset_Count, flow metrics
Health check metrics	HTTP-based health checks	TCP connection health checks
Latency components	Request processing, response processing	Connection establishment time

This fundamental difference means your monitoring strategy should align with the load balancer type you're using.

What's the difference between load balancer latency and target response time?

Load Balancer Latency = Request Processing Time + Target Response Time + Response Processing Time

Request Processing Time: Time from receiving request to sending it to target (includes TLS handshake, rule evaluation)
Target Response Time: Time spent waiting for backend target to respond (includes application logic, database queries)
Response Processing Time: Time to process and send response back to client

When troubleshooting high latency, these component metrics help pinpoint whether the issue lies in the load balancer configuration, network path, or backend application code.

How can I accurately calculate the cost impact of my ELB usage?

To build a comprehensive cost model:

Base cost: Number of load balancers × hourly rate × hours in month
Data transfer: ProcessedBytes metric × appropriate data transfer rate
- Internet outbound: $0.09-$0.15/GB (varies by region and volume)
- Inter-AZ: ~$0.01/GB
- Intra-AZ: Free
LCU costs (for ALBs): Calculate Load Balancer Capacity Units based on:
- New connections per second (divided by 25)
- Active connections per minute (divided by 3,000)
- Processed bytes per hour (divided by 1GB)
- Rule evaluations per second (divided by 1,000)

Pro tip: Many organizations discover that data transfer costs often exceed the hourly load balancer costs by 3-5x, especially for content-heavy applications. Implement proper caching strategies and CDN integration to minimize these costs.