As your infrastructure grows, keeping track of what your elastic load balancers are doing isn’t just useful—it’s necessary. Without clear visibility into this layer, it’s easy to miss bottlenecks, uneven traffic distribution, or failing targets.
This guide walks through the key ELB metrics that matter when you're running real workloads in production.
Elastic Load Balancer Statistics: Core Metrics Explained
Elastic load balancer statistics are the data points that show how your load balancers are performing. They're the numbers that tell you whether your traffic is flowing smoothly or if you've got bottlenecks that need attention.
These stats cover everything from request counts and latency to error rates and connection details. Think of them as your load balancer's vital signs—they let you know if things are healthy or if something's about to break.
Why Proactive ELB Statistics Tracking is Critical for System Reliability
You can't fix what you don't measure, and with load balancers sitting at the gateway of your infrastructure, flying blind here is particularly risky. Here's why tracking these numbers is worth your time:
- Spot issues before users do – See performance drops before they affect your users
- Right-size your resources – Know exactly when to scale up or down based on actual usage patterns
- Budget better – Understand traffic patterns to predict future costs and avoid surprise bills
- Sleep easier – Set up intelligent alerts that only wake you up when there's a real problem
- Improve mean time to resolution – When incidents occur, having historical data speeds up troubleshooting
- Optimize for peak traffic – Prepare your infrastructure for high-demand periods by understanding past patterns
Essential Elastic Load Balancer Metrics That Reveal System Health
Let's break down the stats that deserve your attention:
Crucial Request and Connection Metrics for Traffic Analysis
| Metric | What It Tells You | Why It Matters | Typical Warning Signs | 
|---|---|---|---|
| RequestCount | Total number of requests processed per period | Traffic patterns and capacity planning | Sudden 2x+ increases without explanation | 
| ActiveConnectionCount | Number of concurrent connections currently open | Real-time load assessment and connection pool health | Steady climb without corresponding increase in requests | 
| NewConnectionCount | New connections established per period | Growth trends and sudden spikes | Rapid oscillation patterns indicating connection issues | 
| RejectedConnectionCount | Connections refused due to capacity limits | Early warning of capacity issues | Any non-zero values during normal operation | 
| SurgeQueueLength | Requests waiting for processing | Backlog size and potential timeout risks | Consistently above zero with increasing trend | 
These metrics show you the raw volume your load balancers are handling. Sudden spikes might indicate legitimate traffic surges (like during a marketing campaign), but they could also signal potential DDoS attacks or misconfigured clients hammering your endpoints.
Pay special attention to the ratio between ActiveConnectionCount and RequestCount. A high number of connections with low request counts often indicates connection pooling issues in your clients.
Performance Metrics That Impact User Experience
| Metric | What It Tells You | Why It Matters | Target Values | 
|---|---|---|---|
| Latency (overall) | Total time to process requests end-to-end | Direct impact on user-perceived performance | Web: <200ms, API: <100ms | 
| TargetResponseTime | Time spent waiting for backend servers | Backend service health and database performance | Should be 70-80% of total latency | 
| ProcessedBytes | Data volume processed by the load balancer | Bandwidth usage planning and cost forecasting | Monitor for sudden changes vs. historical patterns | 
| TLS Negotiation Time | Time spent establishing secure connections | Security overhead impact | Should be <50ms with session resumption enabled | 
| RequestProcessingTime | Time the load balancer spends processing | ELB efficiency and potential configuration issues | Should be minimal (<10ms) | 
Latency is particularly crucial—users notice when things get slow. Studies show that each 100ms of added latency can reduce conversion rates by up to 7%. A sudden increase here means it's time to start investigating immediately.
For e-commerce or financial applications, you should track the 95th and 99th percentile latency values, not just averages. A smooth average can hide terrible experiences for a significant portion of your users.
Health and Error Metrics: Your Early Warning System
These metrics serve as canaries in the coal mine for your application health:
| Metric | What It Tells You | Warning Threshold | Critical Threshold | 
|---|---|---|---|
| HTTPCode_ELB_4XX_Count | Client errors (bad requests) | >5% of total requests | >10% of total requests | 
| HTTPCode_ELB_5XX_Count | Server errors (your problem) | >0.1% of total requests | >1% of total requests | 
| HTTPCode_Target_2XX_Count | Successful responses | <90% of total requests | <80% of total requests | 
| HealthyHostCount | Number of backend servers passing health checks | <90% of total hosts | <70% of total hosts | 
| UnhealthyHostCount | Number of failing backend servers | >10% of total hosts | >30% of total hosts | 
| FailedHealthChecks | Number of individual health check failures | Any consistent pattern | Steady increase over time | 
A sharp rise in 5XX errors typically means something's broken in your backend services. But don't ignore 4XX errors – a sudden increase often indicates a client library issue or API change that's causing compatibility problems.
For high-traffic systems, you should track the ratio of errors to total requests rather than absolute numbers. A system handling millions of requests can have thousands of errors while still being "healthy" from a percentage perspective.
How to Set Up ELB Statistics Collection
Getting these metrics isn't complicated, but doing it right requires attention to detail. Here's how to get started with a production-grade setup:
Using CloudWatch for Foundational Metric Collection
AWS automatically sends ELB metrics to CloudWatch without any configuration on your part. To view them:
- Go to the CloudWatch console in your AWS account
- Navigate to Metrics > All metrics in the left navigation
- Select the "AWS/ApplicationELB" or "AWS/NetworkELB" namespace depending on your load balancer type
- Choose the metrics you want to view from the available dimensions
- Adjust the period to see trends (last 3 hours, 1 day, 1 week, or custom)
- Consider setting up a CloudWatch dashboard for frequently referenced metrics
But basic CloudWatch metrics only tell part of the story – they provide aggregated data that can mask underlying issues. For production systems, you'll want more detailed insights.
Implementing Enhanced Monitoring with Access Logs for Detailed Analysis
To get deeper insights that can help with troubleshooting and pattern recognition:
- Enable access logs in your ELB settings (under Attributes in the ELB console)
- Set a lifecycle policy that matches your retention needs (90 days is common for production systems)
- Create regular reports using Athena queries or connect to a visualization tool
Set up Athena tables to query these logs efficiently:
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
  type                          string,
  time                          string,
  elb                           string,
  client_ip                     string,
  client_port                   int,
  target_ip                     string,
  target_port                   int,
  request_processing_time       double,
  target_processing_time        double,
  response_processing_time      double,
  elb_status_code               int,
  target_status_code            string,
  received_bytes                bigint,
  sent_bytes                    bigint,
  request_verb                  string,
  request_url                   string,
  request_proto                 string,
  user_agent                    string,
  ssl_cipher                    string,
  ssl_protocol                  string,
  target_group_arn              string,
  trace_id                      string,
  domain_name                   string,
  chosen_cert_arn               string,
  matched_rule_priority         string,
  request_creation_time         string,
  actions_executed              string,
  redirect_url                  string,
  lambda_error_reason           string,
  target_port_list              string,
  target_status_code_list       string,
  classification                string,
  classification_reason         string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)Configure an S3 bucket to receive these logs with appropriate permissions:
{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Principal": {        "AWS": "arn:aws:iam::elb-account-id:root"      },      "Action": "s3:PutObject",      "Resource": "arn:aws:s3:::your-bucket-name/prefix/AWSLogs/your-account-id/*"    }  ]}Access logs give you details on every request, including source IP, request path, user agent, and processing times—perfect for troubleshooting specific issues and identifying patterns that aggregate metrics might miss.
For high-traffic systems, these logs can grow to terabytes, so set up appropriate partitioning and lifecycle policies.
Building Custom Dashboards
Raw numbers are nice, but visualizations tell the story better:
# Grafana dashboard JSON snippet for ELB metrics
{
  "panels": [
    {
      "title": "ELB Request Count",
      "type": "graph",
      "datasource": "CloudWatch",
      "targets": [
        {
          "region": "us-east-1",
          "namespace": "AWS/ApplicationELB",
          "metricName": "RequestCount",
          "statistics": ["Sum"],
          "dimensions": {
            "LoadBalancer": "your-load-balancer-name"
          }
        }
      ]
    }
  ]
}Pro tip: Don't build dashboards that just look pretty. Build ones that answer your actual questions.
Production-Grade Alert Thresholds Based on Industry Experience
Knowing what to look for is half the battle in monitoring. Generic thresholds are a starting point, but truly effective monitoring requires context-specific values.
Here are starting points for different types of applications:
Web Application Alert Thresholds
| Metric | Warning Threshold | Critical Threshold | Duration | When to Adjust | 
|---|---|---|---|---|
| P95 Latency | >500ms | >1000ms | 5 minutes | Lower for premium services, higher for admin portals | 
| 5XX Error Rate | >0.5% | >1% | 3 minutes | Lower for checkout flows, higher for non-critical paths | 
| 4XX Error Rate | >5% | >10% | 5 minutes | Higher after API changes or client updates | 
| Healthy Hosts | <80% | <70% | 2 minutes | Higher for small clusters, lower for large pools | 
| Rejected Connections | Any | >100/minute | Immediate | Adjust based on connection limits | 
API Service Alert Thresholds
| Metric | Warning Threshold | Critical Threshold | Duration | When to Adjust | 
|---|---|---|---|---|
| P95 Latency | >200ms | >500ms | 3 minutes | Lower for real-time APIs, higher for batch processes | 
| 5XX Error Rate | >0.1% | >0.5% | 2 minutes | Critical APIs should have near-zero tolerance | 
| Request Rate Drop | >20% from baseline | >50% from baseline | 5 minutes | Adjust based on normal traffic patterns | 
| Backend Target Errors | >1% | >5% | 2 minutes | Lower for critical infrastructure | 
E-commerce Alert Thresholds
| Metric | Warning Threshold | Critical Threshold | Duration | Context | 
|---|---|---|---|---|
| Cart/Checkout Latency | >300ms | >700ms | 1 minute | Studies show checkout abandonment increases 7% per 100ms delay | 
| Payment Processing Errors | >0.1% | >0.5% | Immediate | Money is on the line | 
| Product Page Latency | >800ms | >2000ms | 5 minutes | Less critical than checkout flows | 
Adjust these based on your application's specific needs and business context. What's normal for an e-commerce site might be a crisis for a trading platform. For mission-critical systems, consider implementing multi-level alerts with escalation policies that match your SLAs.
Advanced ELB Statistics Analysis Techniques for Performance Optimization
Once you've mastered the basics of ELB monitoring, it's time to level up with more sophisticated analysis techniques that can uncover hidden patterns and opportunities for optimization.
Multi-Dimensional Correlation Analysis for Root Cause Detection
The real insights come from connecting metrics across different parts of your stack. Here are patterns to look for:
| Pattern | Potential Cause | Recommended Action | 
|---|---|---|
| Spike in latency + increase in CPU on targets | Insufficient backend capacity | Scale up backend resources or optimize code | 
| Increase in 4XX errors after deployment | Client/API compatibility issue | Rollback or implement backwards compatibility | 
| High connection count + low request count | Connection leak in client code | Fix keep-alive handling in client applications | 
| Increased latency + normal CPU/memory | Network issues or DNS problems | Check network paths and DNS resolution times | 
| Cyclical latency spikes + high disk I/O | Cache misses or database issues | Tune caching strategy or optimize database queries | 
| Degraded performance during specific hours | Maintenance jobs or cron tasks | Reschedule background tasks to off-peak hours | 
When troubleshooting complex issues, create correlation graphs that plot multiple metrics on the same timeline—this visual approach often reveals relationships that raw numbers miss.
Data-Driven Capacity Planning with Historical Load Balancer Metrics
Export your metrics data to analyze growth patterns and forecast future needs with statistical confidence:
# Example Python code for exporting CloudWatch metrics
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.get_metric_data(
    MetricDataQueries=[
        {
            'Id': 'request_count',
            'MetricStat': {
                'Metric': {
                    'Namespace': 'AWS/ApplicationELB',
                    'MetricName': 'RequestCount',
                    'Dimensions': [
                        {'Name': 'LoadBalancerName', 'Value': 'my-load-balancer'},
                    ]
                },
                'Period': 3600,  # 1 hour
                'Stat': 'Sum',
            },
            'ReturnData': True,
        },
    ],
    StartTime=datetime.now() - timedelta(days=30),
    EndTime=datetime.now(),
)This data can help you predict when you'll need to add capacity or optimize your setup.
Enterprise-Grade Integration with Observability Platforms
CloudWatch alone provides only a fragmented view of your infrastructure. For production systems, you need a unified observability approach that connects load balancer performance with backend services, user experience, and business outcomes.
Comprehensive Observability Platform Comparison
Looking for a tool that brings all your metrics together with context? Here's a detailed comparison of leading options:
| Platform | Strengths | Best For | Integration Complexity | Pricing Model | 
|---|---|---|---|---|
| Last9 | High-cardinality observability at scale; predictable pricing; excellent correlation capabilities; OpenTelemetry support | Organizations with variable loads and budget constraints; teams needing unified metrics, logs, and traces | Low-Medium (well-documented APIs) | Events-based pricing (predictable) | 
| Prometheus with OpenTelemetry | Open-source; highly customizable; strong community | Teams with Kubernetes; organizations with in-house expertise | Medium-High (requires management) | Free (infrastructure costs only) | 
| Grafana Cloud | Beautiful visualizations; multi-cloud support; strong dashboarding | Teams needing visual insights; multi-cloud environments | Low-Medium | Resource-based with free tier | 
| Dynatrace | AI-powered anomaly detection; session replay | Enterprises with complex environments | Medium | Host/application based | 
| Honeycomb | High-cardinality debugging; BubbleUp feature | Troubleshooting complex distributed systems | Low | Event-based with sampling | 
If you're watching your budget while needing serious observability horsepower, Last9 stands out from the crowd. We've handled monitoring for 11 of the 20 largest live-streaming events in history, proving scalability under extreme conditions.
How to Create a Unified Monitoring Strategy
Don't treat ELB stats as isolated metrics. The real power comes from connecting them with:
- Application metrics
- Infrastructure stats
- Business KPIs
This lets you answer questions like "Did that latency spike affect our conversion rate?" instead of just "Is the load balancer working?"
Troubleshooting Common ELB Issues Using Statistics
When things go wrong, here's how to use statistics to find the culprit:
High Latency
If your latency metrics are climbing:
- Check TargetResponseTime to see if it's a backend issue
- Look at ActiveConnectionCount to check for connection pool exhaustion
- Review RequestCount for unexpected traffic patterns
Error Rate Spikes
For a sudden increase in errors:
- Check HTTP code breakdowns to identify client vs. server errors
- Review access logs to find common patterns in failing requests
- Check HealthyHostCount to see if backends are failing
Connection Problems
If clients can't connect:
- Check RejectedConnectionCount for capacity issues
- Look at the security group and network ACL metrics for blocked connections
- Review SurgeQueueLength to see if requests are backing up
ELB Statistics in a CI/CD World
Modern deployment pipelines should include metric validation:
Canary Deployments
Use metrics to validate canary releases:
- Deploy to a small percentage of targets
- Compare error rates and latency between the canary and stable
- Automatically promote or rollback based on statistical thresholds
Blue/Green Deployment Monitoring
When doing blue/green deployments:
- Set up duplicate dashboards for both environments
- Compare key metrics side by side
- Only shift traffic if the new environment's metrics look healthy
Post-Deployment Monitoring
After each release:
- Watch for statistically significant changes in any key metrics
- Set lower alert thresholds during the first hour after deployment
- Record baseline metrics for future comparison
Cost Optimization Through ELB Statistics
Your metrics can help you save money:
Right-sizing Based on Usage Patterns
- Analyze RequestCount and ProcessedBytes over time
- Identify peak periods vs. quiet times
- Consider switching to smaller load balancers during predictable low-traffic periods
Idle Load Balancer Detection
Create reports to find underutilized resources:
-- Example Athena query to find underutilized ELBs
SELECT 
    elb_name,
    SUM(request_count) as total_requests,
    AVG(latency) as avg_latency
FROM elb_logs
WHERE date BETWEEN DATE_SUB(CURRENT_DATE, 30) AND CURRENT_DATE
GROUP BY elb_name
HAVING total_requests < 1000 -- Define your threshold
ORDER BY total_requests ASC;Cross-Region Traffic Analysis
If you're using Global Accelerator or similar:
- Compare metrics across regions
- Adjust traffic distribution to maximize performance while minimizing cost
- Consider shutting down redundant capacity in low-use regions
ELB Statistics for Security and Compliance
Your metrics aren't just for performance—they're a security tool too:
Detecting Unusual Traffic Patterns
Set up anomaly detection for:
- Unusual spikes in request volume
- Unexpected geographic distribution of traffic
- Abnormal request size patterns
Compliance Reporting
For regulated industries:
- Set up automatic exports of key metrics
- Document uptime and performance statistics
- Create custom dashboards for auditors showing security-relevant metrics
DDoS Detection and Mitigation
Configure alerts for potential attacks:
- Sudden increases in rejected connections
- Unusual patterns in client errors
- Spikes in specific HTTP methods (especially uncommon ones)
Conclusion
Understanding Elastic Load Balancer statistics isn’t just something you turn to when things break—it’s part of running stable, efficient systems day to day. Start with the core metrics, get familiar with the patterns, and build from there.
FAQs
How frequently are ELB metrics updated in CloudWatch?
Most ELB metrics are updated at one-minute intervals in CloudWatch by default. However, if you enable detailed monitoring, some metrics can be available at 10-second intervals, which is valuable for real-time troubleshooting. Keep in mind that higher resolution comes with increased CloudWatch costs.
Do ELB statistics count against my CloudWatch costs?
Yes, standard CloudWatch pricing applies to ELB metrics. Basic metrics at 1-minute resolution are included at no additional charge, but detailed monitoring (higher frequency), custom metrics, dashboards, alarms, and long-term storage will incur additional costs. A typical production setup with reasonable alerting can add $50-200/month to your CloudWatch bill depending on scale.
How long should I retain ELB statistics for different business needs?
| Business Need | Recommended Retention | Reasoning | 
|---|---|---|
| Operational monitoring | 2-4 weeks | Sufficient for troubleshooting recent issues | 
| Capacity planning | 13 months | Enables year-over-year comparisons and seasonal analysis | 
| Compliance (finance) | 3-7 years | Many financial regulations require extended retention | 
| Compliance (healthcare) | 6+ years | HIPAA and related standards have lengthy requirements | 
| Security forensics | 1+ year | Allows investigation of long-term patterns and breaches | 
Always consult your legal and compliance teams for specific requirements in regulated industries.
Can I export ELB metrics to external systems for custom analysis?
Yes, you have several options for exporting metrics:
- CloudWatch API: Programmatically extract metrics using boto3 or AWS SDK
- CloudWatch Metric Streams: Stream metrics in real-time to Kinesis Data Firehose
- Third-party integrations: Most observability platforms offer direct CloudWatch integration
- Custom Lambda exporters: Build custom export logic for specific requirements
For large-scale environments, consider using CloudWatch Metric Streams with Firehose to an S3 data lake for cost-effective long-term storage and analysis.
How do ELB statistics differ between Application Load Balancers and Network Load Balancers?
| Feature | Application Load Balancer | Network Load Balancer | 
|---|---|---|
| Layer | Layer 7 (HTTP/HTTPS) | Layer 4 (TCP/UDP/TLS) | 
| Request metrics | Full HTTP(S) metrics | Connection-based metrics | 
| Status code tracking | HTTP status codes (2XX, 4XX, 5XX) | TCP connection status | 
| Unique metrics | Rule evaluation counts, HTTP header processing | TCP_ELB_Reset_Count, flow metrics | 
| Health check metrics | HTTP-based health checks | TCP connection health checks | 
| Latency components | Request processing, response processing | Connection establishment time | 
This fundamental difference means your monitoring strategy should align with the load balancer type you're using.
What's the difference between load balancer latency and target response time?
Load Balancer Latency = Request Processing Time + Target Response Time + Response Processing Time
- Request Processing Time: Time from receiving request to sending it to target (includes TLS handshake, rule evaluation)
- Target Response Time: Time spent waiting for backend target to respond (includes application logic, database queries)
- Response Processing Time: Time to process and send response back to client
When troubleshooting high latency, these component metrics help pinpoint whether the issue lies in the load balancer configuration, network path, or backend application code.
How can I accurately calculate the cost impact of my ELB usage?
To build a comprehensive cost model:
- Base cost: Number of load balancers × hourly rate × hours in month
- Data transfer: ProcessedBytes metric × appropriate data transfer rate- Internet outbound: $0.09-$0.15/GB (varies by region and volume)
- Inter-AZ: ~$0.01/GB
- Intra-AZ: Free
 
- LCU costs (for ALBs): Calculate Load Balancer Capacity Units based on:- New connections per second (divided by 25)
- Active connections per minute (divided by 3,000)
- Processed bytes per hour (divided by 1GB)
- Rule evaluations per second (divided by 1,000)
 
Pro tip: Many organizations discover that data transfer costs often exceed the hourly load balancer costs by 3-5x, especially for content-heavy applications. Implement proper caching strategies and CDN integration to minimize these costs.
 
  
  
  
  
  
  
 