Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 16th, ‘25 / 13 min read

Getting Started with Elastic Load Balancer (ELB) Metrics

Learn the key ELB metrics that help you monitor traffic, spot issues early, and keep your load balancers running smoothly in production.

Getting Started with Elastic Load Balancer (ELB) Metrics

As your infrastructure grows, keeping track of what your elastic load balancers are doing isn’t just useful—it’s necessary. Without clear visibility into this layer, it’s easy to miss bottlenecks, uneven traffic distribution, or failing targets.

This guide walks through the key ELB metrics that matter when you're running real workloads in production.

Elastic Load Balancer Statistics: Core Metrics Explained

Elastic load balancer statistics are the data points that show how your load balancers are performing. They're the numbers that tell you whether your traffic is flowing smoothly or if you've got bottlenecks that need attention.

These stats cover everything from request counts and latency to error rates and connection details. Think of them as your load balancer's vital signs—they let you know if things are healthy or if something's about to break.

💡
For a broader view of network behavior alongside ELB stats, you might find this guide on how to view and understand VPC Flow Logs useful.

Why Proactive ELB Statistics Tracking is Critical for System Reliability

You can't fix what you don't measure, and with load balancers sitting at the gateway of your infrastructure, flying blind here is particularly risky. Here's why tracking these numbers is worth your time:

  • Spot issues before users do – See performance drops before they affect your users
  • Right-size your resources – Know exactly when to scale up or down based on actual usage patterns
  • Budget better – Understand traffic patterns to predict future costs and avoid surprise bills
  • Sleep easier – Set up intelligent alerts that only wake you up when there's a real problem
  • Improve mean time to resolution – When incidents occur, having historical data speeds up troubleshooting
  • Optimize for peak traffic – Prepare your infrastructure for high-demand periods by understanding past patterns

Essential Elastic Load Balancer Metrics That Reveal System Health

Let's break down the stats that deserve your attention:

Crucial Request and Connection Metrics for Traffic Analysis

Metric What It Tells You Why It Matters Typical Warning Signs
RequestCount Total number of requests processed per period Traffic patterns and capacity planning Sudden 2x+ increases without explanation
ActiveConnectionCount Number of concurrent connections currently open Real-time load assessment and connection pool health Steady climb without corresponding increase in requests
NewConnectionCount New connections established per period Growth trends and sudden spikes Rapid oscillation patterns indicating connection issues
RejectedConnectionCount Connections refused due to capacity limits Early warning of capacity issues Any non-zero values during normal operation
SurgeQueueLength Requests waiting for processing Backlog size and potential timeout risks Consistently above zero with increasing trend

These metrics show you the raw volume your load balancers are handling. Sudden spikes might indicate legitimate traffic surges (like during a marketing campaign), but they could also signal potential DDoS attacks or misconfigured clients hammering your endpoints.

Pay special attention to the ratio between ActiveConnectionCount and RequestCount. A high number of connections with low request counts often indicates connection pooling issues in your clients.

💡
If you're setting up AWS traffic routing end to end, this CloudFront basics and setup guide can help you connect the dots.

Performance Metrics That Impact User Experience

Metric What It Tells You Why It Matters Target Values
Latency (overall) Total time to process requests end-to-end Direct impact on user-perceived performance Web: <200ms, API: <100ms
TargetResponseTime Time spent waiting for backend servers Backend service health and database performance Should be 70-80% of total latency
ProcessedBytes Data volume processed by the load balancer Bandwidth usage planning and cost forecasting Monitor for sudden changes vs. historical patterns
TLS Negotiation Time Time spent establishing secure connections Security overhead impact Should be <50ms with session resumption enabled
RequestProcessingTime Time the load balancer spends processing ELB efficiency and potential configuration issues Should be minimal (<10ms)

Latency is particularly crucial—users notice when things get slow. Studies show that each 100ms of added latency can reduce conversion rates by up to 7%. A sudden increase here means it's time to start investigating immediately.

For e-commerce or financial applications, you should track the 95th and 99th percentile latency values, not just averages. A smooth average can hide terrible experiences for a significant portion of your users.

Health and Error Metrics: Your Early Warning System

These metrics serve as canaries in the coal mine for your application health:

Metric What It Tells You Warning Threshold Critical Threshold
HTTPCode_ELB_4XX_Count Client errors (bad requests) >5% of total requests >10% of total requests
HTTPCode_ELB_5XX_Count Server errors (your problem) >0.1% of total requests >1% of total requests
HTTPCode_Target_2XX_Count Successful responses <90% of total requests <80% of total requests
HealthyHostCount Number of backend servers passing health checks <90% of total hosts <70% of total hosts
UnhealthyHostCount Number of failing backend servers >10% of total hosts >30% of total hosts
FailedHealthChecks Number of individual health check failures Any consistent pattern Steady increase over time

A sharp rise in 5XX errors typically means something's broken in your backend services. But don't ignore 4XX errors – a sudden increase often indicates a client library issue or API change that's causing compatibility problems.

For high-traffic systems, you should track the ratio of errors to total requests rather than absolute numbers. A system handling millions of requests can have thousands of errors while still being "healthy" from a percentage perspective.

💡
Looking at load balancer stats is a good start, but issues can also crop up downstream—like DynamoDB throttling during traffic spikes.

How to Set Up ELB Statistics Collection

Getting these metrics isn't complicated, but doing it right requires attention to detail. Here's how to get started with a production-grade setup:

Using CloudWatch for Foundational Metric Collection

AWS automatically sends ELB metrics to CloudWatch without any configuration on your part. To view them:

  1. Go to the CloudWatch console in your AWS account
  2. Navigate to Metrics > All metrics in the left navigation
  3. Select the "AWS/ApplicationELB" or "AWS/NetworkELB" namespace depending on your load balancer type
  4. Choose the metrics you want to view from the available dimensions
  5. Adjust the period to see trends (last 3 hours, 1 day, 1 week, or custom)
  6. Consider setting up a CloudWatch dashboard for frequently referenced metrics

But basic CloudWatch metrics only tell part of the story – they provide aggregated data that can mask underlying issues. For production systems, you'll want more detailed insights.

Implementing Enhanced Monitoring with Access Logs for Detailed Analysis

To get deeper insights that can help with troubleshooting and pattern recognition:

  1. Enable access logs in your ELB settings (under Attributes in the ELB console)
  2. Set a lifecycle policy that matches your retention needs (90 days is common for production systems)
  3. Create regular reports using Athena queries or connect to a visualization tool

Set up Athena tables to query these logs efficiently:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
  type                          string,
  time                          string,
  elb                           string,
  client_ip                     string,
  client_port                   int,
  target_ip                     string,
  target_port                   int,
  request_processing_time       double,
  target_processing_time        double,
  response_processing_time      double,
  elb_status_code               int,
  target_status_code            string,
  received_bytes                bigint,
  sent_bytes                    bigint,
  request_verb                  string,
  request_url                   string,
  request_proto                 string,
  user_agent                    string,
  ssl_cipher                    string,
  ssl_protocol                  string,
  target_group_arn              string,
  trace_id                      string,
  domain_name                   string,
  chosen_cert_arn               string,
  matched_rule_priority         string,
  request_creation_time         string,
  actions_executed              string,
  redirect_url                  string,
  lambda_error_reason           string,
  target_port_list              string,
  target_status_code_list       string,
  classification                string,
  classification_reason         string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)

Configure an S3 bucket to receive these logs with appropriate permissions:

{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Principal": {        "AWS": "arn:aws:iam::elb-account-id:root"      },      "Action": "s3:PutObject",      "Resource": "arn:aws:s3:::your-bucket-name/prefix/AWSLogs/your-account-id/*"    }  ]}

Access logs give you details on every request, including source IP, request path, user agent, and processing times—perfect for troubleshooting specific issues and identifying patterns that aggregate metrics might miss.

For high-traffic systems, these logs can grow to terabytes, so set up appropriate partitioning and lifecycle policies.

💡
Once you’re familiar with ELB metrics, it’s worth checking out how CloudWatch metrics work to track and manage system-wide performance.

Building Custom Dashboards

Raw numbers are nice, but visualizations tell the story better:

# Grafana dashboard JSON snippet for ELB metrics
{
  "panels": [
    {
      "title": "ELB Request Count",
      "type": "graph",
      "datasource": "CloudWatch",
      "targets": [
        {
          "region": "us-east-1",
          "namespace": "AWS/ApplicationELB",
          "metricName": "RequestCount",
          "statistics": ["Sum"],
          "dimensions": {
            "LoadBalancer": "your-load-balancer-name"
          }
        }
      ]
    }
  ]
}

Pro tip: Don't build dashboards that just look pretty. Build ones that answer your actual questions.

Production-Grade Alert Thresholds Based on Industry Experience

Knowing what to look for is half the battle in monitoring. Generic thresholds are a starting point, but truly effective monitoring requires context-specific values.

Here are starting points for different types of applications:

Web Application Alert Thresholds

Metric Warning Threshold Critical Threshold Duration When to Adjust
P95 Latency >500ms >1000ms 5 minutes Lower for premium services, higher for admin portals
5XX Error Rate >0.5% >1% 3 minutes Lower for checkout flows, higher for non-critical paths
4XX Error Rate >5% >10% 5 minutes Higher after API changes or client updates
Healthy Hosts <80% <70% 2 minutes Higher for small clusters, lower for large pools
Rejected Connections Any >100/minute Immediate Adjust based on connection limits

API Service Alert Thresholds

Metric Warning Threshold Critical Threshold Duration When to Adjust
P95 Latency >200ms >500ms 3 minutes Lower for real-time APIs, higher for batch processes
5XX Error Rate >0.1% >0.5% 2 minutes Critical APIs should have near-zero tolerance
Request Rate Drop >20% from baseline >50% from baseline 5 minutes Adjust based on normal traffic patterns
Backend Target Errors >1% >5% 2 minutes Lower for critical infrastructure

E-commerce Alert Thresholds

Metric Warning Threshold Critical Threshold Duration Context
Cart/Checkout Latency >300ms >700ms 1 minute Studies show checkout abandonment increases 7% per 100ms delay
Payment Processing Errors >0.1% >0.5% Immediate Money is on the line
Product Page Latency >800ms >2000ms 5 minutes Less critical than checkout flows

Adjust these based on your application's specific needs and business context. What's normal for an e-commerce site might be a crisis for a trading platform. For mission-critical systems, consider implementing multi-level alerts with escalation policies that match your SLAs.

💡
If you're already working with ELB, setting up AWS WAF can add an extra layer of protection against unwanted traffic.

Advanced ELB Statistics Analysis Techniques for Performance Optimization

Once you've mastered the basics of ELB monitoring, it's time to level up with more sophisticated analysis techniques that can uncover hidden patterns and opportunities for optimization.

Multi-Dimensional Correlation Analysis for Root Cause Detection

The real insights come from connecting metrics across different parts of your stack. Here are patterns to look for:

Pattern Potential Cause Recommended Action
Spike in latency + increase in CPU on targets Insufficient backend capacity Scale up backend resources or optimize code
Increase in 4XX errors after deployment Client/API compatibility issue Rollback or implement backwards compatibility
High connection count + low request count Connection leak in client code Fix keep-alive handling in client applications
Increased latency + normal CPU/memory Network issues or DNS problems Check network paths and DNS resolution times
Cyclical latency spikes + high disk I/O Cache misses or database issues Tune caching strategy or optimize database queries
Degraded performance during specific hours Maintenance jobs or cron tasks Reschedule background tasks to off-peak hours

When troubleshooting complex issues, create correlation graphs that plot multiple metrics on the same timeline—this visual approach often reveals relationships that raw numbers miss.

Data-Driven Capacity Planning with Historical Load Balancer Metrics

Export your metrics data to analyze growth patterns and forecast future needs with statistical confidence:

# Example Python code for exporting CloudWatch metrics
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_data(
    MetricDataQueries=[
        {
            'Id': 'request_count',
            'MetricStat': {
                'Metric': {
                    'Namespace': 'AWS/ApplicationELB',
                    'MetricName': 'RequestCount',
                    'Dimensions': [
                        {'Name': 'LoadBalancerName', 'Value': 'my-load-balancer'},
                    ]
                },
                'Period': 3600,  # 1 hour
                'Stat': 'Sum',
            },
            'ReturnData': True,
        },
    ],
    StartTime=datetime.now() - timedelta(days=30),
    EndTime=datetime.now(),
)

This data can help you predict when you'll need to add capacity or optimize your setup.

Enterprise-Grade Integration with Observability Platforms

CloudWatch alone provides only a fragmented view of your infrastructure. For production systems, you need a unified observability approach that connects load balancer performance with backend services, user experience, and business outcomes.

Comprehensive Observability Platform Comparison

Looking for a tool that brings all your metrics together with context? Here's a detailed comparison of leading options:

Platform Strengths Best For Integration Complexity Pricing Model
Last9 High-cardinality observability at scale; predictable pricing; excellent correlation capabilities; OpenTelemetry support Organizations with variable loads and budget constraints; teams needing unified metrics, logs, and traces Low-Medium (well-documented APIs) Events-based pricing (predictable)
Prometheus with OpenTelemetry Open-source; highly customizable; strong community Teams with Kubernetes; organizations with in-house expertise Medium-High (requires management) Free (infrastructure costs only)
Grafana Cloud Beautiful visualizations; multi-cloud support; strong dashboarding Teams needing visual insights; multi-cloud environments Low-Medium Resource-based with free tier
Dynatrace AI-powered anomaly detection; session replay Enterprises with complex environments Medium Host/application based
Honeycomb High-cardinality debugging; BubbleUp feature Troubleshooting complex distributed systems Low Event-based with sampling

If you're watching your budget while needing serious observability horsepower, Last9 stands out from the crowd. We've handled monitoring for 11 of the 20 largest live-streaming events in history, proving scalability under extreme conditions.

How to Create a Unified Monitoring Strategy

Don't treat ELB stats as isolated metrics. The real power comes from connecting them with:

  • Application metrics
  • Infrastructure stats
  • Business KPIs

This lets you answer questions like "Did that latency spike affect our conversion rate?" instead of just "Is the load balancer working?"

💡
To get a fuller picture of your AWS setup beyond ELB, this guide on AWS monitoring tools walks through what's available and how to use them effectively.

Troubleshooting Common ELB Issues Using Statistics

When things go wrong, here's how to use statistics to find the culprit:

High Latency

If your latency metrics are climbing:

  1. Check TargetResponseTime to see if it's a backend issue
  2. Look at ActiveConnectionCount to check for connection pool exhaustion
  3. Review RequestCount for unexpected traffic patterns

Error Rate Spikes

For a sudden increase in errors:

  1. Check HTTP code breakdowns to identify client vs. server errors
  2. Review access logs to find common patterns in failing requests
  3. Check HealthyHostCount to see if backends are failing

Connection Problems

If clients can't connect:

  1. Check RejectedConnectionCount for capacity issues
  2. Look at the security group and network ACL metrics for blocked connections
  3. Review SurgeQueueLength to see if requests are backing up
💡
Now, fix production ELB metric issues instantly—right from your IDE, with AI and Last9 MCP.

ELB Statistics in a CI/CD World

Modern deployment pipelines should include metric validation:

Canary Deployments

Use metrics to validate canary releases:

  1. Deploy to a small percentage of targets
  2. Compare error rates and latency between the canary and stable
  3. Automatically promote or rollback based on statistical thresholds

Blue/Green Deployment Monitoring

When doing blue/green deployments:

  1. Set up duplicate dashboards for both environments
  2. Compare key metrics side by side
  3. Only shift traffic if the new environment's metrics look healthy

Post-Deployment Monitoring

After each release:

  1. Watch for statistically significant changes in any key metrics
  2. Set lower alert thresholds during the first hour after deployment
  3. Record baseline metrics for future comparison

Cost Optimization Through ELB Statistics

Your metrics can help you save money:

Right-sizing Based on Usage Patterns

  1. Analyze RequestCount and ProcessedBytes over time
  2. Identify peak periods vs. quiet times
  3. Consider switching to smaller load balancers during predictable low-traffic periods

Idle Load Balancer Detection

Create reports to find underutilized resources:

-- Example Athena query to find underutilized ELBs
SELECT 
    elb_name,
    SUM(request_count) as total_requests,
    AVG(latency) as avg_latency
FROM elb_logs
WHERE date BETWEEN DATE_SUB(CURRENT_DATE, 30) AND CURRENT_DATE
GROUP BY elb_name
HAVING total_requests < 1000 -- Define your threshold
ORDER BY total_requests ASC;

Cross-Region Traffic Analysis

If you're using Global Accelerator or similar:

  1. Compare metrics across regions
  2. Adjust traffic distribution to maximize performance while minimizing cost
  3. Consider shutting down redundant capacity in low-use regions

ELB Statistics for Security and Compliance

Your metrics aren't just for performance—they're a security tool too:

Detecting Unusual Traffic Patterns

Set up anomaly detection for:

  • Unusual spikes in request volume
  • Unexpected geographic distribution of traffic
  • Abnormal request size patterns

Compliance Reporting

For regulated industries:

  1. Set up automatic exports of key metrics
  2. Document uptime and performance statistics
  3. Create custom dashboards for auditors showing security-relevant metrics

DDoS Detection and Mitigation

Configure alerts for potential attacks:

  • Sudden increases in rejected connections
  • Unusual patterns in client errors
  • Spikes in specific HTTP methods (especially uncommon ones)

Conclusion

Understanding Elastic Load Balancer statistics isn’t just something you turn to when things break—it’s part of running stable, efficient systems day to day. Start with the core metrics, get familiar with the patterns, and build from there.

💡
If you're curious to see how others are tackling similar challenges, our Discord community has plenty of ongoing conversations around observability and practical monitoring setups.

FAQs

How frequently are ELB metrics updated in CloudWatch?

Most ELB metrics are updated at one-minute intervals in CloudWatch by default. However, if you enable detailed monitoring, some metrics can be available at 10-second intervals, which is valuable for real-time troubleshooting. Keep in mind that higher resolution comes with increased CloudWatch costs.

Do ELB statistics count against my CloudWatch costs?

Yes, standard CloudWatch pricing applies to ELB metrics. Basic metrics at 1-minute resolution are included at no additional charge, but detailed monitoring (higher frequency), custom metrics, dashboards, alarms, and long-term storage will incur additional costs. A typical production setup with reasonable alerting can add $50-200/month to your CloudWatch bill depending on scale.

How long should I retain ELB statistics for different business needs?

Business Need Recommended Retention Reasoning
Operational monitoring 2-4 weeks Sufficient for troubleshooting recent issues
Capacity planning 13 months Enables year-over-year comparisons and seasonal analysis
Compliance (finance) 3-7 years Many financial regulations require extended retention
Compliance (healthcare) 6+ years HIPAA and related standards have lengthy requirements
Security forensics 1+ year Allows investigation of long-term patterns and breaches

Always consult your legal and compliance teams for specific requirements in regulated industries.

Can I export ELB metrics to external systems for custom analysis?

Yes, you have several options for exporting metrics:

  1. CloudWatch API: Programmatically extract metrics using boto3 or AWS SDK
  2. CloudWatch Metric Streams: Stream metrics in real-time to Kinesis Data Firehose
  3. Third-party integrations: Most observability platforms offer direct CloudWatch integration
  4. Custom Lambda exporters: Build custom export logic for specific requirements

For large-scale environments, consider using CloudWatch Metric Streams with Firehose to an S3 data lake for cost-effective long-term storage and analysis.

How do ELB statistics differ between Application Load Balancers and Network Load Balancers?

Feature Application Load Balancer Network Load Balancer
Layer Layer 7 (HTTP/HTTPS) Layer 4 (TCP/UDP/TLS)
Request metrics Full HTTP(S) metrics Connection-based metrics
Status code tracking HTTP status codes (2XX, 4XX, 5XX) TCP connection status
Unique metrics Rule evaluation counts, HTTP header processing TCP_ELB_Reset_Count, flow metrics
Health check metrics HTTP-based health checks TCP connection health checks
Latency components Request processing, response processing Connection establishment time

This fundamental difference means your monitoring strategy should align with the load balancer type you're using.

What's the difference between load balancer latency and target response time?

Load Balancer Latency = Request Processing Time + Target Response Time + Response Processing Time

  • Request Processing Time: Time from receiving request to sending it to target (includes TLS handshake, rule evaluation)
  • Target Response Time: Time spent waiting for backend target to respond (includes application logic, database queries)
  • Response Processing Time: Time to process and send response back to client

When troubleshooting high latency, these component metrics help pinpoint whether the issue lies in the load balancer configuration, network path, or backend application code.

How can I accurately calculate the cost impact of my ELB usage?

To build a comprehensive cost model:

  1. Base cost: Number of load balancers × hourly rate × hours in month
  2. Data transfer: ProcessedBytes metric × appropriate data transfer rate
    • Internet outbound: $0.09-$0.15/GB (varies by region and volume)
    • Inter-AZ: ~$0.01/GB
    • Intra-AZ: Free
  3. LCU costs (for ALBs): Calculate Load Balancer Capacity Units based on:
    • New connections per second (divided by 25)
    • Active connections per minute (divided by 3,000)
    • Processed bytes per hour (divided by 1GB)
    • Rule evaluations per second (divided by 1,000)

Pro tip: Many organizations discover that data transfer costs often exceed the hourly load balancer costs by 3-5x, especially for content-heavy applications. Implement proper caching strategies and CDN integration to minimize these costs.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.