Skip to content
Last9

Standard Deviation Alerting

Create adaptive alerts that detect anomalies using statistical analysis instead of fixed thresholds.

Standard deviation alerting automatically detects unusual behavior in your services by comparing current metrics against historical patterns. Instead of setting fixed thresholds that may not account for normal traffic variations, these alerts adapt to your service’s baseline behavior and trigger when metrics deviate significantly from the norm.

Access Alerting in your Last9 dashboard to create alerts using the standard deviation macro.

How It Works

The adaptive_std_cmp macro is a built-in Last9 function that returns a boolean value (0 or 1) indicating whether your metric is behaving anomalously. Use this macro as your query when creating a static threshold alert in Alerting.

Implementation Pattern

adaptive_std_cmp(query, std_factor, duration)

Parameters:

  • query: Your base PromQL metric query
  • std_factor: Number of standard deviations from mean (typically 2-3)
  • duration: Time window for statistical calculation (without quotes)

Output: Boolean value where 1 = anomaly detected, 0 = normal behavior

Setting Up Standard Deviation Alerts

  1. Navigate to Alerting and click Create Alert
  2. Select Static Threshold as your alert type
  3. Enter your adaptive_std_cmp query in the query field
  4. Set your threshold to 0.5 (alert when output goes above 0.5, meaning anomaly detected)
  5. Configure sensitivity with bad out of total minutes

Threshold Setting

  • Recommended threshold: 0.5
  • Why: Since the macro outputs 0 (normal) or 1 (anomaly), setting threshold at 0.5 triggers the alert when an anomaly is detected

Sensitivity (Bad out of Total Minutes)

  • For critical services: 1 out of 3 minutes (high sensitivity)
  • For general monitoring: 2 out of 5 minutes (balanced)
  • For noisy metrics: 3 out of 10 minutes (low sensitivity, reduces false positives)

Common Use Cases

Response Time Anomalies

adaptive_std_cmp(trace_service_response_time{service_name="prod-api-service"}, 2, 10m)

Throughput Anomalies

adaptive_std_cmp(sum(trace_endpoint_count{service_name="prod-api-service",span_kind="SPAN_KIND_SERVER"}), 3, 15m)

External Service Performance

adaptive_std_cmp(trace_client_duration{service_name="prod-api-service",net_peer_name="external_host"}, 2.5, 5m)

Understanding Your Alerts

When the adaptive_std_cmp query returns 1, it means your metric has deviated beyond the specified number of standard deviations from its historical average over the defined duration. The alert will fire based on your threshold (0.5) and sensitivity settings.

Example Alert Behavior:

  • Query returns 1 (anomaly detected)
  • Threshold 0.5 is exceeded
  • If sensitivity is “2 out of 5 minutes”, the alert fires when 2 minutes within any 5-minute window show anomalous behavior

Standard Deviation Factor Guidelines

  • std_factor = 2: Catches ~95% of normal variations (more sensitive)
  • std_factor = 2.5: Balanced approach for most services
  • std_factor = 3: Catches ~99.7% of normal variations (less sensitive, critical alerts only)

Duration Window Guidelines

  • 5-10 minutes: Fast-changing services, real-time monitoring
  • 15-30 minutes: Standard web services
  • 1+ hours: Batch jobs, daily patterns

Best Practices

  • Start with std_factor=2.5 and duration=15m for most services
  • Use shorter durations for latency-sensitive applications
  • Use longer durations for services with natural daily/weekly patterns
  • Monitor alert frequency during initial setup and adjust sensitivity accordingly
  • Combine with traditional threshold alerts for comprehensive coverage

Standard deviation alerting provides a middle ground between static thresholds and advanced pattern detection algorithms. Use it when you need adaptive behavior but don’t require the specialized pattern matching of high/low spikes, level changes, or trend deviation algorithms.


Troubleshooting

Please get in touch with us on Discord or Email if you have any questions.