Staring at a monitoring dashboard and still feeling like you're missing half the picture? Happens more often than you'd think. Especially when you're dealing with metrics like request durations or payload sizes—data that doesn’t behave nicely or fit into neat little averages.
This is where Prometheus' histogram buckets step in. They're not just another metric type; they're a better way to track the messy, uneven world of performance data.
In this guide, we’ll walk through how histogram buckets work, how to configure them properly, and how to squeeze real, useful insights out of them.
Understanding Prometheus Histogram Buckets
Histogram buckets are Prometheus' way of tracking the distribution of values across predefined ranges. Unlike simple counters or gauges, histograms tell you how many observations fall within specific value ranges.
Here's the deal: When you measure things like HTTP request duration, a single average (or even a median) won't show you the full picture. Some requests might be lightning fast while others crawl—and knowing this distribution is crucial for spotting issues.
Each bucket counts observations less than or equal to a specific upper bound. So if you have buckets at 0.1, 0.5, and 1.0 seconds, they'll tell you how many requests finished within those timeframes.
Technically speaking, a Prometheus histogram consists of three components:
- A counter for each bucket (
_bucket{le="<upper bound>"}
) - Counts all values less than or equal to the upper bound - A sum of all observed values (
_sum
) - Tracks the total sum of all observations - A count of all observations (
_count
) - Tracks the total number of data points
When you create a histogram metric like http_request_duration_seconds
, Prometheus automatically creates these three components, giving you the raw data needed for sophisticated analysis.
Why Histogram Buckets Matter for Production Monitoring and SRE Work
Why should you care about these buckets? Because they unlock insights that simpler metrics hide:
- They reveal performance outliers that averages mask: A service with an average response time of 300ms might have 1% of requests taking 5+ seconds
- They help you set realistic SLOs by showing actual service behavior: Set meaningful SLOs at the 95th or 99th percentile instead of averages
- They make it possible to calculate percentiles (like p95 or p99) on the fly: Calculate any percentile without pre-defining it at collection time
- They allow you to spot gradual performance shifts: Detect when your p95 starts creeping up while your average stays stable
- They enable better capacity planning: Understand how your system behaves under different load conditions
- They provide deeper insights into user experience: Correlate actual user experience with specific latency bands
Without histograms, you might see your average response time looking healthy at 200ms while completely missing that 5% of requests that take over 2 seconds—exactly the kind of thing that makes users bounce.
Consider this scenario: An e-commerce site sees an average page load time of 1.2 seconds, which seems acceptable. However, histogram data reveals that during peak hours, 10% of checkout page loads take 4+ seconds, directly correlating with cart abandonment. Without histogram buckets, this critical insight would remain hidden.
rate()
, histogram_quantile()
, and friends.Setting Up Your First Histogram in Prometheus
Getting started with histograms is straightforward. Here's how to implement one in your code using the Prometheus client library (shown in Go, but similar principles apply to other languages):
// Import the Prometheus client
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"time"
)
// Create a histogram with custom buckets
responseTimeHistogram := promauto.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: []float64{0.1, 0.3, 0.5, 0.7, 1.0, 2.0, 5.0, 10.0},
})
// Use it in your code
func handleRequest() {
start := time.Now()
// ... handle the request ...
duration := time.Since(start).Seconds()
responseTimeHistogram.Observe(duration)
}
This code creates a histogram that tracks how many requests take less than 0.1s, 0.3s, and so on. Each time you call Observe()
, Prometheus updates all the buckets.
You can also use the default buckets provided by Prometheus if you're just starting:
// Using default buckets (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10)
responseTimeHistogram := promauto.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
})
For Python users, the implementation is similarly straightforward:
from prometheus_client import Histogram
import time
REQUEST_TIME = Histogram('http_request_duration_seconds',
'HTTP request duration in seconds',
buckets=[0.1, 0.3, 0.5, 0.7, 1.0, 2.0, 5.0, 10.0])
def process_request():
start = time.time()
# ... handle the request ...
duration = time.time() - start
REQUEST_TIME.observe(duration)
Choosing the Right Bucket Boundaries
Here's where many teams trip up: picking bucket boundaries that make sense for your data. You need buckets that provide useful information without creating unnecessary cardinality.
Strategic Bucket Selection Process
- Start with your service level objectives (SLOs)
- If your SLO is "99% of requests under 300ms," include buckets at 200ms, 300ms, and 400ms
- If you have multiple SLOs (e.g., 95% under 200ms, 99% under 500ms), include boundaries for each
- Consider user experience thresholds
- Research shows users perceive latency differently at specific thresholds:
- Under 100ms feels instantaneous
- 100-300ms feels quick but noticeable
- 300-1000ms creates friction
- Over 1000ms (1s) feels broken
- Include bucket boundaries at these perceptual thresholds
- Research shows users perceive latency differently at specific thresholds:
- Use logarithmic or exponential scales
- Linear spacing wastes resources and provides poor resolution
- For latency, a common approach is powers of 2 (0.0625, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16)
- Or powers of 10 with intermediate steps (0.001, 0.01, 0.05, 0.1, 0.5, 1, 5, 10)
- Focus resolution where you need it most
- Place more buckets around your SLO thresholds
- For a 300ms SLO, consider extra buckets at 250ms, 275ms, 300ms, 325ms, 350ms
For a typical web service, you might want something like:
[]float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}
This gives you good visibility into both very fast responses (5-50ms) and slower outliers (1-10s).
Common Histogram Bucket Mistakes and Pitfalls to Avoid
Let's talk about what not to do with your histogram buckets:
1. Creating Too Many Buckets: The Resource Trap
Adding tons of buckets seems tempting—more data is better, right? Not when each bucket:
- Increases TSDB storage requirements
- Slows down query performance
- Adds networking and CPU overhead
- Potentially increases your monitoring costs
Real-world impact: A team with 100 services added 50 buckets per histogram and found their Prometheus storage grew by 300% while query latency doubled. They reduced to 12 carefully chosen buckets and maintained the same insight quality.
Recommendation: Stick to 10-15 buckets for most use cases. If you need more resolution in specific areas, consider using multiple histograms with different bucket layouts.
2. Using Linear Bucket Spacing: The Efficiency Killer
Using evenly spaced buckets (like 1s, 2s, 3s, 4s...) is one of the most common mistakes. Why it fails:
- Most performance data follows exponential or power-law distributions
- Linear spacing wastes resolution on unlikely values
- You get poor resolution where you need it most (often in the lower ranges)
Example: A team using linear buckets [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
had great visibility into the 100ms-1s range but couldn't differentiate between 10ms and 90ms responses—which represented 80% of their traffic!
Better approach: Use exponential spacing like [0.001, 0.01, 0.1, 0.5, 1, 5, 10]
or domain-specific clustering.
3. Not Covering the Full Range: The Blind Spot Problem
If your largest bucket is 5 seconds but you sometimes get 10-second requests, you'll miss important outliers. This creates dangerous blind spots in your monitoring.
Consequences:
- Inability to detect truly pathological cases
- "Surprise" timeouts that seem to come from nowhere
- Difficulty troubleshooting extreme outliers
Solution: Always include a bucket that exceeds your system timeout or worst expected case by at least 2×.
4. Focusing Only on Happy Path: The False Security Issue
Don't just create buckets for your expected performance range. Include buckets for seriously degraded performance to catch issues early.
Warning sign: If most of your buckets are below your SLO threshold, you're monitoring for confirmation, not detection.
Better practice: Allocate at least half your buckets to above-SLO ranges to catch degradation early.
5. High-Cardinality Label Combinations: The Explosion Problem
Adding high-cardinality labels to histograms multiplies storage requirements dramatically. Each label value creates a complete set of buckets.
Example to avoid:
httpRequestDuration.WithLabelValues(userID, endpoint, browser, region).Observe(duration)
Better approach:
// Only use labels that have manageable cardinality
httpRequestDuration.WithLabelValues(endpoint, region).Observe(duration)
Advanced Histogram Techniques
Let's explore some advanced histogram techniques that separate monitoring novices from experts.
Programmatic Dynamic Bucket Calculation
For systems where the performance profile might change or you need fine-tuned buckets, consider generating them programmatically:
func generateLogarithmicBuckets(min, max float64, count int) []float64 {
buckets := make([]float64, count)
logMin := math.Log(min)
logMax := math.Log(max)
for i := 0; i < count; i++ {
factor := float64(i) / float64(count-1)
buckets[i] = math.Exp(logMin + factor*(logMax-logMin))
}
return buckets
}
// Generate 10 buckets between 1ms and 10s
myBuckets := generateLogarithmicBuckets(0.001, 10, 10)
This creates logarithmically distributed buckets that adapt to your specific range needs.
You can also create buckets clustered around key thresholds:
func generateClusteredBuckets(targetValue, spread float64, count int) []float64 {
buckets := make([]float64, count)
// Create more buckets around the target value with specified spread
for i := 0; i < count; i++ {
// Sigmoid-like distribution centered on targetValue
position := float64(i)/float64(count-1)*2.0 - 1.0 // -1 to 1
buckets[i] = targetValue + spread*math.Tanh(position*2)
}
sort.Float64s(buckets) // Ensure sorted buckets
return buckets
}
// Generate buckets clustered around 0.3s (your SLO) with 0.2s spread
sloBuckets := generateClusteredBuckets(0.3, 0.2, 8)
Advanced Histogram Quantile Calculations
Prometheus lets you calculate quantiles from histograms using the histogram_quantile
function:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
This PromQL query gives you the 95th percentile response time over the last 5 minutes.
For more advanced analysis, you can:
- Compare percentiles across different time windows:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) /
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le))
- Track percentile shifts over time with recording rules:
groups:
- name: LatencyPercentiles
rules:
- record: service:request_duration:p95_5m
expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
- Calculate percentile ratios to detect distribution skew:
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) /
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
A sudden increase in this ratio can indicate that outliers are becoming more extreme while median performance stays stable.
Note that these calculated quantiles are approximations based on your bucket boundaries. The more buckets you have around the percentile you're calculating, the more accurate it will be.
Histogram Aggregation and Multi-Window Analysis
Histograms excel at aggregation across instances. This powerful technique lets you calculate global percentiles across your entire fleet:
# Calculate p99 across all frontend instances
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
You can also create multi-window analyses to detect trend changes:
# Calculate how much p95 latency has changed in the last hour vs last day
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le)) /
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1d])) by (le))
Values significantly above 1.0 indicate recent performance degradation.
Histogram vs. Summary: When to Choose the Right Distribution Metric Type
Prometheus offers two ways to track distributions: histograms and summaries. Understanding their differences is crucial for selecting the right approach for your monitoring needs:
Feature | Histogram | Summary |
---|---|---|
Server-side aggregation | Yes - can aggregate across instances | No - percentiles can't be meaningfully aggregated |
Client-side percentiles | No - calculated at query time | Yes - pre-calculated at collection time |
Calculation flexibility | High - any percentile at query time | Low - only pre-defined percentiles available |
Accuracy | Depends on bucket layout | Higher - exact values within time window |
Resource usage | Lower CPU but higher storage | Higher CPU but lower storage |
Memory pressure | Lower on clients | Higher on clients |
Query performance | Can be slower for percentile calculation | Faster for pre-defined percentiles |
Ecosystem integration | Better supported in alerting/dashboards | More limited query options |
When to Choose Histograms
Use histograms when:
- You need to aggregate data across multiple instances (e.g., global p99 across all API servers)
- You want the flexibility to calculate different percentiles without reconfiguring metrics
- Your percentile needs might change over time
- You're operating at scale and client-side CPU usage is a concern
- You want to create heat maps or distribution visualizations
- You're OK with approximations based on bucket boundaries
When to Choose Summaries
Use summaries when:
- You need highly accurate percentiles
- You don't need to aggregate across instances
- The exact percentiles are known in advance and won't change
- You're willing to trade higher client CPU for better accuracy
- You have a small number of instances
- Query performance is critical
Histogram Bucket Configurations for Common Services
Let's examine battle-tested histogram bucket configurations for different types of services, based on real production systems:
RESTful API Service Bucket Configuration
// RESTful API with 200ms target response time
apiLatencyBuckets := []float64{
0.005, // 5ms - ultra fast responses
0.025, // 25ms - very fast responses
0.050, // 50ms - fast responses
0.100, // 100ms - good responses
0.150, // 150ms - approaching target
0.200, // 200ms - target response time
0.300, // 300ms - slightly slow
0.500, // 500ms - noticeably slow
1.000, // 1s - user frustration threshold
2.500, // 2.5s - serious issue
5.000, // 5s - critical slowness
10.000, // 10s - approaching timeout
}
Why this works: Notice the increased resolution around the 200ms target, with enough higher buckets to catch degradation and enough lower buckets to track improvements.
Database Query Performance Monitoring
// Database query times with wide performance range
dbQueryBuckets := []float64{
0.001, // 1ms - cache hits/simple lookups
0.005, // 5ms - very fast queries
0.010, // 10ms - index lookups
0.025, // 25ms - good performance
0.050, // 50ms - acceptable
0.100, // 100ms - getting slow
0.250, // 250ms - slow queries
0.500, // 500ms - very slow
1.000, // 1s - problematic
2.500, // 2.5s - seriously problematic
5.000, // 5s - approaching timeout
10.000, // 10s - likely timeout
30.000, // 30s - long-running analytical queries
60.000, // 60s - very long running queries
}
Why this works: Databases exhibit extremely wide performance ranges—from sub-millisecond cache hits to multi-second analytical queries. This configuration provides visibility across that entire spectrum.
Background Job Processing System
// Background job processing times (in seconds)
jobProcessingBuckets := []float64{
1, // 1s - trivial jobs
5, // 5s - very quick jobs
15, // 15s - quick jobs
30, // 30s - typical jobs
60, // 1m - standard batch jobs
180, // 3m - longer batch jobs
300, // 5m - substantial jobs
600, // 10m - large batch jobs
1200, // 20m - very large jobs
1800, // 30m - approaching timeout
3600, // 1h - maximum expected runtime
7200, // 2h - abnormal runtime
}
Why this works: Background processing jobs often run for minutes rather than milliseconds. This configuration scales appropriately for tracking long-running processes.
Microservice Event Processing Pipelines
// Event processing pipeline latency
eventPipelineBuckets := []float64{
0.010, // 10ms - minimal processing
0.050, // 50ms - fast processing
0.100, // 100ms - normal processing
0.250, // 250ms - moderately complex processing
0.500, // 500ms - complex processing
1.000, // 1s - very complex processing
2.500, // 2.5s - includes external calls
5.000, // 5s - multiple external dependencies
10.000, // 10s - approaching timeout
15.000, // 15s - maximum expected runtime
}
Why this works: Event processing often involves both internal computation and external service calls, requiring buckets that span both fast and slow scenarios.
File Upload/Download Operations
// File transfer operations
fileTransferBuckets := []float64{
0.100, // 100ms - tiny files
0.500, // 500ms - small files
1.000, // 1s - moderate files
2.500, // 2.5s - larger files
5.000, // 5s - large files
10.000, // 10s - very large files
30.000, // 30s - huge files
60.000, // 1m - approaching timeout for standard files
120.000, // 2m - large media files
300.000, // 5m - maximum expected time
}
Why this works: File operations have a direct correlation between file size and processing time, requiring a wide range of buckets to accommodate different file sizes.
Implementing Effective Alerting Based on Histogram Data
One of the most powerful uses of histograms is building alerts on percentiles rather than averages. Here's how to implement a comprehensive alerting strategy based on histogram data:
Creating Multi-level Percentile Alerts
groups:
- name: LatencyAlerts
rules:
# Warning level - P95 latency exceeding SLO
- alert: HighP95Latency
expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "High latency on {{ $labels.service }}"
description: "P95 latency is above 500ms for {{ $labels.service }} for 10 minutes"
dashboard: "https://grafana.example.com/d/latency/service-latency?var-service={{ $labels.service }}"
runbook: "https://wiki.example.com/sre/runbooks/high-latency"
# Critical level - P99 latency exceeding SLO
- alert: CriticalP99Latency
expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 1.0
for: 5m
labels:
severity: critical
annotations:
summary: "Critical latency spike on {{ $labels.service }}"
description: "P99 latency is above 1s for {{ $labels.service }} for 5 minutes"
dashboard: "https://grafana.example.com/d/latency/service-latency?var-service={{ $labels.service }}"
runbook: "https://wiki.example.com/sre/runbooks/critical-latency"
This creates a two-tier alerting system that warns on moderate issues and escalates for critical ones.
Detecting Distribution Skew
Sometimes the problem isn't just high percentiles but a change in the distribution shape:
- alert: LatencyDistributionSkew
expr: |
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) /
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 10
for: 15m
labels:
severity: warning
annotations:
summary: "Latency distribution skew detected for {{ $labels.service }}"
description: "P99/P50 ratio exceeds 10× for {{ $labels.service }}, indicating outliers are growing while median remains stable"
This alert detects when your outliers get much worse while your median stays stable—often an early warning of developing problems.
Alerting on SLO Burn Rate
For more sophisticated alerting, you can create an SLO burn rate alert based on histogram data:
# First, define a recording rule for requests exceeding SLO
- record: service:requests_exceeding_slo:ratio_5m
expr: |
sum(rate(http_request_duration_seconds_count[5m])) by (service) -
sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (service)
/
sum(rate(http_request_duration_seconds_count[5m])) by (service)
# Then alert on burn rate
- alert: SLOBurnRateTooHigh
expr: service:requests_exceeding_slo:ratio_5m > 4 * 0.05 # 4× the allowed error budget
for: 15m
labels:
severity: warning
annotations:
summary: "SLO burn rate too high for {{ $labels.service }}"
description: "Error budget is being consumed too quickly for {{ $labels.service }}"
This alert triggers when your service is consuming an error budget 4 times faster than it should—providing early warning before you risk breaking your quarterly SLO.
Implementing Multi-window Alerting
For extra protection against false positives and negatives, implement multi-window alerting:
- alert: SustainedLatencyIncrease
expr: |
(histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 0.5)
and
(histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le, service)) > 0.5)
for: 5m
labels:
severity: warning
annotations:
summary: "Sustained latency increase for {{ $labels.service }}"
description: "P95 latency is above 500ms in both 5m and 1h windows for {{ $labels.service }}"
This approach reduces alert noise by requiring confirmation across different time windows.
Histogram Bucket Performance Optimization Techniques
For high-traffic services, histogram buckets can generate significant storage needs and performance overhead.
Here are advanced optimization techniques to keep your Prometheus deployment efficient:
1. Strategic Label Usage to Control Cardinality
Labels multiply the number of time series dramatically. Each unique combination of label values creates a complete set of histogram buckets:
// BAD: High cardinality - creates unique buckets per endpoint AND method
httpRequestDuration.WithLabelValues(endpoint, method, statusCode).Observe(duration)
// BETTER: Group by meaningful dimensions only
httpRequestDuration.WithLabelValues(endpoint, statusCode).Observe(duration)
Impact calculation: With 10 buckets, 100 endpoints, 4 methods, and 5 status codes:
- Bad approach: 10 × 100 × 4 × 5 = 20,000 time series
- Better approach: 10 × 100 × 5 = 5,000 time series (75% reduction)
Consider creating separate histograms for different dimensions rather than using labels when appropriate.
2. Implementing Client-side Aggregation
For services with many instances, perform client-side aggregation:
// Use PushGateway for batch processing jobs
func submitHistogramOnCompletion() {
registry := prometheus.NewRegistry()
registry.MustRegister(jobDurationHistogram)
pusher := push.New("pushgateway:9091", "batch_job").
Gatherer(registry)
// Push metrics once at the end of the job
if err := pusher.Push(); err != nil {
log.Errorf("Could not push to Pushgateway: %v", err)
}
}
Or use a pull approach with metric aggregation:
// Configure local aggregation with Prometheus agent mode
prometheus:
global:
scrape_interval: 15s
agent:
mode: true
remote_write:
- url: "https://prometheus-central:9090/api/v1/write"
name: central_prometheus
3. Bucket Selection Optimization
Remove unnecessary buckets that don't provide valuable insights:
// Original buckets
originalBuckets := []float64{0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10}
// Optimized buckets that still cover key thresholds but with fewer points
optimizedBuckets := []float64{0.01, 0.05, 0.1, 0.5, 1, 5, 10}
Storage impact: Going from 14 buckets to 7 can reduce TSDB storage requirements by 50% for that histogram.
4. Sampling and Filtering Techniques
For ultra-high-volume metrics, consider implementing sampling:
func shouldSample() bool {
return rand.Float64() < 0.1 // 10% sampling rate
}
func handleRequest() {
// Always measure the duration
start := time.Now()
// ... handle the request ...
duration := time.Since(start).Seconds()
// But only record to histogram for a percentage of requests
if shouldSample() {
requestDurationHistogram.Observe(duration)
}
}
This works well for services with thousands of requests per second where you don't need to measure every request.
5. Time Series Retention and Downsampling
Configure appropriate retention periods based on access patterns:
# In prometheus.yml
storage:
tsdb:
# Retain raw histogram data for 15 days
retention.time: 15d
# Use recording rules for downsampled long-term storage
- record: job:request_duration:histogram_p95_1h
expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le, job))
This pattern keeps full-resolution histogram data for recent analysis while preserving key metrics for longer-term trending.
Troubleshooting Common Prometheus Histogram Implementation Issues
Even experienced teams encounter issues with Prometheus histograms. Here's a detailed troubleshooting guide for common problems:
Diagnosing Inaccurate Percentile Calculations
When your histogram_quantile
calculations produce unexpected or seemingly wrong results:
Problem: Percentiles jumping erratically between queries
- Root cause: Insufficient data in the time window or poor bucket selection
Solution: Increase the time window in rate()
function:
# More stable with longer windowhistogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[10m])) by (le))
Problem: Percentiles always land exactly on bucket boundaries
- Root cause: Linear interpolation assumes even distribution within buckets
Solution: Add more buckets around critical percentiles:
// Add fine-grained buckets around p95 target (300ms)[]float64{0.25, 0.27, 0.29, 0.3, 0.31, 0.33, 0.35}
Problem: Percentiles reporting lower than minimum observed values
- Root cause: Often occurs with low traffic and rate() calculations
Solution: Use increase()
instead of rate()
for low-volume services:
histogram_quantile(0.95, sum(increase(http_request_duration_seconds_bucket[10m])) by (le))
Resolving High Cardinality Explosions
When histograms cause excessive storage or memory usage:
Problem: Prometheus crashes or slows dramatically after adding histograms
- Root cause: Too many label combinations multiplying bucket cardinality
- Solution: Implement one or more of these fixes:
- Reduce label dimensions on high-cardinality histograms
- Increase Prometheus storage allocation
- Shard your Prometheus instances by metric type
Check: Run this query to identify the worst offenders:
topk(10, count by (__name__, job) ({__name__=~".+_bucket"}))
Problem: Queries on histogram data become extremely slow
- Root cause: Too many histogram buckets across too many services
Solution: Create recording rules for common percentile calculations:
- record: job:http_request_duration:p95_5m expr: histogram_quantile(0.95, sum by (le, job) (rate(http_request_duration_seconds_bucket[5m])))
Fixing Missing or Incomplete Histogram Data
When your histograms aren't capturing all the data you expect:
Problem: Some requests don't appear in any bucket
- Root cause: Bucket range doesn't cover all values
- Solution: Add an explicit +Inf bucket or check for dropped metrics
Diagnosis: Check the difference between count and sum of bucket counts:
sum(http_request_duration_seconds_count) - sum(http_request_duration_seconds_bucket{le="+Inf"})
(Should be zero; if not, there's a problem)
Problem: Histogram data disappears after service restarts
- Root cause: Counter reset behavior with incorrect query formulation
Solution: Use increase()
or rate()
instead of raw counters:
# Handles counter resets properlysum(increase(http_request_duration_seconds_bucket[5m])) by (le)
Problem: Inconsistent histogram data across service instances
- Root cause: Different bucket configurations between instances
- Solution: Standardize histogram bucket definitions in a shared configuration or library
Resolving Resource Consumption Issues
When histograms consume excessive resources:
Problem: Prometheus storage growing too quickly
- Root cause: Too many histograms with too many buckets
Solution: Implement a histogram bucket reduction strategy:
// Before: 14 buckets[]float64{0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20}// After: 7 strategically chosen buckets[]float64{0.001, 0.01, 0.1, 0.5, 1, 10, 20}
Problem: Histogram instrumentation adds too much overhead
- Root cause: High-frequency observations in critical paths
Solution: Implement adaptive sampling:
sampleRate := 0.01 // Sample 1% by defaultif duration > 1.0 { // But sample 100% of slow requests sampleRate = 1.0}if rand.Float64() <= sampleRate { requestDurationHistogram.Observe(duration)}
Integrating with Last9 for Advanced Histogram Analytics and Visualization
Prometheus histograms offer insights, but Last9 takes them further with enhanced visualization, correlation, and management features.
Key Benefits:
- Visualization: Heat maps, percentile comparisons, and anomaly detection.
- SLO Management: Track error budgets and predict violations.
- Correlation: Link latency with infrastructure metrics and deployments.
- High-cardinality Management: Optimize bucketing and reduce inefficiencies.
Cost-effective:
Event-based pricing keeps costs predictable, even with high traffic or complex architectures.
Talk to us if you'd like to know more or if you want to explore at your own, get started for free!
FAQs
What exactly is the difference between histogram and summary metrics in Prometheus?
Histograms and summaries track distribution data differently:
Histograms:
- Store observations in configurable buckets (counters of values ≤ each threshold)
- Calculate percentiles at query time using
histogram_quantile()
- Allow aggregation across multiple instances (crucial for distributed systems)
- Provide flexibility to calculate any percentile without pre-configuration
- Take less client-side CPU but more storage space
- Work well with Prometheus recording rules and alerting
Summaries:
- Pre-calculate percentiles in the client application
- Store specific quantiles (e.g., 0.5, 0.9, 0.99) directly
- Provide more accurate percentiles within single instances
- Cannot be meaningfully aggregated across instances
- Use more client-side resources but less storage
- Have fixed percentiles that can't be changed after collection
Choose histograms when you need cross-instance aggregation or flexible percentile selection. Choose summaries when you need exact percentiles on single instances.
How do I determine the optimal number of buckets for a Prometheus histogram?
The optimal bucket count balances accuracy against resource usage:
- General guideline: 10-15 buckets work well for most applications
- Minimum effective number: At least 7 buckets (to cover 2-3 orders of magnitude)
- Resource-constrained systems: Stick to 7-10 strategically placed buckets
- High-precision requirements: Up to 20-25 buckets, focusing resolution where needed
Focus bucket density around:
- Your SLO thresholds (e.g., more buckets around your p95 target)
- User experience breakpoints (e.g., 100ms, 300ms, 1s)
- Expected operational ranges for your specific service
Remember that each bucket creates a separate time series, so costs grow linearly with bucket count.
When changing histogram bucket definitions, what happens to historical data?
When you modify histogram bucket definitions:
- New time series creation: Prometheus creates entirely new time series for the new buckets
- Historical data limitation: Historical data won't be retroactively available in the new buckets
- Dual maintenance period: You'll need to maintain both old and new histograms during transition
- Recording rule approach: For critical metrics, create recording rules with the old buckets before changing
Best practices for bucket changes:
- Plan bucket layouts carefully before going to production
- When changing is necessary, keep the old metric name for twice your retention period
- Use a new metric name for the new bucket layout (e.g.,
http_request_duration_seconds_v2
) - Create a recording rule that combines old and new data during the transition
What's the most effective way to calculate accurate percentiles from Prometheus histogram buckets?
For accurate percentile calculations:
# Basic p99 calculation
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# For more stability in low-traffic services
histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[10m])) by (le))
# Aggregating across job instances while preserving endpoint dimension
histogram_quantile(0.95, sum by (le, endpoint) (rate(http_request_duration_seconds_bucket[5m])))
To improve accuracy:
- Use more buckets around the percentile you're calculating
- Use longer time windows for stability (5-10m instead of 1m)
- For critical percentiles, create recording rules to ensure calculation consistency
Remember that percentile accuracy is always limited by your bucket layout—more buckets around key percentiles yield better accuracy.
How can high cardinality with Prometheus histograms be effectively managed?
High cardinality management strategies:
- Label discipline:
- Limit high-cardinality labels (like user_id, request_id) from histograms
- Use no more than 2-3 label dimensions per histogram
- Move high-cardinality dimensions to separate metrics when needed
- Bucket optimization:
- Use only necessary buckets (8-12 is often sufficient)
- Standardize bucket layouts across services
- Remove buckets that don't provide actionable insights
- Architecture approaches:
- Implement client-side aggregation for high-volume services
- Use federation or hierarchical Prometheus for large-scale deployments
- Create recording rules for commonly queried percentiles
- Sampling techniques:
- Implement probabilistic sampling for ultra-high-volume services
- Use higher sampling rates for outliers and errors
- Consider exemplar-based approaches for detailed analysis
How can I improve the accuracy of percentiles calculated from histogram buckets?
Percentile accuracy depends on your bucket configuration:
- Add targeted bucket density:
- Place more buckets around critical percentiles (e.g., your p95 or p99 target)
- Example: For a p95 target of 300ms, add buckets at 250ms, 275ms, 300ms, 325ms, 350ms
- Use logarithmic distribution:
- Linear buckets create poor resolution; use exponential/logarithmic spacing
- Evenly distribute bucket density in log-space, not linear space
- Incorporate historical performance:
- Analyze several weeks of data to identify your actual distribution
- Place buckets based on observed percentiles, not theoretical ones
- Evaluate specific service patterns:
- Services with bimodal distributions need buckets covering both modes
- Cache-heavy services need extra resolution in lower latency ranges
The theoretical maximum accuracy is ±(upper_bound - lower_bound)/2 for the bucket containing your percentile.
Beyond request timing, what other metrics benefit from histogram bucket analysis?
Histograms are valuable for many distributions beyond request duration:
- Resource utilization: Memory usage, CPU utilization, disk IOPS
- Queue metrics: Queue depth, time in queue, batch sizes
- Network performance: Packet sizes, network latency, throughput
- Database metrics: Query execution time, connection pool usage, row counts
- Cache performance: Cache hit ratios, time-to-cache, object sizes
- User behavior: Session duration, items per cart, clicks per session
- Message processing: Message size, processing latency, retry counts
- Batch job metrics: Records processed per second, job duration, error rates
- API response sizes: Payload sizes for requests and responses
- Thread pool metrics: Thread usage, task execution time, queue wait time
The pattern applies whenever you need to understand a distribution rather than just averages or totals.
How do I implement histogram bucket monitoring for non-time measurements like request sizes?
For non-time measurements:
- Adjust bucket scales to match data characteristics:
- Memory usage: Consider MB-scale buckets like
[128, 256, 512, 1024, 2048, 4096]
- Queue depth: Use application-appropriate buckets like
[1, 5, 10, 50, 100, 500]
- Message counts: Linear buckets might work better, e.g.,
[10, 20, 50, 100, 250, 500]
- Memory usage: Consider MB-scale buckets like
Create SLOs on size distributions when appropriate:
# Alert when p95 message size exceeds 500KB
histogram_quantile(0.95, sum(rate(message_size_bytes_bucket[5m])) by (le, topic)) > 512000
Observe distribution patterns:
// Track API response sizes
sizeBytes := float64(len(responseData))
responseSizeHistogram.Observe(sizeBytes)
Choose appropriate units and scale:
// For request sizes in bytes, using powers-of-10 scale
requestSizeHistogram := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_size_bytes",
Help: "HTTP request size in bytes",
Buckets: []float64{10, 100, 1000, 10000, 100000, 1000000, 10000000},
})