Modern systems throw off a lot of data—metrics, traces, logs—sometimes more than we know what to do with. When you're trying to understand how values spread out over time (like response times, memory usage, or queue lengths), averages alone don’t tell the full story.
OpenTelemetry histograms help fill in those gaps.
This guide walks through what they are, why they matter, and how DevOps engineers can use them to improve observability in real systems.
What Is an OpenTelemetry Histogram?
An OpenTelemetry histogram is a type of metric that captures the distribution of values rather than just simple counts or gauges. Think of it as a sophisticated bucket system that groups measurements into ranges, allowing you to understand not just averages but the actual spread of your data.
Histograms track how often values fall within specific ranges (buckets), making them perfect for measuring things like request latencies, response size, or resource consumption patterns. The OpenTelemetry specification defines histograms as one of the core instrument types for tracking distributions of values.
Key Components of OpenTelemetry Histograms
OpenTelemetry histograms consist of three main elements:
- Count: The total number of measurements recorded
- Sum: The sum of all recorded values
- Buckets: Predefined ranges that track how many values fall within each range
This structure gives you the power to calculate percentiles, averages, and identify outliers - critical capabilities for understanding system performance. The OpenTelemetry metrics API provides methods to work with these components efficiently.
Understanding Histogram Types
The OpenTelemetry specification supports two primary histogram implementations:
- Explicit bucket histogram: The traditional approach, where you define specific bucket boundaries
- Exponential histograms: An advanced implementation that automatically adjusts bucket boundaries using an exponential scale
Exponential histograms provide better precision with fewer buckets, making them more storage-efficient while maintaining analytical power.
Why DevOps Engineers Should Care About Histograms
Averages lie. That's the hard truth every DevOps engineer eventually learns. A system with an average response time of 200ms might still be frustrating users if 5% of requests take over 2 seconds.
Histograms solve this problem by showing you the full picture:
- They reveal performance outliers hiding behind "good" averages
- They help set realistic SLOs (Service Level Objectives)
- They make troubleshooting faster by showing where problems concentrate
- They provide data for capacity planning based on actual usage patterns
Real-World Example
Imagine you're monitoring API response times. A simple average shows 150ms - seems great! But a histogram reveals that while 90% of responses complete in under 100ms, 5% take over 500ms. That pattern suggests an intermittent problem affecting specific requests - something you'd never catch with basic metrics.
By visualizing this data in your monitoring dashboards, you can immediately spot these anomalies and take action.
OpenTelemetry Metrics: Understanding the Data Model
Before diving deeper into histograms, it's helpful to understand how they fit into the broader OpenTelemetry data model.
Core Metric Instruments
OpenTelemetry instruments are the measurement points in your code. The specification defines several instrument types:
- Counter: Measures a value that only increases (monotonic), like the number of requests
- UpDownCounter: Measures a value that can increase or decrease, like queue size
- ObservableGauge: Captures current value snapshots on demand
- Histogram: Captures distributions of values
Each instrument produces data points that get aggregated into metrics. Understanding which instrument type to use is crucial for effective monitoring.
Synchronous vs. Asynchronous Instruments
OpenTelemetry instruments are categorized as either:
- Synchronous instruments: These are called directly in your code at the point of measurement (Counter, UpDownCounter, Histogram)
- Asynchronous instruments: These use a callback mechanism to collect measurements periodically (ObservableCounter, ObservableUpDownCounter, ObservableGauge)
Histograms are typically synchronous instruments, meaning you record values explicitly when they occur. However, the OpenTelemetry metrics API also supports asynchronous instruments for cases where periodic sampling is more appropriate.
Histograms vs. Other Metric Types
OpenTelemetry supports several metric types. Here's how histograms compare:
Metric Type | Best For | Limitations | Temporality |
---|---|---|---|
Counter | Counting discrete events (requests, errors) | No distribution data | Monotonic, cumulative |
UpDownCounter | Tracking values that increase and decrease | No distribution data | Non-monotonic, cumulative |
Histogram | Distribution of values (request duration, payload size) | Higher storage requirements | Cumulative or delta temporality |
ObservableGauge | Current value snapshots (memory usage) | No history or patterns | Latest measurement only |
Histograms require more storage than simple metrics but provide exponentially more analytical power - a worthwhile trade-off for critical systems. When analyzing timeseries data, histograms give you the most complete picture.
Understanding Temporality in Histograms
Temporality refers to how metric data points relate to previous measurements. OpenTelemetry supports two types of temporality:
- Cumulative temporality: Each data point represents all measurements since the start
- Delta temporality: Each data point represents only the measurements since the last collection
Histograms support both temporality models, though different backends may prefer one over the other. Understanding temporality is critical when setting up metric pipelines and interpreting histogram data.
Setting Up OpenTelemetry Histograms
Getting started with OpenTelemetry histograms involves a few key steps:
- Install the OpenTelemetry SDK for your language
- Configure a Meter Provider as the central component for creating and managing metrics
- Create and use a histogram instrument to record your measurements
Here's a simplified example of creating a histogram:
# Create a histogram instrument
request_duration = meter.create_histogram(
name="http.request.duration",
description="HTTP request duration in milliseconds",
unit="ms", # Unit of measure for clarity
)
# Record values to the histogram
request_duration.record(duration, {"path": request.path, "method": request.method})
The metric name and set of attributes together form a unique identifier for your metrics. This combination helps when querying and analyzing data later.
Advanced Histogram Techniques
Once you've mastered the basics, these advanced techniques will help you get even more from OpenTelemetry histograms:
Custom Bucket Boundaries
The default bucket boundaries might not match your needs. For high-precision monitoring, define custom boundaries for your explicit bucket histogram with boundaries that make sense for your application - for web services, focus on user-perceptible thresholds (100ms, 300ms, 1s, etc.).
Working with Exponential Histograms
Exponential histograms offer improved precision with fewer buckets, making them ideal for scenarios with wide value ranges. They automatically adjust bucket boundaries using an exponential scale, providing better resolution for both small and large values.
Contextual Dimensions with Labels
Add context to your histogram data with labels (also called tags or attributes):
# Record with multiple dimensions
request_duration.record(
duration,
{
"service.name": "payment-api",
"http.method": "POST",
"endpoint": "/process",
"customer.tier": "premium"
}
)
These dimensions transform basic metrics into powerful analytical tools. You can slice and dice data to find patterns - like seeing if premium customers experience different performance than others.
OpenTelemetry's semantic conventions provide guidelines for standardized attribute naming, improving compatibility across different systems.
Using Asynchronous Instruments with Callbacks
While histograms are typically synchronous, you can create systems that periodically sample values using callbacks. This approach is useful for integrating with systems that already collect measurements internally.
The callback mechanism makes it easy to connect existing monitoring logic with OpenTelemetry instruments.
Analyzing Histogram Data
Collecting histogram data is just half the battle. Here's how to extract insights from your metric data:
Understanding Raw Histogram Data
At its core, a histogram metric generates these data points:
- Count: Total observations
- Sum: Sum of all observations
- Bucket counts: Number of observations in each bucket
- Bucket boundaries: The upper bounds of each bucket
These raw components can be transformed into powerful visualizations and analytics.
Calculating Percentiles
Percentiles tell you the value below which a percentage of observations fall:
- p50 (median): 50% of requests are faster than this
- p95: 95% of requests are faster than this
- p99: 99% of requests are faster than this
Monitoring these values helps spot performance issues before they affect users. A sudden jump in p99 often indicates problems even when averages look fine.
Visualizing Histogram Data
Raw numbers rarely tell the full story. Consider these visualization techniques:
- Heatmaps: Show the distribution changing over time
- Percentile graphs: Track p50/p95/p99 trends
- Bucket breakdowns: See which buckets contain the most measurements
Most observability dashboards support these visualizations when connected to OpenTelemetry data.
Common Histogram Use Cases
Histograms shine in these scenarios:
Response Time Monitoring
Track how long your services take to respond. This helps you:
- Set realistic SLAs based on actual performance
- Spot degradation before it becomes critical
- Identify which request types need optimization
Monitoring the number of requests along with their latency distribution gives you a complete picture of system performance.
Resource Consumption Patterns
Monitor how resources like memory, CPU, or connection pools are distributed across your application:
- Find resource hogs
- Right-size containers and instances
- Plan capacity based on actual usage patterns
Batch Processing Performance
For systems that process batches of work:
- Track processing time per item
- Identify problematic batch types
- Optimize scheduling based on performance characteristics
OpenTelemetry Backend Integrations
One of OpenTelemetry's strengths is its open-source nature and compatibility with various backends. The OpenTelemetry Collector serves as a central gateway for processing and forwarding telemetry data to your chosen backend.
Runtime Metadata and Resource Attributes
Adding runtime metadata to your metrics helps with filtering and analysis. Resource attributes become part of your metrics' identity and are additive to any attributes specified when recording values. This metadata helps you filter and group metrics when analyzing them in your observability platform.
Integrating with Observability Platforms
OpenTelemetry histograms work with most modern observability solutions. Here are some popular options:
Last9
If you're looking for an observability solution that’s built for scale and reliability, we’ve got you covered. Our platform has handled high-cardinality observability for some of the biggest live-streaming events out there, so we know what it takes to manage massive volumes of data.
Plus, with Last9 MCP, you can bring your production context directly into your workspace and tackle issues faster using AI and our platform’s capabilities. And since our pricing is consumption-based, you only pay for what you use, making costs both predictable and manageable.
Other Compatible Platforms
- Prometheus: Native histogram support with OpenTelemetry collector
- Grafana: Powerful visualization for histogram data and timeseries analytics
- Jaeger: Trace-centric observability with histogram support
- Elastic APM: Full-stack monitoring with OpenTelemetry integration
When selecting a backend, consider how well it handles histogram data specifically, as some platforms provide better tools for percentile calculations and distribution analysis.
Troubleshooting OpenTelemetry Histograms
Even the best tools sometimes need troubleshooting:
Missing or Incomplete Data
If the histogram data isn't appearing:
- Check that the buckets match your value range
- Verify exporter configuration and endpoints
- Ensure proper initialization of the meter provider
- Confirm that timestamp handling is correct in your configuration
High Cardinality Issues
Too many dimension combinations can cause performance issues:
- Limit labels to essential dimensions
- Avoid using high-cardinality values as labels (like user IDs)
- Consider aggregating some dimensions
Resource Usage Concerns
Histograms use more resources than simple metrics:
- Start with fewer buckets and add more as needed
- Sample extremely high-volume measurements
- Consider aggregating at the collector level
- Use exponential histograms for better efficiency
Conclusion
OpenTelemetry histograms give you a much clearer view of your system’s performance by showing the full distribution of measurements, not just averages. This extra detail can make a big difference when you’re looking to really understand and optimize your system.
FAQs
What is a histogram in OpenTelemetry?
A histogram in OpenTelemetry is a metric type that measures the distribution of values across predefined buckets. It consists of three components: count (total number of measurements), sum (total of all values), and buckets (counters for how many values fall within specific ranges). Histograms are ideal for tracking distributions like request durations, response sizes, and other values where understanding the spread is important.
What are the disadvantages of OpenTelemetry?
While OpenTelemetry offers many benefits, it does have some challenges:
- Initial setup complexity and learning curve
- Potential performance impact with excessive instrumentation
- Storage costs for high-cardinality data or too many histogram buckets
- The relatively young ecosystem means some features are still maturing
- Configuration can be complex for advanced use cases
What is the difference between gauge and histogram in Prometheus?
A gauge in Prometheus measures a single value that can go up or down (like current memory usage), while a histogram captures the distribution of values across predefined buckets. Gauges tell you what's happening at a specific moment, whereas histograms help you understand patterns and outliers in your data over time. Histograms enable percentile calculations, which gauges cannot provide.
What are the default buckets for OpenTelemetry histogram?
The default bucket boundaries in OpenTelemetry histograms vary by implementation, but they typically follow an exponential scale. In many implementations, the default buckets are approximately: [0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000]. However, you should always check your specific SDK documentation as these can differ.
What are OpenTelemetry metrics?
OpenTelemetry metrics are numerical measurements collected at regular intervals from your applications and infrastructure. They include four main types: counters (increasing values), gauges (values that go up and down), histograms (distribution of values), and summaries (similar to histograms but with server-calculated percentiles). Metrics help you monitor system health, performance, and business indicators.
How to start using OpenTelemetry Metrics?
To start using OpenTelemetry metrics:
- Install the OpenTelemetry SDK for your language
- Configure a MeterProvider and register metric exporters
- Create a Meter instance for your service
- Define metrics (counters, gauges, histograms) that matter to your application
- Record values to these metrics throughout your code
- Configure an exporter to send metrics to your observability platform
Is it correct to create a histogram out of custom counter metrics?
No, it's not recommended to build histograms from custom counters. Histograms require bucket definitions at creation time and track the distribution directly. Attempting to create a histogram from counter metrics would lose precision and require additional computation. OpenTelemetry provides native histogram types that handle the distribution calculations efficiently.
How can I find or correctly export resource labels of OpenTelemetry metrics in Prometeus?
To export resource labels from OpenTelemetry to Prometheus:
- Ensure your OpenTelemetry collector is configured with the Prometheus exporter
- Add resource attributes to your OpenTelemetry SDK configuration
- Use the
resource.attributes
configuration in the collector to control which attributes are exported - Check the Prometheus target endpoint (usually
/metrics
) to verify labels are being exported correctly - In Prometheus queries, these will appear as standard labels you can filter and group by
How to change the default path of the Prometheus exporter in OpenTelemetry Collector?
To change the default path of the Prometheus exporter in the OpenTelemetry Collector, modify your collector configuration YAML to include a custom path parameter under the Prometheus exporter configuration. Restart the collector after making these changes.
Is there any guidance documented for using histogram vs gauge while using OpenTelemetry?
The general guideline is:
- Use gauges for single values that can increase or decrease (memory usage, CPU utilization, queue length)
- Use histograms when you need to understand the distribution of values and calculate percentiles (request durations, payload sizes)
- Use counters for values that only increase (total requests, errors)
Choose histograms when the "shape" of your data matters more than just the current value.
How do you implement a histogram using OpenTelemetry?
To implement a histogram using OpenTelemetry:
- Set up a meter provider and get a meter instance
- Create a histogram with appropriate name, description, and unit
- Optionally define custom bucket boundaries if the defaults don't suit your needs
- Record values to the histogram at the appropriate points in your code
- Add dimensions using attributes/labels for better analysis
The exact implementation will vary by programming language but follows this general pattern.
How do you implement a histogram with OpenTelemetry for monitoring response times?
For monitoring response times, create a histogram with milliseconds as the unit of measure, then record the duration of each request with relevant attributes like method, route, and status code. This setup will give you detailed insights into your response time distribution across different endpoints and status codes.