Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 25th, ‘25 / 10 min read

Everything You Need to Know About OpenTelemetry Histograms

OpenTelemetry histograms help you go beyond averages. Learn how they work and why they matter for real-world observability in DevOps.

Everything You Need to Know About OpenTelemetry Histograms

Modern systems throw off a lot of data—metrics, traces, logs—sometimes more than we know what to do with. When you're trying to understand how values spread out over time (like response times, memory usage, or queue lengths), averages alone don’t tell the full story.

OpenTelemetry histograms help fill in those gaps.

This guide walks through what they are, why they matter, and how DevOps engineers can use them to improve observability in real systems.

What Is an OpenTelemetry Histogram?

An OpenTelemetry histogram is a type of metric that captures the distribution of values rather than just simple counts or gauges. Think of it as a sophisticated bucket system that groups measurements into ranges, allowing you to understand not just averages but the actual spread of your data.

Histograms track how often values fall within specific ranges (buckets), making them perfect for measuring things like request latencies, response size, or resource consumption patterns. The OpenTelemetry specification defines histograms as one of the core instrument types for tracking distributions of values.

💡
If you're setting up histograms in your observability pipeline, this OpenTelemetry Collector guide walks through the setup steps and common gotchas to watch out for.

Key Components of OpenTelemetry Histograms

OpenTelemetry histograms consist of three main elements:

  • Count: The total number of measurements recorded
  • Sum: The sum of all recorded values
  • Buckets: Predefined ranges that track how many values fall within each range

This structure gives you the power to calculate percentiles, averages, and identify outliers - critical capabilities for understanding system performance. The OpenTelemetry metrics API provides methods to work with these components efficiently.

Understanding Histogram Types

The OpenTelemetry specification supports two primary histogram implementations:

  1. Explicit bucket histogram: The traditional approach, where you define specific bucket boundaries
  2. Exponential histograms: An advanced implementation that automatically adjusts bucket boundaries using an exponential scale

Exponential histograms provide better precision with fewer buckets, making them more storage-efficient while maintaining analytical power.

💡
If you're working with histograms in OpenTelemetry, you might also find this guide on monitoring host metrics using OpenTelemetry helpful. It covers setting up the Collector, configuring receivers, and understanding key system metrics.

Why DevOps Engineers Should Care About Histograms

Averages lie. That's the hard truth every DevOps engineer eventually learns. A system with an average response time of 200ms might still be frustrating users if 5% of requests take over 2 seconds.

Histograms solve this problem by showing you the full picture:

  • They reveal performance outliers hiding behind "good" averages
  • They help set realistic SLOs (Service Level Objectives)
  • They make troubleshooting faster by showing where problems concentrate
  • They provide data for capacity planning based on actual usage patterns

Real-World Example

Imagine you're monitoring API response times. A simple average shows 150ms - seems great! But a histogram reveals that while 90% of responses complete in under 100ms, 5% take over 500ms. That pattern suggests an intermittent problem affecting specific requests - something you'd never catch with basic metrics.

By visualizing this data in your monitoring dashboards, you can immediately spot these anomalies and take action.

OpenTelemetry Metrics: Understanding the Data Model

Before diving deeper into histograms, it's helpful to understand how they fit into the broader OpenTelemetry data model.

Core Metric Instruments

OpenTelemetry instruments are the measurement points in your code. The specification defines several instrument types:

  • Counter: Measures a value that only increases (monotonic), like the number of requests
  • UpDownCounter: Measures a value that can increase or decrease, like queue size
  • ObservableGauge: Captures current value snapshots on demand
  • Histogram: Captures distributions of values

Each instrument produces data points that get aggregated into metrics. Understanding which instrument type to use is crucial for effective monitoring.

Synchronous vs. Asynchronous Instruments

OpenTelemetry instruments are categorized as either:

  • Synchronous instruments: These are called directly in your code at the point of measurement (Counter, UpDownCounter, Histogram)
  • Asynchronous instruments: These use a callback mechanism to collect measurements periodically (ObservableCounter, ObservableUpDownCounter, ObservableGauge)

Histograms are typically synchronous instruments, meaning you record values explicitly when they occur. However, the OpenTelemetry metrics API also supports asynchronous instruments for cases where periodic sampling is more appropriate.

Histograms vs. Other Metric Types

OpenTelemetry supports several metric types. Here's how histograms compare:

Metric TypeBest ForLimitationsTemporality
CounterCounting discrete events (requests, errors)No distribution dataMonotonic, cumulative
UpDownCounterTracking values that increase and decreaseNo distribution dataNon-monotonic, cumulative
HistogramDistribution of values (request duration, payload size)Higher storage requirementsCumulative or delta temporality
ObservableGaugeCurrent value snapshots (memory usage)No history or patternsLatest measurement only

Histograms require more storage than simple metrics but provide exponentially more analytical power - a worthwhile trade-off for critical systems. When analyzing timeseries data, histograms give you the most complete picture.

💡
Check out this guide on OpenTelemetry Logging to see how structured logging can complement your metrics and traces for better observability.

Understanding Temporality in Histograms

Temporality refers to how metric data points relate to previous measurements. OpenTelemetry supports two types of temporality:

  • Cumulative temporality: Each data point represents all measurements since the start
  • Delta temporality: Each data point represents only the measurements since the last collection

Histograms support both temporality models, though different backends may prefer one over the other. Understanding temporality is critical when setting up metric pipelines and interpreting histogram data.

Setting Up OpenTelemetry Histograms

Getting started with OpenTelemetry histograms involves a few key steps:

  1. Install the OpenTelemetry SDK for your language
  2. Configure a Meter Provider as the central component for creating and managing metrics
  3. Create and use a histogram instrument to record your measurements

Here's a simplified example of creating a histogram:

# Create a histogram instrument
request_duration = meter.create_histogram(
    name="http.request.duration",
    description="HTTP request duration in milliseconds",
    unit="ms",  # Unit of measure for clarity
)

# Record values to the histogram
request_duration.record(duration, {"path": request.path, "method": request.method})

The metric name and set of attributes together form a unique identifier for your metrics. This combination helps when querying and analyzing data later.

Advanced Histogram Techniques

Once you've mastered the basics, these advanced techniques will help you get even more from OpenTelemetry histograms:

Custom Bucket Boundaries

The default bucket boundaries might not match your needs. For high-precision monitoring, define custom boundaries for your explicit bucket histogram with boundaries that make sense for your application - for web services, focus on user-perceptible thresholds (100ms, 300ms, 1s, etc.).

Working with Exponential Histograms

Exponential histograms offer improved precision with fewer buckets, making them ideal for scenarios with wide value ranges. They automatically adjust bucket boundaries using an exponential scale, providing better resolution for both small and large values.

💡
For insights into integrating OpenTelemetry with GraphQL APIs, explore our guide here: How to Use OpenTelemetry with Your GraphQL Stack.

Contextual Dimensions with Labels

Add context to your histogram data with labels (also called tags or attributes):

# Record with multiple dimensions
request_duration.record(
    duration,
    {
        "service.name": "payment-api",
        "http.method": "POST", 
        "endpoint": "/process",
        "customer.tier": "premium"
    }
)

These dimensions transform basic metrics into powerful analytical tools. You can slice and dice data to find patterns - like seeing if premium customers experience different performance than others.

OpenTelemetry's semantic conventions provide guidelines for standardized attribute naming, improving compatibility across different systems.

Using Asynchronous Instruments with Callbacks

While histograms are typically synchronous, you can create systems that periodically sample values using callbacks. This approach is useful for integrating with systems that already collect measurements internally.

The callback mechanism makes it easy to connect existing monitoring logic with OpenTelemetry instruments.

Analyzing Histogram Data

Collecting histogram data is just half the battle. Here's how to extract insights from your metric data:

Understanding Raw Histogram Data

At its core, a histogram metric generates these data points:

  • Count: Total observations
  • Sum: Sum of all observations
  • Bucket counts: Number of observations in each bucket
  • Bucket boundaries: The upper bounds of each bucket

These raw components can be transformed into powerful visualizations and analytics.

Calculating Percentiles

Percentiles tell you the value below which a percentage of observations fall:

  • p50 (median): 50% of requests are faster than this
  • p95: 95% of requests are faster than this
  • p99: 99% of requests are faster than this

Monitoring these values helps spot performance issues before they affect users. A sudden jump in p99 often indicates problems even when averages look fine.

Visualizing Histogram Data

Raw numbers rarely tell the full story. Consider these visualization techniques:

  • Heatmaps: Show the distribution changing over time
  • Percentile graphs: Track p50/p95/p99 trends
  • Bucket breakdowns: See which buckets contain the most measurements

Most observability dashboards support these visualizations when connected to OpenTelemetry data.

💡
For a deeper look into setting up custom metrics in OpenTelemetry, check out our guide on Getting Started with OpenTelemetry Custom Metrics.

Common Histogram Use Cases

Histograms shine in these scenarios:

Response Time Monitoring

Track how long your services take to respond. This helps you:

  • Set realistic SLAs based on actual performance
  • Spot degradation before it becomes critical
  • Identify which request types need optimization

Monitoring the number of requests along with their latency distribution gives you a complete picture of system performance.

Resource Consumption Patterns

Monitor how resources like memory, CPU, or connection pools are distributed across your application:

  • Find resource hogs
  • Right-size containers and instances
  • Plan capacity based on actual usage patterns

Batch Processing Performance

For systems that process batches of work:

  • Track processing time per item
  • Identify problematic batch types
  • Optimize scheduling based on performance characteristics
💡
For a detailed walkthrough on setting up metrics aggregation in OpenTelemetry, check out our guide here: OpenTelemetry Metrics Aggregation.

OpenTelemetry Backend Integrations

One of OpenTelemetry's strengths is its open-source nature and compatibility with various backends. The OpenTelemetry Collector serves as a central gateway for processing and forwarding telemetry data to your chosen backend.

Runtime Metadata and Resource Attributes

Adding runtime metadata to your metrics helps with filtering and analysis. Resource attributes become part of your metrics' identity and are additive to any attributes specified when recording values. This metadata helps you filter and group metrics when analyzing them in your observability platform.

Integrating with Observability Platforms

OpenTelemetry histograms work with most modern observability solutions. Here are some popular options:

Last9

If you're looking for an observability solution that’s built for scale and reliability, we’ve got you covered. Our platform has handled high-cardinality observability for some of the biggest live-streaming events out there, so we know what it takes to manage massive volumes of data.

Plus, with Last9 MCP, you can bring your production context directly into your workspace and tackle issues faster using AI and our platform’s capabilities. And since our pricing is consumption-based, you only pay for what you use, making costs both predictable and manageable.

Other Compatible Platforms

  • Prometheus: Native histogram support with OpenTelemetry collector
  • Grafana: Powerful visualization for histogram data and timeseries analytics
  • Jaeger: Trace-centric observability with histogram support
  • Elastic APM: Full-stack monitoring with OpenTelemetry integration

When selecting a backend, consider how well it handles histogram data specifically, as some platforms provide better tools for percentile calculations and distribution analysis.

Troubleshooting OpenTelemetry Histograms

Even the best tools sometimes need troubleshooting:

Missing or Incomplete Data

If the histogram data isn't appearing:

  • Check that the buckets match your value range
  • Verify exporter configuration and endpoints
  • Ensure proper initialization of the meter provider
  • Confirm that timestamp handling is correct in your configuration

High Cardinality Issues

Too many dimension combinations can cause performance issues:

  • Limit labels to essential dimensions
  • Avoid using high-cardinality values as labels (like user IDs)
  • Consider aggregating some dimensions
💡
To better understand high cardinality and its impact on observability, explore our guide: What Is High Cardinality?.

Resource Usage Concerns

Histograms use more resources than simple metrics:

  • Start with fewer buckets and add more as needed
  • Sample extremely high-volume measurements
  • Consider aggregating at the collector level
  • Use exponential histograms for better efficiency

Conclusion

OpenTelemetry histograms give you a much clearer view of your system’s performance by showing the full distribution of measurements, not just averages. This extra detail can make a big difference when you’re looking to really understand and optimize your system.

💡
And if you want to keep the conversation going, our Discord community is here for you. We’ve got a dedicated channel where you can chat with other developers about your specific use case.

FAQs

What is a histogram in OpenTelemetry?

A histogram in OpenTelemetry is a metric type that measures the distribution of values across predefined buckets. It consists of three components: count (total number of measurements), sum (total of all values), and buckets (counters for how many values fall within specific ranges). Histograms are ideal for tracking distributions like request durations, response sizes, and other values where understanding the spread is important.

What are the disadvantages of OpenTelemetry?

While OpenTelemetry offers many benefits, it does have some challenges:

  • Initial setup complexity and learning curve
  • Potential performance impact with excessive instrumentation
  • Storage costs for high-cardinality data or too many histogram buckets
  • The relatively young ecosystem means some features are still maturing
  • Configuration can be complex for advanced use cases

What is the difference between gauge and histogram in Prometheus?

A gauge in Prometheus measures a single value that can go up or down (like current memory usage), while a histogram captures the distribution of values across predefined buckets. Gauges tell you what's happening at a specific moment, whereas histograms help you understand patterns and outliers in your data over time. Histograms enable percentile calculations, which gauges cannot provide.

What are the default buckets for OpenTelemetry histogram?

The default bucket boundaries in OpenTelemetry histograms vary by implementation, but they typically follow an exponential scale. In many implementations, the default buckets are approximately: [0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000]. However, you should always check your specific SDK documentation as these can differ.

What are OpenTelemetry metrics?

OpenTelemetry metrics are numerical measurements collected at regular intervals from your applications and infrastructure. They include four main types: counters (increasing values), gauges (values that go up and down), histograms (distribution of values), and summaries (similar to histograms but with server-calculated percentiles). Metrics help you monitor system health, performance, and business indicators.

How to start using OpenTelemetry Metrics?

To start using OpenTelemetry metrics:

  1. Install the OpenTelemetry SDK for your language
  2. Configure a MeterProvider and register metric exporters
  3. Create a Meter instance for your service
  4. Define metrics (counters, gauges, histograms) that matter to your application
  5. Record values to these metrics throughout your code
  6. Configure an exporter to send metrics to your observability platform

Is it correct to create a histogram out of custom counter metrics?

No, it's not recommended to build histograms from custom counters. Histograms require bucket definitions at creation time and track the distribution directly. Attempting to create a histogram from counter metrics would lose precision and require additional computation. OpenTelemetry provides native histogram types that handle the distribution calculations efficiently.

How can I find or correctly export resource labels of OpenTelemetry metrics in Prometeus?

To export resource labels from OpenTelemetry to Prometheus:

  1. Ensure your OpenTelemetry collector is configured with the Prometheus exporter
  2. Add resource attributes to your OpenTelemetry SDK configuration
  3. Use the resource.attributes configuration in the collector to control which attributes are exported
  4. Check the Prometheus target endpoint (usually /metrics) to verify labels are being exported correctly
  5. In Prometheus queries, these will appear as standard labels you can filter and group by

How to change the default path of the Prometheus exporter in OpenTelemetry Collector?

To change the default path of the Prometheus exporter in the OpenTelemetry Collector, modify your collector configuration YAML to include a custom path parameter under the Prometheus exporter configuration. Restart the collector after making these changes.

Is there any guidance documented for using histogram vs gauge while using OpenTelemetry?

The general guideline is:

  • Use gauges for single values that can increase or decrease (memory usage, CPU utilization, queue length)
  • Use histograms when you need to understand the distribution of values and calculate percentiles (request durations, payload sizes)
  • Use counters for values that only increase (total requests, errors)

Choose histograms when the "shape" of your data matters more than just the current value.

How do you implement a histogram using OpenTelemetry?

To implement a histogram using OpenTelemetry:

  1. Set up a meter provider and get a meter instance
  2. Create a histogram with appropriate name, description, and unit
  3. Optionally define custom bucket boundaries if the defaults don't suit your needs
  4. Record values to the histogram at the appropriate points in your code
  5. Add dimensions using attributes/labels for better analysis

The exact implementation will vary by programming language but follows this general pattern.

How do you implement a histogram with OpenTelemetry for monitoring response times?

For monitoring response times, create a histogram with milliseconds as the unit of measure, then record the duration of each request with relevant attributes like method, route, and status code. This setup will give you detailed insights into your response time distribution across different endpoints and status codes.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X