What are OpenTelemetry Metrics? A Comprehensive Guide

OpenTelemetry metrics help you keep track of how your app is doing—things like request counts, memory usage, or how long something took to run. It’s a simple idea: record some numbers, look at them later, spot problems before they turn into outages.

OpenTelemetry (or just OTel) is an open-source project from the CNCF that brings some order to the chaos of observability. It gives you a standard way to collect metrics, logs, and traces across services.

In this post, we’re focusing on metrics—what they are, how they work in OTel, and what you need to know to actually use them in your setup.

What Are OpenTelemetry Metrics?

Metrics are numbers you track over time, things like how many requests your app handled, how long a DB call took, or how much memory your service is using.

In the OpenTelemetry world, metrics are one of the three core signals, along with logs and traces. If traces tell you what happened and logs tell you why, then metrics tell you how often and how bad.

They’re great for building dashboards, setting alerts, and keeping an eye on long-term trends. And since they’re lightweight, they’re usually the cheapest signal to collect at scale.

💡

Here’s a clear explanation of how the OpenTelemetry Collector works alongside exporters and when you might need each.

Types of OpenTelemetry Metric Instruments (And When to Use Each)

OpenTelemetry metrics are built around instruments—specific tools for recording values that matter to your system. Each instrument is suited to a particular kind of measurement.

Let's understand them:

Counter

Use this when you're counting things that only ever go up. Simple, reliable, and great for totals.

Example: number of HTTP requests, errors, messages processed, user signups—anything where you’re just tracking how many times something happened.

request_counter = meter.create_counter(
    "requests", description="Number of requests"
)
request_counter.add(1)

Counters can’t decrease. That’s by design—to avoid confusion or incorrect data when trying to track things like retries or failures.

UpDownCounter

As the name suggests, this one can increment and decrement. Perfect when the number you're tracking can both rise and fall.

Think: active connections, items in a queue, current memory in use, or threads running.

active_connections = meter.create_up_down_counter(
    "active_connections", description="Number of active connections"
)
active_connections.add(1)   # New connection opens
active_connections.add(-1)  # Connection closes

This is useful when you want to understand the current value of something that changes over time, not just how many times it happened.

Histogram

Histograms are ideal when you want to capture not just single values, but the distribution of those values over time.

This is the go-to instrument for things like request durations, payload sizes, or DB query times.

response_size_histogram = meter.create_histogram(
    "response_size", description="Size of HTTP responses"
)
response_size_histogram.record(256)  # Record a 256-byte response

Instead of just tracking the latest value or a sum, histograms let you see things like averages, percentiles (p50, p90, p99), and outliers. This is what makes them great for performance monitoring.

Asynchronous Instruments

Some metrics aren’t updated by your application logic—they're observed periodically. That’s where asynchronous instruments come in.

Let’s say you want to track CPU usage. That’s something you query from the system at regular intervals, not something you "add to" like a counter.

OpenTelemetry handles this with observable instruments and callbacks:

def get_cpu_usage():
    return cpu_usage  # Replace with your logic to get current CPU usage

meter.create_observable_gauge(
    "cpu_usage",
    callbacks=[get_cpu_usage],
    description="Current CPU usage"
)

When the SDK scrapes metrics, it runs your callback and records the returned value.

There are a few types of async instruments (ObservableGauge, ObservableCounter, etc.), and they’re mostly used for system-level metrics or values pulled from outside your app code.

💡

If you want to get a clearer picture of things like response times or payload sizes, this post on OpenTelemetry histograms breaks down how they work!

How Aggregation Works in OpenTelemetry Metrics

Recording metrics is just the first step. What happens next is aggregation—how those values are grouped, summarized, or broken down so you can use them in charts, alerts, or reports.

Each instrument has a default aggregation, like summing up all counter values or bucketing histogram values by range. These defaults are sensible and usually good enough, but you can also define custom aggregations based on what your observability backend supports.

For example, you can configure metrics to give you:

Total number of requests (sum)
Requests per second (rate)
Average duration of requests (mean from histogram)
Distribution of response sizes (histogram buckets)

This flexibility makes OpenTelemetry metrics a powerful signal, not just for quick alerts, but for building meaningful dashboards and long-term trend analysis.

Exporting Metrics with OpenTelemetry

After you collect metrics, you’ll need to send them somewhere useful—whether that’s a dashboard, an alerting system, or long-term storage.

OpenTelemetry supports a few popular protocols and formats:

OTLP (OpenTelemetry Protocol) – the native, vendor-neutral option
Prometheus – ideal if you’re using a pull-based metrics setup
StatsD – lightweight and still used in some setups

Most teams use the OpenTelemetry Collector to handle this part. It sits between your app and your backend(s), receiving metrics (usually via OTLP) and exporting them where they need to go.

Here’s a simple example of a Collector config that receives OTLP metrics and exposes them to Prometheus:

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus]

This setup gives you flexibility—you can add more exporters later or switch destinations without touching your application code.

Push vs Pull Exporters: How Metrics Get From Your App to Your Tools

When exporting metrics, two main methods dominate: push and pull. Understanding these helps you pick the right setup.

Pull Exporters

In this model, your monitoring system (like Prometheus) periodically scrapes metrics from your app or collector. Your app exposes metrics at an HTTP endpoint, and Prometheus “pulls” data on its schedule.

Ideal for: systems where you want centralized control over data collection frequency. Prometheus is the classic example. It’s simple and scales well for many targets.

Push Exporters

Here, your app or collector pushes metrics directly to the backend (like a SaaS monitoring service or custom storage). Your code actively sends data whenever it’s ready, usually using protocols like OTLP or StatsD.

Ideal for: Useful when your backend can’t scrape metrics or when you want to reduce complexity on the backend side. Push models often suit cloud-native environments or systems behind firewalls.

Many setups combine both: your app pushes to an OpenTelemetry Collector, which then exposes a pull endpoint or pushes metrics further downstream. This hybrid flexibility lets you optimize for scale, reliability, and security.

💡

If you want to understand more on how to create custom metrics with OpenTelemetry, this guide breaks it down clearly!

How Labels Add Meaning — And Why Too Many Can Hurt

Labels (sometimes called dimensions or tags) give your metrics some much-needed context. Instead of just counting requests, you can break them down by things like service name, endpoint, user region, or error type. This makes your dashboards and alerts way more useful because you can slice and dice the data.

But here’s the catch: cardinality.

What’s cardinality?
It’s how many unique label combinations your metrics end up creating. High cardinality means a huge number of unique label sets.

Why should you care?
Each unique combination turns into a separate time series in your monitoring backend. Too many of these can overload your storage, slow down queries, or even cause data loss.

Some quick tips:

Use labels sparingly. Stick to the important, low-cardinality ones.
Avoid labels like user IDs or session tokens — those explode cardinality fast.
If you need fine-grained data, aggregate it inside your app instead of exporting every little detail raw.

In short, smart label use keeps your metrics meaningful and your system running smoothly.

Semantic Conventions: A Common Language for Your Metrics

OpenTelemetry comes with semantic conventions — basically, agreed-upon names and labels for common metrics and attributes. It’s like everyone using the same dictionary, so when different apps or teams talk metrics, they actually understand each other.

Why bother? Because sticking to these conventions means your metrics will:

Be easier to read and compare across different services.
Play nicely with tools and dashboards you’re already using.
Stay consistent as your observability setup grows bigger and more complex.

💡

You can also check out the official list here: OpenTelemetry Semantic Conventions!

Monitoring Scenarios with OpenTelemetry Metrics

OpenTelemetry metrics help teams in many practical ways:

Tracking request latency: Use histograms to measure how long API calls take and spot slow endpoints before they impact users.
Error monitoring: Count errors and get notified if failure rates suddenly jump.
Resource usage: Monitor CPU, memory, and disk usage with gauges to avoid capacity problems.
Queue depth: Keep an eye on queue sizes using up-down counters to catch processing slowdowns early.
Dependency health: Watch database or external service response times and errors to identify upstream issues.

These common use cases help you identify problems early and keep your apps running reliably.

How OpenTelemetry Works with Your Observability Tools

One of the big advantages of OpenTelemetry is that it doesn’t lock you into a single vendor.

You can export metrics to:

Prometheus
Last9
Grafana (via Prometheus or other exporters)
Jaeger (for traces, but relevant if you’re using OTel holistically)
Zipkin and more

This flexibility makes it easy to standardize instrumentation while still choosing the tools that work best for your team.

Best Practices and Common Use Cases

Here are some best practices for working with OpenTelemetry metrics:

Use semantic conventions: OpenTelemetry defines semantic conventions for common metrics. Adhering to these makes metrics more interoperable and easier to understand.
Be mindful of cardinality: High-cardinality metrics can cause performance issues. Use labels judiciously.
Implement health checks: Use metrics to implement health checks for services. This can be invaluable for Kubernetes deployments.
Monitor dependencies: Use OpenTelemetry to monitor dependencies, like databases or external APIs. This can help identify bottlenecks in distributed systems.
Combine metrics with traces and logs: For a complete observability solution, use OpenTelemetry for all three pillars: metrics, traces, and logs.

Conclusion

OpenTelemetry metrics give you a straightforward way to track what’s going on inside your app, no matter the size or complexity. It sets a common language so you can easily gather and understand data across different parts of your system.

When you combine it with tools like Last9, you get extra help turning those metrics into clear insights, making it easier to spot and fix problems before they cause real trouble.

Get started with us today!

🤝

Share SRE experiences, and thoughts on reliability, observability, or monitoring. Let's connect on the SRE Discord community!

FAQs

What is the difference between OpenTelemetry metrics and OpenMetrics?

OpenTelemetry metrics and OpenMetrics are both open-source projects related to metrics, but they serve different purposes:

OpenTelemetry metrics is part of the larger OpenTelemetry project, which provides a complete observability framework including traces, logs, and metrics. It offers a standardized way to collect and export telemetry data across multiple languages and platforms.
OpenMetrics is a project focused on evolving the Prometheus exposition format into a standard. It's primarily about the format of metric data, rather than the collection and export process.

OpenTelemetry can export metrics in the OpenMetrics format, allowing for interoperability between the two standards.

What are telemetry metrics?

Telemetry metrics are measurements collected from software systems to monitor their performance, behavior, and health. These metrics can include things like:

Response times
Error rates
Resource utilization (CPU, memory, disk)
Business-specific measurements (e.g., number of orders processed)

Telemetry metrics are crucial for understanding system behavior, identifying issues, and making data-driven decisions about system performance and capacity.

What is OpenTelemetry data?

OpenTelemetry data refers to the telemetry data collected and processed using the OpenTelemetry framework. This includes:

Metrics: Numerical measurements of system behavior and performance.
Traces: Distributed traces that show the path of requests through a distributed system.
Logs: Timestamped records of discrete events that happened in the system.

OpenTelemetry provides a standardized way to collect, process, and export this data, making it easier to implement comprehensive observability across different languages and platforms.

What is the use of OpenTelemetry?

OpenTelemetry is used to implement observability in software systems. Its main uses include:

Performance Monitoring: Tracking system performance metrics to identify bottlenecks and optimize resources.
Error Detection: Quickly identifying and diagnosing errors in distributed systems.
Distributed Tracing: Following requests as they travel through microservices architectures.
Capacity Planning: Using historical data to predict future resource needs.
Business Intelligence: Tracking metrics that are important for business decisions.
Debugging: Providing detailed information to help developers understand and fix issues.

What are the types of open telemetry metrics?

OpenTelemetry supports several types of metrics:

Counter: A cumulative metric that only increases in value (e.g., number of requests).
UpDownCounter: A metric that can both increase and decrease (e.g., number of active connections).
Histogram: A metric that samples observations and counts them in configurable buckets (e.g., request durations).
Gauge: A metric that represents a single numerical value that can arbitrarily go up and down (e.g., CPU usage).

Additionally, OpenTelemetry supports synchronous and asynchronous versions of these metric types.

What are some of the benefits of OpenTelemetry?

Some key benefits of OpenTelemetry include:

Standardization: Provides a single, vendor-neutral standard for telemetry data.
Language Support: Offers consistent APIs across multiple programming languages.
Flexibility: Can export data to multiple backends and supports various data formats.
Open Source: Backed by a large community and major industry players.
Reduced Vendor Lock-in: This makes it easier to switch between different observability tools.
Comprehensive: Covers metrics, traces, and logs in a single framework.
Performance: Designed to have minimal performance impact on the systems it monitors.

What is the difference between a metric and an event?

While both metrics and events are types of telemetry data, they serve different purposes:

Metrics:
- Represent numerical measurements of system behavior over time.
- Are typically aggregated (e.g., average, sum, count) over a time period.
- Are used to understand trends and patterns in system performance.
- Examples: CPU usage, request latency, error rate.
Events:
- Represent discrete occurrences at a specific point in time.
- Contains detailed information about what happened at that moment.
- Are used to understand specific actions or state changes in a system.
- Examples: User login, order placed, error occurred.

In OpenTelemetry, metrics are handled by the metrics API, while events are typically captured as part of logging or tracing.