Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Nov 20th, ‘24 / 11 min read

Prometheus Metrics Types - A Deep Dive

A deep dive on different metric types in Prometheus and best practices

Prometheus Metrics Types - A Deep Dive

Updated: 15-Apr-2025.

This blog post explores the four key metrics supported by Prometheus, highlighting their use cases and the PromQL functions you can use to query them.

In Prometheus, metrics represent one or more time series, each consisting of a metric name, a set of labels, and a series of data points (with timestamps and values). Time series data is essentially a collection of data points indexed by time, which is crucial for monitoring system performance.

At its core, a metric is a quantifiable measure used to track and assess the status of a specific process or activity. Prometheus, as an open-source monitoring solution, offers a powerful data model for storing and querying this metric data.

Metrics Structure in Prometheus:

Before jumping into metric types, it's important to understand how Prometheus handles metric data.

The Prometheus data model is designed to efficiently store time-series data, with each data point containing sample observations from your systems.

The structure of a metric typically includes the following key components:

  • Metric Name: An explicit identifier for the metric, often reflecting what it measures. For example, http_requests_total.
  • Labels: Key-value pairs that provide additional dimensions to the metric, enabling more detailed and specific tracking. An example label for http_requests_total could be {method="GET", endpoint="/api"}.
  • Metric Value: The actual data point representing the measurement, which could be a count, a duration, etc.
  • Timestamp: The point in time when the metric value was recorded (often added automatically by the monitoring system)

Consider a metric named user_logins_total

This metric could have labels like {user_type="admin", location="EU"} and a numerical value indicating the total count of logins. The timestamp would denote when this count was recorded.

💡
To get a better understanding of how Prometheus functions can be used in your queries, check out this article: Prometheus Functions.

Types of Prometheus Metrics

Prometheus, through its various client libraries including Python, Go, and Java clients, primarily deals with four types of metrics:

  1. Counter: A metric that only increases or resets to zero on restart. Ideal for tracking the number of requests, tasks completed, or errors.
  2. Gauge: This represents a value that can go up or down, like temperature or current memory usage.
  3. Histogram: Measures the distribution of events over time, such as request latencies or response sizes.
  4. Summary: Similar to histograms but provides a total count and sum of observed values.

Lets dive deeper into each Prometheus metrics types.

Counters

A counter metric is a cumulative metric used primarily for tracking quantities that increase over time. Said simply, a counter... counts!

What are Counters Used For?

Counters are ideal for monitoring the rate of events, like:

  • Total number of HTTP requests to a web server
  • Task completions
  • Error occurrences

A counter is designed to only increase, which means its value should never decrease (except when reset to zero, usually due to a restart or reset of the process generating it).

Visualizing and Alerting with Counters

Counters are often visualized in dashboards to show trends over time, like the rate of requests to a web service. They can trigger alerts if the rate of errors or specific events exceeds a threshold, indicating potential issues in the monitored system.

Example: node_network_receive_bytes_total in Node Exporter is a counter that tracks the total number of bytes received on a network interface.

Working with Counters in PromQL

Several PromQL functions are commonly used with counters:

  • rate(): Calculates a metric's per-second average rate of increase over a given time interval
  • increase(): Calculates the cumulative increase of a metric over a given time range
  • reset(): Identifies the number of times a counter has been reset during a given period

Counter Reset Behavior

There are scenarios where a counter can reset. The most common reason is when the process generating the metric restarts due to a service restart, deployment, or system reboot. When this happens, the counter starts from zero again.

💡 An upcoming feature in Prometheus adds created timestamp metrics to solve the long-standing issues with counter-resets. See the talk from Promcon 2023.

This reset behavior is crucial for understanding how to interpret counter data. Functions like rate() or increase() in PromQL are designed to account for counter resets by detecting when the counter value decreases between scrape intervals.

Counter Implementation Example

// Go example
requestCounter := prometheus.NewCounter(prometheus.CounterOpts{
    Name: "http_requests_total",
    Help: "Total number of HTTP requests",
})
prometheus.MustRegister(requestCounter)

// Increment the counter
requestCounter.Inc()
💡
For a closer look at how to configure Prometheus ports for optimal performance, check out this article: Prometheus Port Configuration.

Gauges

Gauges represent a metric that can increase or decrease, akin to a thermometer. They give a snapshot of a system's state at a specific point in time.

What are Gauges Used For?

Gauges are versatile and can measure values like:

  • Memory usage
  • Temperature
  • Queue sizes
  • CPU utilization
  • Current connections

Working with Gauges

Gauges are straightforward in terms of updating their value. They can be:

  • Set to a particular value at any given time
  • Incremented
  • Decremented

This flexibility makes them ideal for tracking metrics that fluctuate up and down.

Visualizing Gauges

Gauges are often visualized using line graphs in dashboards to depict their value changes over time. They are useful for observing the current state and trends of what's being measured rather than the rate of change.

Example: From the JMX Exporter, which is used for Java applications, a Gauge might be employed to monitor the number of active threads in a JVM labeled as jvm_threads_current.

Analyzing Gauge Metrics with PromQL

When working with gauges, specific functions are typically used to calculate statistical measures over a time series:

  • avg_over_time() - for computing the average
  • max_over_time() - for finding the maximum value
  • min_over_time() - for the minimum value
  • quantile_over_time() - for determining percentiles within the specified period
  • delta() - for the difference in the gauge value over the time series

These functions are instrumental in analyzing the trends and variations of gauge metrics, providing valuable insights into the performance and state of monitored systems.

Gauge Implementation Example

// Go example
connectionGauge := prometheus.NewGauge(prometheus.GaugeOpts{
    Name: "active_connections",
    Help: "Number of active connections",
})
prometheus.MustRegister(connectionGauge)

// Set or modify the gauge
connectionGauge.Set(10)
connectionGauge.Inc()
connectionGauge.Dec()

Fix Prometheus metric issues instantly — right from your IDE, with AI and Last9 MCP.

Last 9 Mobile Illustration

Histograms

Histograms are used to sample and aggregate distributions, such as latencies. They use configurable buckets to sort measurements into predefined ranges, which can be adjusted based on your monitoring needs. Histograms are excellent for understanding the distribution of metric values and helpful in performance analysis, like tracking request latencies or response sizes.

How Histograms Work

Histograms efficiently categorize measurement data into defined intervals, known as buckets, and tally the number (i.e., a counter) of measurements that fit into each of these buckets. These buckets are pre-defined during the instrumentation stage.

A key thing to note in the Prometheus Histogram type is that the buckets are cumulative. This means each bucket counts all values less than or equal to its upper bound, providing a cumulative distribution of the data. Simply put, each bucket contains the counts of all prior buckets.

Example: Observing Response Times

Let's take an example of observing response times with buckets — We could classify request times into meaningful time buckets like:

  • 0 to 200ms - le="200" (less or equal to 200)
  • 200ms to 300ms - le="300" (less or equal to 300)
  • … and so on
  • Prometheus also adds a +inf bucket by default

Let's say our API's response time observed is 175ms; the count values for the bucket will look something like this:

Bucket Count
0 - 200 1
0 - 300 1
0 - 500 1
0 - 1000 1
0 - +Inf 1

Here, you can see how the cumulative nature of the histogram works.

Let's say in the following observation our API's response time is 300ms; the count values will look like this:

Bucket Count
0 - 200 1
0 - 300 2
0 - 500 2
0 - 1000 2
0 - +Inf 2

Histogram Metric Structure

It is essential to note the histogram-type metric's structure for properly querying it.

Each bucket is available as a "counter," which can be accessed by adding a _bucket suffix and the le label. The suffix of _count and _sum are generated by default to help with the qualitative calculation.

  • _count is a counter with the total number of measurements available for the said metric.
  • _sum is a counter with the total (or the sum) of all values of the measurement.

For Example:

  • http_request_duration_seconds_sum{host="example.last9.io"} 9754.113
  • http_request_duration_seconds_count{host="example.last9.io"} 6745
  • http_request_duration_seconds_bucket{host="example.last9.io", le="200"} 300
  • http_request_duration_seconds_bucket{host="example.last9.io", le="300"} 124
  • ...

Working with Histograms in PromQL

The histogram_quantile() function calculates quantiles (e.g., medians, 95th percentiles) from histograms. It takes a quantile (a value between 0 and 1) and a histogram metric as arguments and computes the estimated value at that quantile across the histogram's buckets.

For instance, histogram_quantile(0.95, metric_name_here) estimates the value below which 95% of the observations in the histogram fall, providing insights into distribution tails like request latencies.

💡
If you're looking to get more out of your Prometheus queries, check out these helpful tips: PromQL Tricks You Should Know.

Aggregating Histograms

The histogram data type can also be aggregated, i.e., combining multiple histograms into a single histogram. Suppose you're monitoring response times across different servers.

Each server emits a histogram of response times. You would aggregate these individual histograms to understand the overall response time distribution across all servers. This aggregation is done by summing up the counts in corresponding buckets across all histograms.

For example, you could use a PromQL query like this:

sum by (le) (rate(http_request_duration_seconds_bucket{endpoint="payment"}[5m]))

In this example, the sum by (le) part aggregates the counts in each bucket (le label) across all instances of the endpoint labeled "payment". The rate function is applied over a 5-minute interval ([5m]), calculating the per-second rate of increase for each bucket, which is helpful for histograms derived from counters. This query gives a unified view of the request duration distribution across all servers for the specified endpoint.

Histogram Implementation Example

// Go example
requestDuration := prometheus.NewHistogram(prometheus.HistogramOpts{
    Name:    "http_request_duration_seconds",
    Help:    "HTTP request duration distribution",
    Buckets: prometheus.LinearBuckets(0.1, 0.1, 10), // 10 buckets, starting at 0.1, width of 0.1
})
prometheus.MustRegister(requestDuration)

// Observe a value
requestDuration.Observe(0.42)

Native Histograms

Starting from Prometheus version 2.40, an experimental feature provides support for native histograms. With native histograms, you only need a one-time series, and it includes a variable number of buckets along with the sum and count of observations. This feature offers significantly higher resolution while being more cost-effective.

Summaries

Summaries track the size and number of events, commonly used to calculate percentiles like the 99th percentile for latency monitoring. The total sum and count are automatically maintained for each summary metric.

What are Summaries Used For?

Summaries are ideal for calculating quantiles and averages. They are used for metrics where aggregating over time and space is essential, like request latency or transaction duration.

How Summaries Work

A summary metric automatically calculates and stores quantiles (e.g., 50th, 90th, 95th percentiles) over a sliding time window. This means it tracks both the number of observations (like requests) and their sizes (like latency), and then computes the quantiles of these observations in real-time.

A Prometheus summary typically consists of three parts:

  • The count (_count) of observed events
  • The sum of these events' values (_sum)
  • The calculated quantiles

Example of Summary Metrics

# HELP http_request_duration_seconds The duration of HTTP requests in seconds
# TYPE http_request_duration_seconds summary
http_request_duration_seconds{quantile="0.5"} 0.055
http_request_duration_seconds{quantile="0.9"} 0.098
http_request_duration_seconds{quantile="0.95"} 0.108
http_request_duration_seconds{quantile="0.99"} 0.15
http_request_duration_seconds_sum 600
http_request_duration_seconds_count 10000

Summaries vs. Histograms

Summaries are better suited when you need accurate quantiles for individual instances or components and don't intend to aggregate these quantiles across different dimensions or labels.

Compared to histograms, which are helpful when aggregating data across multiple instances or dimensions, like calculating global request latency across several servers.

Limitations of Summaries

A significant limitation of summaries is that you cannot aggregate their quantiles across multiple instances. While you can sum the counts and sums, the quantiles are only meaningful within the context of a single summary instance.

Resource Considerations

Summaries can be more resource-intensive since they compute quantiles on the fly and keep a sliding window of observations. Histograms can be more efficient regarding memory and CPU usage, especially when dealing with high-cardinality data. Since the bucket configuration is fixed, they can also be optimized for storage.

Summary Implementation Example

// Go example
requestLatency := prometheus.NewSummary(prometheus.SummaryOpts{
    Name:       "http_request_processing_seconds",
    Help:       "Request processing time",
    Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
})
prometheus.MustRegister(requestLatency)

// Observe a value
requestLatency.Observe(0.27)

Performance and Cardinality Considerations

When working with Prometheus metrics, understanding performance implications and cardinality concerns is crucial for maintaining a healthy monitoring system.

Understanding Metric Cardinality

Cardinality refers to the number of unique time series in your Prometheus database. Each unique combination of metric name and label values creates a separate time series. High cardinality can lead to:

  • Increased storage requirements
  • Slower query performance
  • Higher memory usage in Prometheus servers
  • Potential system instability

Cardinality Impact by Metric Type

Different metric types have varying impacts on cardinality:

  • Counters and Gauges: Generally have the lowest cardinality impact since they create only one time series per unique label combination.
  • Histograms: Can significantly increase cardinality due to the creation of multiple time series:
    • One time series for each bucket (number of buckets × number of label combinations)
    • Additional time series for _count and _sum
    • For example, a histogram with 10 buckets and 5 label combinations can create 52 time series (10+1 buckets × 5 combinations, plus 5 _count and 5 _sum series)
  • Summaries: Generate fewer time series than comparable histograms since they don't use buckets, but still create multiple:
    • One time series for each configured quantile
    • Additional time series for _count and _sum
💡
For insights on troubleshooting common Prometheus challenges like cardinality and resource utilization, check out this article: Troubleshooting Common Prometheus Pitfalls.

Best Practices for Managing Cardinality

  • Label Usage:
    • Avoid high-cardinality labels like user IDs, session IDs, or timestamps
    • Use label values with bounded cardinality (e.g., HTTP status codes instead of exact error messages)
    • Consolidate similar label values when precise granularity isn't needed
  • Histogram Configuration:
    • Choose bucket counts and ranges carefully
    • More buckets = higher resolution but increased cardinality
    • Focus on ranges that matter most for your use case
  • Regular Monitoring:
    • Monitor the total number of time series in your Prometheus instance
    • Watch for unexpected cardinality increases
    • Use prometheus_tsdb_head_series metric to track active series
  • Aggregation and Recording Rules:
    • Use recording rules to pre-compute and store frequently accessed aggregations
    • This reduces query-time resource usage and can mitigate some cardinality issues

Resource Usage Considerations

  • Storage growth is directly proportional to the number of active time series
  • Memory requirements increase with the number of active series
  • Query performance degrades as cardinality increases
  • Network bandwidth between Prometheus and exporters increases with more time series

Balancing Detail and Performance

The key to effective Prometheus monitoring is finding the right balance between:

  • Detailed metrics that provide valuable insights
  • Controlled cardinality that maintains system performance

For most applications, being selective about labels and thoughtful about metric types will help maintain this balance.

Visualization and Integration

While Prometheus provides powerful querying capabilities through PromQL, many organizations use Grafana as their primary visualization tool for Prometheus metrics.

Grafana offers rich dashboarding capabilities and seamless integration with Prometheus data sources. You can also use Last9 to explore these metrics through very user-friendly navigation and dashboards.

Last9’s Telemetry Warehouse now supports Logs and Traces too
Last9’s Telemetry Warehouse now supports Logs and Traces too

Summing up

The fundamentals we have covered in this post around metrics types in Prometheus will hopefully help you better grasp your monitoring setup.

In previous posts, we have posted various posts covering the fundamentals of Prometheus Monitoring and Prometheus Cardinality.

If you or your team is looking to get started using Prometheus, you can consider hosted and managed prometheus offerings that can help eliminate your cardinality and long-term storage woes while reducing your monitoring cost significantly.

PromQL: A Developer’s Guide to Prometheus Query Language | Last9
Our developer’s guide breaks down Prometheus Query Language in an easy-to-understand way, helping you monitor and analyze your metrics like a pro.

Prometheus Query Language Developer Guide

Additional Resources

  • For more detailed information about implementing these metric types, refer to the official Prometheus docs and client-side documentation for your preferred programming language.
  • The prometheus client libraries provide comprehensive examples for different metric implementations across various programming languages like Python, Go, and Java.

Contents


Newsletter

Stay updated on the latest from Last9.