Prometheus is a powerful and flexible tool for observability and monitoring, with the rate() function standing out as a key feature for tracking system behavior over time.
This guide will walk you through how to use the rate() function effectively, covering its mechanics, use cases, and best practices to boost your monitoring and observability efforts, helping you build more reliable systems.

Understanding the Prometheus Rate Function

The rate() function is a key component of PromQL (Prometheus Query Language) used for analyzing the rate of change in counter metrics over time. At its core, rate() calculates the per-second average rate of increase of time series in a range vector.

rate() helps answer critical questions such as:

"What is the current request rate for a service?"
"How rapidly is CPU usage increasing over a given period?"
"What's the rate of errors occurring in our application?"

These insights help in understanding system performance and behavior.

How Scrape Intervals Influence Your Monitoring Setup

The scrape interval in Prometheus defines how frequently metrics are collected. This interval plays a crucial role in how the rate() function works and affects your results.

When using rate(counter[time_range]), Prometheus recommends that your time range should be at least 4 times the scrape interval. This ensures enough data points for accurate calculations while smoothing out irregularities.

For example, if your scrape interval is 15 seconds, your time range should be at least 1 minute (rate(counter[1m])). For metrics scraped every minute, consider using a 5-minute range (rate(counter[5m])).

Inadequate time ranges relative to scrape intervals can lead to:

Noisy and spiky graphs
Misleading trends
Missed counter resets

The Importance of the Rate Function in Monitoring

Understanding rates of change is crucial in monitoring systems for a few key reasons:

Performance Monitoring: It helps identify sudden spikes or drops in system performance, allowing for quick detection of anomalies or potential issues.
Capacity Planning: Analyzing trends in rates allows for predicting future resource needs and planning for scaling accordingly.
Alerting: Rate-based alerts can catch issues before they become critical, enabling proactive problem-solving.
SLO/SLA Tracking: Rates are often key components of service-level objectives and agreements, making them crucial for ensuring compliance and maintaining service quality.
Trend Analysis: Rates provide valuable insights into long-term trends, helping in strategic decision-making and system optimization.

Using the Rate Function: Syntax and Basic Examples

The basic syntax of the rate() function is straightforward:

rate(metric[time_range])

Example:

Assume there's a counter metric, http_requests_total, which tracks the total number of HTTP requests to a service.

To calculate the rate of requests per second over the last 5 minutes, use the following query:

rate(http_requests_total[5m])

This query returns the per-second rate of increase for http_requests_total over the last 5 minutes.

Note: It's crucial to use rate() only counter-metrics. Applying it to gauge metrics will result in incorrect data.

The Two-Sample Minimum Requirement

The rate() function requires at least two data points within the specified time range to calculate a rate. This is a fundamental requirement because rate calculation is based on the change between points over time.

If your time range is too small relative to your scrape interval, or if there are gaps in your data collection, you may see empty spots in your graphs where rate() couldn't calculate values.

This requirement becomes especially important when:

Working with sparse metrics
Handling service outages or collection gaps
Debugging missing data in dashboards

Rate vs. Irate: When to Use Each

Prometheus offers two functions for calculating rates from counter metrics, each with distinct purposes:

rate(): Calculates the per-second average rate over the entire time range, smoothing out fluctuations.
irate(): Calculates the instantaneous rate using only the last two data points, capturing rapid changes.

Choosing the Right Function:

Use rate() for:
- Dashboards and trend analysis
- Capacity planning
- Stable metrics where consistency matters
- Alerting on sustained issues
Use irate() for:
- Highly variable metrics
- Detecting sudden spikes
- Real-time operational monitoring
- When you need to see rapid changes

# Stable rate over 5 minutes
rate(http_requests_total[5m])

# Instant rate, captures rapid changes
irate(http_requests_total[5m])

The key difference is that rate() averages across your entire time range, providing stability at the cost of potentially masking brief spikes, while irate() is more responsive but produces noisier visualizations.

When to Use rate() vs. irate()

While both functions handle counter metrics, they serve different monitoring purposes:

Choosing the Right Function:

Function	Purpose	Use When
`rate()`	Calculates per-second average rate over the entire time range	You need stable trends and want to smooth out spikes
`irate()`	Calculates instantaneous rate using just the last two samples	You need to detect

Advanced Usage: Combining `rate()` with Other Functions

The true potential of the rate() function in Prometheus comes when it’s combined with other PromQL functions.

Below are a few advanced techniques to enhance your metrics analysis:

1. Calculating Request Error Rates

To calculate the ratio of 5xx errors to total requests:

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

This gives a clear view of your error rate by comparing 5xx errors to the total number of requests.

2. Using `histogram_quantile()` with Rates

To calculate the 95th percentile of request durations over the last 5 minutes:

histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

This is especially useful for tracking performance SLOs by providing a detailed view of latency.

3. Smoothing Out Spikes with `avg_over_time()`

To smooth out short-term fluctuations by averaging the request rate over a longer period:

avg_over_time(rate(http_requests_total[5m])[1h:])

This gives the average rate over the last hour, updated every 5 minutes, helping to track longer-term trends.

4. Comparing Rates Across Different Time Ranges

To detect sudden traffic spikes by comparing the short-term and long-term request rates:

rate(http_requests_total[5m]) / rate(http_requests_total[1h]) > 1.2

This query identifies when the short-term rate is 20% higher than the long-term rate.

5. Calculating the Rate of Change for a Gauge Metric

For calculating the rate of change in gauge metrics (like memory usage) using deriv():

deriv(process_resident_memory_bytes{job="app"}[1h])

This tracks how quickly memory usage is changing over an hour.

How rate() Extrapolates Data

An important but often overlooked aspect of rate() is how it handles extrapolation. The function doesn't simply divide the total increase by the time range. Instead, it:

Calculates the rate between the first and last data points in the range
Extrapolates this rate to a per-second value
Accounts for counter resets within the range

This extrapolation behavior means that if your traffic pattern changes dramatically just after the start of your time range, rate() will still consider the older data points in its calculation, potentially masking recent changes.

Understanding this behavior is crucial when interpreting graphs and setting appropriate time ranges for your specific monitoring needs.

Advanced Applications of rate()

Beyond basic monitoring, the rate() function enables sophisticated observability patterns that can transform your Prometheus metrics into actionable insights.

Common Advanced Patterns

Pattern	Example	Use Case
Multi-instance Aggregation	`sum(rate(http_requests_total{job="api"}[5m]))`	Measure total request rate across all API servers
Error Percentage Calculation	`sum(rate(http_requests_total{status="error"}[5m])) / sum(rate(http_requests_total[5m])) * 100`	Track error rates as percentage of total traffic
Top Consumer Identification	`topk(5, sum by(user) (rate(api_requests_total[10m])))`	Find users generating the most traffic
Historical Comparison	`rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1d)`	Compare current traffic to same time yesterday

These patterns form the building blocks for comprehensive monitoring systems that can detect anomalies, track performance regressions, and identify optimization opportunities in your infrastructure and applications.

5 Common Pitfalls and How to Avoid Them

When working with the rate() function, keep an eye on these common issues:

Too Short Time Ranges: Using a time range that's shorter than the scrape interval can result in inaccurate data. A good rule of thumb is to use a time range at least 4x the scrape interval. For instance, if your scrape interval is 15 seconds, set the time range in your rate() function to at least 1 minute.
Ignoring Counter Resets: The rate() function can handle counter resets (e.g., when a service restarts), but these resets can still cause temporary spikes. Be mindful of this when interpreting data, as the rate is calculated from the available information.
Misunderstanding Aggregation: Since rate() returns a per-second value, summing or averaging these rates won’t give an accurate total. Instead, sum the underlying counters and then apply the rate() function.
Inappropriate Use with Gauges: The rate() function is specifically for counters. Using it with gauge metrics will yield incorrect results, so avoid this combination.
Neglecting Label Changes: Frequent changes in label values can create gaps in your data, leading to inaccurate rate calculations. Always account for potential label changes when working with metrics.

Practical Example: Monitoring API Request Rates

Let's consider a scenario where you need to monitor request rates for different API endpoints. Here's how it can be set up:

Instrument the API: Expose a counter metric, api_requests_total, with labels for the endpoint and method.
Create a Grafana Dashboard: Use the following PromQL query to visualize the request rates:

sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m]))

This query calculates the request rate for each endpoint over the last 5 minutes and sums up the rates for all methods.

Set Up Alerts: Trigger an alert if the request rate for any endpoint exceeds 100 requests per second:

sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m])) > 100

This setup provides a real-time view of API usage patterns, enabling quick identification of bottlenecks and helping optimize heavily used endpoints.

Best Practices for Using the `rate()` Function

To make the most of the rate() function in Prometheus, keep these best practices in mind:

Choose the Right Time Range: Ensure the time range is long enough to capture meaningful trends while remaining short enough to respond to real-time changes.
Alerting with Care: When using rate() in alerts, opt for longer time ranges to reduce the risk of false positives caused by short-term fluctuations.
Use Other Functions: As highlighted in advanced examples, combining rate() with other PromQL functions can provide richer insights into your data.
Know Your Metrics: Understand the nature of your data—such as how often metrics update and their variability—to ensure accurate monitoring.
Test Thoroughly: Validate your queries with historical data to ensure they're reliable under different conditions and scenarios.

Best Practices for Selecting Time Ranges

Choosing the right time range for rate() involves balancing between smoothing and responsiveness:

For high-frequency metrics (scraped every 15s), use 1-2 minutes for alerting and 5-10 minutes for dashboards
For standard metrics (scraped every minute), use 5 minutes for alerting and 10-30 minutes for dashboards
For slow-changing metrics, longer ranges (up to several hours) may be appropriate

Tips for selecting the optimal range:

Start with 4x your scrape interval as a minimum
Increase the range if graphs appear too noisy
Decrease the range if you need to detect changes more quickly
Consider using different ranges for alerting vs. dashboarding
Test different ranges under various load conditions to find what works best

Remember that longer ranges provide more stability but mask short-term changes, while shorter ranges show more detail but may introduce noise.

Conclusion

The Prometheus rate() function is a fundamental tool for monitoring, offering versatility from tracking request rates to analyzing performance and error metrics. Its strength lies in revealing the rate of change across various metrics, making it vital for any observability strategy.

💡

And if you’d like to discuss anything further, our Discord community is open. We have a dedicated channel where you can share your use case and connect with other developers.

FAQs

Q: How does the Prometheus rate() function differ from increase()?
A: While rate() calculates the per-second average rate of increase, increase() calculates the total increase in the counter's value over the time range. rate() is generally more useful for ongoing monitoring, while increase() can help understand total change over a specific period.

Q: How do you calculate request rates using the Prometheus rate function?
A: To calculate request rates, use a query like rate(http_requests_total[5m]). This will give the per-second rate of requests over the last 5 minutes. These rates can be summed or grouped as needed, e.g., sum(rate(http_requests_total[5m])) for the total request rate across all instances.

Q: Are there approaches for capturing spikes with PromQL?
A: Yes, max_over_time() can be used with rate() to capture spikes. For example, max_over_time(rate(http_requests_total[5m])[1h:]) will show the maximum rate observed in 5-minute windows over the last hour.

Q: How do you calculate the increase of a counter over time using Prometheus functions?
A: To calculate the total increase of a counter, use the increase() function. For example, increase(http_requests_total[1h]) will show the total increase in the number of requests over the last hour.

Q: Can rate() be used with all types of Prometheus metrics?
A: No, rate() should only be used with counter-metrics. It doesn't make sense to use rate() with gauge metrics, as they don't represent cumulative values.

Prometheus Rate Function: A Practical Guide to Using It

Contents

Understanding the Prometheus Rate Function

How Scrape Intervals Influence Your Monitoring Setup

The Importance of the Rate Function in Monitoring

Using the Rate Function: Syntax and Basic Examples

Example:

The Two-Sample Minimum Requirement

Rate vs. Irate: When to Use Each

Choosing the Right Function:

When to Use rate() vs. irate()

Choosing the Right Function:

Advanced Usage: Combining `rate()` with Other Functions

1. Calculating Request Error Rates

2. Using `histogram_quantile()` with Rates

3. Smoothing Out Spikes with `avg_over_time()`

4. Comparing Rates Across Different Time Ranges

5. Calculating the Rate of Change for a Gauge Metric

How rate() Extrapolates Data

Advanced Applications of rate()

Common Advanced Patterns

5 Common Pitfalls and How to Avoid Them

Fix observability issues with Prometheus instantly — right from your IDE, with AI and Last9 MCP.

Practical Example: Monitoring API Request Rates

Best Practices for Using the `rate()` Function

Best Practices for Selecting Time Ranges

Conclusion

FAQs

Contents

Do More with Less

Handcrafted Related Posts

New in OTel: How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

How sum_over_time Works in Prometheus

Use Telegraf Without the Prometheus Complexity

Prometheus Rate Function: A Practical Guide to Using It

Contents

Understanding the Prometheus Rate Function

How Scrape Intervals Influence Your Monitoring Setup

The Importance of the Rate Function in Monitoring

Using the Rate Function: Syntax and Basic Examples

Example:

The Two-Sample Minimum Requirement

Rate vs. Irate: When to Use Each

Choosing the Right Function:

When to Use rate() vs. irate()

Choosing the Right Function:

Advanced Usage: Combining rate() with Other Functions

1. Calculating Request Error Rates

2. Using histogram_quantile() with Rates

3. Smoothing Out Spikes with avg_over_time()

4. Comparing Rates Across Different Time Ranges

5. Calculating the Rate of Change for a Gauge Metric

How rate() Extrapolates Data

Advanced Applications of rate()

Common Advanced Patterns

5 Common Pitfalls and How to Avoid Them

Fix observability issues with Prometheus instantly — right from your IDE, with AI and Last9 MCP.

Practical Example: Monitoring API Request Rates

Best Practices for Using the rate() Function

Best Practices for Selecting Time Ranges

Conclusion

FAQs

Contents

Do More with Less

Handcrafted Related Posts

New in OTel: How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

How sum_over_time Works in Prometheus

Use Telegraf Without the Prometheus Complexity

Advanced Usage: Combining `rate()` with Other Functions

2. Using `histogram_quantile()` with Rates

3. Smoothing Out Spikes with `avg_over_time()`

Best Practices for Using the `rate()` Function